Home     |     .Net Programming    |     cSharp Home    |     Sql Server Home    |     Javascript / Client Side Development     |     Ajax Programming

Ruby on Rails Development     |     Perl Programming     |     C Programming Language     |     C++ Programming     |     IT Jobs

Python Programming Language     |     Laptop Suggestions?    |     TCL Scripting     |     Fortran Programming     |     Scheme Programming Language


 
 
Cervo Technologies
The Right Source to Outsource

MS Dynamics CRM 3.0

C# Programming

XMLReader skip current element


For example, i have some part of XML file.

<AppSettings>
    <Object ClassVersion="1.0.0.0" Type="AppSettings">
      <Fields>
        <Field Name="App_ID" Type="System.Int32">
          <Value>
            <int>-1</int>
          </Value>
        </Field>
        <Field Name="AppDate Type="System.DateTime">
          <Value>
            <dateTime>2007-05-25T00:00:00</dateTime>
          </Value>
        </Field>
        <Field Name="AppFileName" Type="System.String">
          <Value>
            <string>TEST 03222007.daf</string>
          </Value>
        </Field>
        <Field Name="AppVersion" Type="System.String">
          <Value>
            <string>1.0.3.3</string>
          </Value>
        </Field>
        <Field Name="_ClassVersion" Type="System.String">
          <Value>
            <string>1.0.0.0</string>
          </Value>
        </Field>
      </Fields>
    </Object>
  </AppSettings>

As you can see, its corrupted, because AppDate doesn't gave second ".
I am getting exception when reader.MoveToContent (after i read App_ID)
this all are in try..catch section...
and after that i am receiving smth like string fieldname == "AppDate
Type=";
I can't understand, how i can jump to AppFileName and skip corrupted
AppDate ?
so, how in catch section i can jump to next element ? (during
application's work, i dont know what is the name of next element)

Thanks

On Jun 5, 4:29 pm, Alex <a@douweb.org> wrote:

<snip>

> As you can see, its corrupted, because AppDate doesn't gave second ".

Right. It's an invalid XML file. I would strongly recommend that you
completely reject such files - trying to cope with broken files like
this is a real pain, and I don't know whether XmlReader (or any of the
other .NET XML types) support it.

Jon

-----------------------------------------------Reply-----------------------------------------------

> <snip>

> > As you can see, its corrupted, because AppDate doesn't gave second ".

> Right. It's an invalid XML file. I would strongly recommend that you
> completely reject such files - trying to cope with broken files like
> this is a real pain, and I don't know whether XmlReader (or any of the
> other .NET XML types) support it.

> Jon

Sure, i made file to be invalid manually, because i want to add some
improvements to my code, to avoid or solve this problem.

This is just fragment, now file size is 100KB and will be bigger
later.
Also, this file is like XmlSerialization of some classes i want to be
serialized.
So, the data which stored are big, and i really don't want user to
fill out all again.

So, if there is some solution about this, i will be glad to here.

-----------------------------------------------Reply-----------------------------------------------

XML has strict rules, the sample markup is not well-formed and therefore
the XML parser will not parse it but throw an exception. There is no way
to simply skip markup that is not well-formed. So you will not be able
to parse that markup successfully with XmlReader. You have to fix
whatever application generates the markup to produce well-formed XML.
With .NET using XmlWriter can help.

--

        Martin Honnen --- MVP XML
        http://JavaScript.FAQTs.com/

-----------------------------------------------Reply-----------------------------------------------

ok :(

is it possible to read in some another way, but a bit automatically,
and skip problem like that as i need ?
i mean not to use XmlReader, because it can't jump, but use smth else.
But for sure i dont want to write to xmlfile all-all fields manually
(this is just serialization of classes' fields i need).

but, if exception appears - skip field

?

-----------------------------------------------Reply-----------------------------------------------

On Tue, 05 Jun 2007 09:06:51 -0700, Alex <a@mail.ru> wrote:
> is it possible to read in some another way, but a bit automatically,
> and skip problem like that as i need ?
> i mean not to use XmlReader, because it can't jump, but use smth else.
> But for sure i dont want to write to xmlfile all-all fields manually
> (this is just serialization of classes' fields i need).

> but, if exception appears - skip field

No.  The general-purpose XML classes have no practical way to make  
intelligent decisions about where to start looking again for valid data.  
The only way to do what you want, even in some limited way, is to do  
everything yourself.

You as a person can look at the file visually and tell where valid data  
again starts, but that's because you have a LOT of "meta-information"  
about the XML and can recognize things that would never appear inside  
quoted text, but which are definitely part of the XML structure.  If you  
want your code to handle that, you will need to write it yourself, taking  
advantage of this knowledge.  If you do this, you will likely want to  
implement your entire XML reading code from scratch, so that when you run  
across something that doesn't make sense you can recover immediately based  
on where you've already read.

Personally, I would not bother.  As has been pointed out, the XML is  
simply invalid.  It's not going to be invalid unless some user hand-edits  
the file and starts mucking it up, and once you assume users may do that,  
it is impossible to ensure that you can in any sensible way recover from  
their doings.  You should definitely make sure that bad data doesn't bring  
your application crashing down, but it's not reasonable for a user to  
expect you to come up with some graceful way to reconstruct the invalid  
data in the general case, and so you should probably not waste a lot of  
time implementing code that does so.

Pete

-----------------------------------------------Reply-----------------------------------------------

Alex <a@mail.ru> wrote:
> > Right. It's an invalid XML file. I would strongly recommend that you
> > completely reject such files - trying to cope with broken files like
> > this is a real pain, and I don't know whether XmlReader (or any of the
> > other .NET XML types) support it.

> Sure, i made file to be invalid manually, because i want to add some
> improvements to my code, to avoid or solve this problem.

Is there any real reason why you need to handle an invalid XML file?
Most XML-based applications don't, as far as I'm aware. (Obviously XML
editors have to, but other than that...)

> This is just fragment, now file size is 100KB and will be bigger
> later.
> Also, this file is like XmlSerialization of some classes i want to be
> serialized.
> So, the data which stored are big, and i really don't want user to
> fill out all again.

> So, if there is some solution about this, i will be glad to here.

Why would the user have to fill anything out again? Why are you
expecting invalid XML?

--
Jon Skeet - <s@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

-----------------------------------------------Reply-----------------------------------------------

On Tue, 05 Jun 2007 10:47:27 -0700, Jon Skeet [C# MVP] <s@pobox.com>  
wrote:

> Is there any real reason why you need to handle an invalid XML file?
> Most XML-based applications don't, as far as I'm aware. (Obviously XML
> editors have to, but other than that...)

Well, and in fact I'm not sure that XML editors have to either.  As an  
imprecise but similar example, consider Visual Studio's code editor.  If  
you miss some sort of closing quote, comment closure, closing bracket,  
etc. the editor makes no attempt to recover from that.  It just shows you  
that there's a problem, treating the file as "valid" all the way up to the  
point where it knows for sure it's not valid (which is often the end of  
the file).

I can imagine someone writing an XML editor that goes to a lot of effort  
to try to detect and correct invalid XML, just as the OP wants to do in  
his program.  But it would surprise me if this is the norm, even when  
looking only at XML editors.

Pete

-----------------------------------------------Reply-----------------------------------------------

Peter Duniho <NpOeStPe@nnowslpianmk.com> wrote:
> > Is there any real reason why you need to handle an invalid XML file?
> > Most XML-based applications don't, as far as I'm aware. (Obviously XML
> > editors have to, but other than that...)

> Well, and in fact I'm not sure that XML editors have to either.  As an  
> imprecise but similar example, consider Visual Studio's code editor.  If  
> you miss some sort of closing quote, comment closure, closing bracket,  
> etc. the editor makes no attempt to recover from that.  It just shows you  
> that there's a problem, treating the file as "valid" all the way up to the  
> point where it knows for sure it's not valid (which is often the end of  
> the file).

It depends on quite how broken you make it.

If you miss off a semi-colon or have a random extra character like "+"
between statements, it's still syntactically invalid, but it recovers
quickly. An extra closing brace certainly confuses it though, yes.

> I can imagine someone writing an XML editor that goes to a lot of effort  
> to try to detect and correct invalid XML, just as the OP wants to do in  
> his program.  But it would surprise me if this is the norm, even when  
> looking only at XML editors.

Maybe it's just the ones I've used - and that's only from memory,
admittedly...

--
Jon Skeet - <s@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

-----------------------------------------------Reply-----------------------------------------------

On Tue, 05 Jun 2007 12:47:18 -0700, Jon Skeet [C# MVP] <s@pobox.com>  
wrote:

> It depends on quite how broken you make it.

> If you miss off a semi-colon or have a random extra character like "+"
> between statements, it's still syntactically invalid, but it recovers
> quickly. An extra closing brace certainly confuses it though, yes.

I suppose "recovers" is in the eye of the beholder.  What I see when one  
leaves off a semi-colon is that the end of the statement where the  
semi-colon was expected is flagged.  However, the only reason it can do  
that is that it is apparent upon seeing the first thing that doesn't make  
sense in that statement (ie, the next statement) where the error is.

But I don't really see that the editor has "recovered".  It is simply  
pointing out the first place it has detected a problem.  Just as the  
compiler won't compile a file even though it could usually correctly infer  
the correct location of the semicolon, it's not really like the VS editor  
has judged the remainder of the file correct and accurate.  In fact, it  
gives up on a variety of automatic stuff once it's stumbled (for example,  
I've lost count of the number of times that I don't get Intellisense  
feedback because of a localized compiler-type error in my source code).

Compilers, code editors, and XML editors alike can all make inferences  
about what the input data *should* look like, and try to produce correct  
behavior based on those inferences.  But my experience (granted, limited  
in the case of XML editors, but not so limited in other areas) is that if  
the input data does not comply exactly with what's expected, the user is  
simply told "this data is bad...I'm not going any further until you fix  
it".

Pete

-----------------------------------------------Reply-----------------------------------------------

It recovers to the extent that it's able to find errors later on, and
you can still use Intellisense etc.

For example, take this code:

using System;

public class Test
{
    static void Main()
    {
        int x = 5
        int y = 10;

        Console.WriteLine("Hello");
    }

}

If you type another "Console." underneath the current call to
Console.WriteLine, VS (2005 at least) offers Intellisense.

It's hard for me to judge exactly how well VS does as opposed to
resharper, but if you change Console.WriteLine to Console.Foo, I
certainly get some feedback that Foo isn't a valid member of Console.

> Just as the  
> compiler won't compile a file even though it could usually correctly infer  
> the correct location of the semicolon, it's not really like the VS editor  
> has judged the remainder of the file correct and accurate.  In fact, it  
> gives up on a variety of automatic stuff once it's stumbled (for example,  
> I've lost count of the number of times that I don't get Intellisense  
> feedback because of a localized compiler-type error in my source code).

You should try Eclipse some time - it will compile (in some cases, at
least) syntactically invalid code, generating code which throws an
exception when it's got to somewhere that the compilation broke. Not
terribly handy, but quite cute.

> Compilers, code editors, and XML editors alike can all make inferences  
> about what the input data *should* look like, and try to produce correct  
> behavior based on those inferences.  But my experience (granted, limited  
> in the case of XML editors, but not so limited in other areas) is that if  
> the input data does not comply exactly with what's expected, the user is  
> simply told "this data is bad...I'm not going any further until you fix  
> it".

Certainly things are more limited after an error, but there's often
still *some* functionality available. If I find the time I might see
what a few XML editors do past an error - whether they still
automatically close tags, find further errors etc. Certainly the VS
2005 XML editor was able to automatically close the "blech" tag in the
below XML, despite the previous error:

<?xml version="1.0" encoding="utf-8" ?>
<foo>
  <bar>
    <baz text="Hello otherText="There"/>

    <blech></blech>
  </bar>
</foo>

Also if you change </blech> to </blech2> it notices that as a second
error.

--
Jon Skeet - <s@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

-----------------------------------------------Reply-----------------------------------------------

On Tue, 05 Jun 2007 13:40:01 -0700, Jon Skeet [C# MVP] <s@pobox.com>  
wrote:

> [...]
> You should try Eclipse some time - it will compile (in some cases, at
> least) syntactically invalid code, generating code which throws an
> exception when it's got to somewhere that the compilation broke. Not
> terribly handy, but quite cute.

Well, sure.  I can appreciate "cute".  :)  But as you say, not terribly  
handy.  Likewise, just how handy would it be to just skip over an invalid  
section of XML, when you have no idea what the overall effect of doing so  
would be?  Just because the remaining XML can be parsed, that doesn't mean  
that it can be *used* without the part that was erroneous.

> [...] Certainly the VS
> 2005 XML editor was able to automatically close the "blech" tag in the
> below XML, despite the previous error:

I certainly agree that it *can* be done.  I just am not convinced it makes  
sense to bother writing the code to do so.  It does seem to me that in an  
editor, where the user is actively modifying the data, it makes more sense  
to put the effort in, but even there I wouldn't necessarily insist on it  
(even in VS there are limits to what it can recover from, and frankly it  
only handles the simplest situations).  I expect it's something you see in  
editors that are intended to be feature-laden, considered "heavy-duty"  
(that's certainly how I'd describe VS).

In a situation where the data is static though, I don't see the use in  
recovering.  You never know when the data that was in error was critical  
to the use of the larger XML document.  Just because you can successfully  
parse the rest of the document doesn't mean you should, just as just  
because a compiler could make an assumption about where to insert a  
missing semi-colon doesn't mean it should.

Pete

-----------------------------------------------Reply-----------------------------------------------

Peter Duniho <NpOeStPe@nnowslpianmk.com> wrote:
> > [...]
> > You should try Eclipse some time - it will compile (in some cases, at
> > least) syntactically invalid code, generating code which throws an
> > exception when it's got to somewhere that the compilation broke. Not
> > terribly handy, but quite cute.

> Well, sure.  I can appreciate "cute".  :)  But as you say, not terribly  
> handy.  Likewise, just how handy would it be to just skip over an invalid  
> section of XML, when you have no idea what the overall effect of doing so  
> would be?  Just because the remaining XML can be parsed, that doesn't mean  
> that it can be *used* without the part that was erroneous.

On the other hand, if I open an invalid XML file it's nice to know
whether there's just one error or whether the whole thing is pooched.

> > [...] Certainly the VS
> > 2005 XML editor was able to automatically close the "blech" tag in the
> > below XML, despite the previous error:

> I certainly agree that it *can* be done.  I just am not convinced it makes  
> sense to bother writing the code to do so.  It does seem to me that in an  
> editor, where the user is actively modifying the data, it makes more sense  
> to put the effort in, but even there I wouldn't necessarily insist on it  
> (even in VS there are limits to what it can recover from, and frankly it  
> only handles the simplest situations).  I expect it's something you see in  
> editors that are intended to be feature-laden, considered "heavy-duty"  
> (that's certainly how I'd describe VS).

Agreed in the last bit - and I'm *certainly* not suggesting that the OP
should try to recover.

> In a situation where the data is static though, I don't see the use in  
> recovering.  You never know when the data that was in error was critical  
> to the use of the larger XML document.  Just because you can successfully  
> parse the rest of the document doesn't mean you should, just as just  
> because a compiler could make an assumption about where to insert a  
> missing semi-colon doesn't mean it should.

Oh absolutely. I was only talking about editors, where it can be handy
to be able to show more than the first error.

Even with static document reading, it *may* be useful to bomb out with
an error which has a good stab at working out where all the error parts
are, rather than just the first one. That's not the same as really
trying to recover though.

--
Jon Skeet - <s@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

-----------------------------------------------Reply-----------------------------------------------

On Tue, 05 Jun 2007 15:00:04 -0700, Jon Skeet [C# MVP] <s@pobox.com>  
wrote:

> On the other hand, if I open an invalid XML file it's nice to know
> whether there's just one error or whether the whole thing is pooched.

Sure, I agree.  If you're using an editor, that would be a nice feature to  
have.  But that still doesn't mean it would be a ubiquitous feature in all  
XML editors (though I can see how it might appear in advanced editors).

> [...]
> Even with static document reading, it *may* be useful to bomb out with
> an error which has a good stab at working out where all the error parts
> are, rather than just the first one. That's not the same as really
> trying to recover though.

Nope.  :)

If I wanted to provide feedback as to a place to look for the error, I  
would inform the user where the last place in the file I had valid data.  
That's not really the same as trying to do anything fancy with figuring  
out the erroneous part though.  All it requires is keep track of how far  
into the file you got before you failed to generate new valid data.

It's the parsing bad data that I think is normally going to be outside the  
scope of typical software.  Sorry if I seem to have taken this thread off  
on a tangent.  I just got set off by the statement that an XML editor  
*has* to handle errors.  An XML editor *could* in fact just display the  
text beyond the error and tell the user "I'm not going to help you with  
this until you fix it".  :)

Pete

Add to del.icio.us | Digg this | Stumble it | Powered by Megasolutions Inc