• 0

Entity References an DOM


Question

I have a set of 12.000 or so XML files that I need to parse. And as it turns out, they contain entity references lik ‐ and ©. DOM treats these as seperate xml elements, which is really an annoying factor, because they are really not.

Is there any way to make the C# DOM parser ignore these entity references ? Since they are making the parsin a pain in the a$$.

As an example, I have this:

<article>

...

...

<p>this is a paragraph and it is © to my company</p>

..

...

</article>

What happens is that it thinks that © is a new element. The contents of the <p>-element is "this is a paragraph and it is", and the conents of © is "to my compay". Whats worse is that for some unexplained reason </p> ends the non-existant <copy> tag...

Any way around this ?

.:BoeManE:.

Link to comment
Share on other sites

4 answers to this question

Recommended Posts

  • 0
Take a look at the ResolveEntity method of the XmlValidatingReader class. There is an example in MSDN that may help.

Would you have an example of how the XmlVallidatingReader ties in with the Xml Document ? Do yo have to use a XmlValidatingReader instead of an XmlTextReader ?

.:BoeManE:.

Link to comment
Share on other sites

  • 0
Would you have an example of how the XmlVallidatingReader ties in with the Xml Document ? Do yo have to use a XmlValidatingReader instead of an XmlTextReader ?

.:BoeManE:.

Just got it :)

XmlTextReader reader = new XmlTextReader(filename);
XmlValidatingReader varReader = new XmlValidatingReader(reader);

Thanks

Link to comment
Share on other sites

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.