boemane Posted August 24, 2004 Share Posted August 24, 2004 I have a set of 12.000 or so XML files that I need to parse. And as it turns out, they contain entity references lik ‐ and ©. DOM treats these as seperate xml elements, which is really an annoying factor, because they are really not. Is there any way to make the C# DOM parser ignore these entity references ? Since they are making the parsin a pain in the a$$. As an example, I have this: <article> ... ... <p>this is a paragraph and it is © to my company</p> .. ... </article> What happens is that it thinks that © is a new element. The contents of the <p>-element is "this is a paragraph and it is", and the conents of © is "to my compay". Whats worse is that for some unexplained reason </p> ends the non-existant <copy> tag... Any way around this ? .:BoeManE:. Link to comment Share on other sites More sharing options...
0 azcodemonkey Posted August 24, 2004 Share Posted August 24, 2004 Take a look at the ResolveEntity method of the XmlValidatingReader class. There is an example in MSDN that may help. Link to comment Share on other sites More sharing options...
0 boemane Posted August 25, 2004 Author Share Posted August 25, 2004 Take a look at the ResolveEntity method of the XmlValidatingReader class. There is an example in MSDN that may help. Would you have an example of how the XmlVallidatingReader ties in with the Xml Document ? Do yo have to use a XmlValidatingReader instead of an XmlTextReader ? .:BoeManE:. Link to comment Share on other sites More sharing options...
0 boemane Posted August 25, 2004 Author Share Posted August 25, 2004 Would you have an example of how the XmlVallidatingReader ties in with the Xml Document ? Do yo have to use a XmlValidatingReader instead of an XmlTextReader ? .:BoeManE:. Just got it :) XmlTextReader reader = new XmlTextReader(filename); XmlValidatingReader varReader = new XmlValidatingReader(reader); Thanks Link to comment Share on other sites More sharing options...
0 azcodemonkey Posted August 25, 2004 Share Posted August 25, 2004 There is a ResolveEntity method on the XmlTextReader, too. Damn, I'm rusty with Xml! :) Link to comment Share on other sites More sharing options...
Question
boemane
I have a set of 12.000 or so XML files that I need to parse. And as it turns out, they contain entity references lik ‐ and ©. DOM treats these as seperate xml elements, which is really an annoying factor, because they are really not.
Is there any way to make the C# DOM parser ignore these entity references ? Since they are making the parsin a pain in the a$$.
As an example, I have this:
<article>
...
...
<p>this is a paragraph and it is © to my company</p>
..
...
</article>
What happens is that it thinks that © is a new element. The contents of the <p>-element is "this is a paragraph and it is", and the conents of © is "to my compay". Whats worse is that for some unexplained reason </p> ends the non-existant <copy> tag...
Any way around this ?
.:BoeManE:.
Link to comment
Share on other sites
4 answers to this question
Recommended Posts