Tricks and Tips for storing html in xml

1,022 · March 9, 2010

Hi everyone. I'm using C# to create xml files of data and I've never had a problem doing what I've been doing until I incorporated a new type of data to include, which is an html description block. Once I did this all hell broke loose.

Is there a guide somewhere to like a set of tricks to make sure your html is xml safe so you can do something like <MyElement Description="my html description here" />. B/c I keep running into road blocks.

Thanks :)

March 9, 2010

You could surround the HTML string in a CDATA string, i.e. <![CDATA[ my_html_description ]]> However if the html contains a CDATA block then it will be messed up.

You could also scan through your HTML replacing special XML characters with their equivalent XML escape code:

" to "

< to <

> to >

& to &

Obviously you'll need to reverse this process when you remove the HTML.

3,529 · March 10, 2010

Don't complicate things. Just Base64-encode it.

5,526 · March 10, 2010

^ Base64 encoding is probably a little more complicated than simply having the description as its own element:

&lt;MyElement&gt;
  &lt;description&gt;
    &lt;p&gt;Here is a &lt;strong&gt;formatted&lt;/strong&gt; description&lt;/p&gt;
  &lt;/description&gt;
&lt;/MyElement&gt;

If you enforce an xhtml syntax on your descriptions, that way you can ensure that the html description is semantically strong. There are downisdes to that, in that if the html is not well formed, the whole xml stream will encounter errors whilst being parsed. You need to decide your risks for each design.

1,022 · March 10, 2010

Thanks for all your help so far. I don't know why base64 never occurred to me though I just added 2 static functions in my program that change the code although it might be more efficient to base64 encode it when coming to large descriptions come to think of it.

Thanks to all of you you've all given me something to think about, either way you all provided answers. I would do it the way you said Antaris but then since the description is put in manually by the end-user I have no guarantee that it's well-formed. I tested it that

way and ran into hundreds of parsing errors.

7,644 · March 10, 2010

Part of XML's charm is that it is human-readable, to an extent. Base64 encoding it removes that.

15,340 · March 10, 2010

You could use Uri.EscapeDataString() and Uri.UnescapeDataString() to make sure the value you're using doesn't have any funky characters in it.

20,536 · March 10, 2010

The best solution of course is to use a tag soup parser to construct a DOM, then parse it back out again (valid HTML will be 1:1, invalid HTML will come out valid)

Failing that, storing it as CDATA would be the next best (assuming you can't enforce XHTML, which is quite possible)

March 11, 2010

On 10/03/2010 at 17:31, The_Decryptor said:

The best solution of course is to use a tag soup parser to construct a DOM, then parse it back out again (valid HTML will be 1:1, invalid HTML will come out valid)

Failing that, storing it as CDATA would be the next best (assuming you can't enforce XHTML, which is quite possible)

This. ;)

3,529 · March 11, 2010

On 10/03/2010 at 17:17, Rob said:

Part of XML's charm is that it is human-readable, to an extent. Base64 encoding it removes that.

He didn't say that it needed "charm." Why go though the pain and pitfalls of a tag parser if you don't need it? If all he's doing is storing the HTML in the XML attribute, I still think encoding is the best idea because it's the simplest and would reduce to zero the chance of screwing up his XML. If he needs validation of the HTML, he can plug that in after the decoding step.

Sign In

Tricks and Tips for storing html in xml

Question

sathenzar

Link to comment

Share on other sites

9 answers to this question

Recommended Posts

StudioFortress

Link to comment

Share on other sites

boogerjones

Link to comment

Share on other sites

Antaris Veteran

Link to comment

Share on other sites

sathenzar

Link to comment

Share on other sites

Rob Veteran

Link to comment

Share on other sites

Eric Veteran

Link to comment

Share on other sites

The_Decryptor Veteran

Link to comment

Share on other sites

AnthonySterling

Link to comment

Share on other sites

boogerjones

Link to comment

Share on other sites

Recently Browsing 0 members

Posts

Recent Achievements

Popular Contributors

Tell a friend

Choose your Ad Blocker