• 0

Tricks and Tips for storing html in xml


Question

Hi everyone. I'm using C# to create xml files of data and I've never had a problem doing what I've been doing until I incorporated a new type of data to include, which is an html description block. Once I did this all hell broke loose.

Is there a guide somewhere to like a set of tricks to make sure your html is xml safe so you can do something like <MyElement Description="my html description here" />. B/c I keep running into road blocks.

Thanks :)

9 answers to this question

Recommended Posts

  • 0

You could surround the HTML string in a CDATA string, i.e. <![CDATA[ my_html_description ]]> However if the html contains a CDATA block then it will be messed up.

You could also scan through your HTML replacing special XML characters with their equivalent XML escape code:

" to "

< to <

> to >

& to &

Obviously you'll need to reverse this process when you remove the HTML.

  • 0

^ Base64 encoding is probably a little more complicated than simply having the description as its own element:

&lt;MyElement&gt;
  &lt;description&gt;
    &lt;p&gt;Here is a &lt;strong&gt;formatted&lt;/strong&gt; description&lt;/p&gt;
  &lt;/description&gt;
&lt;/MyElement&gt;

If you enforce an xhtml syntax on your descriptions, that way you can ensure that the html description is semantically strong. There are downisdes to that, in that if the html is not well formed, the whole xml stream will encounter errors whilst being parsed. You need to decide your risks for each design.

  • 0

Thanks for all your help so far. I don't know why base64 never occurred to me though I just added 2 static functions in my program that change the code although it might be more efficient to base64 encode it when coming to large descriptions come to think of it.

Thanks to all of you you've all given me something to think about, either way you all provided answers. I would do it the way you said Antaris but then since the description is put in manually by the end-user I have no guarantee that it's well-formed. I tested it that

way and ran into hundreds of parsing errors.

  • 0

The best solution of course is to use a tag soup parser to construct a DOM, then parse it back out again (valid HTML will be 1:1, invalid HTML will come out valid)

Failing that, storing it as CDATA would be the next best (assuming you can't enforce XHTML, which is quite possible)

  • 0
  On 10/03/2010 at 17:31, The_Decryptor said:

The best solution of course is to use a tag soup parser to construct a DOM, then parse it back out again (valid HTML will be 1:1, invalid HTML will come out valid)

Failing that, storing it as CDATA would be the next best (assuming you can't enforce XHTML, which is quite possible)

This. ;)

  • 0
  On 10/03/2010 at 17:17, Rob said:

Part of XML's charm is that it is human-readable, to an extent. Base64 encoding it removes that.

He didn't say that it needed "charm." Why go though the pain and pitfalls of a tag parser if you don't need it? If all he's doing is storing the HTML in the XML attribute, I still think encoding is the best idea because it's the simplest and would reduce to zero the chance of screwing up his XML. If he needs validation of the HTML, he can plug that in after the decoding step.
This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • I happen to try it today not knowing about the update and was happily surprised; it is great.
    • Hello, Hardware Support Applications are a special kind of Microsoft Store app and have to go through additional checks and certifications because they can communicate directly with their driver, which means that a vulnerability in one of them could allow an attacker access to kernel space memory through the HSA ←→ device driver interface.  In other words, a BYOVD (bring your won vulnerable driver) attack, but with the HSA being used as an extra step. Remember, the Microsoft Store is strategic to Microsoft's long-term goals: they see it as the means to get the same 30% of every application sale that Apple and Google get through their stores, which is why it has been a fixture of Windows since Windows 8 was introduced in 2012 despite a low adoption rate.  Microsoft cannot afford to have anyone get an app through their store which causes a security issue for their end users.  Even if the app was written by and uploaded to the Microsoft Store by a partner, it is Microsoft's name on the store, and they are the ones that will have reputational/brand damage if they allow something malicious into their store. Regards, Aryeh Goretsky  
    • This is more from my childhood, when nickelodeon just launched and had to license shows to have something to air. Left a big an impact, but probably more emotion positive / childhood thing. Europe got the follow up season's decade's latter with the animation studio that did Air Bender but never licenses for the US. I miss the day's of longer intro's. Nier (PS3) Intro is epic, and was very unexpected.  PS1 Xengears was also epic and an amazing game.  
    • Sayan Sen, do you think one day an image of the Windows Vista desktop or the wallpaper could be used in the primary image of an article? (When I think of CDs and DVDs I think of that release of Windows and of earlier releases; it is the one that debuted IMAPI 2.0 and other features.)
    • Big fan of EAC Here's a good non-default naming scheme I found on the web (can't take credit) File Name Scheme - %albumartist%\%year% - %albumtitle%\%tracknr2% %title% Various Artists Naming Scheme - Various Artists\%year% - %albumtitle%\%tracknr2% %title% Also, I need test but there is a new flac.exe binary & dll you can drop in the folder to upgrade flac support. I did this pre EAC 1.8. EAC 1.8 did upgrade it to 1.4.3. Flac 1.5.0 came out this year. https://ftp.osuosl.org/pub/xiph/releases/flac/ I don't know how much of a difference / impact will make.
  • Recent Achievements

    • Week One Done
      maimutza earned a badge
      Week One Done
    • Week One Done
      abortretryfail earned a badge
      Week One Done
    • First Post
      Mr bot earned a badge
      First Post
    • First Post
      Bkl211 earned a badge
      First Post
    • One Year In
      Mido gaber earned a badge
      One Year In
  • Popular Contributors

    1. 1
      +primortal
      485
    2. 2
      +FloatingFatMan
      263
    3. 3
      snowy owl
      240
    4. 4
      ATLien_0
      227
    5. 5
      Edouard
      188
  • Tell a friend

    Love Neowin? Tell a friend!