• 0

Tricks and Tips for storing html in xml


Question

Hi everyone. I'm using C# to create xml files of data and I've never had a problem doing what I've been doing until I incorporated a new type of data to include, which is an html description block. Once I did this all hell broke loose.

Is there a guide somewhere to like a set of tricks to make sure your html is xml safe so you can do something like <MyElement Description="my html description here" />. B/c I keep running into road blocks.

Thanks :)

9 answers to this question

Recommended Posts

  • 0

You could surround the HTML string in a CDATA string, i.e. <![CDATA[ my_html_description ]]> However if the html contains a CDATA block then it will be messed up.

You could also scan through your HTML replacing special XML characters with their equivalent XML escape code:

" to "

< to <

> to >

& to &

Obviously you'll need to reverse this process when you remove the HTML.

  • 0

^ Base64 encoding is probably a little more complicated than simply having the description as its own element:

&lt;MyElement&gt;
  &lt;description&gt;
    &lt;p&gt;Here is a &lt;strong&gt;formatted&lt;/strong&gt; description&lt;/p&gt;
  &lt;/description&gt;
&lt;/MyElement&gt;

If you enforce an xhtml syntax on your descriptions, that way you can ensure that the html description is semantically strong. There are downisdes to that, in that if the html is not well formed, the whole xml stream will encounter errors whilst being parsed. You need to decide your risks for each design.

  • 0

Thanks for all your help so far. I don't know why base64 never occurred to me though I just added 2 static functions in my program that change the code although it might be more efficient to base64 encode it when coming to large descriptions come to think of it.

Thanks to all of you you've all given me something to think about, either way you all provided answers. I would do it the way you said Antaris but then since the description is put in manually by the end-user I have no guarantee that it's well-formed. I tested it that

way and ran into hundreds of parsing errors.

  • 0

The best solution of course is to use a tag soup parser to construct a DOM, then parse it back out again (valid HTML will be 1:1, invalid HTML will come out valid)

Failing that, storing it as CDATA would be the next best (assuming you can't enforce XHTML, which is quite possible)

  • 0
  On 10/03/2010 at 17:31, The_Decryptor said:

The best solution of course is to use a tag soup parser to construct a DOM, then parse it back out again (valid HTML will be 1:1, invalid HTML will come out valid)

Failing that, storing it as CDATA would be the next best (assuming you can't enforce XHTML, which is quite possible)

This. ;)

  • 0
  On 10/03/2010 at 17:17, Rob said:

Part of XML's charm is that it is human-readable, to an extent. Base64 encoding it removes that.

He didn't say that it needed "charm." Why go though the pain and pitfalls of a tag parser if you don't need it? If all he's doing is storing the HTML in the XML attribute, I still think encoding is the best idea because it's the simplest and would reduce to zero the chance of screwing up his XML. If he needs validation of the HTML, he can plug that in after the decoding step.
This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • I’m new to the Neowin forum and just wanted to introduce myself. I’ve always had an interest in tech — especially Windows, software updates, and gadgets. I’ve been reading posts here for a while, and it seemed like a good time to finally join in and be part of the conversation. Outside of tech, I enjoy music, movies, and keeping up with the latest news in the digital world. I’m hoping to learn from others, share a few tips of my own, and have some good discussions.
    • Hey! What a difference it makes to upgrade from an old WiFi 5 router to a new one. The Asus BE88U and BE92U are both very top picks. Asus gives you more control and better firmware support than most. Netgear’s new stuff is decent, but locked down. If you need more than 4 wired ports, I’d suggest a separate 2.5G or 10G switch. It makes life easier. Synology’s UI is clean too, but they don’t have a WiFi 7 router yet. If the price isn’t a big deal, go with Asus and pair it with a switch. Let us know what you think!  
    • Honestly, I think the long-term play here is for Microsoft to ditch the idea of a traditional console entirely and just turn Xbox into a full-on operating system. They (or anyone!) could release hardware like a Mac Mini or a typical console with built-in GPU and RAM, but instead of being locked into a console ecosystem, they run the Xbox OS. It makes total sense because it pushes Game Pass, Windows, and all their other software. The handheld angle is really interesting too. You’d basically have a portable PC. Hook up a keyboard and mouse, and suddenly you can edit videos or get some real work done while on the go. Something like a prebuilt Xbox PC would be more than just a gaming box. It could be a decent little PC that people might actually upgrade with each generation. And since it’s running a PC-based OS, you’d get all the usual perks like mod support, cheaper games across different stores, and no extra charges just to plug in a webcam or other standard accessories. Plus, if they let you install the OS on your own rig, then you’ve got full upgradability too. Best bit also being Microsoft wont even need to build them anyone can and when they do Microsoft just wins. The competition is kind of stuffed. Those maybe relying on SteamOS might be ok but will still have software support issues and no Game Pass. Sony and Nintendo can’t really offer the same kind of flexibility at all not sure how they will live on. They’d struggle to match something that works as a desktop OS, console, handheld, streaming box, and media hub, all with your game library ready to go, never needing to rebuy games. Console exclusives are the only way they can live I think, but if they ever get blocked by antitrust rulings down the line, it’s game over. I'm waiting to see the bloat that's still left on these and if they let you install on a regular old PC. Fingers crossed this could be the lightweight OS we have been asking for.
    • I take my Apple Watch off at night and put it on the charger when I go to bed. Then I wake up the next morning I put it back on.
  • Recent Achievements

    • Conversation Starter
      NeoToad777 earned a badge
      Conversation Starter
    • Week One Done
      VicByrd earned a badge
      Week One Done
    • Reacting Well
      NeoToad777 earned a badge
      Reacting Well
    • Reacting Well
      eric79XXL earned a badge
      Reacting Well
    • First Post
      brynmot earned a badge
      First Post
  • Popular Contributors

    1. 1
      +primortal
      480
    2. 2
      +FloatingFatMan
      277
    3. 3
      ATLien_0
      243
    4. 4
      snowy owl
      209
    5. 5
      Edouard
      187
  • Tell a friend

    Love Neowin? Tell a friend!