• 0

Tricks and Tips for storing html in xml


Question

Hi everyone. I'm using C# to create xml files of data and I've never had a problem doing what I've been doing until I incorporated a new type of data to include, which is an html description block. Once I did this all hell broke loose.

Is there a guide somewhere to like a set of tricks to make sure your html is xml safe so you can do something like <MyElement Description="my html description here" />. B/c I keep running into road blocks.

Thanks :)

9 answers to this question

Recommended Posts

  • 0

You could surround the HTML string in a CDATA string, i.e. <![CDATA[ my_html_description ]]> However if the html contains a CDATA block then it will be messed up.

You could also scan through your HTML replacing special XML characters with their equivalent XML escape code:

" to "

< to <

> to >

& to &

Obviously you'll need to reverse this process when you remove the HTML.

  • 0

^ Base64 encoding is probably a little more complicated than simply having the description as its own element:

&lt;MyElement&gt;
  &lt;description&gt;
    &lt;p&gt;Here is a &lt;strong&gt;formatted&lt;/strong&gt; description&lt;/p&gt;
  &lt;/description&gt;
&lt;/MyElement&gt;

If you enforce an xhtml syntax on your descriptions, that way you can ensure that the html description is semantically strong. There are downisdes to that, in that if the html is not well formed, the whole xml stream will encounter errors whilst being parsed. You need to decide your risks for each design.

  • 0

Thanks for all your help so far. I don't know why base64 never occurred to me though I just added 2 static functions in my program that change the code although it might be more efficient to base64 encode it when coming to large descriptions come to think of it.

Thanks to all of you you've all given me something to think about, either way you all provided answers. I would do it the way you said Antaris but then since the description is put in manually by the end-user I have no guarantee that it's well-formed. I tested it that

way and ran into hundreds of parsing errors.

  • 0

The best solution of course is to use a tag soup parser to construct a DOM, then parse it back out again (valid HTML will be 1:1, invalid HTML will come out valid)

Failing that, storing it as CDATA would be the next best (assuming you can't enforce XHTML, which is quite possible)

  • 0
  On 10/03/2010 at 17:31, The_Decryptor said:

The best solution of course is to use a tag soup parser to construct a DOM, then parse it back out again (valid HTML will be 1:1, invalid HTML will come out valid)

Failing that, storing it as CDATA would be the next best (assuming you can't enforce XHTML, which is quite possible)

This. ;)

  • 0
  On 10/03/2010 at 17:17, Rob said:

Part of XML's charm is that it is human-readable, to an extent. Base64 encoding it removes that.

He didn't say that it needed "charm." Why go though the pain and pitfalls of a tag parser if you don't need it? If all he's doing is storing the HTML in the XML attribute, I still think encoding is the best idea because it's the simplest and would reduce to zero the chance of screwing up his XML. If he needs validation of the HTML, he can plug that in after the decoding step.
This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • A summary of the article shouldn't be an issue in theory, as the information is all based within the document itself. The problem with a lot of AI models is that they are trained off both Wikipedia as a primary source and other secondary websites which have less reliable information. As an encyclopaedia it is no doubt going to have bias in one way or another, all encyclopaedias do. The difference is that there is a process for a lot of this information to be figured out whether or not it is an opinion, or the source of that information, and to be aware of the possible bias in the first place, even your own bias as a reader. The way many people understand languages these days is quite limited, particularly in certain areas of the world. This is obvious in many types of "journalism" where some articles are thinly disguised opinion pieces, Where language is carefully curated to push a point of view despite it looking like fact. This is partly why a lot of the Western world is now so divided.
    • I paid for a year subscription, starting in December, I think. I like the user-friendliness and assets like royalty-free music and sounds. But they need to fix some of the horrific bugs before they start adding more features. For starters, you should be able to drag stuff around the timeline without it completely obliterating transitions, synchronization timing, and clip lengths. If you already set up footage and audio how you want, and try to insert more by sliding content to the right, you'll have to clean it all up again. The scroll wheel constantly stops working, and you have to minimize the window and restore it to get it back. When selecting text at the top right in special text tools, it glitches and goes to "copy" mode with a weird pop-up on top of your cursor and you can't type. These are more important than AI tools and new features. Microsoft, start by making your product less painful to use.
    • 🙄 Whats the need to launder? Do you have any reputable sources?
    • Thanks, I'll download it and see how it goes. Gonna be tough. I've used Nova Launcher for around 10+ years I think.
    • Wikipedia has become hot molasses with mostly filth masquerading as truth. Most of it is the imagination of few writers who think of themselves as above god. The narrative setup is mind boggling.
  • Recent Achievements

    • Apprentice
      Adrian Williams went up a rank
      Apprentice
    • Reacting Well
      BashOrgRu earned a badge
      Reacting Well
    • Collaborator
      CHUNWEI earned a badge
      Collaborator
    • Apprentice
      Cole Multipass went up a rank
      Apprentice
    • Posting Machine
      David Uzondu earned a badge
      Posting Machine
  • Popular Contributors

    1. 1
      +primortal
      519
    2. 2
      ATLien_0
      263
    3. 3
      +Edouard
      191
    4. 4
      +FloatingFatMan
      176
    5. 5
      snowy owl
      133
  • Tell a friend

    Love Neowin? Tell a friend!