• 0

Tricks and Tips for storing html in xml


Question

Hi everyone. I'm using C# to create xml files of data and I've never had a problem doing what I've been doing until I incorporated a new type of data to include, which is an html description block. Once I did this all hell broke loose.

Is there a guide somewhere to like a set of tricks to make sure your html is xml safe so you can do something like <MyElement Description="my html description here" />. B/c I keep running into road blocks.

Thanks :)

9 answers to this question

Recommended Posts

  • 0

You could surround the HTML string in a CDATA string, i.e. <![CDATA[ my_html_description ]]> However if the html contains a CDATA block then it will be messed up.

You could also scan through your HTML replacing special XML characters with their equivalent XML escape code:

" to "

< to <

> to >

& to &

Obviously you'll need to reverse this process when you remove the HTML.

  • 0

^ Base64 encoding is probably a little more complicated than simply having the description as its own element:

&lt;MyElement&gt;
  &lt;description&gt;
    &lt;p&gt;Here is a &lt;strong&gt;formatted&lt;/strong&gt; description&lt;/p&gt;
  &lt;/description&gt;
&lt;/MyElement&gt;

If you enforce an xhtml syntax on your descriptions, that way you can ensure that the html description is semantically strong. There are downisdes to that, in that if the html is not well formed, the whole xml stream will encounter errors whilst being parsed. You need to decide your risks for each design.

  • 0

Thanks for all your help so far. I don't know why base64 never occurred to me though I just added 2 static functions in my program that change the code although it might be more efficient to base64 encode it when coming to large descriptions come to think of it.

Thanks to all of you you've all given me something to think about, either way you all provided answers. I would do it the way you said Antaris but then since the description is put in manually by the end-user I have no guarantee that it's well-formed. I tested it that

way and ran into hundreds of parsing errors.

  • 0

The best solution of course is to use a tag soup parser to construct a DOM, then parse it back out again (valid HTML will be 1:1, invalid HTML will come out valid)

Failing that, storing it as CDATA would be the next best (assuming you can't enforce XHTML, which is quite possible)

  • 0
  On 10/03/2010 at 17:31, The_Decryptor said:

The best solution of course is to use a tag soup parser to construct a DOM, then parse it back out again (valid HTML will be 1:1, invalid HTML will come out valid)

Failing that, storing it as CDATA would be the next best (assuming you can't enforce XHTML, which is quite possible)

This. ;)

  • 0
  On 10/03/2010 at 17:17, Rob said:

Part of XML's charm is that it is human-readable, to an extent. Base64 encoding it removes that.

He didn't say that it needed "charm." Why go though the pain and pitfalls of a tag parser if you don't need it? If all he's doing is storing the HTML in the XML attribute, I still think encoding is the best idea because it's the simplest and would reduce to zero the chance of screwing up his XML. If he needs validation of the HTML, he can plug that in after the decoding step.
This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • Now I may not quite understand this, so someone tell me if I'm off the mark here, but does this mean they'll be potentially removing drivers for now unsupported systems, such as old processors and chipsets? In the past 15 years, Windows has been amazing at just installing on any device, and often having zero, or just a few unessential drivers missing on first install. It would be a shame for that experience to go, though I understand the reasoning, or at least their financial reasoning for it!
    • Microsoft is removing legacy drivers from Windows Update by Usama Jawad Last month, we learned that Microsoft is making major changes to the development of hardware drivers in Windows. This included the retirement of Windows Metadata and Internet Services (WMIS), along with the process for pre-production driver signing. Now, the Redmond tech firm has informed partners that it will be getting rid of old drivers in Windows Update. In what is being described as a "strategic" move to improve the security posture and compatibility of Windows, Microsoft has announced that it will be performing a cleanup of legacy drivers that are still being delivered through Windows Update. Right now, the first phase only targets drivers that already have modern replacements present in Windows Update. As a part of its cleanup process, Microsoft will expire legacy drivers so that it is not offered to any system. This expiration involves removing audience segments in the Hardware Development Center. Partners can still republish a driver that was deemed as legacy by Microsoft, but the firm may require a justification. Once the Redmond tech giant completes its first phase of this cleanup, it will give partners a six-month grace period to share any concerns. However, if no concerns are brought forward, the drivers will be permanently eradicated from Windows Update. Microsoft has emphasized that this will be a regular activity moving forward and while the current phase only targets legacy drivers with newer replacements, the next phases may expand the scope of this cleanup and remove other drivers too. That said, each time the company takes a step in this direction, it will inform partners so that there is transparency between both parties. Microsoft believes that this move will help improve the security posture of Windows and ensure that an optimized set of drivers is offered to end-users. The firm has asked partners to review their drivers in Hardware Program so that there are no unexpected surprises during this cleanup process.
    • No idea, but I had a client the other week that lost the entire drive to it. I suggested relying on the Samsung T7's instead. The Sandisk Extreme's had reliability issues too.
    • I use it every day so personally yes I need it, or rather I want it. I use OpenShell though, not the garbage modern Start Menu. I just counted and at the moment I have a total of 92 program shortcuts organized into six folders almost exactly the way I did back in Windows 95. I can get to any program I want to run very quickly. I never use Search to find or run programs.
    • I do miss the Apps view from Windows 8.1 Update.
  • Recent Achievements

    • One Month Later
      KynanSEIT earned a badge
      One Month Later
    • One Month Later
      gowtham07 earned a badge
      One Month Later
    • Collaborator
      lethalman went up a rank
      Collaborator
    • Week One Done
      Wayne Robinson earned a badge
      Week One Done
    • One Month Later
      Karan Khanna earned a badge
      One Month Later
  • Popular Contributors

    1. 1
      +primortal
      685
    2. 2
      ATLien_0
      274
    3. 3
      Michael Scrip
      220
    4. 4
      +FloatingFatMan
      171
    5. 5
      Steven P.
      160
  • Tell a friend

    Love Neowin? Tell a friend!