• 0

Tricks and Tips for storing html in xml


Question

Hi everyone. I'm using C# to create xml files of data and I've never had a problem doing what I've been doing until I incorporated a new type of data to include, which is an html description block. Once I did this all hell broke loose.

Is there a guide somewhere to like a set of tricks to make sure your html is xml safe so you can do something like <MyElement Description="my html description here" />. B/c I keep running into road blocks.

Thanks :)

9 answers to this question

Recommended Posts

  • 0

You could surround the HTML string in a CDATA string, i.e. <![CDATA[ my_html_description ]]> However if the html contains a CDATA block then it will be messed up.

You could also scan through your HTML replacing special XML characters with their equivalent XML escape code:

" to "

< to <

> to >

& to &

Obviously you'll need to reverse this process when you remove the HTML.

  • 0

^ Base64 encoding is probably a little more complicated than simply having the description as its own element:

&lt;MyElement&gt;
  &lt;description&gt;
    &lt;p&gt;Here is a &lt;strong&gt;formatted&lt;/strong&gt; description&lt;/p&gt;
  &lt;/description&gt;
&lt;/MyElement&gt;

If you enforce an xhtml syntax on your descriptions, that way you can ensure that the html description is semantically strong. There are downisdes to that, in that if the html is not well formed, the whole xml stream will encounter errors whilst being parsed. You need to decide your risks for each design.

  • 0

Thanks for all your help so far. I don't know why base64 never occurred to me though I just added 2 static functions in my program that change the code although it might be more efficient to base64 encode it when coming to large descriptions come to think of it.

Thanks to all of you you've all given me something to think about, either way you all provided answers. I would do it the way you said Antaris but then since the description is put in manually by the end-user I have no guarantee that it's well-formed. I tested it that

way and ran into hundreds of parsing errors.

  • 0

The best solution of course is to use a tag soup parser to construct a DOM, then parse it back out again (valid HTML will be 1:1, invalid HTML will come out valid)

Failing that, storing it as CDATA would be the next best (assuming you can't enforce XHTML, which is quite possible)

  • 0
  On 10/03/2010 at 17:31, The_Decryptor said:

The best solution of course is to use a tag soup parser to construct a DOM, then parse it back out again (valid HTML will be 1:1, invalid HTML will come out valid)

Failing that, storing it as CDATA would be the next best (assuming you can't enforce XHTML, which is quite possible)

This. ;)

  • 0
  On 10/03/2010 at 17:17, Rob said:

Part of XML's charm is that it is human-readable, to an extent. Base64 encoding it removes that.

He didn't say that it needed "charm." Why go though the pain and pitfalls of a tag parser if you don't need it? If all he's doing is storing the HTML in the XML attribute, I still think encoding is the best idea because it's the simplest and would reduce to zero the chance of screwing up his XML. If he needs validation of the HTML, he can plug that in after the decoding step.
This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • Lenovo announces the most powerful ARM-based Chromebook with an OLED display by Pradeep Viswanathan Lenovo today announced the Lenovo Chromebook Plus 14, its most powerful ARM-based Chromebook. This Chromebook is powered by the MediaTek Kompanio Ultra 910 processor, which features an NPU that can deliver up to 50 TOPS of AI performance. The Chromebook Plus 14 comes with a 14-inch OLED display, with optional touchscreen models. Customers can customize this laptop with up to 16 GB of RAM based on their performance needs. Thanks to the power-efficient SoC, Lenovo claims that this Chromebook can last up to 17 hours on a single charge, the longest battery life on a Chromebook Plus. The Lenovo Chromebook Plus 14 is now available for purchase in the US, starting at $649 from Best Buy and Lenovo’s official website. To make the purchase more valuable, Google is offering a one-year subscription to its Google AI Pro plan (a $240 value) with every Chromebook Plus. To take advantage of the powerful on-device AI capabilities of the Lenovo Chromebook Plus 14, Google is releasing the following two exclusive AI features: Smart grouping: Users can use AI to organize open Chrome tabs and documents into logical groups. Image editing in the Gallery app: The Gallery app can be used to remove backgrounds, make stickers, and more. Apart from the above exclusive features, Google is also releasing the following updates to all Chromebook Plus models starting today: Select to search & Text capture: A Google Lens-like capability is now available on Chromebooks. Users can just long-press the on-screen launcher button or use the screenshot tool to select anything on their screen for instant Google Search results. Users can also use the new "Text capture" to automatically extract text from images and send it to Google Workspace apps or calendars as editable text. The Quick Insert (QI) key, which was introduced earlier this year, now allows users to easily generate images using AI in addition to its existing capabilities. The new "simplify" feature within "Help me read" will help students convert complex language into more understandable content. Google’s popular NotebookLM research and note-taking app is now pre-installed on every Chromebook Plus. Netflix’s popular Squid Game: Unleashed game is coming to Chromebooks as an optimized desktop app with keyboard and mouse controls and some exclusive in-game items, including skins. With its high‑performance, premium hardware and advanced AI features, the new Lenovo Chromebook Plus 14 is trying to position itself as a strong contender against Windows laptops in the premium segment.
    • Sounds ok. More competition is better for the consumer/user.
    • To be fair, GOG Galaxy, which has been a launcher for multiple platforms from day 1 still works really well. It just runs whichever launcher process a game needs in the background, and kills it when you exit the game. Of course it only works for launching games, not the storefronts, chat, discussions etc - but as a way of organising your games library, seeing which titles you own on multiple services, it works really well. Sounds like MS will be doing exactly the same thing.
    • It makes more sense when you realize this is for the handhelds, and the Xbox app is likely what's going to control/activate the "Xbox full-screen experience" that disables unneeded desktop services and such.
    • I'm not against the idea...it just isn't going to work. We are already dealing with multiple launcher issues, between game stores like Steam and games that require their own launcher. There is no way adding a 3rd layer makes it better for anyone. Now IF game studios universally moved away from their own proprietary launchers in favor of a universal launcher like this, that might be cool, but even if the launcher is fully capable of providing all the features they want (which I highly doubt), then I still doubt companies would choose it over their software...we all know companies want to run as much software as possible on our computers, and something like a launch that has an excuse the run in the background for reasons, even better.
  • Recent Achievements

    • Week One Done
      fredss earned a badge
      Week One Done
    • Dedicated
      fabioc earned a badge
      Dedicated
    • One Month Later
      GoForma earned a badge
      One Month Later
    • Week One Done
      GoForma earned a badge
      Week One Done
    • Week One Done
      ravenmanNE earned a badge
      Week One Done
  • Popular Contributors

    1. 1
      +primortal
      650
    2. 2
      Michael Scrip
      226
    3. 3
      ATLien_0
      219
    4. 4
      +FloatingFatMan
      144
    5. 5
      Xenon
      137
  • Tell a friend

    Love Neowin? Tell a friend!