• 0

[Python] Extracting specific things from XML


Question

Hey guys,

Extremely frustrating problem, have wasted hours on this and can't find a solution.

I'm using dom/minidom through Python and can't use any other external Python modules.

Basically, we have the following example XML:


<a id="1">
<b>
<c>1</c>
<c>2</c>
<c>3</c>
</b>
</a>
<a id="2">
<b>
<c>1</c>
<c>2</c>
<c>3</c>
</b>
</a>
<a id="3">
<b>
<c>1</c>
<c>2</c>
<c>3</c>
</b>
</a>
[/CODE]

What I'm attempting to do is only pull data from the first [i]c[/i] of every [i]b[/i] element which is nested inside [i]a[/i]. All I'm able to do at the moment is pull all data contained in all available [i]c[/i]'s.

[CODE]
sText = dom.getElementsByTagName('a')
for node in sText:
sTextList = node.getElementsByTagName('c')
for i in sTextList:
dataText = i.firstChild.data
print dataText.encode('utf-8'), "<br />"
[/CODE]

Currently, all this achieves is:

[CODE]
1
2
3
1
2
3
1
2
3
[/CODE]

Is there a way I just pull the FIRST [i]c[/i] (as in, only "1") instead of all of them with the above code? This is driving me insane...

Thanks in advance!

5 answers to this question

Recommended Posts

  • 0

This is more drawn out than it needs to be... but it works... You could cut several of my steps short if you wanted.


from xml.dom.minidom import *
doc = """
<doc>
<a id="1">
<b>
<c>1</c>
<c>2</c>
<c>3</c>
</b>
</a>
<a id="2">
<b>
<c>1</c>
<c>2</c>
<c>3</c>
</b>
</a>
<a id="3">
<b>
<c>1</c>
<c>2</c>
<c>3</c>
</b>
</a>
</doc>
"""
dom = parseString(doc)
sText = dom.getElementsByTagName('a')
for node in sText:
sTextList = node.getElementsByTagName('b')
for i in sTextList:
print i.getElementsByTagName('c')[0].childNodes[0].data
[/CODE]

This topic is now closed to further replies.
  • Posts

    • I gave up on browser ad-blocking extensions a few years ago, replaced them with Adguard. Not perfect but overhaul is a nice app that does the job on both Windows and Android with the respective versions.
    • Glary Utilities 6.43.0.47 by Razvan Serea Glary Utilities offers numerous powerful and easy-to-use system tools and utilities to fix, speed up, maintain and protect your PC. Glary Utilities allow you to clean common system junk files, as well as invalid registry entries and Internet traces. You can manage and delete browser add-ons, analyze disk space usage and find duplicate files. You can also view and manage installed shell extensions, encrypt your files from unauthorized access and use, split large files into smaller manageable files and then rejoin them. Furthermore, Glary Utilities includes the options to find, fix, or remove broken Windows shortcuts, manage the programs that start at Windows startup and uninstall software. All Glary Utilities tools can be accessed through an eye-pleasing and totally simplistic interface. Glary Utilities 6.43.0.47 changelog: Optimized Memory Defrager: Optimized the clipboard cleaning algorithm, increasing speed by 5%. Optimized Wipe Free Space: Optimized the free space wiping algorithm, increasing speed by 8%. Minor GUI improvements. Minor bug fixes. Download: Glary Utilities 6.43.0.47 | 27.0 MB (Freeware) Download: Portable Glary Utilities | 32.3 MB View: Glary Utilities Homepage | Screenshot Get alerted to all of our Software updates on Twitter at @NeowinSoftware
    • "Of course the easiest solution is to switch to uBlock Origin Lite if you want to remain on Chrome, as it is MV3-based, but from our experience, uBO Lite does not seem to be as good as the original non-Lite version" In my experience uBlock Origin Lite does the job for normal everday home users. When they kill that we get to watch Firefox and Brave get a boost in user market share.
    • Block by DNS ad blocker! I dare you! I will even layer unlock Origin, on top of my internal DNS, if I need to and I don’t even block ads today (I really should but this type of behavior makes me angry). I suppose I could also just be lazy, and add the flags myself, back to each release (it wouldn’t be that difficult).
    • Wonder what MPs have ties to these privacy/verification/data harvesting companies that are going to step in this time. Last time under the Tories half the cabinet had fingers in the pies, heck even the PM and his wife at the time was working for silicon valley, probably made a fortune.
  • Recent Achievements

    • Week One Done
      skylerssviv earned a badge
      Week One Done
    • One Month Later
      mobmobiles earned a badge
      One Month Later
    • Very Popular
      Captain_Eric earned a badge
      Very Popular
    • One Month Later
      amusc earned a badge
      One Month Later
    • One Month Later
      DJC50PLUS earned a badge
      One Month Later
  • Popular Contributors

    1. 1
      +primortal
      500
    2. 2
      PsYcHoKiLLa
      219
    3. 3
      ATLien_0
      92
    4. 4
      +Edouard
      91
    5. 5
      Steven P.
      82
  • Tell a friend

    Love Neowin? Tell a friend!