• 0

[Python] Extracting specific things from XML


Question

Hey guys,

Extremely frustrating problem, have wasted hours on this and can't find a solution.

I'm using dom/minidom through Python and can't use any other external Python modules.

Basically, we have the following example XML:


<a id="1">
<b>
<c>1</c>
<c>2</c>
<c>3</c>
</b>
</a>
<a id="2">
<b>
<c>1</c>
<c>2</c>
<c>3</c>
</b>
</a>
<a id="3">
<b>
<c>1</c>
<c>2</c>
<c>3</c>
</b>
</a>
[/CODE]

What I'm attempting to do is only pull data from the first [i]c[/i] of every [i]b[/i] element which is nested inside [i]a[/i]. All I'm able to do at the moment is pull all data contained in all available [i]c[/i]'s.

[CODE]
sText = dom.getElementsByTagName('a')
for node in sText:
sTextList = node.getElementsByTagName('c')
for i in sTextList:
dataText = i.firstChild.data
print dataText.encode('utf-8'), "<br />"
[/CODE]

Currently, all this achieves is:

[CODE]
1
2
3
1
2
3
1
2
3
[/CODE]

Is there a way I just pull the FIRST [i]c[/i] (as in, only "1") instead of all of them with the above code? This is driving me insane...

Thanks in advance!

Link to comment
Share on other sites

5 answers to this question

Recommended Posts

  • 0

Your loop is grabbing all the "C"s... then with each individual "C" you grab the first child/data... so you are still just looping through each "C" and displaying its data.

Link to comment
Share on other sites

  • 0

This is more drawn out than it needs to be... but it works... You could cut several of my steps short if you wanted.


from xml.dom.minidom import *
doc = """
<doc>
<a id="1">
<b>
<c>1</c>
<c>2</c>
<c>3</c>
</b>
</a>
<a id="2">
<b>
<c>1</c>
<c>2</c>
<c>3</c>
</b>
</a>
<a id="3">
<b>
<c>1</c>
<c>2</c>
<c>3</c>
</b>
</a>
</doc>
"""
dom = parseString(doc)
sText = dom.getElementsByTagName('a')
for node in sText:
sTextList = node.getElementsByTagName('b')
for i in sTextList:
print i.getElementsByTagName('c')[0].childNodes[0].data
[/CODE]

Link to comment
Share on other sites

This topic is now closed to further replies.