[Python] Extracting specific things from XML

November 9, 2012

Hey guys,

Extremely frustrating problem, have wasted hours on this and can't find a solution.

I'm using dom/minidom through Python and can't use any other external Python modules.

Basically, we have the following example XML:


<a id="1">
  <b>
	<c>1</c>
	<c>2</c>
	<c>3</c>
  </b>
</a>
<a id="2">
  <b>
	<c>1</c>
	<c>2</c>
	<c>3</c>
  </b>
</a>
<a id="3">
  <b>
	<c>1</c>
	<c>2</c>
	<c>3</c>
  </b>
</a>
[/CODE]

What I'm attempting to do is only pull data from the first [i]c[/i] of every [i]b[/i] element which is nested inside [i]a[/i]. All I'm able to do at the moment is pull all data contained in all available [i]c[/i]'s.

[CODE]
sText = dom.getElementsByTagName('a')
for node in sText:
sTextList = node.getElementsByTagName('c')
for i in sTextList:
dataText = i.firstChild.data
print dataText.encode('utf-8'), "<br />"
[/CODE]

Currently, all this achieves is:

[CODE]
1
2
3
1
2
3
1
2
3
[/CODE]

Is there a way I just pull the FIRST [i]c[/i] (as in, only "1") instead of all of them with the above code? This is driving me insane...

Thanks in advance!

November 9, 2012

Your loop is grabbing all the "C"s... then with each individual "C" you grab the first child/data... so you are still just looping through each "C" and displaying its data.

November 9, 2012

I haven't used Python in 5 or so years but... let me install it and write your program...

November 9, 2012

You're a legend, thanks for that. I'll still be trying to sort this out - not really moving too well right now...

November 9, 2012

This is more drawn out than it needs to be... but it works... You could cut several of my steps short if you wanted.


from xml.dom.minidom import *
doc = """
<doc>
<a id="1">
  <b>
		<c>1</c>
		<c>2</c>
		<c>3</c>
  </b>
</a>
<a id="2">
  <b>
		<c>1</c>
		<c>2</c>
		<c>3</c>
  </b>
</a>
<a id="3">
  <b>
		<c>1</c>
		<c>2</c>
		<c>3</c>
  </b>
</a>
</doc>
"""
dom = parseString(doc)
sText = dom.getElementsByTagName('a')
for node in sText:
	sTextList = node.getElementsByTagName('b')
	for i in sTextList:
		print i.getElementsByTagName('c')[0].childNodes[0].data
[/CODE]

November 9, 2012

You are AMAZING, thank you so much - it worked right away! :D

Mind is finally at ease with this - been a very long day...

Sign In

[Python] Extracting specific things from XML

Question

Snowstorm

Link to comment

Share on other sites

5 answers to this question

Recommended Posts

Tuishimi

Link to comment

Share on other sites

Tuishimi

Link to comment

Share on other sites

Snowstorm

Link to comment

Share on other sites

Tuishimi

Link to comment

Share on other sites

Snowstorm

Link to comment

Share on other sites

Recently Browsing 0 members

Similar Content

Posts

Recent Achievements

Popular Contributors

Tell a friend