Jump to content



Photo

[Python] Extracting specific things from XML

python xml

  • Please log in to reply
5 replies to this topic

#1 Snowstorm

Snowstorm

    Neowinian

  • Joined: 24-October 01

Posted 09 November 2012 - 04:23

Hey guys,

Extremely frustrating problem, have wasted hours on this and can't find a solution.

I'm using dom/minidom through Python and can't use any other external Python modules.

Basically, we have the following example XML:

<a id="1">
  <b>
	<c>1</c>
	<c>2</c>
	<c>3</c>
  </b>
</a>
<a id="2">
  <b>
	<c>1</c>
	<c>2</c>
	<c>3</c>
  </b>
</a>
<a id="3">
  <b>
	<c>1</c>
	<c>2</c>
	<c>3</c>
  </b>
</a>

What I'm attempting to do is only pull data from the first c of every b element which is nested inside a. All I'm able to do at the moment is pull all data contained in all available c's.

sText = dom.getElementsByTagName('a')
for node in sText:
	sTextList = node.getElementsByTagName('c')
	for i in sTextList:
		dataText = i.firstChild.data
		print dataText.encode('utf-8'), "<br />"

Currently, all this achieves is:

1
2
3
1
2
3
1
2
3

Is there a way I just pull the FIRST c (as in, only "1") instead of all of them with the above code? This is driving me insane...

Thanks in advance!


#2 Tuishimi

Tuishimi

    Michinator

  • Joined: 19-November 10
  • OS: Windows 8

Posted 09 November 2012 - 04:29

Your loop is grabbing all the "C"s... then with each individual "C" you grab the first child/data... so you are still just looping through each "C" and displaying its data.

#3 Tuishimi

Tuishimi

    Michinator

  • Joined: 19-November 10
  • OS: Windows 8

Posted 09 November 2012 - 04:34

I haven't used Python in 5 or so years but... let me install it and write your program...

#4 OP Snowstorm

Snowstorm

    Neowinian

  • Joined: 24-October 01

Posted 09 November 2012 - 04:36

You're a legend, thanks for that. I'll still be trying to sort this out - not really moving too well right now...

#5 Tuishimi

Tuishimi

    Michinator

  • Joined: 19-November 10
  • OS: Windows 8

Posted 09 November 2012 - 04:52

This is more drawn out than it needs to be... but it works... You could cut several of my steps short if you wanted.

from xml.dom.minidom import *
doc = """
<doc>
<a id="1">
  <b>
		<c>1</c>
		<c>2</c>
		<c>3</c>
  </b>
</a>
<a id="2">
  <b>
		<c>1</c>
		<c>2</c>
		<c>3</c>
  </b>
</a>
<a id="3">
  <b>
		<c>1</c>
		<c>2</c>
		<c>3</c>
  </b>
</a>
</doc>
"""
dom = parseString(doc)
sText = dom.getElementsByTagName('a')
for node in sText:
	sTextList = node.getElementsByTagName('b')
	for i in sTextList:
		print i.getElementsByTagName('c')[0].childNodes[0].data


#6 OP Snowstorm

Snowstorm

    Neowinian

  • Joined: 24-October 01

Posted 09 November 2012 - 05:01

You are AMAZING, thank you so much - it worked right away! :D

Mind is finally at ease with this - been a very long day...



Click here to login or here to register to remove this ad, it's free!