Jump to content

6 posts in this topic

Posted

Hey guys,

Extremely frustrating problem, have wasted hours on this and can't find a solution.

I'm using dom/minidom through Python and can't use any other external Python modules.

Basically, we have the following example XML:

[CODE]
<a id="1">
<b>
<c>1</c>
<c>2</c>
<c>3</c>
</b>
</a>
<a id="2">
<b>
<c>1</c>
<c>2</c>
<c>3</c>
</b>
</a>
<a id="3">
<b>
<c>1</c>
<c>2</c>
<c>3</c>
</b>
</a>
[/CODE]

What I'm attempting to do is only pull data from the first [i]c[/i] of every [i]b[/i] element which is nested inside [i]a[/i]. All I'm able to do at the moment is pull all data contained in all available [i]c[/i]'s.

[CODE]
sText = dom.getElementsByTagName('a')
for node in sText:
sTextList = node.getElementsByTagName('c')
for i in sTextList:
dataText = i.firstChild.data
print dataText.encode('utf-8'), "<br />"
[/CODE]

Currently, all this achieves is:

[CODE]
1
2
3
1
2
3
1
2
3
[/CODE]

Is there a way I just pull the FIRST [i]c[/i] (as in, only "1") instead of all of them with the above code? This is driving me insane...

Thanks in advance!

Share this post


Link to post
Share on other sites

Posted

Your loop is grabbing all the "C"s... then with each individual "C" you grab the first child/data... so you are still just looping through each "C" and displaying its data.

Share this post


Link to post
Share on other sites

Posted

I haven't used Python in 5 or so years but... let me install it and write your program...

Share this post


Link to post
Share on other sites

Posted

You're a legend, thanks for that. I'll still be trying to sort this out - not really moving too well right now...

Share this post


Link to post
Share on other sites

Posted

This is more drawn out than it needs to be... but it works... You could cut several of my steps short if you wanted.

[CODE]
from xml.dom.minidom import *
doc = """
<doc>
<a id="1">
<b>
<c>1</c>
<c>2</c>
<c>3</c>
</b>
</a>
<a id="2">
<b>
<c>1</c>
<c>2</c>
<c>3</c>
</b>
</a>
<a id="3">
<b>
<c>1</c>
<c>2</c>
<c>3</c>
</b>
</a>
</doc>
"""
dom = parseString(doc)
sText = dom.getElementsByTagName('a')
for node in sText:
sTextList = node.getElementsByTagName('b')
for i in sTextList:
print i.getElementsByTagName('c')[0].childNodes[0].data
[/CODE]

Share this post


Link to post
Share on other sites

Posted

You are AMAZING, thank you so much - it worked right away! :D

Mind is finally at ease with this - been a very long day...

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0

  • Recently Browsing   0 members

    No registered users viewing this page.