Totally confused on using HTML Agility Pack

December 12, 2014

I have no idea how to use this thing. I need to scrape data off a page, specifically http://e-juice-recipes.com/recipe/reign-drops-throne-clone-banana-nut-bread/, in the <tbody id="results_table"> and all the tr's and td's under it.

I have no idea where to start - documentation seems non-existant.

Can anyone help?

Many thanks :-)

8,753 · December 12, 2014

A quick Google search yields a lot of relevant info here: http://stackoverflow.com/questions/846994/how-to-use-html-agility-pack , have you checked that out?

Note you don't necessarily need to use that, often you can get the data you want in a web page just by looking for lines that contain a particular string, depending on the complexity of what you want to do it might be more conceptual overhead to learn how to use a library than writing the parsing code yourself.

1,222 · December 12, 2014

If you're using PHP, the PHP Simple HTML DOM Parser does this very easily.

http://simplehtmldom.sourceforge.net/

2,549 · December 13, 2014

I'd also recommend a dom parser, combining it with regex and you got a powerfull combination.

December 14, 2014

Im not designing with PHP, so the DOM parser won't work.

I'll take a look at the link Andre S. posted - thanks :-)

2,549 · December 14, 2014

Im not designing with PHP, so the DOM parser won't work.

I'll take a look at the link Andre S. posted - thanks :-)

What are you using then? asp? node js?

If you use node js or do it even client side it's even easier since you can then use jquery to filter the html object.

Edit: seems you're writing for .net so I suppose a application or a asp website :P

6,612 · December 14, 2014

What are you using then? asp? node js?

HTML Agility Pack is a dotNET library that's designed for this sort of thing, similar to Python's BeautifulSoup if you're familiar with it.

That said, yea the documentation for it is rather weak but plenty of examples/answered questions on the web (Andre's link for example, lots of answered questions and demos out there to look at), pretty easy library to use, trickiest part is getting the node path just right for it to properly scrape the data. Once you get that right, the rest is really simple, used it a couple of times.

2,549 · December 14, 2014

I see a bit of a problem by the way, the #results_table is generated with js after the page has loaded...

Which means you have to scrape the js instead at the bottom of the page instead and interpret that :/

I'm looking currently for maybe a possibility that simpler.

Edit: nope, the values are put into the js block at the bottom of the page and then used to calculate the actual content for the html table with Recipe() (recipe.js).

So your only options is scraping the js block of code and then interpreting the js code and doing the calculations done by Recipe() yourself...

December 20, 2014

I see a bit of a problem by the way, the #results_table is generated with js after the page has loaded...

Which means you have to scrape the js instead at the bottom of the page instead and interpret that :/

I'm looking currently for maybe a possibility that simpler.

Edit: nope, the values are put into the js block at the bottom of the page and then used to calculate the actual content for the html table with Recipe() (recipe.js).

So your only options is scraping the js block of code and then interpreting the js code and doing the calculations done by Recipe() yourself...

Interesting you found that. Thank you.

If it's that difficult I'll just abandon the idea, it's not worth that much hassle.

Oh and I'm developing a VB.NET application only, it was just going to pull data for a textbox in the program - so the DOM and PHP stuff wouldn't apply.

Sign In

Totally confused on using HTML Agility Pack

Question

Mr. Black

Link to comment

Share on other sites

8 answers to this question

Recommended Posts

Andre S. Veteran

Link to comment

Share on other sites

lj300

Link to comment

Share on other sites

Seahorsepip Veteran

Link to comment

Share on other sites

Mr. Black

Link to comment

Share on other sites

Seahorsepip Veteran

Link to comment

Share on other sites

Max Norris

Link to comment

Share on other sites

Seahorsepip Veteran

Link to comment

Share on other sites

Mr. Black

Link to comment

Share on other sites

Recently Browsing 0 members