Mr. Black Posted December 12, 2014 Share Posted December 12, 2014 I have no idea how to use this thing. I need to scrape data off a page, specifically http://e-juice-recipes.com/recipe/reign-drops-throne-clone-banana-nut-bread/, in the <tbody id="results_table"> and all the tr's and td's under it. I have no idea where to start - documentation seems non-existant. Can anyone help? Many thanks :-) Link to comment Share on other sites More sharing options...
0 Andre S. Veteran Posted December 12, 2014 Veteran Share Posted December 12, 2014 A quick Google search yields a lot of relevant info here: http://stackoverflow.com/questions/846994/how-to-use-html-agility-pack , have you checked that out? Note you don't necessarily need to use that, often you can get the data you want in a web page just by looking for lines that contain a particular string, depending on the complexity of what you want to do it might be more conceptual overhead to learn how to use a library than writing the parsing code yourself. Link to comment Share on other sites More sharing options...
0 lj300 Posted December 12, 2014 Share Posted December 12, 2014 If you're using PHP, the PHP Simple HTML DOM Parser does this very easily. http://simplehtmldom.sourceforge.net/ simplezz 1 Share Link to comment Share on other sites More sharing options...
0 Seahorsepip Veteran Posted December 13, 2014 Veteran Share Posted December 13, 2014 I'd also recommend a dom parser, combining it with regex and you got a powerfull combination. Link to comment Share on other sites More sharing options...
0 Mr. Black Posted December 14, 2014 Author Share Posted December 14, 2014 Im not designing with PHP, so the DOM parser won't work. I'll take a look at the link Andre S. posted - thanks :-) Link to comment Share on other sites More sharing options...
0 Seahorsepip Veteran Posted December 14, 2014 Veteran Share Posted December 14, 2014 Im not designing with PHP, so the DOM parser won't work. I'll take a look at the link Andre S. posted - thanks :-) What are you using then? asp? node js? If you use node js or do it even client side it's even easier since you can then use jquery to filter the html object. Edit: seems you're writing for .net so I suppose a application or a asp website :P Link to comment Share on other sites More sharing options...
0 Max Norris Posted December 14, 2014 Share Posted December 14, 2014 What are you using then? asp? node js? HTML Agility Pack is a dotNET library that's designed for this sort of thing, similar to Python's BeautifulSoup if you're familiar with it. That said, yea the documentation for it is rather weak but plenty of examples/answered questions on the web (Andre's link for example, lots of answered questions and demos out there to look at), pretty easy library to use, trickiest part is getting the node path just right for it to properly scrape the data. Once you get that right, the rest is really simple, used it a couple of times. Link to comment Share on other sites More sharing options...
0 Seahorsepip Veteran Posted December 14, 2014 Veteran Share Posted December 14, 2014 I see a bit of a problem by the way, the #results_table is generated with js after the page has loaded... Which means you have to scrape the js instead at the bottom of the page instead and interpret that :/ I'm looking currently for maybe a possibility that simpler. Edit: nope, the values are put into the js block at the bottom of the page and then used to calculate the actual content for the html table with Recipe() (recipe.js). So your only options is scraping the js block of code and then interpreting the js code and doing the calculations done by Recipe() yourself... Link to comment Share on other sites More sharing options...
0 Mr. Black Posted December 20, 2014 Author Share Posted December 20, 2014 I see a bit of a problem by the way, the #results_table is generated with js after the page has loaded... Which means you have to scrape the js instead at the bottom of the page instead and interpret that :/ I'm looking currently for maybe a possibility that simpler. Edit: nope, the values are put into the js block at the bottom of the page and then used to calculate the actual content for the html table with Recipe() (recipe.js). So your only options is scraping the js block of code and then interpreting the js code and doing the calculations done by Recipe() yourself... Interesting you found that. Thank you. If it's that difficult I'll just abandon the idea, it's not worth that much hassle. Oh and I'm developing a VB.NET application only, it was just going to pull data for a textbox in the program - so the DOM and PHP stuff wouldn't apply. Link to comment Share on other sites More sharing options...
Question
Mr. Black
I have no idea how to use this thing. I need to scrape data off a page, specifically http://e-juice-recipes.com/recipe/reign-drops-throne-clone-banana-nut-bread/, in the <tbody id="results_table"> and all the tr's and td's under it.
I have no idea where to start - documentation seems non-existant.
Can anyone help?
Many thanks :-)
Link to comment
Share on other sites
8 answers to this question
Recommended Posts