Jump to content

20 posts in this topic

Posted

Hey guys,
I wanted to work on a new project just for fun, so I thought a program that downloaded imgur galleries would be cool. I am working on it, but the program will run through duplicates after 51 images usually.

I've looked at all i can and can't find the problem. My guess is that it's the server resetting me. I wanted a challenge so i tried to get past it. I tried creating a new url connection each time, I also created a seperate class that did the downloading and creating a new object of that class everytime but it didn't work.

Thoughts?

Source
http://shayconcepts.com/programming/ImgurGalleryDownloader/imgurDL.zip

Share this post


Link to post
Share on other sites

Posted

You are parsing the HTML page. Don't do that. Use imgur's API instead - from the documentation, you can just append ".xml" to any gallery page and it returns the gallery as an XML file.

In general, don't parse HTML pages unless there is no other way. It's not easy and it can change any day, making your program useless.

Share this post


Link to post
Share on other sites

Posted

[quote name='Aethec' timestamp='1349382618' post='595225913']In general, don't parse HTML pages unless there is no other way.[/quote][url="http://stackoverflow.com/a/1732454/154766"]Especially not with regex[/url].

Share this post


Link to post
Share on other sites

Posted

[quote name='Aethec' timestamp='1349382618' post='595225913']
You are parsing the HTML page. Don't do that. Use imgur's API instead - from the documentation, you can just append ".xml" to any gallery page and it returns the gallery as an XML file.

In general, don't parse HTML pages unless there is no other way. It's not easy and it can change any day, making your program useless.
[/quote]

Thanks! I'll take a look.

EDIT: it seems it only returns 56 images though :/

[quote name='Dr_Asik' timestamp='1349401869' post='595226475']
[url="http://stackoverflow.com/a/1732454/154766"]Especially not with regex[/url].
[/quote]
I learned about regex and it has worked really well for me! I'll take a look at your link and find a better way to parse HTML. Thanks!

Share this post


Link to post
Share on other sites

Posted

[quote name='thatguyandrew1992' timestamp='1349410461' post='595226607']I learned about regex and it has worked really well for me! I'll take a look at your link and find a better way to parse HTML. Thanks![/quote]If you really need to parse HTML, you should use a proper HTML library; for Java there is HtmlUnit and Jsoup at least. Better would be to not parse any HTML and use imgur's API as Aethec suggested.

Share this post


Link to post
Share on other sites

Posted

[quote name='Dr_Asik' timestamp='1349417150' post='595226741']
If you really need to parse HTML, you should use a proper HTML library; for Java there is HtmlUnit and Jsoup at least. Better would be to not parse any HTML and use imgur's API as Aethec suggested.
[/quote]
But when using the API, I append the .xml but it doesn't show ALL the images in the gallery

Share this post


Link to post
Share on other sites

Posted

To get other pages, append /page/XXX to an existing gallery URL. XXX is the page number. Don't forget the .xml, too.

Share this post


Link to post
Share on other sites

Posted

[quote name='Aethec' timestamp='1349445902' post='595227419']
To get other pages, append /page/XXX to an existing gallery URL. XXX is the page number. Don't forget the .xml, too.
[/quote]
Thank you so much!! How did you find this? The API doc?

Share this post


Link to post
Share on other sites

Posted

[quote name='thatguyandrew1992' timestamp='1349467142' post='595228327']
Thank you so much!! How did you find this? The API doc?
[/quote]
[url="http://api.imgur.com/gallery"]http://api.imgur.com/gallery[/url]

Share this post


Link to post
Share on other sites

Posted

[quote name='ZakO' timestamp='1349467357' post='595228337']
[url="http://api.imgur.com/gallery"]http://api.imgur.com/gallery[/url]
[/quote]
Thanks!

Share this post


Link to post
Share on other sites

Posted

So Im using an XML parser and the /page/#.xml and it's working perfectly! Superb help! Thanks guys!

Now if a server disconnects me, how can I reestablish a connection, as if I had restarted my program.

Share this post


Link to post
Share on other sites

Posted

Sometimes I get the java.net.ConnectException: Connection time out...

Share this post


Link to post
Share on other sites

Posted

try {
// le magic
} catch (Exception e)
{
// re-establish connection
}

Share this post


Link to post
Share on other sites

Posted

[quote name='ArialBlue' timestamp='1349748023' post='595234629']
try {
// le magic
} catch (Exception e)
{
// re-establish connection
}
[/quote][font=courier new,courier,monospace]catch (ConnectException e)[/font] ... always catch the specific exception you want, and never the base class of all exceptions.

Share this post


Link to post
Share on other sites

Posted

If I catch that ConnectException, how do I reetasblish the connection?

Share this post


Link to post
Share on other sites

Posted

If you're only downloading files from Imgur, you don't have to start or restart the connection - you just download files, and if the download fails, retry.

Share this post


Link to post
Share on other sites

Posted

[quote name='Aethec' timestamp='1349819345' post='595236593']
If you're only downloading files from Imgur, you don't have to start or restart the connection - you just download files, and if the download fails, retry.
[/quote]
Hmm ok. Thanks! I will see what I can do.

Share this post


Link to post
Share on other sites

Posted

Ok guys, I need some more help. How do I get the xml of an album? They are apparently different than galleries! The xml page isn't coming up. Or if there isn't an xml, what is recommended to get the images?

Ex
http://imgur.com/a/HKtAR/

Share this post


Link to post
Share on other sites

Posted

[quote name='thatguyandrew1992' timestamp='1350016078' post='595241781']
Ok guys, I need some more help. How do I get the xml of an album? They are apparently different than galleries! The xml page isn't coming up. Or if there isn't an xml, what is recommended to get the images?

Ex
[url="http://imgur.com/a/HKtAR/"]http://imgur.com/a/HKtAR/[/url]
[/quote]
It's all there in the API documentation: http://api.imgur.com/resources_anon#album

Example: http://api.imgur.com/2/album/HKtAR

Share this post


Link to post
Share on other sites

Posted

[quote name='ZakO' timestamp='1350050749' post='595242513']
It's all there in the API documentation: [url="http://api.imgur.com/resources_anon#album"]http://api.imgur.com...rces_anon#album[/url]

Example: [url="http://api.imgur.com/2/album/HKtAR"]http://api.imgur.com/2/album/HKtAR[/url]
[/quote]
Thanks! I'm learning how to use APIs, I know this is all obvious. I thought I had looked through this. ill be sure to look extra hard!, >.<

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0

  • Recently Browsing   0 members

    No registered users viewing this page.