• 0

Java Question (Should be easy)


Question

Could someone look at my code and tell me why findInHorizon is giving me more than what my regular expression is looking for? I cannot figure it out. I think when I take the file extension part out, it will work, but when it's there takes in that whole line with extra html code. Enter 2776 when the program starts. The program will download an HTML page and look through for my regular expression, but it isn't working! This code is basically another program I made that DID work, but now when I changed it to work for C&H it stopped :/

Source code-

http://pastebin.com/cxTv30yW

THANKS!

Link to comment
https://www.neowin.net/forum/topic/1072709-java-question-should-be-easy/
Share on other sites

12 answers to this question

Recommended Posts

  • 0

I haven't tested it but I suspect it'll be because you're using a greedy match in your regex, change it to be non-greedy using ? after .*

theImage = sc2.findWithinHorizon("[h][t][t][p][:][/][/]www.explosm.net/db/files/Comics/.*?[.][jJpPgG][pPnNiI][eE]?[gGfF]", 0);[/CODE]

I've never used Java so I'm unfamiliar with its regex matcher but it looks like (?i) can make a part of a pattern case insensitive, so you could do something a bit easier to read for the extension too:

[CODE]theImage = sc2.findWithinHorizon("www\\.explosm\\.net/db/files/Comics/.*?\\.(?i)(jpe?g|gif|png)", 0);[/CODE]

  • 0

EDIT****

.*?

I think this worked. I'll post back in a bit

I haven't tested it but I suspect it'll be because you're using a greedy match in your regex, change it to be non-greedy using ? after .*

theImage = sc2.findWithinHorizon("[h][t][t][p][:][/][/]www.explosm.net/db/files/Comics/.*?[.][jJpPgG][pPnNiI][eE]?[gGfF]", 0);[/CODE]

I've never used Java so I'm unfamiliar with its regex matcher but it looks like (?i) can make a part of a pattern case insensitive, so you could do something a bit easier to read for the extension too:

[CODE]theImage = sc2.findWithinHorizon("www\\.explosm\\.net/db/files/Comics/.*?\\.(?i)(jpe?g|gif|png)", 0);[/CODE]

I'll try that!

I don?t have the ability to compile your code right now, so I can?t give you a straight answer. But, I can help you to debug your code yourself!

What?s being stored in theImage after findWithinHorizon is run? What was the expected value?

This is what's being stored in theImage

http://pastebin.com/PirKMMXU

This is what I want

http://www.explosm.n...s/Kris/well.png

Why would it grab so much extra? Is it because that regex appears in that line again?

  • 0

By default quantifiers in regex are greedy, adding ? makes them lazy.

For example: if you had the string abcdabc using a pattern of .*b would match up until the last b character: abcdab but using .*?b would give just ab

In your case it was matching the beginning of the URL, and then trying to find the last match for jpg, jpeg, gif or png. Adding the ? makes it stop on the first match.

There's a better explanation here: http://www.regular-expressions.info/repeat.html

  • 0

By default quantifiers in regex are greedy, adding ? makes them lazy.

For example: if you had the string abcdabc using a pattern of .*b would match up until the last b character: abcdab but using .*?b would give just ab

In your case it was matching the beginning of the URL, and then trying to find the last match for jpg, jpeg, gif or png. Adding the ? makes it stop on the first match.

Oh that makes sense!

Ok so at page 2717 the link does NOT contain www. So I changed my reg expression

  • 0

EDIT;

I think this fixed it

finally{

//System.out.println("hi");

continue;

}

Ok so I have another question. Sometimes the pages numbers dont exist which causes an error. I try to catch the error, then move along. The catch block catches it but loops the catch. I added a scanner.next(); to capture the bad input. That will stop the catch from looping (or I guess the program from looping with bad input) but then it just stops in the catch and doesn't continue past the scanner.next(). What can I do? I setup the next link in the catch block so I need the the program to leave the catch block and start over at the top of the while loop.

http://pastebin.com/Kpajd6ER

  • 0

I have to ask... where did you learn to put square brackets around almost every character in your regex pattern? You don't need to do that, it's just making it harder to read. Also, it doesn't matter much in your pattern, but you should be escaping the . in "explosm.net", the full-stop has a special meaning, match any character.

Good job overall though!

  • 0

I have to ask... where did you learn to put square brackets around almost every character in your regex pattern? You don't need to do that, it's just making it harder to read. Also, it doesn't matter much in your pattern, but you should be escaping the . in "explosm.net", the full-stop has a special meaning, match any 1 character.

Good job overall though!

Are you referring to the http://www part? I did the www because not all the links have that. And I just did that to the http when I was trying to debug. i just let it like that. If you are referring to my file extensions, I think it's needed.

  • 0

Are you referring to the http://www part? I did the www because not all the links have that. And I just did that to the http when I was trying to debug. i just let it like that. If you are referring to my file extensions, I think it's needed.

I meant the http part. Square brackets only have meaning if you use them like [ab] which matches a or b, but you have it around single characters: [h][t][t][p][:][/][/]. It's not a problem, I was just interested because I've never seen anyone do that before.

  • 0

I meant the http part. Square brackets only have meaning if you use them like [ab] which matches a or b, but you have it around single characters: [h][t][t][p][:][/][/]. It's not a problem, I was just interested because I've never seen anyone do that before.

Just a debugging thing but left it haha

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • HomeBank 5.10.1 by Razvan Serea HomeBank is a free software (as in "free speech" and also as in "free beer") that will assist you to manage your personal accounting. It is designed to easy to use and be able to analyse your personal finance and budget in detail using powerful filtering tools and beautiful charts. If you are looking for a completely free and easy application to manage your personal accounting, budget, finance then HomeBank should be the software of choice. HomeBank also benefits of more than 19 years of user experience and feedback, and is translated by its users in around 56 languages. Highlights: Cross platform, supports GNU/Linux, Microsoft Windows, Mac OS X Import easily from Intuit Quicken, Microsoft Money or other software Import bank account statements (OFX, QIF, CSV, QFX) Duplicate transaction detection Automatic cheque numbering Various account types : Bank, Cash, Asset, Credit card, Liability Scheduled transaction Category split Internal transfer Month/Annual budget Dynamic powerful reports with charts Automatic category/payee assignment Vehicule cost HomeBank 5.10.1 changelog: change: the input field helper icon + fixed some spacing inconsistency change: transaction, added some missing input tooltips and reworked existing change: category, payee and tag window add input now have a tooltip and button change: split window, refactored the layout change: split window, add display of memo and date wish : #2106800 budget report option to exclude transfers from unbudgeted line bugfix: prevent deletion of non pending transaction when rejecting bugfix: transaction warning for no rate faultly showing in transfer bugfix: report missing space for filter tooltip icon bugfix: budget report missing filter tooltip bugfix: manage account closed icon was hidding budget icon bugfix: #2154771 view transcations requires hitting Escape or X twice to close dialog bugfix: #2154337 transfer to/from closed account with different currency don't show the amount bugfix: #2154234 scheduled transaction recurring pattern daily value limited to 100 bugfix: #2149897 view split for closed accounts bugfix: #2148561 global time chart do not shows current period by default bugfix: #2148456 the main screen Total Chart is no longer showing an overall total bugfix: #2147497 editing a transaction resets scroll position bugfix: #2147377 balance mixup with transaction same day sort by amount bugfix: #2147052 quarter are wrong when fiscal year is jan 1 bugfix: #2147048 all events for the month are late but today is only the 1st bugfix: #2144993 impossible to search for transactions by value for values >999,99 bugfix: #2144698 adding new Category/Payee/Tags requires hitting -Enter- bugfix: #2144419 QIF Account name detection fail on import bugfix: #2142349 can't delete account groups bugfix: #2139409 account maximum limit is not fully used (example credit card) bugfix: #2133783 transfers shouldn't add to dashboard top spending reports Download: HomeBank 5.10.1 | 20.5 MB (Open Source) Download: 3rd party packages (macOSX. Ubuntu...etc) View: HomeBank Website | Support | Features | Screenshot Get alerted to all of our Software updates on Twitter at @NeowinSoftware
    • Same, price was right for my Home, laptop, phone. Works great!
    • Brave and Firefox. I’ve been using them as my primary browsers for a while now, perfect combo
    • They want Ring 0 access. Should be a hard no. A middle ground needs to be found.
    • WFP can be a bit buggy. I switched to the new SocketFilter when 7.22.7 came out. I haven't had any issues so far.
  • Recent Achievements

    • Experienced
      JayZJay went up a rank
      Experienced
    • Reacting Well
      Sir_Timbit earned a badge
      Reacting Well
    • Week One Done
      rubentuben8 earned a badge
      Week One Done
    • Week One Done
      ARaclen earned a badge
      Week One Done
    • Week One Done
      jojodbn earned a badge
      Week One Done
  • Popular Contributors

    1. 1
      +primortal
      525
    2. 2
      PsYcHoKiLLa
      232
    3. 3
      Edouard
      135
    4. 4
      ATLien_0
      88
    5. 5
      Steven P.
      83
  • Tell a friend

    Love Neowin? Tell a friend!