• 0

Java Question (Should be easy)


Question

Could someone look at my code and tell me why findInHorizon is giving me more than what my regular expression is looking for? I cannot figure it out. I think when I take the file extension part out, it will work, but when it's there takes in that whole line with extra html code. Enter 2776 when the program starts. The program will download an HTML page and look through for my regular expression, but it isn't working! This code is basically another program I made that DID work, but now when I changed it to work for C&H it stopped :/

Source code-

http://pastebin.com/cxTv30yW

THANKS!

Link to comment
https://www.neowin.net/forum/topic/1072709-java-question-should-be-easy/
Share on other sites

12 answers to this question

Recommended Posts

  • 0

I haven't tested it but I suspect it'll be because you're using a greedy match in your regex, change it to be non-greedy using ? after .*

theImage = sc2.findWithinHorizon("[h][t][t][p][:][/][/]www.explosm.net/db/files/Comics/.*?[.][jJpPgG][pPnNiI][eE]?[gGfF]", 0);[/CODE]

I've never used Java so I'm unfamiliar with its regex matcher but it looks like (?i) can make a part of a pattern case insensitive, so you could do something a bit easier to read for the extension too:

[CODE]theImage = sc2.findWithinHorizon("www\\.explosm\\.net/db/files/Comics/.*?\\.(?i)(jpe?g|gif|png)", 0);[/CODE]

  • 0

EDIT****

.*?

I think this worked. I'll post back in a bit

I haven't tested it but I suspect it'll be because you're using a greedy match in your regex, change it to be non-greedy using ? after .*

theImage = sc2.findWithinHorizon("[h][t][t][p][:][/][/]www.explosm.net/db/files/Comics/.*?[.][jJpPgG][pPnNiI][eE]?[gGfF]", 0);[/CODE]

I've never used Java so I'm unfamiliar with its regex matcher but it looks like (?i) can make a part of a pattern case insensitive, so you could do something a bit easier to read for the extension too:

[CODE]theImage = sc2.findWithinHorizon("www\\.explosm\\.net/db/files/Comics/.*?\\.(?i)(jpe?g|gif|png)", 0);[/CODE]

I'll try that!

I don?t have the ability to compile your code right now, so I can?t give you a straight answer. But, I can help you to debug your code yourself!

What?s being stored in theImage after findWithinHorizon is run? What was the expected value?

This is what's being stored in theImage

http://pastebin.com/PirKMMXU

This is what I want

http://www.explosm.n...s/Kris/well.png

Why would it grab so much extra? Is it because that regex appears in that line again?

  • 0

By default quantifiers in regex are greedy, adding ? makes them lazy.

For example: if you had the string abcdabc using a pattern of .*b would match up until the last b character: abcdab but using .*?b would give just ab

In your case it was matching the beginning of the URL, and then trying to find the last match for jpg, jpeg, gif or png. Adding the ? makes it stop on the first match.

There's a better explanation here: http://www.regular-expressions.info/repeat.html

  • 0

By default quantifiers in regex are greedy, adding ? makes them lazy.

For example: if you had the string abcdabc using a pattern of .*b would match up until the last b character: abcdab but using .*?b would give just ab

In your case it was matching the beginning of the URL, and then trying to find the last match for jpg, jpeg, gif or png. Adding the ? makes it stop on the first match.

Oh that makes sense!

Ok so at page 2717 the link does NOT contain www. So I changed my reg expression

  • 0

EDIT;

I think this fixed it

finally{

//System.out.println("hi");

continue;

}

Ok so I have another question. Sometimes the pages numbers dont exist which causes an error. I try to catch the error, then move along. The catch block catches it but loops the catch. I added a scanner.next(); to capture the bad input. That will stop the catch from looping (or I guess the program from looping with bad input) but then it just stops in the catch and doesn't continue past the scanner.next(). What can I do? I setup the next link in the catch block so I need the the program to leave the catch block and start over at the top of the while loop.

http://pastebin.com/Kpajd6ER

  • 0

I have to ask... where did you learn to put square brackets around almost every character in your regex pattern? You don't need to do that, it's just making it harder to read. Also, it doesn't matter much in your pattern, but you should be escaping the . in "explosm.net", the full-stop has a special meaning, match any character.

Good job overall though!

  • 0

I have to ask... where did you learn to put square brackets around almost every character in your regex pattern? You don't need to do that, it's just making it harder to read. Also, it doesn't matter much in your pattern, but you should be escaping the . in "explosm.net", the full-stop has a special meaning, match any 1 character.

Good job overall though!

Are you referring to the http://www part? I did the www because not all the links have that. And I just did that to the http when I was trying to debug. i just let it like that. If you are referring to my file extensions, I think it's needed.

  • 0

Are you referring to the http://www part? I did the www because not all the links have that. And I just did that to the http when I was trying to debug. i just let it like that. If you are referring to my file extensions, I think it's needed.

I meant the http part. Square brackets only have meaning if you use them like [ab] which matches a or b, but you have it around single characters: [h][t][t][p][:][/][/]. It's not a problem, I was just interested because I've never seen anyone do that before.

  • 0

I meant the http part. Square brackets only have meaning if you use them like [ab] which matches a or b, but you have it around single characters: [h][t][t][p][:][/][/]. It's not a problem, I was just interested because I've never seen anyone do that before.

Just a debugging thing but left it haha

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • I literally tried to install it on my PC today and it crashed upon every start up. I tried all the tricks, terminate and remove cache, uninstall, reinstall, reboot, install via winget.. in the end I had my call with Brad Wardell on Google Meet. Oh I also tried Teams on the web, and got no video or audio (yes I allowed mic and camera).
    • Microsoft explains how it made Teams so much faster in 2026 by Usama Jawad Microsoft Teams is Redmond's flagship online communication and collaboration tool, primarily used in enterprise, government, and school environments. It regularly receives updates, with Microsoft frequently revealing features currently in development as well, such as Efficiency Mode and a dedicated app for meeting recaps. Now, the company has shed some light on the backend enhancements it has made to its popular software in 2026. During the first half of 2026, switching between chats has become 20% faster on desktop and web, which is especially great for less powerful devices or hardware on weak networks. Microsoft explains that it tracks two types of chat switches. The first is a warm switch, which relates to returning to a recent chat, in which case it is most likely already present in memory and should load extremely fast. The other is a cold switch where a chat that hasn't been opened recently is now opened. Naturally, the second type of chat switch was causing the most notable latency spikes. Microsoft realized that this was happening because of three reasons. First, the query to fetch conversation data was being fired too late. Next, queries were being sent sequentially, increasing the overall round-trip time. Finally, there was no handling of response priority, which meant that non-critical responses would sometimes block the main thread. Microsoft solved all of these problems by firing the data query immediately, bundling queries to be sent together so that they wouldn't create a "waterfall" effect, and enabling faster painting of the frame. As a result, warm and cold switches now have almost the same latency, and the only distinguishable difference between them is due to the data layer response time. Next, Microsoft focused on reducing app hangs and freezes on macOS and iOS by 35%. On macOS, the company has a health monitor thread running in the background that does exactly what it says on the tin. The tech firm built a dedicated StackDecoder tool to analyze health monitor's output at scale. Through this combination, it decided to move several error reporting and monitoring threads either to the background or treated them as an asynchronous call so that they wouldn't block the main thread, causing Teams to freeze. On iOS, the same result was accomplished through optimization of computations, caching, offloading operations from the main thread, refactoring database access behaviors, deferring non-critical tasks, and more. On a related note, it also made people search 25% on iOS by optimizing the query pipeline, and efficient database task queuing and scheduling. However, Microsoft has emphasized that improving Teams' performance is an ongoing endeavor, and that it hopes that these efforts will continue to make the software a reliable partner in online communication scenarios. On a related note, Microsoft recently unveiled some upcoming performance upgrades for OneDrive on macOS too, and you can check those out here.
  • Recent Achievements

    • First Post
      StaticMatrix earned a badge
      First Post
    • Week One Done
      StaticMatrix earned a badge
      Week One Done
    • Rookie
      lamborghiniv10 went up a rank
      Rookie
    • One Month Later
      pinnclepd earned a badge
      One Month Later
    • First Post
      X-No-file earned a badge
      First Post
  • Popular Contributors

    1. 1
      +primortal
      520
    2. 2
      PsYcHoKiLLa
      209
    3. 3
      +Edouard
      147
    4. 4
      Steven P.
      97
    5. 5
      ATLien_0
      83
  • Tell a friend

    Love Neowin? Tell a friend!