• 0

Java Question (Should be easy)


Question

Could someone look at my code and tell me why findInHorizon is giving me more than what my regular expression is looking for? I cannot figure it out. I think when I take the file extension part out, it will work, but when it's there takes in that whole line with extra html code. Enter 2776 when the program starts. The program will download an HTML page and look through for my regular expression, but it isn't working! This code is basically another program I made that DID work, but now when I changed it to work for C&H it stopped :/

Source code-

http://pastebin.com/cxTv30yW

THANKS!

Link to comment
https://www.neowin.net/forum/topic/1072709-java-question-should-be-easy/
Share on other sites

12 answers to this question

Recommended Posts

  • 0

I haven't tested it but I suspect it'll be because you're using a greedy match in your regex, change it to be non-greedy using ? after .*

theImage = sc2.findWithinHorizon("[h][t][t][p][:][/][/]www.explosm.net/db/files/Comics/.*?[.][jJpPgG][pPnNiI][eE]?[gGfF]", 0);[/CODE]

I've never used Java so I'm unfamiliar with its regex matcher but it looks like (?i) can make a part of a pattern case insensitive, so you could do something a bit easier to read for the extension too:

[CODE]theImage = sc2.findWithinHorizon("www\\.explosm\\.net/db/files/Comics/.*?\\.(?i)(jpe?g|gif|png)", 0);[/CODE]

  • 0

EDIT****

.*?

I think this worked. I'll post back in a bit

I haven't tested it but I suspect it'll be because you're using a greedy match in your regex, change it to be non-greedy using ? after .*

theImage = sc2.findWithinHorizon("[h][t][t][p][:][/][/]www.explosm.net/db/files/Comics/.*?[.][jJpPgG][pPnNiI][eE]?[gGfF]", 0);[/CODE]

I've never used Java so I'm unfamiliar with its regex matcher but it looks like (?i) can make a part of a pattern case insensitive, so you could do something a bit easier to read for the extension too:

[CODE]theImage = sc2.findWithinHorizon("www\\.explosm\\.net/db/files/Comics/.*?\\.(?i)(jpe?g|gif|png)", 0);[/CODE]

I'll try that!

I don?t have the ability to compile your code right now, so I can?t give you a straight answer. But, I can help you to debug your code yourself!

What?s being stored in theImage after findWithinHorizon is run? What was the expected value?

This is what's being stored in theImage

http://pastebin.com/PirKMMXU

This is what I want

http://www.explosm.n...s/Kris/well.png

Why would it grab so much extra? Is it because that regex appears in that line again?

  • 0

By default quantifiers in regex are greedy, adding ? makes them lazy.

For example: if you had the string abcdabc using a pattern of .*b would match up until the last b character: abcdab but using .*?b would give just ab

In your case it was matching the beginning of the URL, and then trying to find the last match for jpg, jpeg, gif or png. Adding the ? makes it stop on the first match.

There's a better explanation here: http://www.regular-expressions.info/repeat.html

  • 0

By default quantifiers in regex are greedy, adding ? makes them lazy.

For example: if you had the string abcdabc using a pattern of .*b would match up until the last b character: abcdab but using .*?b would give just ab

In your case it was matching the beginning of the URL, and then trying to find the last match for jpg, jpeg, gif or png. Adding the ? makes it stop on the first match.

Oh that makes sense!

Ok so at page 2717 the link does NOT contain www. So I changed my reg expression

  • 0

EDIT;

I think this fixed it

finally{

//System.out.println("hi");

continue;

}

Ok so I have another question. Sometimes the pages numbers dont exist which causes an error. I try to catch the error, then move along. The catch block catches it but loops the catch. I added a scanner.next(); to capture the bad input. That will stop the catch from looping (or I guess the program from looping with bad input) but then it just stops in the catch and doesn't continue past the scanner.next(). What can I do? I setup the next link in the catch block so I need the the program to leave the catch block and start over at the top of the while loop.

http://pastebin.com/Kpajd6ER

  • 0

I have to ask... where did you learn to put square brackets around almost every character in your regex pattern? You don't need to do that, it's just making it harder to read. Also, it doesn't matter much in your pattern, but you should be escaping the . in "explosm.net", the full-stop has a special meaning, match any character.

Good job overall though!

  • 0

I have to ask... where did you learn to put square brackets around almost every character in your regex pattern? You don't need to do that, it's just making it harder to read. Also, it doesn't matter much in your pattern, but you should be escaping the . in "explosm.net", the full-stop has a special meaning, match any 1 character.

Good job overall though!

Are you referring to the http://www part? I did the www because not all the links have that. And I just did that to the http when I was trying to debug. i just let it like that. If you are referring to my file extensions, I think it's needed.

  • 0

Are you referring to the http://www part? I did the www because not all the links have that. And I just did that to the http when I was trying to debug. i just let it like that. If you are referring to my file extensions, I think it's needed.

I meant the http part. Square brackets only have meaning if you use them like [ab] which matches a or b, but you have it around single characters: [h][t][t][p][:][/][/]. It's not a problem, I was just interested because I've never seen anyone do that before.

  • 0

I meant the http part. Square brackets only have meaning if you use them like [ab] which matches a or b, but you have it around single characters: [h][t][t][p][:][/][/]. It's not a problem, I was just interested because I've never seen anyone do that before.

Just a debugging thing but left it haha

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • JetBrains is working to cut false positives in RustRover 2026.2 by David Uzondu Recently, JetBrains released the fifth EAP build of its dedicated IDE, RustRover 2026.2, bringing improvements like a Run gutter icon for criterion_main! macro benchmarking and a feature that alerts you when there are unused traits in your current scope. Now, the company is out with a blog post addressing one of the "most common" complaints from users: false positives. In RustRover, a false positive occurs when the editor incorrectly highlights something as an error even though the project compiles and runs successfully. This mismatch flags a gap between the IDE's internal intelligence and the actual compiler. When the editor flashes red warnings over perfectly valid code, developers lose trust in the tool, which stalls momentum. Traditionally, RustRover runs cargo check to detect compiler errors and warnings, but it also relies on its own code analysis engine to power real-time features. To provide quick feedback, this engine parses your source code into a syntax tree while inferring types and resolving names as you type. Because this engine must work on broken, half-written code and react instantly, its logic sometimes diverges from the compiler's, producing false positives that do not exist in the compiler's eyes. JetBrains said that it has a "dedicated task force" focused specifically on identifying and fixing false positives by analyzing user reports and examining large-scale open-source projects. To speed up this process, the team built an internal system modeled after Crater, the famous Rust project that compiles and runs tests for every single crate published on crates.io. This automated pipeline compares the diagnostics from RustRover's analysis with actual compiler output to catch discrepancies before they reach users, ensuring smoother workflows. RustRover, for those who're unaware, is a dedicated IDE designed specifically for Rust developers. It's been around for a couple of years now, providing features like built-in debugging via LLDB, seamless cargo integration, advanced macro expansion, and HTML support. JetBrains distributes the app under two licensing models: a paid commercial subscription and a free option for non-commercial use.
    • Last year I bought the 2TB variant for $114 on Amazon. That's crazy that the 1TB is now 67% more expensive for half the storage, even with the newer T9 already on the market. And that's considered a good deal.
    • You can disable all non needed features from Brave. There is also Brave Origin which removes them entirely and it is free for Linux.
    • I wish I could use Brave but the tab suspension feature is horrible. It doesn't suspend them like Edge does. Even after 2h open with 70+ tabs (same as Edge), it has 2GB more consumption than Edge for no reason.
    • TeamViewer 15.78.4.0 by Razvan Serea TeamViewer is the fast, simple and friendly solution for remote access over the Internet - all applications in one single, very affordable module. Remote control of computers over the Internet, Instantly take control over a computer anywhere on the Internet, even through firewalls. No installation required, just use it fast and secure. Training, sales and teamwork, TeamViewer can also be used to present your desktop to a partner on the Internet. Show and share your software, PowerPoint presentations etc. File transfer, chat and more, Share your files, chat, switch the direction during a teamwork session, and a lot more is included in TeamViewer. TeamViewer key features: Cross-platform remote access (Windows, macOS, Linux, Android, iOS, IoT) Attended and unattended remote control Secure file transfer between devices Remote printing to local printers Multi-monitor support with easy switching Wake-on-LAN for sleeping devices Session links for quick connections (no password sharing) Web client access (no installation needed) End-to-end encryption (AES-256) Two-factor authentication and access controls AI-powered session insights and reporting Mass deployment and device management tools Customizable allow/block lists for security Command line and script execution remotely Performance monitoring and analytics dashboards TeamViewer 15.78.4.0 changelog: Improvements Permissions inheritance has been improved, increasing reliability when permissions are assigned to user group managers. Bugfixes Fixed a bug where 'Show details' button was not showing up on command bar upon selection of a device group. Fixed a bug which was causing the legacy groups to disappear when applying hide offline filter in basic view. Fixed a bug where devices were loading infinitely after login. Fixed a bug which was causing crash in application. Download: TeamViewer 15.78.4.0 | 32-bit | Portable | Mac | ~70.0 MB (Free for personal use) View: TeamViewer Home Page | Release Notes | Screenshot Get alerted to all of our Software updates on Twitter at @NeowinSoftware
  • Recent Achievements

    • One Year In
      Primer1st earned a badge
      One Year In
    • Experienced
      JayZJay went up a rank
      Experienced
    • Reacting Well
      Sir_Timbit earned a badge
      Reacting Well
    • Week One Done
      rubentuben8 earned a badge
      Week One Done
    • Week One Done
      ARaclen earned a badge
      Week One Done
  • Popular Contributors

    1. 1
      +primortal
      524
    2. 2
      PsYcHoKiLLa
      231
    3. 3
      Edouard
      135
    4. 4
      ATLien_0
      88
    5. 5
      Steven P.
      82
  • Tell a friend

    Love Neowin? Tell a friend!