• 0

Java Question (Should be easy)


Question

Could someone look at my code and tell me why findInHorizon is giving me more than what my regular expression is looking for? I cannot figure it out. I think when I take the file extension part out, it will work, but when it's there takes in that whole line with extra html code. Enter 2776 when the program starts. The program will download an HTML page and look through for my regular expression, but it isn't working! This code is basically another program I made that DID work, but now when I changed it to work for C&H it stopped :/

Source code-

http://pastebin.com/cxTv30yW

THANKS!

Link to comment
https://www.neowin.net/forum/topic/1072709-java-question-should-be-easy/
Share on other sites

12 answers to this question

Recommended Posts

  • 0

I haven't tested it but I suspect it'll be because you're using a greedy match in your regex, change it to be non-greedy using ? after .*

theImage = sc2.findWithinHorizon("[h][t][t][p][:][/][/]www.explosm.net/db/files/Comics/.*?[.][jJpPgG][pPnNiI][eE]?[gGfF]", 0);[/CODE]

I've never used Java so I'm unfamiliar with its regex matcher but it looks like (?i) can make a part of a pattern case insensitive, so you could do something a bit easier to read for the extension too:

[CODE]theImage = sc2.findWithinHorizon("www\\.explosm\\.net/db/files/Comics/.*?\\.(?i)(jpe?g|gif|png)", 0);[/CODE]

  • 0

EDIT****

.*?

I think this worked. I'll post back in a bit

I haven't tested it but I suspect it'll be because you're using a greedy match in your regex, change it to be non-greedy using ? after .*

theImage = sc2.findWithinHorizon("[h][t][t][p][:][/][/]www.explosm.net/db/files/Comics/.*?[.][jJpPgG][pPnNiI][eE]?[gGfF]", 0);[/CODE]

I've never used Java so I'm unfamiliar with its regex matcher but it looks like (?i) can make a part of a pattern case insensitive, so you could do something a bit easier to read for the extension too:

[CODE]theImage = sc2.findWithinHorizon("www\\.explosm\\.net/db/files/Comics/.*?\\.(?i)(jpe?g|gif|png)", 0);[/CODE]

I'll try that!

I don?t have the ability to compile your code right now, so I can?t give you a straight answer. But, I can help you to debug your code yourself!

What?s being stored in theImage after findWithinHorizon is run? What was the expected value?

This is what's being stored in theImage

http://pastebin.com/PirKMMXU

This is what I want

http://www.explosm.n...s/Kris/well.png

Why would it grab so much extra? Is it because that regex appears in that line again?

  • 0

By default quantifiers in regex are greedy, adding ? makes them lazy.

For example: if you had the string abcdabc using a pattern of .*b would match up until the last b character: abcdab but using .*?b would give just ab

In your case it was matching the beginning of the URL, and then trying to find the last match for jpg, jpeg, gif or png. Adding the ? makes it stop on the first match.

There's a better explanation here: http://www.regular-expressions.info/repeat.html

  • 0

By default quantifiers in regex are greedy, adding ? makes them lazy.

For example: if you had the string abcdabc using a pattern of .*b would match up until the last b character: abcdab but using .*?b would give just ab

In your case it was matching the beginning of the URL, and then trying to find the last match for jpg, jpeg, gif or png. Adding the ? makes it stop on the first match.

Oh that makes sense!

Ok so at page 2717 the link does NOT contain www. So I changed my reg expression

  • 0

EDIT;

I think this fixed it

finally{

//System.out.println("hi");

continue;

}

Ok so I have another question. Sometimes the pages numbers dont exist which causes an error. I try to catch the error, then move along. The catch block catches it but loops the catch. I added a scanner.next(); to capture the bad input. That will stop the catch from looping (or I guess the program from looping with bad input) but then it just stops in the catch and doesn't continue past the scanner.next(). What can I do? I setup the next link in the catch block so I need the the program to leave the catch block and start over at the top of the while loop.

http://pastebin.com/Kpajd6ER

  • 0

I have to ask... where did you learn to put square brackets around almost every character in your regex pattern? You don't need to do that, it's just making it harder to read. Also, it doesn't matter much in your pattern, but you should be escaping the . in "explosm.net", the full-stop has a special meaning, match any character.

Good job overall though!

  • 0

I have to ask... where did you learn to put square brackets around almost every character in your regex pattern? You don't need to do that, it's just making it harder to read. Also, it doesn't matter much in your pattern, but you should be escaping the . in "explosm.net", the full-stop has a special meaning, match any 1 character.

Good job overall though!

Are you referring to the http://www part? I did the www because not all the links have that. And I just did that to the http when I was trying to debug. i just let it like that. If you are referring to my file extensions, I think it's needed.

  • 0

Are you referring to the http://www part? I did the www because not all the links have that. And I just did that to the http when I was trying to debug. i just let it like that. If you are referring to my file extensions, I think it's needed.

I meant the http part. Square brackets only have meaning if you use them like [ab] which matches a or b, but you have it around single characters: [h][t][t][p][:][/][/]. It's not a problem, I was just interested because I've never seen anyone do that before.

  • 0

I meant the http part. Square brackets only have meaning if you use them like [ab] which matches a or b, but you have it around single characters: [h][t][t][p][:][/][/]. It's not a problem, I was just interested because I've never seen anyone do that before.

Just a debugging thing but left it haha

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • I'm fine with a little reasonable promotion of Edge, but the degree which they do it right now I consider extremely unreasonable. 
    • Microsoft AI boss no longer believes that AI will replace human workers by David Uzondu Mustafa Suleyman, the head of Microsoft AI, recently took back his statements concerning white-collar jobs that he gave to the Financial Times in an interview made back in February, where he claimed that AI would replace office workers within 12 to 18 months. On Monday's episode of The Verge's Decoder, Suleyman recast the technology as more like a helpmate than a tool designed to take over your job. He explained that smaller office duties will "increasingly become digitized, automated" as people generate more digital materials. During the discussion, Suleyman emphasized a "very important distinction" between "tasks" and "jobs" to clarify his previous claims. He argued that his earlier comments only referred to individual actions that people perform at their desks. Suleyman used to work for DeepMind, the research lab he co-founded in 2010 alongside Demis Hassabis and Shane Legg, before he left in 2022 to establish Inflection AI and build an empathetic digital assistant. Microsoft hired him in March 2024 to lead its newly formed "Microsoft AI" division, placing him in charge of consumer products like Copilot, Bing, and Edge. His February comments also detailed plans for Microsoft to achieve self-sufficiency with a $140 billion infrastructure budget to train frontier models, predicting that creating a customized AI will soon feel like creating a podcast or a new blog: The 41-year-old is not the only AI executive who's softened his "AI will replace you" stance. OpenAI's CEO, Sam Altman, last month used X to push back against employment panic by arguing that his startup builds tools to assist humans rather than build replacements. He had previously garnered backlash by suggesting that many modern office roles that AI might replace did not qualify as "real work" in the first place, at least when you compare desk jobs to physical, historical labor like farming.
    • Adobe Acrobat Reader DC 2026.001.21662 by Razvan Serea Adobe Acrobat Reader DC software is the free, trusted standard for viewing, printing, signing, and annotating PDFs. Its the only PDF viewer that can open and interact with all types of PDF content – including forms and multimedia. It’s connected to Adobe Document Cloud – so you can work with PDFs on computers and mobile devices. Adobe Document Cloud is a revolutionary, modern and efficient way to get work done with documents in the office, at home or on-the-go. At the heart of Document Cloud is the all-new Adobe Acrobat DC, which will take e-signatures mainstream by delivering free e-signing with every individual subscription. Document Cloud includes a set of integrated services that use a consistent online profile and personal document hub. With Adobe Document Cloud, people will be able to create, review, approve, sign and track documents whether on a desktop or mobile device. Businesses will be able to take advantage of Document Cloud for enterprise which provides enterprise-class document services that integrate into systems of record such as CRM, HCM, CLM, and CMS, adding speed, efficiency and transparency to getting business done with documents. Adobe Acrobat Reader DC new feature highlights: Work with PDFs from anywhere with the new, free Acrobat DC mobile app for Android or iOS. Select functionality is also available on Windows Phone. Use the new Fill & Sign tool in your desktop software to complete PDF forms fast with smart autofill. Download the free Adobe Fill & Sign mobile app to add the same option to your iPad or Android tablet device. Save money on ink and toner when printing from your Windows PC. Store and access files in Adobe Document Cloud with 5GB of free storage. Get instant access to recent files across desktop, web, and mobile devices with Mobile Link. Sync your Fill & Sign autofill collection across desktop, web, and iPad devices. Adobe PDF Pack premium features includes: Convert documents and images to PDF files. Use your mobile device camera to take a picture of a paper document or form and convert it to PDF. Turn PDFs into editable Microsoft Word, Excel, PowerPoint, or RTF files. Combine multiple files into a single PDF (web only). Get signatures from others with a complete e-signature service. Send, track, and confirm delivery of documents electronically instead of using fax or overnight services (tracking not available on mobile). Store and access files online with 20GB of storage. Download: Adobe Acrobat Reader DC 64-bit | 719.0 MB (Freeware) Link: Adobe Acrobat Reader DC Home Page | Release Notes | Screenshot Get alerted to all of our Software updates on Twitter at @NeowinSoftware
    • Meta will now use data from outside businesses to personalize AI responses by David Uzondu In an update that's rolling out globally (except in a handful of countries), Meta will use your data from outside businesses to personalize your AI responses and your primary feeds. Meta already utilizes your shopping activity to target ads, but the company now plans to expand this tracking to personalize other "parts of your experience" like feed algorithms and AI assistant chats. The company is replacing the two settings ("Your activity off Meta technologies" and "Activity from other businesses") that currently let you disconnect off-platform activity with a single, renamed setting called Activity from other businesses. If you don't want Meta to manipulate your feed and AI responses using your outside history, you can just turn the Activity from other businesses setting off in your account settings. This toggle resides within your Accounts Center, applying your choice to every connected profile. Turning this off will not stop companies from sending your data to Meta. The company will still collect your web interactions, but it only uses them to train products, while still accessing external accounts you connect. When The Verge spoke to Meta spokesperson, Emil Vazquez, the representative said that this update will exclude several locations at launch including the European region, the UK, Brazil, Thailand, South Africa, Turkey, South Korea, Ecuador, Nigeria, and Kenya. The new update comes at a time when the social media giant is recovering from a major PR disaster involving generative AI. Last week, there was a huge security issue on Instagram where attackers figured out a way to exploit a prompt injection vulnerability. Hackers managed to trick Meta AI into handing over account ownership (even if the victim had 2FA enabled). Some of the affected accounts include the dormant Obama White House profile, cosmetics brand Sephora, the Chief Master Sergeant of the Space Force, and security researcher Jane Manchun Wong. Internally, the company also had to scale back plans on its Model Capability Initiative (MCI), an employee-monitoring program designed to train corporate AI models by recording worker keystrokes and screen activity, after employees raised privacy concerns and complained about severe battery life drain.
    • JetBrains is working to cut false positives in RustRover 2026.2 by David Uzondu Recently, JetBrains released the fifth EAP build of its dedicated IDE, RustRover 2026.2, bringing improvements like a Run gutter icon for criterion_main! macro benchmarking and a feature that alerts you when there are unused traits in your current scope. Now, the company is out with a blog post addressing one of the "most common" complaints from users: false positives. In RustRover, a false positive occurs when the editor incorrectly highlights something as an error even though the project compiles and runs successfully. This mismatch flags a gap between the IDE's internal intelligence and the actual compiler. When the editor flashes red warnings over perfectly valid code, developers lose trust in the tool, which stalls momentum. Traditionally, RustRover runs cargo check to detect compiler errors and warnings, but it also relies on its own code analysis engine to power real-time features. To provide quick feedback, this engine parses your source code into a syntax tree while inferring types and resolving names as you type. Because this engine must work on broken, half-written code and react instantly, its logic sometimes diverges from the compiler's, producing false positives that do not exist in the compiler's eyes. JetBrains said that it has a "dedicated task force" focused specifically on identifying and fixing false positives by analyzing user reports and examining large-scale open-source projects. To speed up this process, the team built an internal system modeled after Crater, the famous Rust project that compiles and runs tests for every single crate published on crates.io. This automated pipeline compares the diagnostics from RustRover's analysis with actual compiler output to catch discrepancies before they reach users, ensuring smoother workflows. RustRover, for those who're unaware, is a dedicated IDE designed specifically for Rust developers. It's been around for a couple of years now, providing features like built-in debugging via LLDB, seamless cargo integration, advanced macro expansion, and HTML support. JetBrains distributes the app under two licensing models: a paid commercial subscription and a free option for non-commercial use.
  • Recent Achievements

    • One Year In
      Primer1st earned a badge
      One Year In
    • Experienced
      JayZJay went up a rank
      Experienced
    • Reacting Well
      Sir_Timbit earned a badge
      Reacting Well
    • Week One Done
      rubentuben8 earned a badge
      Week One Done
    • Week One Done
      ARaclen earned a badge
      Week One Done
  • Popular Contributors

    1. 1
      +primortal
      512
    2. 2
      PsYcHoKiLLa
      229
    3. 3
      Edouard
      134
    4. 4
      ATLien_0
      87
    5. 5
      Steven P.
      80
  • Tell a friend

    Love Neowin? Tell a friend!