• 0

[Java] How to get HTML links within a URL


Question

I'm just starting out on my Networking Assignment and I'm already stuck.

Assignment asks me to check a the user provided website for links and to determine if they are active or inactive by reading the header info.

So far after googling, I just have this code which retrieves the website. I don't get how to go over this information and look for HTML links.

Here's the code:

import java.net.*;
import java.io.*;

public class url_checker {
    public static void main(String[] args) throws Exception {
        URL yahoo = new URL("http://yahoo.com");
        URLConnection yc = yahoo.openConnection();
        BufferedReader in = new BufferedReader(
                                new InputStreamReader(
                                yc.getInputStream()));
        String inputLine;
        int count = 0;
        while ((inputLine = in.readLine()) != null) {
            System.out.println (inputLine);               
            }      
        in.close();
    }
}

Please help.

Link to comment
Share on other sites

2 answers to this question

Recommended Posts

  • 0

You'll need to either use a regular expression to find matches in the text or use an HTML parser. There's a lot of them out there if you google, but if you don't want to use an external library, you might look at http://download.java.net/jdk7/docs/api/javax/swing/text/html/parser/ParserDelegator.html which will call the callback for each type of thing. What you'll want is to add code in the handleSimpleTag method to look for HTML.Tag.A types and print those out.

Link to comment
Share on other sites

  • 0

You'll need to either use a regular expression to find matches in the text or use an HTML parser. There's a lot of them out there if you google, but if you don't want to use an external library, you might look at http://download.java.net/jdk7/docs/api/javax/swing/text/html/parser/ParserDelegator.html which will call the callback for each type of thing. What you'll want is to add code in the handleSimpleTag method to look for HTML.Tag.A types and print those out.

My assignment doesn't allow me to use external library. I will look into ParserDelegator.

Thanks!

Link to comment
Share on other sites

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.