문제

I am writing a web crawler tool in Java. When I type the website name, how can I make it so that it connects to that site in http or https without me defining the protocol?

try {
   Jsoup.connect("google.com").get();
} catch (IOException ex) {
   Logger.getLogger(LinkGUI.class.getName()).log(Level.SEVERE, null, ex);
}

But I get the error:

java.lang.IllegalArgumentException: Malformed URL: google.com

What can I do? Are there any classes or libraries that do this?

What I'm trying to do is I have a list of 165 Courses, each with 65 - 71 html pages with links all throughout them. I am writing a Java program to test if the link is broken or not.

도움이 되었습니까?

해결책

You can write your own simple method to try both protocols, like:

static boolean usesHttps(final String urlWithoutProtocol) throws IOException {
    try {
        Jsoup.connect("http://" + urlWithoutProtocol).get();
        return false;
    } catch (final IOException e) {
        Jsoup.connect("https://" + urlWithoutProtocol).get();
        return true;
    }
}

Then, your original code can be:

try {
    boolean shouldUseHttps = usesHttps("google.com");
} catch (final IOException ex) {
    Logger.getLogger(LinkGUI.class.getName()).log(Level.SEVERE, null, ex);
}

Note: you should only use the usesHttps() method once per URL, to figure out which protocol to use. After you know that, you should connect using Jsoup.connect() directly. This will be more efficient.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top