Selenium WebDriver – How To Find Broken Links On a Page

Overview:

In this article, we will see how we could use Selenium WebDriver to find broken links on a webpage.

Utility Class:

To demonstrate how it works, I would be using below simple utility. This below method simply returns the HTTP response code the given URL.

public class LinkUtil {

    // hits the given url and returns the HTTP response code
    public static int getResponseCode(String link) {
        URL url;
        HttpURLConnection con = null;
        Integer responsecode = 0;
        try {
            url = new URL(link);
            con = (HttpURLConnection) url.openConnection();
            responsecode = con.getResponseCode();
        } catch (Exception e) {
            // skip
        } finally {
            if (null != con)
                con.disconnect();
        }
        return responsecode;
    }

}

Usage:

Rest is simple.  Find all the elements which has the href src attribute and by using the above utility class, we could collect the response codes for all the links and groups them based on the response codes.

driver.get("https://www.yahoo.com");

Map<Integer, List<String>> map = driver.findElements(By.xpath("//*[@href]")) 
                .stream()                             // find all elements which has href attribute & process one by one
                .map(ele -> ele.getAttribute("href")) // get the value of href
                .map(String::trim)                    // trim the text
                .distinct()                           // there could be duplicate links , so find unique
                .collect(Collectors.groupingBy(LinkUtil::getResponseCode)); // group the links based on the response code

Now we could access the urls based on the response code we are interested in.

map.get(200) // will contain all the good urls
map.get(403) // will contain all the 'Forbidden' urls   
map.get(404) // will contain all the 'Not Found' urls   
map.get(0) // will contain all the unknown host urls

We can even simplify this – just partition the urls if the response code is 200 or not.

Map<Boolean, List<String>> map= driver.findElements(By.xpath("//*[@href]"))  // find all elements which has href attribute
                  .stream()
                  .map(ele -> ele.getAttribute("href"))   // get the value of href
                  .map(String::trim)                      // trim the text
                  .distinct()                             // there could be duplicate links , so find unique
                  .collect(Collectors.partitioningBy(link -> LinkUtil.getResponseCode(link) == 200)); // partition based on response code

Simply we could access the map to list all the bad urls as shown here.

map.get(true) // will contain all the good urls
map.get(false) // will contain all the bad urls

Print all the bad URLs.

map.get(false)
   .stream()
   .forEach(System.out::println);

 

Happy Testing & Subscribe 🙂

 

Share This:

6 thoughts on “Selenium WebDriver – How To Find Broken Links On a Page

  1. I am able to run this code however I am unable to print using the line “map.get(false)
    .stream()
    .forEach(System.out::println);”

    1. KD,

      Please check the updated code. Basically the Map should be set with the exact type like this “Map<Boolean, List<String>> map”. Mostly the IDE itself will correct this.

  2. Just a quick question like, whether it gives the broken links for all the links and sublinks in the website main URL provided?

    1. KD, it scans entire page which has the href attribute which is link to other sits and checks if the site is reachable or not.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.