Crawling all Links with Selenium and NUnit

October 24, 2008

The simplest test to perform on a web application is to simply follow all links presented by the application.

To start up the web application test, we have to login first, and we should be able to detect .Net server errors.

Our crawling test needs two collections: the URLs we already processed, and a queue of URLs to be processed.

private System.Collections.Hashtable ht;
private System.Collections.Generic.Queue<string> quUrls;

The first entry in the URL Queue is the page following the login. The hashtable can be filled with URLs not to be followed (such as a link to logout):

public void OpenAllLinks()
{
    Configuration.Login(selenium, uMandIndex);
    quUrls.Enqueue(selenium.GetLocation());

    int uDone = 0;
    string sUrl;
    sUrl = selenium.GetEval("window.document.getElementById('ctl00_hlLogout').href");
    ht.Add(sUrl, "dont follow");

    while (quUrls.Count > 0)
    {
        sUrl = quUrls.Dequeue();
        ht.Add(sUrl, "done");
        uDone++;
        NUnitLog.Trace("processing " + sUrl + " scanned " + ht.Count.ToString() +
            " todo " + quUrls.Count.ToString() + " done " + uDone.ToString());

        selenium.Open(sUrl);
        selenium.WaitForPageToLoad("30000");

        string sHtml = selenium.GetHtmlSource();
        if (DetectServerError(sHtml))
        {
            HandleServerError(sUrl, sHtml);
            continue;
        }

        string sCount = selenium.GetEval(
            "window.document.getElementsByTagName('a').length");
        NUnitLog.Trace(sCount + " links");

After retrieving the page, we collect all href attributes. As selenium.GetAllLinks only returns named elements, we need to set the id attribute of unnamed links. For performance reasons, this is done with a single Javascript call:

        NUnitLog.Trace(selenium.GetEval(@"
var i = 0, ii = 0;
for(i = 0; i < window.document.getElementsByTagName('a').length; i++) {
if (window.document.getElementsByTagName('a')[i].id == '') {
window.document.getElementsByTagName('a')[i].id = 'hl_' + i; ii++;
}
}
ii;") + " links updated");

        string[] rgsLinks = selenium.GetAllLinks();

        foreach (string sLink in rgsLinks)
        {
            if (!string.IsNullOrEmpty(sLink) &&
                sLink != "ctl00_hlHelp")
            {
                string sUrlLink = selenium.GetEval(
                    "window.document.getElementById('" + sLink + "').href");
                if (!string.IsNullOrEmpty(sUrlLink) &&
                    sUrlLink.StartsWith(Configuration.Host) &&
                    !sUrlLink.Contains(".ashx"))
                {
                    string sUrlLinkBase = sUrlLink;
                    if (sUrlLinkBase.Contains("?"))
                    {
                        sUrlLinkBase = Regex.Replace(sUrlLinkBase, "=.+?&", "&");
                        sUrlLinkBase = Regex.Replace(sUrlLinkBase, "=.+", "");
                    }

                    if (!ht.ContainsKey(sUrlLinkBase) &&
                        !quUrls.Contains(sUrlLinkBase))
                    {
                        NUnitLog.Trace("queuing " + sUrlLink);
                        quUrls.Enqueue(sUrlLink);

                        if (sUrlLinkBase != sUrlLink)
                            ht.Add(sUrlLinkBase, "pseudo");
                    }
                }
            }
        }
    }
}

The sUrlLinkBase variable is calculated to avoid calling the same .aspx page with different parameters. Therefore we extract all parameter values with two regular expressions, just leaving the parameter names in the URL. If this modified URL is not in the hashtable of processed urls, we queue it. This calculation is optional; disable it if you want to crawl each and every generated page.


Detecting ASP.Net Server Error in Selenium

October 10, 2008

If you request a URL with Selenium, the first thing you want to check is whether the page can be displayed, or whether it generates a Server Error message.

If there a server error occurred, the string “Server Error in” occurs in the generated HTML text. We then strip everything that is not inside the <BODY> tag, remove all HTML tags (they do not contain any relevant information in case of an error), and log and store the remaining error message.

selenium.Open(sUrl);
selenium.WaitForPageToLoad("30000");

string sHtml = selenium.GetHtmlSource();
if (sHtml.Contains("Server Error in "))
{
    verificationErrors.AppendLine("");
    verificationErrors.AppendLine(sUrl);
    NUnitLog.Error(sUrl);

    try    { sHtml = sHtml.Substring(sHtml.IndexOf("<BODY")); } catch { }
    try    { sHtml = sHtml.Substring(0, sHtml.IndexOf("</BODY>")); } catch { }
    try    { sHtml = Regex.Replace(sHtml, "<[^>]+?>", ""); } catch { }
    try    { sHtml = Regex.Replace(sHtml, " +", " "); } catch { }
    try    { sHtml = Regex.Replace(sHtml, @"(\s\n)+", "\n"); } catch { }
    NUnitLog.Error(sHtml);
    verificationErrors.AppendLine(sHtml);
}

selenium is of type ISelenium, verificationErrors is a StringBuilder, conforming to the Selenium convention. NUnitLog is a custom class handling logging and error messages.


First Steps with Selenium and NUnit

October 1, 2008

After installing Selenium and NUnit, I started my first experiments with the testing framework. Starting from the results of the Selenium IDE, which generates C# source of recorded test cases, I quickly created a test case project in VS2005.

In NUnit, each test case is a class with the [TestFixture] attribute, whose public methods are tagged with [SetUp], [TearDown] and [Test].

Helper Class

Since I intended to write several tests, the first step was to encapsulate Selenium initialization and login steps into a helper class:

public class Helper
{
  public static ISelenium StartSelenium()
  {
    ISelenium selenium = new DefaultSelenium("localhost", 4444, "*iexplore", "http://web.server");
    selenium.Start();
    return selenium;
  }

  public static string Login(ISelenium selenium)
  {
    selenium.Open("default.aspx");
    selenium.Type("username", "User");
    selenium.Type("password", "Password123");
    selenium.FireEvent("username", "keyup");
    selenium.Click("login");
    selenium.WaitForPageToLoad("30000");
    return selenium.GetLocation();
  }
}

A test class simply calls the StartSelenium() method from the [SetUp] method, and the [Test] method uses Login() to log in to the web application.

Javascript Events

The selenium.Type() method only sets an input controls text, but does not simulate key or mouse events. If you have a button which is enabled through Javascript events, you need to explicitly fire an event.

Tracing

NUnit GUI displays all calls to System.Diagnostics.Trace.WriteLine() in its Trace tab.

Integration with log4net

This blog has basic instructions of how to integrate log4net (in VB.Net though).

To have log4net functionality for every test case, I created a class with the [SetUpFixture] attribute, with its [Setup] method creating the logger. This provides logging functionality for every test in the namespace.


Follow

Get every new post delivered to your Inbox.