Page-specific actions in Selenium NUnit crawler

In my previous posts on Selenium and NUnit I described how to crawl your web application by following all the links on every page, and hashing the visited addresses.

My crawler can optionally reduce a page’s URL to a kind of signature consisting of the address and the names of its parameters. For example, the URL


would be reduced to


If we want to add page-specific actions, the simplest approach is a huge switch/case statement which finds us the actions to be performed depending on the current address (and thus, depending on the signature of the current URL). Let’s define

delegate void PageTest();


string sUrlPattern = sUrl.Substring(sUrl.LastIndexOf("/") + 1);
if (sUrlPattern.Contains("?"))
    sUrlPattern = Regex.Replace(sUrlPattern, "=.+?&", "&");  // non-greedy
    sUrlPattern = Regex.Replace(sUrlPattern, "=.+", "");
List<PageTest> lifnTests = new List<PageTest>();

We can then add page tests to the list:

    case "mypage.aspx?page&section":
      lifnTests.Add(delegate() { TestMyPageWithPageAndSection(); });

The List<PageTest> now contains all the test functions that can be called because of the parameters of the current page.

foreach(PageTest pt in lifnTests)

An action consists of the usual Selenium commands:

private void TestMyPageWithPageAndSection()

Of course, the necessary try/catch blocks, logging, etc need to be added to let NUnit run through an application test in case of an error.

Selenium NUnit crawler speed-up

I improved the speed of my Selenium link crawling algorithm by directly extracting the href URLs of all hyperlinks, instead of retrieving the hyperlinks by ID and querying their href attributes:

string sLinks = selenium.Eval(@"
var s = '', i = 0;
for(i = 0; i < window.document.getElementsByTagName('a').length; i++) {
    s = s + ' ' + window.document.getElementsByTagName('a')[i].href;

string[] rgsLinks = sLinks.Split(' ');

The string array now contains all URLs found in the current page.

As each call to the Selenium API is passed to the Selenium server, then to the browser, which evaluates it, and returns the result to the server, which passes it to the client, this approach is way faster than querying individual a.href attributes.

Crawling all Links with Selenium and NUnit

The simplest test to perform on a web application is to simply follow all links presented by the application.

To start up the web application test, we have to login first, and we should be able to detect .Net server errors.

Our crawling test needs two collections: the URLs we already processed, and a queue of URLs to be processed.

private System.Collections.Hashtable ht;
private System.Collections.Generic.Queue<string> quUrls;

The first entry in the URL Queue is the page following the login. The hashtable can be filled with URLs not to be followed (such as a link to logout):

public void OpenAllLinks()
    Configuration.Login(selenium, uMandIndex);

    int uDone = 0;
    string sUrl;
    sUrl = selenium.GetEval("window.document.getElementById('ctl00_hlLogout').href");
    ht.Add(sUrl, "dont follow");

    while (quUrls.Count > 0)
        sUrl = quUrls.Dequeue();
        ht.Add(sUrl, "done");
        NUnitLog.Trace("processing " + sUrl + " scanned " + ht.Count.ToString() +
            " todo " + quUrls.Count.ToString() + " done " + uDone.ToString());


        string sHtml = selenium.GetHtmlSource();
        if (DetectServerError(sHtml))
            HandleServerError(sUrl, sHtml);

        string sCount = selenium.GetEval(
        NUnitLog.Trace(sCount + " links");

After retrieving the page, we collect all href attributes. As selenium.GetAllLinks only returns named elements, we need to set the id attribute of unnamed links. For performance reasons, this is done with a single Javascript call:

var i = 0, ii = 0;
for(i = 0; i < window.document.getElementsByTagName('a').length; i++) {
if (window.document.getElementsByTagName('a')[i].id == '') {
window.document.getElementsByTagName('a')[i].id = 'hl_' + i; ii++;
ii;") + " links updated");

        string[] rgsLinks = selenium.GetAllLinks();

        foreach (string sLink in rgsLinks)
            if (!string.IsNullOrEmpty(sLink) &&
                sLink != "ctl00_hlHelp")
                string sUrlLink = selenium.GetEval(
                    "window.document.getElementById('" + sLink + "').href");
                if (!string.IsNullOrEmpty(sUrlLink) &&
                    sUrlLink.StartsWith(Configuration.Host) &&
                    string sUrlLinkBase = sUrlLink;
                    if (sUrlLinkBase.Contains("?"))
                        sUrlLinkBase = Regex.Replace(sUrlLinkBase, "=.+?&", "&");
                        sUrlLinkBase = Regex.Replace(sUrlLinkBase, "=.+", "");

                    if (!ht.ContainsKey(sUrlLinkBase) &&
                        NUnitLog.Trace("queuing " + sUrlLink);

                        if (sUrlLinkBase != sUrlLink)
                            ht.Add(sUrlLinkBase, "pseudo");

The sUrlLinkBase variable is calculated to avoid calling the same .aspx page with different parameters. Therefore we extract all parameter values with two regular expressions, just leaving the parameter names in the URL. If this modified URL is not in the hashtable of processed urls, we queue it. This calculation is optional; disable it if you want to crawl each and every generated page.

Detecting ASP.Net Server Error in Selenium

If you request a URL with Selenium, the first thing you want to check is whether the page can be displayed, or whether it generates a Server Error message.

If there a server error occurred, the string “Server Error in” occurs in the generated HTML text. We then strip everything that is not inside the <BODY> tag, remove all HTML tags (they do not contain any relevant information in case of an error), and log and store the remaining error message.


string sHtml = selenium.GetHtmlSource();
if (sHtml.Contains("Server Error in "))

    try    { sHtml = sHtml.Substring(sHtml.IndexOf("<BODY")); } catch { }
    try    { sHtml = sHtml.Substring(0, sHtml.IndexOf("</BODY>")); } catch { }
    try    { sHtml = Regex.Replace(sHtml, "<[^>]+?>", ""); } catch { }
    try    { sHtml = Regex.Replace(sHtml, " +", " "); } catch { }
    try    { sHtml = Regex.Replace(sHtml, @"(\s\n)+", "\n"); } catch { }

selenium is of type ISelenium, verificationErrors is a StringBuilder, conforming to the Selenium convention. NUnitLog is a custom class handling logging and error messages.

First Steps with Selenium and NUnit

After installing Selenium and NUnit, I started my first experiments with the testing framework. Starting from the results of the Selenium IDE, which generates C# source of recorded test cases, I quickly created a test case project in VS2005.

In NUnit, each test case is a class with the [TestFixture] attribute, whose public methods are tagged with [SetUp], [TearDown] and [Test].

Helper Class

Since I intended to write several tests, the first step was to encapsulate Selenium initialization and login steps into a helper class:

public class Helper
  public static ISelenium StartSelenium()
    ISelenium selenium = new DefaultSelenium("localhost", 4444, "*iexplore", "http://web.server");
    return selenium;

  public static string Login(ISelenium selenium)
    selenium.Type("username", "User");
    selenium.Type("password", "Password123");
    selenium.FireEvent("username", "keyup");
    return selenium.GetLocation();

A test class simply calls the StartSelenium() method from the [SetUp] method, and the [Test] method uses Login() to log in to the web application.

Javascript Events

The selenium.Type() method only sets an input controls text, but does not simulate key or mouse events. If you have a button which is enabled through Javascript events, you need to explicitly fire an event.


NUnit GUI displays all calls to System.Diagnostics.Trace.WriteLine() in its Trace tab.

Integration with log4net

This blog has basic instructions of how to integrate log4net (in VB.Net though).

To have log4net functionality for every test case, I created a class with the [SetUpFixture] attribute, with its [Setup] method creating the logger. This provides logging functionality for every test in the namespace.

Automated Web Application Testing using Selenium and NUnit

A recent post on stackoverflow raised the question which tools to use for automated tests of web applications.

This question has been an interesting issue for me, as the largest web application I develop and maintain has over the years grown to some 300 aspx files.

Required software to implement tests using Selenium and NUnit:

Selenium Remote Control consists of an HTTP server component requiring Java (1.5 or higher) and a .Net client library. Method calls to the library issue commands to the Selenium server, which in turn sends these commands to the browser.

First, install Selenium IDE plugin for Firefox. You can use the IDE to record your actions in the browser, and replay these recorded actions. It also generates scripts of the recorded actions (called Test Cases) in HTML, C#, and other programming languages.

After installing Selenium Remote Control, the Selenium HTTP server can be started using the command line

c:\path\to\java.exe -jar c:\path\to\selenium-server.jar

Selenium server listens on port 4444 by default.

Install NUnit and create a Visual Studio project which references the nunit.framework.dll and Thoughtworks.Selenium.Core.dll libraries. Your project is now ready to compile the test cases generated by Selenium IDE.

(I recommend changing the browser string in the DefaultSelenium constructor to “*iexplore” for first experiments)

Run the NUnit GUI application and open the newly created assembly. The left-hand tree shows assemblies, namespaces, and test case names.

After starting the Selenium server, select a node and press Run. This will open a browser window which executes the commands defined in the test case. After completion, the browser will be closed again. (If you don’t see a browser window, use task manager to watch the list of processes)