I improved the speed of my Selenium link crawling algorithm by directly extracting the href URLs of all hyperlinks, instead of retrieving the hyperlinks by ID and querying their href attributes:
string sLinks = selenium.Eval(@" var s = '', i = 0; for(i = 0; i < window.document.getElementsByTagName('a').length; i++) { s = s + ' ' + window.document.getElementsByTagName('a')[i].href; } s;"); string[] rgsLinks = sLinks.Split(' ');
The string array now contains all URLs found in the current page.
As each call to the Selenium API is passed to the Selenium server, then to the browser, which evaluates it, and returns the result to the server, which passes it to the client, this approach is way faster than querying individual a.href attributes.
Newbie here. Just out of curiosity, is there an advantage to using your version over using window.document.links to gather all the hrefs? Or are they about the same?
blame my inexperience in JavaScript and DOM…