TPAC 2017

The WebDriver working group meeting at TPAC this year marked the culmination of six years of hard work defining WebDriver as a standard. Up to this point, the work has largely been about specifying the behaviour of existing implementations: we have tightened up semantics where drivers have behaved differently, corrected inconsistencies inherited from the Selenium wire protocol, and written a test suite.

We are quickly reaching a point where we can deliver a consistent cross-browser mechanism for instrumenting and automating web browsers. Vendors are already reaping the benefits of this by employing WebDriver in the testing of standards, and web authors will soon see a range of new features be added.

New windows

One long-awaited feature is for WebDriver to be able to open new windows and tabs. Today people use many different techniques, such as injecting a…) script to work around this deficiency. Doing that is problematic because new windows will be children of the current window, potentially opening them up to leeching.

Some users are exploiting the fact that certain drivers let you perform key combinations that affect the OS or the browser. For example, a ^T or ⌘T combination will normally open a new tab in desktop browsers, but WebDriver is constricted to web content and is not meant to let you interact with the surrounding system.

There is currently no platform API that lets you open a new untainted top-level browsing context and it will complement the other window manipulation commands well.


In the Selenium project many drivers implemented a rudimentary logging API that made it possible get different logs such as console- and performance, the driver logs, and Selenium Grid logs. We talked about logging several years ago but decided to put it on hold in order to narrow the scope of the first draft and focus on getting the fundamentals right.

Simon came up with a new strawman proposal that lets you request log services from arbitrary remote ends between you and the final endpoint node. What sets the new API apart from the existing Selenium logging API, is that it distinguishes logs from individual classes of nodes. Your favourite WebDriver-in-the-cloud provider might provide a service for taking screenshots for every failing test, and with this new API it will be possible to request those in a uniform way.


As WebDriver is a specification text other standards now have the ability to leverage its definitions to meet their own demands. We are seeing an example of this with the Permissions API that is extending WebDriver to instrument getUserMedia(). The ability to write automated tests for permissions allows shared test suites like Web Platform Tests to promote consistency amongst browsers, but for example also means web authors will get the opportunity to test geolocation for maps and other types of media.


Can you imagine driving a WebVR headset using WebDriver? Well, the WebVR working group can. It will be an unconventional use of WebDriver, but it turns out that WebDriver’s API lends itself well to the kind of spatial navigation that is needed to control headsets.

To go into virtual reality mode in your browser you first need permission from the user, and it’s therefore exciting to see that we are starting to build an ecosystem of tools for browser instrumentation with the addition of the Permissions API.