Update from WebDriver WG meeting in July 2016

The W3C Browser Tools- and Testing Working Group met again 13-14 July 2016 in Redmond, WA to discuss the progress of the WebDriver specification. I will try to summarise the discussions, but if you’re interested in all the details the meetings have been meticulously scribed.

I wrote about the progress from our TPAC 2015 meeting previously, and we appear to have made good progress since then. The specification text is nearing completion, although it is missing a few important chapters: Some particularly obvious omissions are the complete lack of input handling, and a big, difficult void where advanced user actions are meant to be.

Actions

James has been hard at work drafting a proposal for action semantics, which we went over in great detail. I think it’s fair to say there had been conceptual agreement in the working group on what the actions were meant to accomplish, but that the details of how they were going to work were extremely scarce.

WebDriver tries to innovate on the actions as they appear in Selenium. Actions in Selenium were originally meant to provide a way to pipeline a sequence of interactions—such as pressing down a mouse button, moving the mouse, and releasing it—through a complex data structure to a single command endpoint. The idea was that this would help address some of the race conditions that are intrinsically part of the one-directional design of the protocol, and reduce latency which may be critical when interacting with a document.

Unfortunately the pipelining design to reduce the number of HTTP requests was never quite implemented in Selenium, and the API design suffered from over-specialisation of different types of input devices and actions. The specification attempts to rectify this by generalising the range of input device classes, and by associating the actions that can be performed with a certain class. This means we are moving away from a flat sequence of types, such as [{type: "mouseDown"}, {type: "mouseMove"}, {type: "mouseUp"}] to a model where each input device has its own “track”. This limits the actions you can perform with each device, which makes some conceptual sense because it would be impossible to i.e. type keys with a mouse or press a mouse button with a stylus/pen input device.

The side-effect of this design is that it allows for parallelisation of actions from one or more types of input devices. This is an important development, as it makes it possible to combine primitives for input methods such as touch: In reality, a device cannot determine whether two fingers are “associated” with the same hand. So instead of defining high-level actions such as pinch and flick, it gives you the right level of granularity to combine actions from two or more touch “finger” devices to synthesise more complex movements. We believe this is a good approach with the right level of granularity that doesn’t try to over-specify or shoehorn in primitives that might not make sense in a cross-browser automation setting.

I’m looking forward to seeing James’ work land in the specificaton text. I think probably some explanatory notes and examples are required to fully explain this concept for both implementors and users.

Input locality

A known limitation of Selenium that we are not proud of is that it does not have a good story for input with alternative keyboard layouts. We have explicitly phrased the specification in such a way that it doesn’t make it impossible to retrofit in support for multiple layouts in the future. But right now we want to finish the baseline of the specification before we try moving into this.

The current design ideas floating around are to have some way of setting a keyboard layout either through a command or a capability. This would allow / to generate key events for Shift and ? on an American layout, and Shift and 7 on Norwegian layout. The biggest reason this is hard is because we need to find the right key code conversion tables for what would happen when typing for example .

Untrusted SSL certificates

We had a big discussion on invalid, self-signed, and untrusted SSL certificates. The general agreement in the WG is that it would be good to have functionality to allow a WebDriver session to bypass the security checks associated with them, as WebDriver may be run in an environment where it is difficult or even impossible to instrument the browser/environment in such a way that they are accepted implicitly (e.g. by modifying the root store).

Different browser vendors raised questions over whether this would pass security review as implementing such a feature increases the attack surface in one of the most critical components in web browsers. A counterargument is that by the point your browser has WebDriver enabled, you probably have bigger things to worry about than the fact that untrusted certificates are implicitly accepted.

We also found that this is highly inconsistently implemented in Selenium. For the two drivers that support it, FirefoxDriver (written and maintained by Selenium) has an acceptSslCerts capability that takes a boolean to switch off security checks, and chromedriver (by Google) by contrast accepts all certificates by default. The remaining drivers have no support for it.

This leaves the working group free to decide on a new and consistent approach. One point of concern is that a boolean to disable all security checks seems like an overly coarse design. A suggested alternative is to provide a list of domains to disable the checks for, where wildcards can be expanded to cover every domain or every subdomain, so that i.e. ["*"] would be equivalent to setting acceptSslCerts to true in today’s Firefox implementation, but that ["*.sny.no"] would only disable untrusted certificates on this domain.

Navigation and URLs

Because WebDriver taps into the browser’s navigation algorithm at a much later point than when a user interacts with the address bar, we decided that malformed URLs should consistently return an error. We have also changed the prose to no longer mislead users to think that navigating in effect means the same as using the address bar; the address bar is not a concept of the web platform.

There was a proposal from Mozilla to allow navigation to relative URLs, so that one could navigate to i.e. "/foo" to go to the path on the current domain, similar to how window.location = "/foo" works. This was unfortunately voted down. I feel it would be useful, even just for consistency, for the WebDriver navigation command to mirror the platform API, modulo security checks.

Desired vs. required capabilities

A big discussion during the meeting was around the continuing confusion around capabilities: Many feel they are an intermediary node concept that is best left undefined in the core specification text itself, because the specification explicitly does not define any qualities or expectations about local ends (clients bindings) or intermediary nodes (Selenium server or proxy that gives you a session).

There was however consensus around the fact that having a way to pick a browser configuration from some matrix was a good idea. The uncertainty, I think, comes largely from driver implementors who feel that once capabilities reach the driver there is very little that can be done about the sort of conflict resolution that required- and desired capabilities warrant.

For example, what does it mean to desire a profile and how do you know if the provided profile is valid? We were unable to reach any agreement on this and decided to punt the topic for our next meeting in Lisbon.

Test coverage

In order to push the specification to “Rec” (short for Recommendation) one must have at least two interoperable implemenations by two separate vendors. To determine that they are interoperable, one needs a test suite. I’ve written previously about the test harness I wrote for the Web Platform Tests that integrates WebDriver spec tests with wptrunner.

We have a few exhaustive tests for a couple of chapters, but I hope to continue this work this quarter.

Next meeting

The working group is meeting again for TPAC that this year is in Lisbon (how civilised!) in late September. I’m enormously looking forward to visiting there as I’ve never been.

We hope resolve the outstanding capabilities discussion and make final decisions on a few more minor outstanding issues then.