Implementing Selenium with Python and Qt

I'm writing a python module that allows me to 'drive' a site using Qt. This means that I can navigate the site, fill forms, submit them and read the resulting pages and scrape them, Selenium style. The reasons I'm using Qt are that it has enough support for the site I'm driving (it's the web frontend of the SIP telephony solution we're using, which has an incomplete API and I have to automatize several aspects not covered by it); there are python bindings; and because I can do it headless: instead of using browser instances, I simply instanciate one QWebPage1 per thread and that's it.

The first thing I learned today is that JS objects representing the DOM elements have two sets of value holders: attributes and properties. The properties is what in Python we call attributes: the object's elements which are accesible with the '.' operator and hold instance values. The attributes are in fact the HTML element's attributes that gave the properties' initial values. That is, given the following HTML element:

<input type="text" name="foo° id="bar" value="quux">

the initial JS object's attributes and properties will have those values. If you change the value with your browser, the value property of that element will be changed, but not the value attribute. When you submit the form, the value properties of all the form elements are used, so if you "only' change the value attribute, that won't be used. So, forget attributes. Also, the DOM is the representation of the actual state of the page, but this state is never reflected in the HTML source that you can ask your browser to show, but you see those changes reflected in the browser's debugger. It's like they really wanted3 to keep initial values apart from current state2.

On the Qt side, QWebElement is only the DOM element representation, not the JS object4, so you can't access the properties via its API, but by executing JS5:

e = DOMRoot.findFisrt('[name="foo"]')
e.evaluateJavaScript("this.value = 'abracadabra'")

Tonight I finished fixing the most annoying bug I had with this site. To add a user I have to fill a form that is split in 7 'tabs' (which means 7 <div>s with fields where only one is shown at a time). One of the fields on the second tab has a complex JS interaction and I was cracking my skull trying to make it work. Because the JS is reacting to key presses, setting the value property was not triggering it. Next I tried firing a KeyboardEvent in JS, but I didn't succeed. Maybe it was me, maybe the fact that the engine behind QWebPage is the original Webkit and for some reason its JS support is lacking there, who knows.

But the good guys from #qtwebkit gave me a third option: just send plain QKeyEvents to the input element. Luckily we can do that, the web engine is completely built in Qt and supports its event system and more. I only had to give focus to the widget.

Again, I tried with JS and failed7, so I went back cheating with Qt behind curtains. QWebElemnt.geometry() returns the QRect of the QWidget that implements the input element; I just took the .center() of it, and generated a pair of mouse button press/release events in that point. One further detail is that the .geometry() won't be right unless I force the second tab to be shown, forcing the field to be drawn. Still, for some reason getting a reference to the input field on page load (when I'm trying to figure out which fields are available, which in the long run does not make sense, as fields could easily be created or destroyed on demand with JS) does not return an object that will be updated after the widget is repositioned, so asking its geometry returns ((0, -1), (-1, 0)), which amounts to an invalid geometry. The solution is to just get the reference to the input field after forcing the div/tab to be shown.

Finally, I create a pair of key press/release events for each character of the string I wanted as value, and seasoned everything with a lot of QMainLoop.processEvents(). Another advantage of using the Qt stuff is that while I was testing I could plug a QWebView, sprinkle some time.sleep() of various lengths, and see how it behaved. Now I can simply remove that to be back to headlessness.

I'm not sure I'll publish the code; as you can see, it's quite hacky and it will require a lot of cleanup to be able to publish it without a brown paper bag in my head.


  1. Yes, I'm using qt5.5 because that's what I will have available in the production server. 

  2. Although as I said, you can change the attributes and so you lose the original values. 

  3. I guess the answer is in in the spec. 

  4. I think i got it: QWebElement is the C++ class that is used in WebKit to represent the HTML tree, the real DOM, while somewhere deeper in there are the classes representing the JS objects which you just can't reach6:. 

  5. This clearly shows that there is a connection between the DOM object and the JS one, you just can't access it via the API. 

  6. This is the original footnote: Or something like that. Look, I'm an engineer and I usually want to know how things work, but since my first exposure to HTML, CSS and JS, back in the time when support was flaky and fragmented on purpose, I always wanted to stay as far away from them as possible. Things got much better, but as you can see the details are still somewhat obscure. I guess, I hope the answer is in the spec. 

  7. With this I mean that I executed something and it didn't trigger the events it should, and there's no practical way to figure out why.