A Capybara future

UPDATE: Some of these proposed changes have made it into Capybara 2.1. Please read our introduction to Capybara 2.1.

Over the last month or so, I’ve spent a lot of time thinking about the current state of Capybara.

As you may know, we released version 2.0 a while ago, which brought a number of significant changes. The biggest change, and for many the most aggravating change, was that matches now needed to be unambiguous. That is that if we try to do click_link("Remove") and there are multiple remove links on the page, we would throw an exception. I believe that this was a very good change. I have over the last years spent a lot of time tracking down issues where Capybara was simply interacting with the wrong element on the page.

The mistake we made however was that we still allowed substring matches, so that fill_in("Password", :with => "capybara"), not only matched "Password" but also "Password confirmation". This leaves us with a situation where we have to bend over backwards to avoid the ambiguity error. Clearly we should have gone all the way and disallowed substring matches.

Furthermore we should have provided a sane default, strictness, and allowed that default to be changed in specific cases, through options. For example we could have allowed:

click_link("Remove") # very strict
click_link("Remove", :exact => false) # allow substrings
click_link("Remove", :ambiguous => true) # pick any Remove link at random
click_link("Remove", :ambiguous => true, :exact => false) # 1.x behaviour

But even if we had made in hindsight those smarter choices, that still leaves us with an inherent ambiguity in what the arguments to action methods, like click_link actually do. This is what the current documentation says about fill_in:

Locate a text field or text area and fill it in with the given text. The field can be found via its name, id or label text.

This is actually not the whole truth. Fields can also be found via their placeholder, which isn’t documented. When found via name, id or placeholder the match needs to be exact, via label it is allowed to be a substring.

The XPath selector we use to find fields like this is quite complicated, and consequently quite slow as well.

What if fill_in worked like this instead:

fill_in "Some label", :with => "Text"
fill_in :id => "some-id", :with => "Text"
fill_in :placeholder => "Fill in some text", :with => "Text"
fill_in :name => "some[name]", :with => "Text"

We still preserve the simplicity of the normal case of selecting fields by their label, yet we make it clear when we deviate from that. This would also allow us to compile an XPath selector specific for this scenario, which would probably be quite a bit faster.

For click_link, and click_button the default would be to find them by their text, with :id and :title being valid alternatives.

I’m thinking that combining these two changes, the addition of the exact and ambiguous options, as well as defaulting to only finding fields, links and buttons by their labels or text respectively, would give us clearer, faster more easily understandable tests.

Taking it further

Requiring options such as :id works quite well for click_link and fill_in, but it works less well for select, which selects an element from a select box.

select "Programmer", :from => { :id => "profession" }
select "Programmer", :id => "profession"
select "Programmer", :from_id => "profession"

None of these options strike me as particularly elegant.

Departing from the current API even further, we could take a slice from the watir-webdriver API and instead change the API to look something like this:

text_field("Name", :exact => false).fill_in("Jonas")
button("Create", :ambiguous => false).click

This would make the API much more consistent, while perhaps sacrificing readability a little. In addition, this would be an even more radical departure from the current API, and for various reasons, we would probably not be able to provide a compatibility layer for this new style.

Please tell me what you think!

This blog post is just me thinking out loud. I haven’t decided yet that these are changes that we should make, and I would love to hear what you think. Do you think these changes are necessary? Is it worth the API breakage? Should we switch over completely or work to maintain backward compatibility at least for the time being through a configuration option? Do you have any other alternatives or suggestions? Please let me know.