The Promise and Failure of Record and Playback

I came across the below video of Bret Pettichord’s keynote presentation to the Selenium Conference in 2011 called “Science and Stories and Test Automation”. Much of the talk covers his experience with Test Automation, specifically the promise and failure of record and playback over the last 20 years (I think). Just this historical perspective makes the video worth watching.

Pettichord, best known in the testing community as one of the authors of Lessons Learned in Software Testing, towards the end of the keynote calls out the Selenium community for falling victim to popularity like the commercial market did on Record and Playback with the Selenium IDE (their version of record and playback). Instead Pettichord says doing what we know is right should triumph in the long run which, in the case of Test Automation, means creating maintainable and useful automation code. Not building a product (like Selenium IDE) to make it easier for people to use but that doesn’t actually work.


Additional References:

Running Rspec acceptance tests in TeamCity

At work we use TeamCity as our CI service to automate the build and deployment of our software to a number of pre-production environments for testing and evaluation. Since we’re already bottling up all the build and deployment steps for our software, I figured we could piggy back on this process and kick off a simple login test. It seems faster and easier to have an automated test tell us this, than to wait until someone stumbles across it. After all who cares if a server has the latest code if you can’t login to use it?

Note: I’m calling the test that attempts to login to our system a sanity test. It could just as easily described it as a smoke test.

The strategy looked something like:

  • Make sure tests are segmented (at least one ‘sanity’ test)
  • Hook up tests to Jenkins as a proof of concept
  • Once the configuration works in Jenkins (and I’ve figured out any additional pre-reqs), reconfigure tests in TeamCity to run “sanity tests” (which is a tag)
  • If sanity tests prove stable, add additional tests or segments

Segmenting tests is a great way to run certain parts of the test suite at a time. Initially this would be a single login test since logging in is a pre-cursor to anything else we’d want to do in the app. For our test framework RSpec this was done by specifying a ‘sanity’ tag.

There didn’t appear to be any guidelines or instructions on the interwebs on how you might configure TeamCity to run RSpec tests but I found them for another CI, Jenkins. Unlike TeamCity, Jenkins is super easy to set up and configure: download the war file from the Jenkins homepage, launch it from the terminal and create a job! Our test code is written in ruby which means I can kick off our tests from the command line using a rake file. Once a job was created and the command line details were properly sorted, I was running tests with the click of a button! (Reporting and other improvements took a little longer).  Note, we don’t use any special RSpec runner for this, just a regular old command line, although we do have a Gemfile with all relevant depencies listed.

Configuring TeamCity

Since I couldn’t find any guidelines on how to configure TeamCity to run RSpec accpetance tests, I’m hoping this helps. We already had the server running so this assumes all you need to do is add your tests to an existing service. After some trial and error here’s how we got it to work:

  1. Created a new build configuration to run the sanity tests
  2. Added version control settings for the automation repo
  3. Within the build configuration added 3 steps:
    1. Install Bundler. This is a command line step that runs a custom script when the previous step finishes. Basically handles the configuration information for Sauce Labs (our Selenium grid provider) and the first pre-req.
    2. Bundle Install. Also a command line step running a custom script. Second pre-req.
    3. Run Tests. Final command line step using rake and my test configuration settings to run the actual tests
  4. Added a build trigger to launch the sanity tests after a successful completion of the previous step (deploy to server)

After this was all put in, I manually triggered the configuration execution to see how well the process worked. There were quite a few hiccups along the way. One of the more interesting problems was finding out the TeamCity agent machine had outdated versions of Ruby and Ruby gems. The version of Ruby gems was so out of date it couldn’t be upgraded, it had to be re-installed which is never much fun on an RDP session.

Once the execution went well I triggered a failure. When the tests fail they print “tests failed” to the build log. Unfortunately the server didn’t seem to understand when a failure occurred so I went back and added a specific “failure condition” (another configuration option) looking for the word “tests failed” which, if found, would mark the test as a failure. Simple enough!

What’s next?

We’ve been running this sanity test for a few months now and it’s quite valuable to know when an environment is in an unusable state and yet I think visibility is still a challenge. Although the failures report directly into a slack channel I’m not sure how quickly the failures are noticed and/or if the information reported in the failed test is useful.

A few articles I’ve read suggest using the CI server to launch integration tests instead of the UI level acceptance tests we are running. I think what we are doing is valuable and I’d like to expand it. I wonder what additional sanity or segments of tests do we add to this process? Are there more or better ways to do what we’re doing now? Please share your experiences!

Selecting a few Platform Configuration Tests

I’ve been developing a GUI acceptance test suite to increase the speed of specific types of feedback about our software releases. In addition to my local environment I’ve been using Sauce Labs to extend our platform coverage (mostly by browsers and operating) and to speed up our tests by running more tests in parallel.

This pretty similar to what I consider traditional configuration testing – making sure your software can run in various configurations with variables such as RAM, OS, Video card, etc. Except on the web the variables are a little different and I am more concerned with browser differences than say operating system differences. Nevertheless, with over 700 browsers and OS platforms at Sauce Labs I still need to decide what configurations I start with and what configurations I add over time in the hope of finding failures.


I figured the best place to start was with our current users and since the only “hard” data we had comes from Google Analytics, I pulled up the details of two variables (OS and Browser). Using as a replacement for my company’s data, our most commonly used platform configurations include:

  • Browsers:
    • Chrome 47 & 48,
    • Firefox 43 & 44,
    • Safari 8 & 9, and
    • Internet Explorer 10 & 11
  • Operating Systems:
    • Windows 7,
    • Windows 8.1,
    • Windows 10,
    • Mac OSX 10.11 and
    • Mac OSX 10.10

Excluding a few constraints like IE only runs on Windows and Safari only runs on Mac, testing browsers and operating systems in combination could potentially yield up to 40 different configurations (8 browsers x 5 operating systems). Also, we designed our application to be responsive and at some point we probably want to test for a few browser resolutions. If we’ve got 40 initial configurations and 3 different browser resolutions that could potentially yield up to 64,000 different configurations. Realistically even for an automated suite and with one to two functional tests, that’s too many tests to run.

Reduce the number of tests

We can start with adding in those constraints I mentioned above and then focuses on just the variables we want to ensure we have coverage of. To get a better picture of the number of possible configuration tests I used ACTS (Automated Combinatorial Testing for Software) tool and based my constraints on what Sauce Labs has available today for configurations. After I added OS and Browsers it looked something like this:

ACTS Window

If I wanted every browser and operating system in combination to be covered (all-pairs) then according to ACTS there aren’t 40 different configurations, just 24 configuration options. That’s more manageable but still too many to start with. If I focus on my original concerns of covering just the browsers, I get back to a more manageable set of configuration options:

Test OS Browser
1 Windows8.1 Chrome47
2 Windows10 Chrome48
3 OSX10.11 Firefox43
4 Windows7 Firefox44
5 OSX10.10 Safari8
6 OSX10.11 Safari9
7 Windows10 IE10
8 Windows7 IE11

8 configuration options are way more manageable and less time consuming than 24 options, considering each one of those configurations will each run a few dozen functional tests as well.

A Good Start

Selecting the configurations is probably the easiest part of configuration testing (and the only thing I’ve shown) but I’ve found its worth thinking through. (The harder part is designing the automated acceptance tests to produce useful failures.) Using ACTS at this point may seem like overkill when we could have just selected the browsers from the beginning but it didn’t take much time and should make it easier in the future when we want to add more variables or change the values of our existing variables.

Screen resolution vs Resizing a window in Selenium

The main product I test was designed to follow a responsive web design layout so it could theoretically be used on anything from desktop computers to tablets and smartphones. Practically speaking this means different viewable window sizes (viewport sizes) will result in the browser placing elements of our application in different locations on the screen. When running my selenium acceptance tests I wanted to be able to specify different viewport sizes both locally and remotely on Sauce Labs. While the sizes may not make a difference to Selenium they give me another variable to specify if I so choose to do responsive testing. The examples I found weren’t very helpful so I decided to make my own for both.

Resizing your window. Locally the browser can be resized to a specific width and height by using the resize_to() command. For a window size of 1280×1024 the line of code we are looking for is:

window.resize_to(1280, 1024)

In my selenium-examples repo this code goes into the spec_helper file and looks like @driver.manage.window.resize_to(1280, 1024) as you see on line 20:

If you don’t use a helper spec you can include this code in your setup method or right after you call WebDriver.

Setting screen resolution. Resizing your window works great locally but what if you want to run your tests remotely at Sauce Labs? How do we ensure our screen resolution is large enough to support a larger window size? Luckily Sauce Labs opens their browsers to the maximum window size so all we have to do is set the screen resolution.

According to the Sauce Labs’ configurator we want to use the ‘screenResolution’ method like they show below:

caps = caps['platform'] = 'Windows 8' caps['version'] = '43.0' caps['screenResolution'] = '1280x1024'

If we go back to the example-selenium repo you’ll see I’m actually using caps[“screenResolution”] = ENV[‘resolution’] in the above spec_helper at line 11.

I’m setting a global variable for screen resolution so I can update it in the config_cloud file as I might update other global settings like operating system or browser version. This is important because in some cases, I may have to either adjust the resolution size or in the case of Safari, actually comment it out. For some reason Sauce Labs doesn’t have many resolutions options for Mac OS X, which is a bit annoying. The latest versions of OS X don’t even support resolutions of 1280×1024.

Humans and Machines: Getting The Model Wrong

It seems like one of the more prominent and perpetual debates within the software testing community is the delineation between what the computer and human can and should do. Stated another way, this question becomes “what parts of testing fall to the human to design, run and evaluate and what parts fall to the computer?” My experience suggests the debate comes from the overuse and misuse of the term Test Automation (which in turn has given rise to the testing vs. checking distinction). Yet if we think about it, this debate is not just one within the specialty of software testing, it’s a problem the whole software industry constantly faces (and to a greater extent the entire economy) about the value humans and machines provide. While the concerns causing this debate may be valid, whenever we hear this rhetoric we need to challenge its premise.

In his book Zero to One, Peter Thiel, a prominent investor and entrepreneur who co-founded PayPal and Palantir Technologies, argues most of the software industry (and in particular Silicon Valley) has gotten this model wrong. Computers don’t replace humans, they extend us, allowing us to do things faster which when combined with the intelligence and intuition of a human mind creates an awesome hybrid.

Peter Thiel and Elon Musk at PayPal

He shares an example from PayPal: 1

Early into the business, PayPal had to combat problems with fraudulent charges that were seriously affecting the company’s profitability (and reputation). They were loosing millions of dollars per month. His co-founder Max Levchin assembled a team of mathematicians to study the fraud transfers and wrote some complex software to identify and cancel bogus transactions.

But it quickly became clear that this approach wouldn’t work either: after an hour or two, the thieves would catch on and change their tactics. We were dealing with an adaptive enemy, and our software could adapt in response.

The fraudsters’ adaptive evasions fooled our automatic detection algorithms, but we found that they didn’t fool our human analysts as easily. So Max and his engineers rewrote the software to take a hybrid approach: the computer would flag the most suspicious transactions on a well-designed user interface, and human operators would make the final judgment as to their legitimacy.

Thiel says he eventually realized the premise that computers are substitutes for humans was wrong. People can substitute for one another – that’s what globalization is all about. People compete for the same resources like jobs and money but computers are not rivals, they are tools. (In fact, long-term research on the impact of robots on labor and productivity seems to agree.) Machines will never want the next great gadget or the beachfront villa on its next vacation – just more electricity (and it’s not even smart enough to know it). People are good at making plans and decisions but bad at dealing with enormous sets of data. Computers struggle to make basic decisions that are easy for humans but can deal quickly with big sets of data.

Substitution seems to be the first thing people (writers, reporters, developers, managers) focus on. Depending on where you sit in an organization, substitution is either the thing you’d like to see (reduce costs – either in terms of time savings or in headcount reduction) or the thing you dread the most (being replaced entirely or your work reduced). Technology articles consistently focus on substitution like how to automate this and that or how cars are learning to drive themselves and soon we’ll no longer need taxi or truck drivers.

Why then do so many people miss the distinction between substitution and complementarity, including so many in our field?


The Apple Watch won’t change Testing

Probably. The Apple Watch won’t change Testing, probably.

Last month uTest announced a contest to win an Apple Watch. All you had to do was provide a response to this post:

In just a paragraph, describe how the Apple Watch will or will not change software testing as we know it today, or how testers are testing.

42mm Apple Watch

While I probably should have clarified the rules a bit (how do you define a paragraph?), I responded with:

If software testing is an empirical, technical investigation of a product for the stakeholders and a new product is introduced, that doesn’t change what software testing is or why we test. It might add some new dimensions to the product (the watch being an extension of the phone, tactile touch interface, etc.) that as testers we have to consider. It might change the importance or risk of some of those dimensions (change in interface, platform, etc.). Or it might change which test techniques we apply and how we apply them (think of stress testing a watch or computer aided testing / automation) but it probably won’t change testing.

I feel like elaborating. (tl;dr skip to the last paragraph)

As I was trying to formulate an answer I was thinking about how a test strategy might change between two similar or complimentary devices – an iPhone and an Apple Watch. The differences might suggest what changes were necessary to the model I was using. That model looked something like the Heuristic Test Strategy Model and the changes I noticed were within product elements and test techniques.

For example we might see a difference between the iPhone and the Apple Watch in:

  • Operations. I imagine the environment and general use cases are a bit different and more extreme for a watch. I’m extremely careful about damaging my phone but I seem to always strike my arms and wrist against doors and walls without knowing it. The fitbit I wear around my non-dominant wrist speaks to this extreme or disfavored use.
  • Platforms. The hardware and software are different. The OS that runs on the watch is new (watchOS), in addition to the apps. What level of support does the iPhone provide (its a required external companion) to the Apple Watch?
  • Interfaces. The user interface seems like the most obvious different given the small display and crown as the home screen. What about the new charging interface or how data is exchanged between the watch and the phone?

Those are just a few dimensions of the Apple Watch I could think of in the thirty or so minutes I took. How many more am I forgetting? (We should examine as many of those dimensions as possible when testing).

Then I started looking at the top Test Techniques listed in the HTSM. How many of them can we apply to testing mobile devices like an iPhone and now the Apple Watch?

  • Function Testing
  • Domain Testing
  • Stress Testing
  • Flow Testing
  • Scenario Testing
  • Claims Testing
  • User Testing
  • Risk Testing
  • Automated Testing

All of them! The challenge might be in applying these techniques. I’ve heard mobile GUI automation like Appium has come a long way in a short time but still has problems and isn’t at the level of Selenium WebDriver. My own experience with mobile simulators suggests they are much less reliable than their desktop counterparts.

After going through this brief exercise I came away thinking this model was still just as applicable. Although the business case is still being made amongst consumers and without knowing any specific project factors, my testing thought process remained much the same. This isn’t to say there isn’t a need for a specialty of testers who are really good at applying test techniques to mobile devices; only that the mental model is still the same. Testing as a whole isn’t really changing.

Why It Matters

I’ve always found joy in exploring new products and the impact they have on our lives. Although it’s assumed when you work in technology you are a technophile – someone who loves technology and you live and breathe gadgets and software -that’s not always the case. I find it just as interesting how quickly I abandoned certain things as how much they stick to me and how much I use them.

I’m still evaluating the Apple Watch. As progress slows on the development of smartphones I’m starting to question the relentless upgrade process – waiting instead for things I feel are worthy of spending the money on. The Apple Watch falls into this same category. I don’t typically wear watches but as I said before I do wear a fitbit. I like knowing how active or inactive I am. Whether I’m ready for a relentless upgrade cycle of expensive watches over the next 5 years in addition to whatever new phones comes out is an entirely different story.

As for the uTest May contest, I came in third place. Thanks to uTest and everyone who voted! Maybe if I win a few more contests I’ll be able to justify getting my own Apple Watch? Even though it probably won’t change testing much, how can I say no?

When to use a Gemfile

I’ve been building a GUI acceptance test automation suite locally in Ruby using the RSpec framework.  When it was time to get the tests running remotely on Sauce Labs, I ran into the following error:

RSpec::Core::ExampleGroup::WrongScopeError: `example` is not available from within an example (e.g. an `it` block) or from constructs that run in the scope of an example (e.g. `before`, `let`, etc). It is only available on an example group (e.g. a `describe` or `context` block).
occurred at /usr/local/rvm/gems/ruby-2.1.2/gems/rspec-core-3.2.2/lib/rspec/core/example_group.rb:642:in `method_missing'

It took a few minutes debugging before I spotted the error:


Source of problem: My remote tests were using a different version of RSpec than I was locally. Solution: Create a Gemfile to specify the version of using Rspec I’m using.

Since I didn’t realize I needed a Gemfile my question was, in general, when should someone use a Gemfile? According to the manual, a Gemfile

describes the gem dependencies required to execute associated Ruby code. Place the Gemfile in the root of the directory containing the associated code.

For example, in this context, I would place a Gemfile into any folder where I specifically want to call tests to run. In my case that meant a few specific locations:

  • At the root of the folder – where I run the whole suite of tests
  • In the /spec/ folder – where I typically run tests at an individual level

At a minimum I’d specify:

  • A Global Source
  • Each Gem I use locally that Sauce Labs will need to use

In the end it might look something like this:

Test, adapt, and re-test.

Installing ChromeDriver on macOS

The ChromeDriver getting started guide isn’t super helpful if you are unfamiliar with including the ChromeDriver location in your PATH environment variable. (The PATH variable helps Chrome find the downloaded ChromeDriver exe). Also it’s a lot of work for something so common. Never fear, here is a better way:

Installing on macOS:

Listed in order of easiest to hardest install, these are the best ways to install ChromeDriver on a Mac:

  • The easiest way to install ChromeDriver is to use your package manager such as brew or npm to install the driver.
    • In your terminal window with the Homebrew package manager:
      • Install ChromeDriver with brew install cask chromedriver
      • Confirm it was installed using chromedriver --version and seeing it returns a version. If it errors it wasn’t installed
    • Other package managers like npm have similar commands npm install chromedriver
  • Run Chrome & ChromeDriver in a container using Docker. Simply download the combined container, start it and point your code at the right address.
  • Specify it in your Selenium setup code and check it into source control like any other configuration detail. If you go this route, you can include additional drivers like GeckoDriver (aka Firefox) as well.
  • Download the driver and add its location to your System PATH.

Which methods have you found the easiest or most success with? Which methods didn’t work for you? Please leave a comment below. 

Additional Resources

Oh and if this article worked for you please consider sharing it or buying me coffee to say thanks.

Including the ChromeDriver location in MacOS System PATH

The ChromeDriver getting started guide isn’t super helpful if you are unfamiliar with including the ChromeDriver location in your PATH environment variable. The PATH variable helps Chrome find the downloaded ChromeDriver exe. Don’t get me wrong, I’ve updated PATH variables on Windows for years but never on a Mac, until now:

System PATH Setup

The following instructions will help you create your own PATH to a unique folder on your Mac or copy the file to an existing PATH directory for ChromeDriver.

  1. Download the ChromeDriver executable.
  2. Now we need to tell Selenium where it is and for that we have a few choices.To do this:
    1. Open up Terminal
    2. Run sudo nano /etc/paths
    3. Enter your password
    4. Go to the bottom of the file and enter the path you wish to add
    5. My PATH looks like: /Users/myname/Documents/WebDriver
    6. Control-x to quit
    7. Y to save
    8. Press enter to confirm
  3. To double check, quit Terminal and relaunch it. Run echo $PATH. You should see your newly added path in the stream of other paths already there.
  4. Finally, update your tests to run using Chrome and run your tests!

After running your tests, if your PATH isn’t set up correctly you get this helpful message:

Selenium::WebDriver::Error::WebDriverError: Unable to find the chromedriver executable. Please download the server from and place it somewhere on your PATH. More info at

Did this work for you? Did it solve your problem? Please leave a comment below. 

Additional Resources:

Oh and if this article worked for you please consider sharing it:

Or buying me coffee.

TDD and Software Testers

I’ve been following along with the series of conversations with Martin Fowler, Kent Beck and David Heinemeier Hansson (DHH) entitled Is TDD Dead. The whole conversation about what’s good, bad and ugly with test driven development (TDD) is interesting in my role as a software tester and from an overall system / quality perspective. What works, what doesn’t? What do some programmers like about it and what do others fear? Does TDD translate into a better product? Etc.

According to Fowler’s website, part 3 of the series covers

…the various ways in which we get feedback while programming and the role of QA in providing feedback to developers.

The whole series is worth a watch but if you are just interested in TDD and the role it plays when you have software testers (or QA), watch it here:

The three people involved with it have have varying experiences with Fowler having worked for many years with software testers in enterprise software, Beck now working at Facebook where they have few testers (and his own experience with dysfunctional QA) and DHH’s experience running Basecamp. It’s an interesting and relevant discussion because it’s coming from a programmers point of view (programmer testing).  My view says testing is an investigation designed to reveal information about a product. Beck frames it as feedback that builds confidence in the code. I think both views of the software are valuable and those differences in techniques and approaches yield very different ways of viewing quality.

The title “TDD is dead” reminds me of the saying “Test is dead”. Neither of those titles are accurate (they are catchy) but understanding the differences in views can help us when talking to stakeholders who have similar feelings or views.