Article: Is Your Test Suite Brittle? Maybe It’s Too DRY

Uncategorized

Article: Is Your Test Suite Brittle? Maybe It’s Too DRY

MMS • Kimberly Hendrick

Key Takeaways

Don’t repeat yourself, or “DRY”, is a useful principle to apply to both application code and test code.
The misapplication of the DRY technique can make tests hard to understand, maintain, and change.
While code duplication may not be so harmful to your tests, allowing duplication of concepts causes the same maintainability problems in test code as in application code.
When applying DRY to tests, clearly distinguish between the three steps of a test: arrange, act, and assert.
TDD provides many benefits and can promote a shorter feedback loop and better test coverage.

Those of us who write automated tests do so for many reasons and gain several benefits. We gain increased trust in the correctness of the code, confidence that allows us to refactor, and faster feedback from our tests on the design of the application code.

I’m a huge proponent of TDD (Test Driven Development) and believe TDD provides all the benefits stated above, along with an even shorter feedback loop and better test coverage.

One crucial design principle in software development is DRY – Don’t Repeat Yourself. However, as we will see, when DRY is applied to test code, it can cause the test suite to become brittle – difficult to understand, maintain, and change. When the tests cause us maintenance headaches, we may question whether they are worth the time and effort we put into them.

Can this happen because our test suite is “too DRY”? How can we avoid this problem and still benefit from writing tests? In this article, I’ll delve into this topic. I will present some indications that a test suite is brittle, guidelines to follow when reducing test duplication, and better ways to DRY up tests.

Note: I won’t discuss the definitions of different types of tests in this article. Instead, it focuses on tests where duplication is common.

These are often considered unit tests but may also occur in tests that don’t fit a strict definition of a “unit test.” For another viewpoint on types of tests, read A Simpler Testing Pyramid: Getting the Most out of Your Tests.

What is DRY?

DRY is an acronym for “Don’t Repeat Yourself,” coined by Andy Hunt and Dave Thomas in The Pragmatic Programmer. They defined it as the principle that “every piece of knowledge must have a single, unambiguous, authoritative representation within a system.”

The advantage of DRY code is that if a concept changes in the application, it requires a change in only one place. This makes a codebase easier to read and maintain and reduces the chances of bugs. Beautiful, clean designs can emerge when domain concepts are represented in a single place in the application.

DRY Application Code

DRY is not always easy to apply. Indeed, duplication in code that looks similar can tempt us to create unnecessary abstractions, leading to more complicated code instead of a cleaner design. One useful criterion to consider is that DRY is concerned with reducing code duplication from concept duplication and not reducing duplication of typing. This idea may guide its application while avoiding common pitfalls.

For example, we often use literal values in our code. Is the number 60 that appears in several locations an instance of duplication, or does it have different meanings in each case? A helpful evaluation can be to ask: “If the value had to change, would we want it to change everywhere?” 60 will (hopefully) always be the number of seconds in a minute, but 60 somewhere else may represent a speed limit. This integer is not a great candidate to pull into a globally shared variable for the sake of DRY.

As another example, imagine a method that loops over a collection and performs an action. This method might look a lot like another method that loops over the same collection and performs a slightly different action. Should these two methods be extracted to remove the duplication? Perhaps, but not necessarily. One way of looking at it is if a feature change would require them both to change simultaneously, they are most likely closely related and should be combined. But it takes more than looking at the code “shape” to know if it should be DRYed up.

Reasoning in terms of duplication of concepts helps avoid wrong decisions.

DRY Tests

DRY in test code often presents a similar dilemma. While excessive duplication can make tests lengthy and difficult to maintain, misapplying DRY can lead to brittle test suites. Does this suggest that the test code warrants more duplication than the application code?

DRY vs. DAMP/WET

A common solution to brittle tests is to use the DAMP acronym to describe how tests should be written. DAMP stands for “Descriptive and Meaningful Phrases” or “Don’t Abstract Methods Prematurely.” Another acronym (we love a good acronym!) is WET: “Write Everything Twice,” “Write Every Time,” “We Enjoy Typing,” or “Waste Everyone’s Time.”

The literal definition of DAMP has good intention – descriptive, meaningful phrases and knowing the right time to extract methods are essential when writing software. However, in a more general sense, DAMP and WET are opposites of DRY. The idea can be summarized as follows: Prefer more duplication in tests than you would in application code.

However, the same concerns of readability and maintainability exist in application code as in test code. Duplication of concepts causes the same problems of maintainability in test code as in application code.

Brittle Example

Let’s review some brittle test code written in Kotlin.

The below example shows a common pattern that may present differently depending on the testing language and framework. For example, in RSpec, the long setUp() method may be many let! statements instead.

class FilterTest {
   private lateinit var filter: Filter

   private lateinit var book1: Book
   private lateinit var book2: Book
   private lateinit var book3: Book
   private lateinit var book4: Book
   private lateinit var author: Author
   private lateinit var item1: Item
   private lateinit var item2: Item

   @BeforeEach
   fun setUp() {
       book1 = createBook("Test title", "Test subtitle", 
                          "2000-01-01", "2012-02-01")
       book2 = createBook("Not found", "Not found", 
                          "2000-01-15", "2012-03-01")
       book3 = createBook("title 2", "Subtitle 2", null, 
                          "archived", "mst")
       createBookLanguage("EN", book1)
       createBookLanguage("EN", book3)
       author = createAuthor()
       book4 = createBook("Another title 2", "Subtitle 2", 
                          null, "processed", "", "", 
                          listOf("b", "c"), author)
       val user = createUser()
       createProduct(user, null, book4)
       val salesTeam = createSalesTeam()
       createProduct(null, salesTeam, book4)
       val price1 = createPrice(book1)
       val price2 = createPrice(book3)
       item1 = createItem("item")
       createPriceTag(item1, price1)
       item2 = createItem("item2")
       createPriceTag(item2, price2)
       val mstDiscount = createDiscount("mstdiscount")
       val specialDiscount = createDiscount("special")
       createBookDiscount(mstDiscount, book1)
       createBookDiscount(specialDiscount, book2)
       createBookDiscount(mstDiscount, book2)
   }

   @Test
   fun `filter by title`() {
       filter = Filter(searchTerm = "title")
       onlyFindsBooks(filter, book1, book3, book4)
   }

   @Test
   fun `filter by last`() {
       filter = Filter(searchTerm = "title", last = "5 days")
       onlyFindsBooks(filter, book3)
   }

   @Test
   fun `filter by released from and released to`() {
       filter = Filter(releasedFrom = "2000-01-10", 
                       releasedTo = "2000-01-20")
       onlyFindsBooks(filter, book2)
   }

   @Test
   fun `filter by released from without released to`() {
       filter = Filter(releasedFrom = "2000-01-02")
       onlyFindsBooks(filter, book2, book3, book4)
   }

   @Test
   fun `filter by released to without released from`() {
       filter = Filter(releasedTo = "2000-01-01")
       onlyFindsBooks(filter, book1)
   }

   @Test
   fun `filter by language`() {
       filter = Filter(language = "EN")
       onlyFindsBooks(filter, book1, book3)
   }

   @Test
   fun `filter by author ids`() {
       filter = Filter(authorUuids = author.uuid)
       onlyFindsBooks(filter, book4)
   }

   @Test
   fun `filter by state`() {
       filter = Filter(state = "archived")
       onlyFindsBooks(filter, book3)
   }

   @Test
   fun `filter by multiple item_uuids`() {
       filter = Filter(itemUuids = listOf(item1.uuid, item2.uuid))
       onlyFindsBooks(filter, book1, book3)
   }

   @Test
   fun `filtering by discounts with substring`() {
       filter = Filter(anyDiscount = listOf("discount"))
       assertTrue(filter.results().isEmpty())
   }

   @Test
   fun `filtering by discounts with single discount string`() {
       filter = Filter(anyDiscount = listOf("special"))
       onlyFindsBooks(filter, book2)
   }

   @Test
   fun `filtering by discounts with non-existent discount`() {
       filter = Filter(anyDiscount = listOf("foobar"))
       assertTrue(filter.results().isEmpty())
   }

   @Test
   fun `filtering by discounts with multiple of the same discount`() {
       filter = Filter(anyDiscount = 
           listOf("mstdiscount", "mstdiscount", "special"))
       onlyFindsBooks(filter, book1, book2)
   }

   private fun onlyFindsBooks(filter: Filter, vararg foundBooks: Book) {
       val uuids = foundBooks.map { it.uuid }.toSet()
       assertEquals(uuids, filter.results().map { it.uuid }.toSet())
   }
}

When studying code like this, it’s common to first focus on the setup steps, then digest each test and figure out how they relate to the setup (or vice versa). Looking at only the setup in isolation provides no clarity, nor does focusing on each test individually. This is an indication of a brittle test suite. Ideally, each test can be read as its own little universe with all context defined locally.

In the above example, the setup() method creates all the books and related data for all the tests. As a result, it is unclear which books are required for which tests. In addition, the numerous details make it challenging to discern which ones are relevant and which are required for book creation in general. Notice how many things would break if the required data for creating books were to change.

When focusing on the tests themselves, each test does the minimum to call the application code and assert the results. The specific book instance(s) referenced in the assertion is buried in the setUp() method at the top. It’s unclear what purpose onlyFindsBooks serves in the tests. You might be tempted to add a comment on these tests to remind you of the relevance of each book’s attributes in each test.

It was clear that the initial developers had good intentions creating the objects all in one place. If the initial feature only had two or three filters available, creating all the objects at the top might have made the code more concise. As the tests and objects grew, however, they outgrew this setup method. Subsequent filter features led developers to add more fields to the books and expect whichever book suited the test to return. Imagine trying to figure out which object was meant to be returned as we began to compose different combinations of the filters together!

To figure out what onlyFindsBooks() does, you’ll need to scroll more to find the hidden assertions. This method has enough logic that it takes a minute to connect the dots between what is passed in from the test and what the assertion is.

Finally, the filter instance declaration is far from the tests.

For example, let’s focus on this test for filtering by language:

@Test
fun `filter by language`() {
   filter = Filter(language = "EN")
   onlyFindsBooks(filter, book1, book3)
}

What makes book1 and book3 match the criteria of language = "EN" that was passed in? Why wouldn’t book2 also come back from this call? To answer those questions, you need to scroll to the setup, load the entire context of all the setup into your mind, and then attempt to spot the similarities and differences between all the books.

Even more challenging is this test:

@Test
fun `filter by last`() {
   filter = Filter(searchTerm = "title", last = "5 days")
   onlyFindsBooks(filter, book3)
}

Where does “5 days” come from? Is it related to a value hidden in the createBook() method for book3?

The author of this code applied the DRY technique to extract duplication but ended up with a test suite that is hard to understand and will break easily.

What to Look For

Many clues in the above code indicate that DRY has been misapplied. Some indications that tests are brittle and need refactoring include:

Tests are not their own little universe (see Mystery Guest): Do you find yourself scrolling up and down to understand each test?
Relevant details are not highlighted: Are there comments in tests to clarify relevant test details?
The intention of the test is unclear: Is there any boilerplate or “noise” required for setup but not directly related to the test?
Duplicate concepts are duplicated: Does changing application code break many tests?
Tests are not independent: Do many tests break when modifying one?

Solutions

In this section, we will present two possible solutions to the problems described above: the Three As principle and the use of object methods.

Three As

Tests may be seen as having three high-level parts. Often, these are referred to as the “Three As“:

Arrange – any necessary setup, including the variable the test is focused on
Act – the call to the application code (aka SUT, Subject Under Test)
Assert – the verification step that includes the expectation or assertion.

These steps are also referred to as Given, When, and Then.

The ideal test has only three lines, one for each of the As. This may not be feasible in reality, but it’s still a worthwhile objective to keep in mind. In fact, tests that match this pattern are easier to read:

// Arrange
var object = createObject()

// Act
var result = sut.findObject()

// Assert
assertEquals(object, result)

Object Creation Methods

Strategic use of object creation methods can highlight relevant details and hide irrelevant (but necessary) boilerplate behind meaningful domain names. This strategy is inspired by two others: Builder Pattern and Object Mother. While the example code we reviewed earlier uses methods to build test objects, it lacks some key benefits.

Object creation methods should:

Be named with a domain name that indicates which type of object it creates
Have defaults for all required values
Allow overrides for any values used directly by tests

Let’s change one of the tests from our example code to follow the Three As and use object creation methods:

@Test
fun `filter by language`() {
   var englishBook = createBook()
   createBookLanguage("EN", englishBook)
   var germanBook = createBook()
   createBookLanguage("DE", germanBook)

   var results = Filter(language = "EN").results()
   
   val expectedUuids = listOf(englishBook).map { it.uuid }
   val actualUuids = results.map { it.uuid }
   assertEquals(expectedUuids, actualUuids)
}

The changes made here are:

We modified the createBook() method to hide the boilerplate and allow overriding of the relevant details of the language value (the createBook() definition is not shown).
We renamed book variables to indicate their relevant differences.
We inlined the filter variable to make the Act step visible. This also allows it to be a constant instead of a variable, thus decreasing mutability.
We inlined the onlyFindsBooks() method and renamed temporary variables. This allows the separation of the Act step from the Assert step and clarifies the assertion.

Now, the three steps are much easier to identify. We can easily see why we are creating two books and their differences. It is clear that the Act step is looking only for "EN" and that we expect only the book’s English version to be returned.

At four lines of code, the Arrange step is longer than ideal. Even though it is four lines long, they are all relevant to this test, and it’s easy to see why. We could combine creating a book and associating the language into a single method. This makes the test code more complex and tightly couples the creation of books with languages in our test code, so it may cause more confusion than clarity. If, however, “book written in language” is a concept that exists in the domain, this might be the right call.

The logic in the Assert step could be better. That’s enough logic and noise to make it hard to understand if it were to fail.

Let’s extract those two areas and see how it looks:

@Test
fun `filter by language`() {
   val englishBook = createBookWrittenIn("EN")
   val germanBook = createBookWrittenIn("DE")

   val results = Filter(language = "EN").results()

   assertBooksEqual(listOf(englishBook), results)
}

private fun createBookWrittenIn(language: String): Book {
   val book = createBook()
   createBookLanguage(language, book)
  
   return book
}

private fun assertBooksEqual(expected: List, actual: List) {
   val expectedUuids = expected.map { it.uuid }
   val actualUuids = actual.map { it.uuid }
   assertEquals(expectedUuids, actualUuids)
}

This test requires nothing in the setUp() method, making it easy to understand without scrolling. You can dive into the details of the helper methods (createBookWrittenIn and assertBooksEqual), but the test is readable even without doing so.

As we apply these changes throughout the rest of the test suite, we’ll be forced to consider which books with which attributes are required for each test. The relevant details will stand out as we continue.

We may look at all the tests together and feel uncomfortable that we’re creating so many books! But we’re ok with that duplication because we know that while it looks like a duplication of code, it is not a duplication of concepts. Each test creates books representing different ideas, e.g., a book written in English vs a book released on a certain date.

Benefits

Our setup method will be empty, and each test will be readable in isolation. Changing our application code (e.g., the book constructor) will only require changing the method in one place. Changing the setup or expectation of a single test will not cause all the tests to fail. The extracted helper methods have meaningful names that fit into the Three As pattern.

Guidelines

Here is a summary of the key guidelines that we followed, as well as additional guidelines:

Each test matches the Three As pattern: Arrange, Act, Assert. The three-part pattern (setup, action, expectations) should be easily distinguishable when looking at the test.

Arrange

Setup code does not include assertions.
Each test clearly indicates relevant differences from other tests.
Setup methods do not include any relevant differences (they are instead local to each test).
Boilerplate “noise” is extracted and easy to reuse.
Tests are run and fail independently. Tests are each their own tiny universe with all the context they need.
Avoid randomness that causes tests to be non-deterministic. Test failures should be deterministic to avoid flaky tests that fail intermittently.

Act

The SUT (Subject Under Test) and the main thing being tested (target behavior) are easy to identify.

Assert

Favor literals (hardcoded) values in assertions instead of variables. An exception is when well-named variables provide additional clarity.
Tests don’t have complicated logic or loops. Loops create interdependent tests. Complicated logic is brittle and hard to understand.
Assertions don’t repeat the implementation code.
Consider fewer assertions per test. Breaking up a test with a large set of assertions into multiple tests with fewer assertions provides more feedback on the failures. Multiple assertions may indicate too many responsibilities in the application code.
Prefer assertions that provide more information when they fail. For example, one assertion that the result matches an array provides more information than multiple assertions that count the items in the array and then verify each item individually. Tests stop on the first failure, so feedback from subsequent assertions is lost.

A Note about Design

Sometimes, it is difficult to follow the above guidelines because the tests are trying to tell you something about the application design. Some test smells that provide feedback to the application code design include:

If this:

Too much setup could indicate a large surface area being tested; too much is being tested.
Wanting to extract a variable (thus coupling tests) because a literal is being tested repeatedly may indicate the application has too many responsibilities.

Then:

Consider that the application code has too many responsibilities and apply the Single Responsibility principle.

If this:

Comments are necessary to make the test understandable

Then:

Rename a variable, method, or test name to be more meaningful
Consider application code refactoring to provide more meaningful names or split up responsibilities

Additionally, don’t be afraid to wait until removing duplication feels “right.” Prefer duplication until it’s clearer what the tests are telling you. If an extraction or refactor goes wrong, it may be best to inline code and try again.

A Note about Performance

One more reason developers are driven to extract code duplication is performance concerns. Certainly, slow tests are a cause for concern, but often, the worry of creating duplicate objects is overinflated, certainly when compared to the time spent maintaining brittle tests. Respond to the pain caused by a lot of test setup by redesigning the application code. This results in both better design and lightweight tests.

If you do encounter performance problems with tests, begin by investigating the reasons for the slowness. Consider whether the tests are telling you something about the architecture. You may find a performance solution that doesn’t compromise the test clarity.

Conclusion

DRY is a valuable principle to apply to both application code and test code. When applying DRY to tests, though, clearly distinguish between the three steps of a test: Arrange, Act, and Assert. This will help highlight the differences between each test and keep the boilerplate from making tests noisy. If your tests feel brittle (often break with application code changes) or hard to read, don’t be afraid to inline them and re-extract along more meaningful domain seams.

It is important to remember that good design principles apply to application and test code. Test code requires the same ease of maintenance and readability as application code, and while code duplication may not be so harmful to your tests, allowing duplication of concepts causes the same problems of maintainability in test code as in application code. Hence, the same level of care should be given to the test code.

About the Author

Kimberly Hendrick

Show moreShow less

Uncategorized