The social lives of unit tests
We often think about ourselves as social creatures. In software development, unit tests have social lives of their own.
The overarching goal of unit testing is to provide a reliable safety net that allows us, the developers, to modify and enhance software without fear of breaking existing functionality. This safety net is crucial, giving us the confidence to refactor code quickly and prevent our codebase from devolving into the dreaded “ball of mud”. Ultimately, every developer seeks the agility to change code swiftly and confidently, relying on tests to catch any lapses in logic or unforeseen side effects.
Yet, unit testing, pivotal in software development, is fraught with misconceptions that can dilute its effectiveness. A major point of contention is the definition of a "unit", leading to two distinct approaches to testing, each profoundly shaping our methods and focus areas.
My favorite definition of a unit test comes from Vladimir Khorikov, and it’s a refinement of Kent Beck’s attributes of unit tests. In his book "Unit Testing Principles, Practices, and Patterns", Khorikov provides a concise breakdown for what unit tests should aim to achieve:
a unit test (1) verifies a unit of software, (2) in isolation and (3) quickly.
While the third point is straightforward, as everyone wants fast feedback loop from running the test suite, the first two warrant further discussion.
1. What’s a unit?
Jay Fields, who coined "solitary" and "sociable" unit testing, highlights that sociable tests aim to check broader behaviors, offering a stark contrast to the solitary tests that focus on smaller, isolated chunks of behavior.
The solitary approach, frequently identified with the London School of testing, views a unit as a single class or method, isolating it from external collaborators through the use of mocks. This method, while streamlining initial testing phases, tends to complicate maintenance as dependencies and system complexity grow.
Solitary unit tests focus on smaller behaviors — specific computations or orchestration of collaborators — using test doubles to isolate each unit and ensuring that mocked interactions adhere to predefined contracts.
However, this perspective can be restrictive and misleading, as it focuses attention narrowly on isolated functions rather than on the overall behavior of the system.
Contrastingly, the sociable approach, often aligned with the Detroit School of testing, defines a unit as a unit of behavior that encompasses interactions with its dependencies, unless external dependencies such as I/O operations demand isolation for efficiency.
Sociable unit tests embrace the real interactions between components, ensuring that the system behaves as expected in a more integrated environment.
This brings us to the second point.
What do we isolate?
The natural answer would be to isolate the unit under test.
If a unit is a class or a method, we want to test this unit in isolation from all external factors that might impact it.
All dependencies of that class should be replaced with test doubles (like mocks) in tests (except for immutable dependencies).
By doing this we are making sure that failures are localized to the unit itself rather than being influenced by external factors or the behavior of dependencies.
So… that means we will mock out any dependencies of that class away.
Our test landscape would then look like this:
What’s wrong with solitary unit testing?
A common complaint with this approach is that any modification in the codebase necessitates changes in the tests.
By focusing the unit on a specific method or class you automatically create a tight coupling between the tests and the implementation, rather than the behavior. This often entails extensive use of mocks to simulate interactions between the class under test and its external dependencies.
The core problem here is in fact precisely this: this kind of tests are heavily mocked. Mock-based tests are rigid. This coupling can manifest in tests that verify whether specific methods were called with particular arguments, rather than focusing on the outcomes those interactions should achieve. Any deviation from that implementation, regardless whether just a mere organizational change, will trigger some tests to fail.
Furthermore, by isolating each component and replacing dependencies with mocks, developers may inadvertently lock themselves into a particular design too early. Because this approach demands a deep understanding of the domain model upfront - which is not always the case -, thus often leading to premature design decisions.
Over-reliance on mocks and substitutes leads to brittle tests that are highly sensitive to changes in the implementation of dependencies, spiraling into a maintenance burden, as changes in the system necessitate widespread updates to the accompanying mocks.
Not only does this hamper the agility of the development process, but also contradicts the core purpose of unit tests — to document and make sure the code has the expected behavior.
No, the core purpose of tests is not to mirror the code structure. The core promise of tests is to help us, to make refactoring safe and easy to do.
Moreover, the argument that tests become stale over time typically reflects a misunderstanding of unit testing's role. If tests are designed to verify behavior, they evolve only when these behaviors change, thereby serving as a living documentation of system expectations.
The Problem with Deep Domain Testing
When tests drill too deeply into the domain, instead of focusing on the API or the service's external interfaces, developers end up with an inflated test suite where many tests offer little to no additional coverage. These overly detailed tests often duplicate the scenarios already covered by broader tests, adding maintenance overhead without any of the corresponding benefits. Moreover, when a new class is extracted from an existing behavior, the necessity to create additional tests and modify existing ones can dissuade developers from making beneficial refactorings.
What should trigger a new unit test?
Spoiler alert: it’s not adding a new class.
What should trigger the need for writing a unit test is implementing a requirement. This means the unit is a UNIT OF BEHAVIOR. It's about fulfilling a functional need, not just verifying implementation details. This functional need is what should drive test creation, because it reflects what the software is expected to do.
In contrast, writing a test simply to check if a method CalculateSum
, correctly adds up two numbers, focuses too much on implementation details. Such details, albeit important, should not dictate the testing strategy.
Instead, testing should be based on higher-level objectives or user needs.
This is where your tests should be targeted, ensuring they are relevant and directly tied to the requirements of the software.
How do we implement sociable unit testing?
Focus on the public exports of your module, your facade. This preserves module encapsulation, crucial for maintaining flexibility.
Don’t focus on testing internals. These are implementation details.
Now, I know there’s a common practice to make internals visible to test projects. In C# we have the InternalsVisibleTo attribute. Refrain from using it, if you want to maintain the encapsulation of your module. It’s this encapsulation what actually preserves the system’s flexibility.
There’s one exception to this rule, and that’s when touching legacy code. There it’s perfectly fine to go ahead and break the encapsulation, as it is more valuable for that code to get under tests than to preserve its encapsulation. For new software, don’t use InternalsVisibleTo. This is what tweaks your development speed for the better in the long run.
Again, when we talk about sociable unit testing, the system under test is not just a class. The system under test are the exports of a module — its facade. This is your Domain in DDD that you’re testing. I’m not going to dive too much into this, as it’s a whole topic on its own, but simply think in terms of hexagonal architecture.
Allow your tests to be sociable. Allow them to communicate with real dependencies. The unit of isolation here is the test itself.
Circling back to Vladimir Khorikov’s three critical attributes of a unit test:
Unit of Software: This refers to the smallest piece of software that can be isolated for testing, which exhibits specific, meaningful behavior. This unit is not necessarily limited to a single method or class, but can involve any functional aspect significant from a business perspective.
Isolation: This attribute emphasizes testing the unit in a controlled environment where dependencies are either simulated or managed to ensure the focus remains on the unit itself, rather than on external interactions.
Quick Execution: The ability of a test to run swiftly complements the agile development process, enabling developers to receive immediate feedback and continuously refine the system with minimal disruption.
Code example
Let’s consider the domain of a Human Resources Management System
(HRMS). This system is responsible, among others, for managing employee records and payroll processes. We'll focus on a domain model that involves the interaction between Employee’s payroll calculations, considering factors like overtime payments.
The Payroll business logic is implemented by the StandardPayRatePolicy, as follows: if it’s standard time, then multiply number of hours by standard rate, otherwise, if overtime, multiply the rate by 1.5 first, and then multiply by the number of hours worked.
In solitary unit testing, this would start with creating a test class for the Employee class. And since we want to isolate the unit under test, we will mock every dependency of that class - IPayRatePolicy
in our case.
This ends up with twisted tested behavior. It’s a perfect depiction of how tests cluttered with mocks tend to focus too much on the interaction with mocks rather than the real outcomes, making it challenging to discern what is actually being tested and why tests fail when implementation details change.
Furthermore, frequent changes in the codebase will probably lead to outdated mocks, making it difficult to ensure that tests still reflect the true behavior of the system.
But what about PayRatePolicy
?
For it, we would end up creating a new test class, which indeed would be focused on the behavior itself.
But what happens when we realize some of that behavior needs to be extracted away to keep our code base clean?
You guessed it. Tests will start failing and new tests will need to be written to accommodate the new implementation details. But did the business logic change? Not one bit.
Let’s now turn to sociable unit tests.
We end up with less tests overall, which are laser focused on our business logic. Regardless of how many tweaks and splits into separate classes we do, as long as the behavior stays the same, the tests won’t fail. Our trust in the test suite increases, and our overall development speed together with it.
General guidelines for writing sociable tests
Utilize specific domain entities
Start by using specific entities that are central to your domain. This helps ground your tests in the reality of your application's environment. You can then modify these entities according to the specific requirements of each test. It ensures that the tests are relevant and provide valuable insights into how the system behaves under different conditions.
Make asserts meaningful
In sociable unit testing, asserts are not just checks; they serve as documentation for the business logic they test. Each assert should clearly communicate what it’s testing and why that behavior is important. This makes tests easier to understand and maintain, and it turns the test suite into a form of documentation that can be useful for both developers and non-technical stakeholders.
Focus on the Behavior, not the Class
Shift your focus from testing specific classes to testing behaviors across classes. This change in perspective helps ensure that the tests assess the system’s functionality and user-facing behaviors rather than just the internal logic of its components. By focusing on behaviors, sociable unit tests can provide more meaningful assurances about the system's readiness for production.
Efficient execution strategies
Sociable unit tests can be more resource-intensive than solitary tests, primarily because they involve more components and require more setup. To manage this, consider not running all sociable unit tests all the time, especially during the early stages of development. Alternatively, you can run all the tests but do so in parallel to reduce the time impact. Modern CI/CD tools and test frameworks offer capabilities to run tests in parallel, leveraging cloud resources and multi-threading to handle the increased load efficiently.
Concerns and challenges
Breaking the testing pyramid
There is a common misconception that sociable unit tests blur the lines with integration tests, potentially breaking the traditional testing pyramid.
However, that’s not the case, as each type of tests serves distinct purposes:
Unit Tests
While they involve multiple classes, the focus of sociable unit tests is on the logic within a specific module or component, regardless of how it is organized. They are interested in the smallest units of behavior, which in this context, are the public exports of a module—essentially the functionalities exposed to the users or other modules. The key is what happens in the facade as a result of that logic, not just that the integration points work. We are testing internal logic (Class B, C, Z) through that facade (Class A).
Integration Tests
These tests are designed to verify the collaboration between such modules and ensure that various parts of the system work together as expected.
End to End Tests
They are responsible to make sure the whole system communicates well.
Addressing Architectural Challenges
Sometimes, creating a reusable setup for sociable unit tests proves too challenging, indicating a potential issue with the architecture itself. Difficulty in configuring tests for certain interactions might suggest that the components in question do not belong in the same context or domain. This difficulty can serve as a valuable indicator that it may be necessary to reevaluate and potentially redefine the architectural or domain boundaries. Such a reevaluation can lead to a more intuitive and coherent system architecture that better supports both development and testing.
The Role of Mocks and Test Doubles
Despite the emphasis on using real dependencies, the use of mocks and test doubles cannot be completely eliminated, especially when dealing with external services where non-determinism is a factor. Even staunch advocates of classical testing acknowledge their value in removing non-determinism, particularly in awkward collaborations.
Favor using fakes—simplified implementations of complex interfaces—over mocks. While mocks are useful for isolating behavior, they can lead to tests that are too closely tied to the implementation details of a component. Fakes allow for testing the interaction with dependencies in a way that is less brittle and more representative of actual interactions within the application.
When to Use Test Doubles
The decision to use test doubles should consider:
Complexity of the SUT: In highly complex systems, using test doubles can help manage this complexity by simplifying the test environment. However, for systems where interactions are crucial to functionality, real dependencies might be necessary.
Test Maintenance Effort: Test doubles can reduce initial setup time but may increase maintenance effort if the system evolves and the interfaces between components change.
Efficient Execution Strategies
Given the often extensive nature of sociable unit tests, it's impractical to run all tests frequently. Typically, only those tests that pertain to the code currently under development are run regularly. This targeted approach, which I refer to as the "compile suite," is essential for maintaining the balance between depth of testing and runtime
Advantages and Challenges of Sociable Unit Tests
Pros:
Validate Refactoring: One of the significant advantages of sociable unit tests is their ability to validate the system's behavior after refactoring. Because these tests are less tied to implementation details, they allow for substantial code modifications without the need to rewrite the tests, provided the external behavior remains consistent.
Simplified Maintenance:
By reducing the reliance on mocks and stubs, there is less overhead in configuring and updating tests. This simplification leads to easier maintenance since changes in the codebase do not necessitate widespread updates across numerous mock setups.
Behavioral Coverage: They are excellent for capturing emergent behaviors in complex systems that might be missed by more granular tests. These tests validate the integration of components, ensuring that the system works as expected in a setup that closely mimics the production environment.
Reinforce Better Modularization: By testing how well components interact with their real dependencies, sociable unit tests encourage better system design and modularization.
Indicate Boundaries: These tests help define and reinforce the boundaries of a component's responsibilities by clearly outlining how it should interact with other parts of the system.
Enhanced Reliability:
Using actual code interactions rather than relying on potentially flawed mock contracts increases the reliability of tests. Real interactions expose the system to more variable conditions and edge cases, improving the robustness of the software.
Cons:
Verbose/Complex Setup: Setting up sociable unit tests can be more complex and time-consuming than solitary tests because they often involve configuring real system dependencies.
(Can be) Slower: Because they interact with real components rather than mocks, sociable unit tests may execute more slowly, which could impact the speed of the development cycle. Whenever that really is the case, use test doubles.
Conclusion
By fostering tests that mirror real-world use rather than just code structure, sociable unit tests enhance not just the robustness but also the relevance of our testing practices. They prepare software not just to work, but to work in the way users truly need. This not only stabilizes the foundation of our applications but also streamlines the development process, making our software more adaptable and resilient to changes.
Pay it forward
Here are this edition’s knowledge nuggets that made me pause and think:
Server-Sent Event - by
Stateless Architecture - What's the Deal? - by
Why you need TDD - by
How an empty S3 bucket can make your AWS bill explode - by Maciej Pocwierz
If you liked this post, share it with your friends and colleagues.