Quantifying Good Code

When a team does code reviews, there are many opportunities for opinion-driven arguments.  These can be difficult to resolve without well-defined criteria that can be objectively evaluated.  When team members come from different backgrounds, they bring different style preferences and different ideas about what makes code good.  This post is an attempt to establish objective criteria, which will hopefully result in conversations about code that are driven more by data than opinion.

Criteria 1: Code should accomplish the intended objective

This probably seems obvious, but it's an objective metric and a good place to start. If the code doesn't do what it's supposed to, it shouldn't pass code review. Of course, assessing whether it works properly (and in all cases) is often not straightforward. This leads us to a next objective criteria of good code.

Criteria 2: Code should be tested

One of the most important tools for a developer are automated tests. Proving code works in all cases is difficult, and rigorously proving code works is incredibly challenging in most cases, so it's important to at least develop a body of evidence of the cases in which your code works. If code is not checked by automated tests, it probably shouldn't pass code review. There are times when it's difficult to test code using automated tools. Sometimes there are good reasons for this difficulty, but most of the time the cause of the difficulty is the structure of the code. This leads us to the next objective criteria.

Criteria 3: Code should be (easily) testable

Code that is testable is easily checked by automated tests. A good sign that code is testable is if the corresponding unit tests are straightforward--requiring only small amounts of simple setup code to check that the behavior of a function or class is correct. If unit tests must mock many objects or duplicate the behavior of collaborating units to check that a unit under test is behaving correctly, the code is not easily testable.

If unit tests are easy to write, programmers will be much more likely to write more of them. When code is easily testable, automated tests become the straightforward way for a programmer to run their code. The alternative is spinning up the whole application and manually running through some steps. That's a lot of work when an automated test that does the same thing can be written just a quickly.

Usually code that is easily tested is also broken down into small chunks. Classes are small, have few dependencies, and adhere to the single responsibility principle. Functions likewise are small, require few arguments, and avoid having side effects wherever possible. This leads to the next criteria.

Criteria 4: Code should be broken down into small units

Programming is all about breaking problems down into smaller, more tractable problems. When a code solution is broken into small pieces of functionality that come together to solve the larger problem, it is easier to test each piece in isolation. It also becomes easier to re-use common pieces when different parts of code require solutions to the same problems.

Long methods are particularly troublesome. They usually result in unclear coupling between all the variables in that function's scope. Usually long methods persist for this reason--it's difficult to untangle which sub-steps require which variables. When those dependencies are understood, they are often very difficult to separate without duplicating the setup code that establishes some of the variables in the first place.

Long methods do more than one thing, so proper behavior in all cases is difficult to check in tests. To check behavior of a middle step (say step 2 in a 3-step long-method process), you have to mock out all the behavior required for step 1 to properly execute before code even gets to step 2, then to check the results, you have to make sure all the dependencies behave sufficiently well to make it through step 3 with a return value that can be checked. Thus the tests for step 2 require detailed knowledge of steps 1 and 3 as well.

Sandi Metz defines four simple rules for developers to follow and produce good code.

1. Your class can be no longer than 100 lines of code.
2. Your methods can be no longer than five lines of code.
3. You can pass no more than four parameters and you can’t just make it one big hash.
4. When a call comes into your Rails controller, you can only instantiate one object to do whatever it is that needs to be done. And your view can only know about one instance variable.

You can break these rules if you can talk your pair into agreeing with you.

The rules are focused on ruby / rails development and the limits probably should be expanded for boiler-plate rich Java codebases, but the idea of small units is consistent with this smallness criteria.

Criteria 5: Code should be necessary

Every class, function, and interface should be necessary to accomplish the stated goal of the task. Code that is not being used (aka "dead code") complicates the reader's job of understanding a given section of code as they first need to learn that it is dead, and then actively ignore the code. Dead code adds more visual clutter when reading code, and more mental clutter when comprehending its purpose. Dead code also likely violates criteria 1 and 2 in that it likely doesn't work and likely isn't well tested. This becomes especially risky as future features may see the dead code and assume it works, use it, and later be forced to go fix the parts that are broken.

Unnecessary code can also be good code that does more than the stated objective. Sometimes this is good, and the stated objective should be expanded. Other times it would be more easily reviewed and tested as a separate code review, even if it is good code and a worthwhile feature.

It can be tempting to solve two or more problems in the same pull request "while I'm in there..." It's good for a team to have a guiding principle that encourages focus on the task at hand and small pull request / code reviews.  This allows the objective task to be quickly accomplished, and lowers the risk of code review quagmire. Less code to review means a faster review and less fodder for disagreement or nit-picking.

Criteria 6: Complexity should be necessary

Along the same lines as not allowing dead code, we should also avoid introducing unnecessary complexity. Classes and functions should not have more indirection, abstraction, or configurability than is required to accomplish the stated objective. It can be tempting to add extra abstraction and configurability in view of the future, thinking it's noble to do more work today some someone in the future can benefit.

However, this introduces two new problems. Firstly, until that future arrives, everyone who reads or modifies the code needs to deal with the extra abstractions and configuration options even though they aren't really necessary. Secondly, when the future actually arrives, it might not be exactly as expected (the future rarely is). Then the work that was done previously needs to be re-worked to accomplish the differing requirements of the actual future requirements. Typically it's easier to write fresh code from scratch than to read, understand, and modify existing code that was designed for different requirements. Because of these downsides, it's best to delay the extra complexity until it actually becomes required for today's coding objective.

Criteria 7: Code should be correct

When writing simple straightforward code it can be tempting to gloss over the corner cases and requirements that force additional boilerplate or complexity. For example, it can be easy to overlook checking for nulls and other bad inputs. Ignoring those details can make code look cleaner and read more clearly, but violates Criteria 1 and exposes potential for latent bugs.

Good tests can flush out problems of nulls and bad inputs, but there are other similar complications that are easy to overlook. For example, behavior in multi-threaded environments needs to be considered carefully. Usually the simple, straightforward, immutable solution will also work best in multi-threaded environments.

Another detail that's easy to overlook is object lifecycle when dependency injection is automated by some framework. JAX-RS / Jersey will automate injection of a Context into a Resource on each request, which can lead to unexpected behavior. These details need be dealt with, and covered in automated tests as much as possible.

Criteria 8: Code should be consistent with style standards

As basic and tedious as it may seem, it's important for a team to have coding standards and adhere to them. Basic code style is a tempting topic to argue over, and will come up continually until a team has a standard. The topic will continue to arise until that standard is universally enforced. These standards are by nature objective and enforcement is easy to automate. Every project should be set up to check for adherence to code standards as part of the build. If there are nits about style that regularly arise during reviews (like "add newline at end of file"), add that to the automated checks.

With automated checks, coders are forced to address style issues when their build fails. Then code reviews aren't bogged down in stylistic nit-picking, and the quality of discussion and quantity of good will toward co-workers is elevated. It's much better for team dynamics to have everyone get slightly annoyed at themselves when their build fails because they broke the style guidelines than to have people get annoyed at the team member who cares about consistency and continuously points out every style violation in code reviews (or at the team member who consistently violates the standards).

Summary

This is an attempt to collect a few objective metrics for assessing code quality. There is surely more that could be said about things like introducing new library dependencies, avoiding duplication within a codebase (DRY), rules of thumb for adopting frameworks (perhaps something like: "the amount of code required to interact with the framework should be less than the amount of code required to accomplish the same goal without the framework"), conventions for which libraries and classes to use for what purposes (eschewing java.util.Date for example), conventions for static methods and *Util classes (avoid them as much as possible), etc.

However, I think these criteria nail down some frequent concerns that arise when reviewing code that seem to devolve into subjective comparison of opinion. If I can convince my coworkers that these high level objectives are acceptable, then we can focus discussion on whether a particular aspect of code (for example) is strictly necessary or not, without having to re-hash the argument of whether code-for-the-future is a good idea (YAGNI).