Zen of Defensive Programming: Code Safety

By Guido in Programming April 25, 2022

We are now reaching the part that people usually associate with Defensive Programming: Code Safety. As I mentioned previously, often considered to be the only subject in defensive programming, as you have seen, it represents only a small subset of the subject as a whole.

Code safety generally consists of two steps, typically: Assertions and Guard Clauses.

Assertions

Assertions are available in most modern languages and are used to ensure that assumptions are met and enforced when a function is called. Depending on your platform and language, it will terminate the program or trigger an exception, letting you know that a condition was met that should have never occurred.

Think of a function’s signature as a contract with the outside world. You feed certain arguments into the function and you expect a result out of it. Let’s say, a function takes a numeric input representing a month and returns a string with its name. The implicit contract involved in this case states that months only have a range from 1 to 12. This never changes! Anything outside that range is illegal and something that cannot be processed correctly. In fact, if anyone ever tries to access a month outside that valid range, you should be made aware of it—for safety purposes!

std::string nameMonth( const int month_ )
{
  const std::string monthName[] = { “January”, “February”, … };
  return monthName[ month_ - 1 ];
}

Anyone passing a value outside the legal range will cause havoc with the return value. If you’re lucky, it will result in a crash. If you’re not so lucky, the program may actually continue to work and bad data will poison your system. An assertion can ensure that whoever is using this function will provide only values within the legal range.

std::string nameMonth( const int month_ )
{
  const std::string monthName[] = { “January”, “February”, … };
  assert( month_ > 0 && month_ <= 12 );
  return monthName[ month_ - 1 ];
}

This way an assertion can instantly alert us if someone is trying to misuse the function. We can then investigate what is causing it and fix the underlying problem. Assertions are a debugging tool, and as such, they generate code only when building or running your code in debug mode. In production mode, these assertions are automatically removed and create absolutely no overhead.

That can create a pitfall, however, that programmers need to be fully conscious of. Never perform any operations in an assertion call because the entire code line will be excluded from the production build!

assert( addNode( newNode ) < MaxNodes );

In the production build of your code, the addNode() function call will never be executed because it will never be included in the source at all! Never use expressions in your assertions that have side effects or modify a state, as in the example above.

It is advisable to use assertions liberally in your code. Make sure, though, to use them exclusively to check for inherently contracted conditions. Do not use them to report run-time checks, such as a situation where you were unable to open a file or could not allocate memory. These types of cases are prone to happen and need to be handled gracefully at runtime.

Assertions are used to confirm the function contract is upheld, such as expected range checks, data types, and the like. Ask yourself, whose fault it is if a property is invalid. If the system generates it, it requires a runtime check. If it’s the programmer’s fault, it’s a case for an assertion!

Assertions are typically very barebones affairs that do not provide a lot of information in and of themselves, other than, perhaps a stack trace. To make them more useful, you can apply a little trick and easily add a message to them. In the case of the month name example earlier, we can extend our assertion like this

assert( month_ > 0 && month_ <= 12 && “Illegal month provided!” );

Upon termination, the program will display the code line that triggered the assertion, making it easier for us to see at a glance, what happened. Since the && operation to tag on the message is always true, it does not affect the outcome of the actual check itself.

Other languages may have slightly more sophisticated assertions. If you are working in Python, for example, you can write the assertion like this

assert( month_ > 0 && month_ <= 12), “Illegal month provided!”

It will then trigger an assertion that looks like this

AssertionError: Illegal month provided!

Oftentimes, it may be a better solution to roll your very own assertion implementation. You could add the ability to add a proper print message to it that can provide you with better information. You could also implement severity levels, or force it to fall back into a debugger on the line in question. Or, perhaps, you’d like to log the assertion message to a file for later reference, etc.

In terms of Defensive Programming, we want to use assertions aggressively. This means you should consider asserting every incoming function argument to ensure all your assumptions are met. Only then should you begin processing them.

If your function expects unit values, add an assertion that ensures, the range is between 0.0 and 1.0. If you expect a valid pointer, perform a NULL-check. Make sure your assumptions are met. Do not create sloppy interfaces that forgive and try to accommodate illegal values with workarounds. In the long run, these kinds of fixes will only weaken your code. Create a concise contract and enforce it. Always!

Guard Clauses

Somewhat in the same vein are guard clauses. They are the runtime equivalent of assertions; precondition checks that make sure that values are safe to be processed.

If your function expects a pointer, but it is, indeed, valid to provide it with a NULL-pointer to skip specific processing, a guard clause would be used to properly catch and handle it.

void processNode( const node *node_ )
{
  if ( node_ != nullptr )
  {
    // … do some processing
  }
}

These types of guard clauses are in place to support your code lattice. To correctly implement them, it is, however, critical that you fully understand your use cases. It is very easy to overuse them and to allow premature pessimization to sneak into your thinking. Be on the lookout for guards that are not necessary.

Avoid creating guard clauses for cases that can never realistically occur or are outside your application’s domain. If your function takes data from a generator, for example, and the generator, by definition, produces output of a specific type and range, adding a guard clause to ensure the integrity of the data only adds unnecessary overhead and noise to your code.

Code Safety

Writing safe code is a key tenet of the defensive programming paradigm. It does not necessarily imply writing code that is impervious to attacks, though that is something to consider. Rather, it implies code that is handling itself cleanly and safely.

Type safety is at the top of this list. Leverage the compiler’s knowledge of your code and data, as we discussed before, by making sure it compiles without warnings. Furthermore, prefer datatypes that the compiler can type check because that is the only way the compiler can point out to you when type issues arise.

In practical terms, this means, do not use C-style preprocessor #defines and macros. The preprocessor allows you to sidestep many of the compiler’s built-in type checks, putting your code at risk. If you need to define a number, use a const declaration instead of a #define. A const is tied to a data type that the compiler keeps an eye on. If the type is misused or misassigned, the compiler will tell you. A #define oftentimes does not provide you with that extra layer of protection and what’s worse is that preprocessor macros can introduce truly gnarly side effects that can turn into Sisyphean tasks to uncover. We will touch upon that in a later installment, so for the time being, please just accept my advice to make liberal use of the const keyword, or final if you’re coding in Java, and remove #define from your arsenal altogether, unless it is used to make a compile-time abstraction.

It all boils down to one key thing in the end. We want the compiler or language interpreter to be able to find as many errors, bugs, and problem areas as soon as possible. We want to leverage the tools and knowledge the compiler offers us and put it to use for us. This will save time down the line and make code more reliable right out of the gate.

Security

Before we move on to concrete ways and examples that will illustrate good defensive coding practices, we also have to briefly talk about code safety.

We’ve all heard of denial-of-service attacks and exploits that create vulnerabilities in applications. You may think this does not apply to your project, but the sad state of affairs is that no one is safe. If there is code, there will eventually be someone trying to break, hack and exploit it. Therefore, it is important to keep a few things in mind when writing new code.

One of the most common problems is buffer overflows, particularly in conjunction with text strings. The reason is that they can be easily located and hacked. The best way to avoid these kinds of exploits is to make sure that your string buffers are always correctly sized. Further, make sure to use size-bounded operations. If you are using something like strcpy(), for example, you should forget it even exists and use strcpy_s() instead if you’re working in Windows, or better yet, strlcpy(), if you’re using a POSIX compatible platform. This allows you to properly avoid buffer overflows by inheriting limiting access to the buffer. Familiarize yourself with the entire family of size-bounded library functions of the language you are using.

Naturally, exploits are not limited to string operations or buffer overflows. Be aware of potential problems that might arise if users abuse your system—particularly any input coming from the outside world. Whether it is a keyboard input or a message arriving from a remote sender or a service, always, always, always, make sure to do a length check before doing anything with the data. The next step then would be to do a sanity check to make sure the data is actually of a valid format, and so forth, before letting it trickle down into your actual system for processing.

This concludes the general overview of what is meant when we discuss defensive programming. In the next installment, I will begin showcasing specific examples, practices, and coding style habits that will take you to the next level! Stay tuned.

Zen of Defensive Programming
Part I • Part II • Part III • Part IV • Part V • Part VI • Part VII • Part VIII • Part IX