Offensive Programming

And how I program without fear

Aug 29, 2023

“Do not think that I have come to bring peace to the earth. I have not come to bring peace, but a sword.” ~Matthew 10:34

School of hard knocks

C and C++ hold up the world of system software like Atlas holds up the earth. Every piece of software we take for granted is almost certainly upheld by one (or both) of these languages. For the uninitiated, these languages are not the backbone of system software because of their ease of use, brevity, or slick syntax. Quite the opposite: writing code in C and C++ is hard to get right. The syntax for function pointers is still something I have to Google, and I wrote C/C++ professionally for several years. Rather, these languages are ubiquitous because they offer extreme control over how the software works, and allow programmers to describe exactly what they want to happen at the hardware level.

What these languages don't do, however, is allow the programmer to relax.

This is valid C code (function signature omitted for brevity):

/* Make a stack-allocated array */
uint8_t my_array[4];

/* point this pointer to the start of the array */
int* my_int = (int*) my_array;

/* modify the array */
my_array[3] = 0xde;
my_array[2] = 0xad;
my_array[1] = 0xbe;
my_array[0] = 0xef;

/* 
 * Show that my_int has changed without ever being dereferenced
 * Prints "my_int = 0xdeadbeef"
 */
printf("my_int = 0x%x\n", *my_int);

The C language is but a humble servant; it lets me perform almost any operation like this without warning. This contrived example is rather simple to understand because we can see where my_int was pointed at another piece of memory. But what happens if the software becomes much more complex than this?

These languages let me create what I like to call eldritch-horror nightmare bugs. These are bugs that are made of concentrated primal evil that no mere mortal could endure. They are so difficult to find and eliminate that it's better just to rewrite the broken code from scratch. It's so easy to forget to initialize a variable, misplace a heap-allocated structure, refer to the same piece of memory in multiple places, have a piece of memory refer to itself, index an array out of bounds, or dereference NULL that every C programmer’s immune system will develop innate responses to these situations.

The programmer will start initializing every variable directly after creation, use reference counting to keep track of multiple references to the same piece of memory, use sizeof to determine array length, remember that arrays decay to pointers when passed on the stack and pass the length as an argument, and start checking for NULL before every dereference just in case. It gets tiring very quickly, especially when the programmer looks at the function they just wrote and realizes that ALL of the control flow is error checking. And so is the return value.

Defensive programming

Taken together, these idiomatic practices are termed defensive programming. Look both ways before you cross the street. Don't count your chickens before they hatch. Trust, but verify. Look before you leap. In all cases, take an extra hit on a missed branch prediction, muddle up the code a little bit with control flow, and avoid these errors by detecting them as early as possible. This style of programming focuses on avoiding errors by defending against their propagation. This requires that every inch of the attack surface be accounted for.

The thing that bugs me (no pun intended) about this approach is that now I must write a ton of code dedicated to doing something that my program shouldn't even be doing in the first place. The code becomes littered with convoluted error checking, state transitions, and recovery. Even worse, there’s very little chance that every case was accounted for. Does anyone really check to make sure that heap memory was allocated for every new or malloc?

After adding all of these branches to the code that have very little chance of ever occurring, what remains is a tangled web of error checking and dead code. In unison, the amount of unit testing for these branches increases. The complexity only compounds if the dead code is only dead due to compiler settings, but could become zombified by accidentally tweaking a flag. To err on the side of caution, the dead code is allowed to stay just in case this error occurs eventually.

I struggled with this scenario over and over and over while writing embedded C++. To make matters worse, we were required to have exceptions disabled for our hard-real-time system. The customer also required us to have no dead code and to test all execution paths in unit tests, but also to use defensive programming techniques to check for errors. Taken together, these requirements forced us to write tests which induced errors on purpose. This is a good idea in theory, but it quickly led to some absurd code. We actually had to turn off certain kinds of optimizations so our completely dead code would not be removed by the compiler. Yikes.

One of the most annoying parts was checking for NULL before dereferencing a heap-allocated value. This means that the error case would only ever happen if a heap allocation failed in the OS. Barring the fact that a failed malloc means that something went horribly wrong, how does one induce this error in unit testing?

// How do you even write a unit test for this?
// Maybe a mock malloc? Or use the preprocessor
// to swap malloc for something else?
int do_a_thing() {
    void * my_ptr = malloc(100);

    if (my_ptr == NULL) {
        // How do we reach this code in a test?
        return 1;
    }

    // How do we test if the memory got freed?
    return 0;
}

Many, many hours were spent answering these types of questions and forcing these errors to occur in our testing. The pre-compiler abominations I wrote are unspeakable. I found myself constantly const_cast-ing away const-ness to make things point somewhere else. I even wrote some evil code to get around private methods and variables in C++ classes. It was an absolute mess.

// i is const and private, not even Tom himself
// could change this value.

class A {
    private:
        const int i = 123;
};

A* a_ptr = new A();

// Damn the encapsulation, full speed ahead!
// This works because i is at the lowest memory address 
// where this object was created
int & i_ref(*(int*)(a_ptr));

i_ref = 321;

Introspection

Writing evil tests for code coverage seemed like a bad idea, but it also felt like the only way to cover our error handling branches at the time. At some point I realized it was time for some deep introspection. Should I really need to be doing this? Should I really be using the pre-compiler to change malloc to a version that fails? Should I really be violating the sanctity of encapsulation just in case? I couldn't see any other option at the time, so I struggled on.

I read the Object Oriented Design Patterns book and applied these patterns liberally, but I still kept running into the necessity of defensive programming for the underlying code. Every class had a bunch of parameter checking on the constructor, a bunch of parameter checking on the function signatures, return codes (usually enums), well-documented exceptions, and properly const-defined guzindas and guzoutas. I was trying to be as explicit as I could possibly be with how my code was defined so that I didn't have to check more error cases. And it was exhausting.

// This is a valid method signature in C++. Imagine trying
// to refactor a bunch of these.

const int * const do_something(const double * const data) const;

Writing C and C++, I learned extreme detail about computer architecture and compiler theory. I learned a huge amount of techniques for managing pointers, leveraging polymorphism, and defensive programming. I learned more than anyone should ever need to know about how structures are laid out in memory. My programs seemed to work, and I understood why they worked. But, always in the back of my mind, something did not sit right with defensive programming. It stuck in my mind like an unscratched itch.

A change of scenery

As my career changed directions, my new work environment allowed much more experimentation and exploration for different ways to solve problems. I was still unsatisfied with defensive programming. We wrote C# at the new company.

I found myself checking a lot less for low-level edge cases because of the nature of C#, but the defensive programming was still there. Checking for null before doing any kind of dereference is so common in C# that it's a language feature with its own null-coalescing operators (? and ??). It was not uncommon to write code like this:

// MyParam is a type whose value can be null
public static double DoSomething(MyParam p)
{
    // Dereference p if able, then dereference the return val if able.
    // The type here is "nullable double" where the value can 
    // be a null, or a double.
    double? v = p?.MethodThatMightReturnNullOrDouble();

    // If the above operation return null,
    // then return 0.0
    return v ?? 0.0;
}

This code is functionally equivalent to:

public static double DoSomething(MyParam p) 
{
    double v = 0.0;

    if (p != null) 
    {
        double? result = p.MethodThatMightReturnNullOrDouble();

        if (result != null) {
            v = result.Value;
        }
    }

    return v;
}

As you can see, the ? and ?? operators are just hiding all of the same defensive control flow that we would write before. This feels very nice at first, and definitely reduces accidental null dereferences. But this is the same pig with new lipstick. All of the messy control flow is still there, it's just hidden behind syntactic sugar.

Then, just about a year later, my software lead told me about a book called Functional Programming in C# by Enrico Buonanno. He and I had a bunch of discussions on error handling and control flow, and we had nearly identical experiences dealing with null in our careers. We both had the unscratched itch brought by defensive programming.

I bought the book and began consuming it, and quickly realized this functional style of programming was a whole new world of wonder. It's hard to describe the euphoria I had when I realized that this was the answer I had been looking for this entire time. The Optional type eliminated all uses of null, which eliminated those most common and pernicious source of error: null dereferences. Exception handling also morphed into something enjoyable; the Either type allowed us to handle errors explicitly, without try-catching. Finally, I felt I had the right tools to scratch the itch.

Let your yes be yes

At the end of the day, programs are really just data-transforming engines. Inputs go in, stuff happens, data comes out. The complexity is determined by the problem being solved, and compounded by the available API’s and the choices of the developers who implement the software. One of these choices is the language used to describe the solution.

Much ink has been spilled trying to define the multitude of programming paradigms, so I figured I'd muddy the waters even more by defining a brand new one: offensive programming.

Offensive programming is any programming method which does not require input validation or return value checking.

Many languages and libraries support this paradigm, but I'm choosing C# examples using the LanguageExt library since that's what I'm most familiar with. Rust is a prime example of this paradigm in a procedural language. For functional languages, almost all of them fit this paradigm. In most other cases, libraries are needed.

First Principals

The offensive programmer has two extremely important responsibilities when following this paradigm: define the data, and transform it explicitly.

Define the data

First, and I think most importantly, offensive programming requires us to define our data domains with extreme specificity. This is the primary weapon of offensive programming because it allows us to completely ignore validation. In other words, if the data domain is defined perfectly, then there is no chance that it could ever contain an incorrect value. I want to be able to trust that a type can describe exactly what I want it to describe, and nothing more.

The easiest example of what this looks like is a fundamental functional data type: Option<T>. This type has two possible values in its domain: Some(T) or None, where T is a generic type parameter and the value of Some(T) is not null. The programmer cannot accidentally misuse this value before it exists, cannot accidentally compare an Option<A> to an Option<B> (where A and B are not comparable), and cannot dereference null because null is not in the domain.

// Function signatures omitted for brevity
using static LanguageExt.Prelude;

// No value yet
Option<SchoolBus> school_bus = None;

// Now school_bus has a value
school_bus = Some(new SchoolBus());

At first, using Option feels a lot like checking for null. But if you think about it, null could mean anything. It could mean that something was deleted, it could mean that something wasn't initialized yet, it could mean that an operation failed. But the meaning must be tracked and checked by hand throughout the code. It's very easy for null to lose its original meaning, to compare two nulls that are not related, or to forget it's there at all. Option solves this problem explicitly: it's Some or None, end of conversation.

But how do we use this data in our code?

Transform the data explicitly

The second responsibility of offensive programming requires us to explicitly define our data transformations using functions. Functional programming is loosely defined as “a programming paradigm defined by composing functions together to solve a problem.” Stated in terms of offensive programming, functions are now maps between data domains. By combining explicit data domains with explicit transformations, the programmer can charge forward into any problem without fear.

These maps are best stated in terms of function signatures. For example, a function which maps from domain A to domain B would look like this:

// Transform a value in domain A to a value in domain B
A -> B

// Or something more familiar
double -> int

// Or even more familiar
int round(double value);

The offensive programming paradigm unlocks some programming super powers which are hard to understate. The first one is that throwing exceptions is now handled explicitly through types and mapping errors to specific handlers. There is zero ambiguity allowed here, the function must either perform the operation or crash out of the software entirely. Rust programmers are very familiar with this concept, and it’s one of the reasons this language is so beloved.

I want to emphasize here that side-effects break this paradigm. If you create a function which changes some memory not in the current stack frame, you have circumvented the contract. This can be impossible to avoid entirely, but most code can be refactored to mitigate this.

Going back to the Option example, we can define one of these transformations like this:

// Signature: Unit -> Option<EmptySchoolBus>
Option<EmptyBus> school_bus = TryGetSchoolBusFromMotorpool();

// Signature: EmptyBus -> FullBus
Option<FullBus> full_bus = school_bus.Map(bus => PickUpStudents(bus));

// Signature: FullBus -> EmptyBus
Option<EmptyBus> empty_bus = full_bus.Map(bus => DropOffStudents(bus));

// By the time we get here, either there was no bus available 
// and nothing happened, or the students were dropped off
// at school. There are no other possibilities.

After a while, chaining these types of transformations together becomes extremely powerful and descriptive. Code starts to read like spoken language, and I can trust that every corner of the input and output domains are handled.

Finally, armed with this new offensive programming paradigm, I am free to write code how I've always wanted to: without fear.

Handling errors

To keep in line with our first principal, errors must be part of an input or output domain in order to be possible. With errors now defined explicitly in a data domain, top-level try/catch blocks are a thing of the past. Errors can still happen because no system is perfect, but they are now explicitly handled. Languages with null and exceptions will still have errors, and still need to be checked. But they can be contained to the lowest level possible, and immediately transformed into a valid input or output domain value.

For example, Exceptions in C# would now be a value passed out of a function that had the possibility of failing.

// The Either<TLeft, TRight> type can be one of two possible values,
// Left(TLeft) or Right(TRight).

// In this function, Left is Exception, Right is string.
private Either<Exception, string> ParseFile(File f) 
{
    if (!f.Exists())
        return new FileNotFoundException("f does not exist");

    // Try/catch as usual, but don't allow the exception to propagate
    try 
    { 
        return Right(f.Parse()); 
    }
    catch (Exception e) 
    { 
        return Left(e); 
    }
}

public Option<string> TryParseFile(File f)
{
    // IfLeft maps a Right onto a Left, e.g. Exception -> string
    
    // If the exception is file not found, return a default value
    // If the exception is something we can't handle, log the 
    // error and crash.
    return ParseFile(f)
               .IfLeft(ex => 
               {
                   if (ex is FileNotFoundException) 
                       return None;
                   else LogErrorAndCrash(ex);
               });
}

Some classes of errors are very hard to test for, such as failed heap allocation I mentioned before. To continue playing offense, I often take one of these routes:

Ignore this as a possibility, and charge forward
Write a factory-like function with some defensive code that can return an error if it fails

Ignorance is bliss and I usually choose option 1, relying on testing and profiling to find memory leaks that could cause this class of error. But when writing code for embedded systems, or systems with very tight tolerances, writing factory-like methods ensures that the allocation was checked before being used. As an added bonus, option 2 allows me to inject errors for unit testing.

Handling shared memory

As mentioned before, one of the most nightmarish bugs any programmer could encounter is when multiple references to the same memory occur, and one of them is used to modify the data that the others were not expecting. The best way to mitigate this is to not modify the data in the first place. This is why side-effects break the offensive programming paradigm: if data is able to change unexpectedly, the programmer must access it defensively. Without the ability to modify shared data, the programmer can continue playing offense.

There are several ways to solve this problem, usually through the form of immutable data. Functional languages tend to put massive weight on this concept at the expense of copying data, or implementing complex iterator algorithms. Computer scientists much smarter than me have written papers on paradigms like copy-on-write, and they are worth considering when trying to maximize memory efficiency. Some languages (like Rust) make this determination for the programmer, while others (like C or C++) let the programmer choose. The offensive programmer must weigh these considerations carefully depending on their language, and shift back to defense if necessary for program efficiency.

Summary

Armed with offensive programming, I am quickly realizing the benefits in my code. Entire classes of errors (like dereferencing null) are eliminated. Exception handling just in case is a thing of the past. Unit testing is extremely easy because 90% of my functions required an input and output domain, and have no side effects. Because there are no side effects I also have de-facto immutable data, eliminating a whole class of shared memory errors. Return types explicitly tell me whether a function can return an error, and I can explicitly handle it. For the first time in a long time, I am completely unafraid of using function parameters without checking them. Data types are now contracts, and the compiler ensures those contracts can not be broken.

At last, I can finally program without fear.

In a nutshell

Offensive programming can be performed in any programming language, and requires only a few principals.

When defining a data type, ensure its domain is defined as explicitly as possible.
When mapping data from one domain to another, do it without side effects.
When steps 1 or 2 are not possible, switch to defense.

The Two Sigma Blag

Discussion about this post