
[Originally written by Zingage Principle Engineer Ethan Resnick as an internal memo]
Carefully Balance What Your Code Promises and What it Demands; Err on the Side of Promising Less
Every piece of code promises some things to its users — e.g., “this API endpoint will return an object with fields x, y, z”; or, “this object supports methods a, b, c”.
Put simply, the more things that a piece of code promises, the harder the code is to change, because a new version of the code either has to continue to deliver all the things that the old code was promising, or all the users of the old code must be updated to no longer depend on the parts of the old promise that the new code will no longer fulfill.
Therefore, the key to long-term engineering velocity is to keep the set of things your code promises small. This usually boils down to not returning data and not supporting operations that aren’t truly necessary.
Some examples of this in practice:
- We've gradually removed methods from our repository classes, and stopped adding a bunch of methods by default, because keeping that API surface small makes it easier to refactor how the underlying data is stored/queried. Similarly, we're phasing out the ability to load entities with their relations, and to save an entity with a bunch of related entities, because those promised abilities are very hard to maintain under many refactors (e.g., as we move off TypeORM, or as we split data across dbs/services).
- As a large-scale, non-Zingage example, consider how the QUIC protocol, which powers HTTP/3, encrypts more than just the message’s content — e.g., it also encrypts details about what protocol features/extensions are being used. These details aren’t actually private, but they’re encrypted primarily so that boxes between the sending and receiving ends of the HTTP request (e.g., routers, firewalls, caches, proxies, etc) can’t see these details, and therefore can’t create implementations that rely on this information in any way. Exposing less information to these middle boxes was a conscious design decision aimed at making the protocol more evolvable, after TCP proved essentially impossible to change at internet scale.
However, there is one real tension when adhering to this principle of “keeping the contract small”, namely: the flip-side of a piece of code promising fewer things is that the user of the code can depend on less.
A great example is whether a field in a returned object can be null. If the code that produces the object promises that the field will be non-null, that promise directly limits the system’s evolvability; supporting a new use case where there is no sensible value for the field (so it should be null) becomes a complicated task of updating all the code’s users to handle null. However, before a use case arises where the field does need to be null, promising that the field will be non-null simplifies all the code that has to work with it; it doesn’t have to be built to handle the case of the field being missing prematurely.
So, when deciding exactly what your code should promise, consider:
- Are the users of the code under your control? For example, our main API is only used by our frontend, which we control. Therefore, it’s relatively-straightforward to make a breaking change in the contract exposed by the API, as we can also update the frontend. Contrast this with an API endpoint in the partnership API, which is called by our customers: we don’t control their code, so changing the endpoint’s contract to no longer return certain data requires a long, complicated coordination process with them. In situations like this, where you don’t control the consumers, keeping your promises small is essential.
- How easy is it to identify and coordinate with all the code’s users? In the case of an endpoint called by our customers, we at least have a comprehensive list of our customers, a way to email all of them; plus, we have a way to see which customers are using which endpoints (through API usage logs or similar). This makes it possible, if time consuming, to coordinate breaking changes with them. However, imagine we had an API endpoint that was open to the world with no authentication; in that case, we’d have no way to know who’s using it, and no effective way to coordinate with them to update their code to accommodate a breaking change. Unauthenticated open endpoints are one extreme; the other extreme might be a Typescipt utility function in our API repo. In a case like that, if we change the function’s return type or to take an extra argument, Typescript will literally guide us to all the callers of the function that are broken by this change.
- Corollary: Invest in tracking the usage of your APIs, because the easier you can identify callers, the easier it is to change the API.
- How much would the callers benefit from a particular addition your code’s promises? Promising that a field won’t be null is a great example of promising something that isn’t strictly necessary to promise — and yet doing so can be worth it if it saves a lot of users of the code a lot of boilerplate and/or edge-case-handling work, which they might need were the field marked as nullable.
- How hard will it be to uphold the promise over time?
Finally, it’s not just the case that code promises things to its users; it also demands some things from them — usually in the form of required arguments. Here again, there’s a similar balancing act: the more your code demands, the more flexible it is — future use cases may be much easier to support if your code can count on having certain arguments provided to it, e.g., and those arguments have no cost from its perspective (it can always ignore some of them, or can loosen those demands later by making some optional). However, everything your code demands is something that its users must be able to promise, so this can complicate all users (e.g., to get the data for the required argument into them).
Make Illegal States Unrepresentable
A huge part of what makes programming hard is that the system can be in an inordinate amount of states, and its very hard to write bug-free code that properly accounts for all of them. By making illegal states/inputs unrepresentable, we can greatly reduce the number of bugs and make our code more reliable and easier to reason about, with the need for fewer assertions.
See https://www.hillelwayne.com/post/constructive/, which refers to this as “constructive data modeling” and reviews some common approaches.
One common manifestation of this principal in our code base is the use of discriminated unions rather than having multiple cases smooshed into one object type with some nullable properties.
To give a concrete example, you should avoid something like:
type ProfileOrBusinessId = { profileId?: string; businessId?: string }
Instead, prefer:
type ProfileOrBusinessId =
| { type: "BUSINESS_ID", id: string }
| { type: "PROFILE_ID", id: string }
The difference is that the former type allows zero or two ids to be provided — both of which should be illegal — and forces all the consuming code to handle both illegal cases. This can create cascading complexity (e.g., if a consuming function throws when it gets no ids, then its caller has to be prepared to catch that error too) and different bits of consuming code could easily end up doing different things in the case where both ids are provided (if they the properties in a different order). By contrast, the second type requires exactly one id to be provided.
“Parse, Don’t Validate”
After your code verifies something about the structure of input value it’s working with, strongly consider encoding that result into the types. See also https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/
Our extensive use of tagged string types are a good example of that: after we verify something at runtime, we record it in the types so that we can write code that, at the type level, requires the right kind of argument. (See, e.g., OwnedEntityId
and isOwnedEntityId
).
Duplicate Coincidentally-Identical Code
Two pieces of code that currently do the same thing should only reference common logic or types for that functionality (e.g., a shared base class or a shared utility function) — that is, the code should only be “made DRY” — if the two pieces of code should always, automatically evolve together.
Some concrete examples:
- The input type (e.g., DTO) for updating an entity should not inherit from the type used when creating the entity. If the update type were to
extend
the creation type, that would mean that any new field that can be provided on creation would also automatically be allowed in the input to an update. However, not every new field should be set updateable: having fields be automatically accepted in an update input can create security risks, and it extends the contract that your code has to uphold. On balance, then, it’s likely better to force the developer to decide, for each new field, whether and how that field should be allowed on update — even if that requires a bit more duplication and boilerplate for fields that are settable on create and update. (This example is just a variation of the “fragile base class” problem.) - The types used in the application to reflect the shape of values stored in the database should not be the same as, or in any way linked to, the types that the application’s services use to reflect their legal inputs, outputs, or intermediate values. The reason is that a change to the type the application is working with does not automatically change the shape of data in the database; that data will only change if the developer remembers to explicitly migrate it. Therefore, having the type used for data in the database be linked to (and therefore automatically change with) other types turns the database types into lies, and actively hides the need to first migrate the data in the database, which the compiler would otherwise warn about.
In general, consider that, when multiple pieces of code reference some shared logic, updating that logic in the one place where it’s defined will lead those changes to cascade everywhere; that’s the whole point of DRY, but it’s a double-edged sword: sometimes, it prevents bugs by keeping every user of the shared logic in sync, on the latest (most-correct) version; but, sometimes the automatic propagation of changes (to places where the new assumptions/behavior of the new code should not apply) is itself the cause of bugs. Hence the original guidance: DRY up code that you’re reasonably confident should always, automatically evolve together, or DRY up code where, because of the specifics of the domain, the benefits of automatic change propagation outweigh the risks.