Add feature flags with the end in mind

Since I first started using feature flags as the primary way to separate release from deploy and start shipping faster, unblocking git-flow and getting better control of shit, the feature-flagging train has really gathered some speed. I don’t need to go through the pros and cons, https://featureflags.io is a great resource, alongside all the internet discussion on the topic.

Complexity Spiral

Something I think that site doesn’t tackle in its Best Practices, which it ought to, is optimising for removal. Feature Flags are great, but they multiply code paths and add complexity to everything. Every time you add a binary feature flag that affects all existing code paths, you’re doubling the possibilities. By the time you’ve done that 3 times, you’ve created 8x the complexity (2³).

For larger repos, multiple teams committing, running their own releases and experiments, that soon mounts up. Without proper management, I’ve seen repos easily run up 300+ feature flags. Assuming only 2 variants per flag (and some have many variants — red, blue, green….) we’re talking more combinations of code paths than grains of sand on earth.

Now imagine debugging something occurring on one of those combos, somewhere 🤔.
The combination explosion makes simple services unfathomably complex. Understanding anything requires comprehension of feature flag state for tens or hundreds of flags. Contributing to a codebase with a thousand upstream possibilities is a nasty, smelly affair.

Like all things that buy you time early, their debt compounds without proper management.

Deep Roots

Trees dig deeper and deeper roots with age. Feature flags not implemented well in code can do just the same.

A flag like ShowNewWidgets might need to be read in a number of places. There might also be a desire to access flags at the root of the code path. This is particularly true if accessing the flag requires user context, which you don’t want to pass around everywhere.

For these two reasons it’s common to see something as high up as the request controller, reading a whole bunch of flags:

func requestHandler(w http.ResponseWriter, r *http.Request) {  
   user := parseUserDetails(r)  
   shouldUseFeatureA := features.IsOn(FeatureA)  
   shouldUseFeatureB := features.IsOn(FeatureB)  
   ....  
   shouldUseFeatureZ := features.IsOn(FeatureZ)  
   ....  
}

Once we’ve got all these values, they then have to passed down through the code path to where they’re used. Something like the following is common:

type Opts struct {  
   IncludeWidgets: shouldUseFeatureA,  
   OverrideOldDBRepo: shouldUseFeatureZ,  
   UseV2Query: shouldUseFeatureB || shouldUseFeatureX,  
}

…and as these values are passed down, their names change and the logic that applies to them increasingly obfuscates the origin of the value, the flags that determine them:

func someOtherFunc(o Opts) {  
   var sqlStatement string  
     
   if o.UseV2Query && o.OverrideOldDBRepo {  
      sqlStatement = fasterQuery  
   } else { ... }  
   ...  
}

Now if I want to come and remove a flag, there’s a barrier to what should be a simple task.

If FeatureX is rolled out, I want to delete it from the code to remove a set of combinations and simplify things. But how would I go about that safely? features.IsOn(FeatureB) will always resolve to true now, so I could simply replace that statement with shouldUseFeatureB := true but that isn’t actually removing any complexity. The checks against that value still exist inside the codebase, and that is the cognitive overhead we want to reduce. We want to remove all such checks and only use the new codepath.

Opts.UseV2Query is an OR statement with FeatureB and FeatureX. We need to consider the impact of this if we remove FeatureB. If FeatureB is always true we can’t delete this check, or else it will alter the outcome of the OR condition. Further down, Opts.UseV2Query determines which query to use, again in combination with another flag, this time in an AND condition. Once again we need to be sure that our removal of the flag leaves the code in the same state as if the flag remained, 100% on.

Indeed, as flags are passed deeper and deeper, we need to evaluate each and every place that the flag contributes to a decision, in order to remove the flag, its repercussions and the cognitive overhead it contributes to the source code. The deeper it is passed, and the more it is used, the harder it becomes to remove. The roots are dug too deep.

Short & Sweet

Your feature flagging tool should have a TTL option on any flag, a date you prescribe after which it will make your life hell until that flag is gone. Try a short default. Most experiments don’t need more than a few weeks, and flags for safe releases can be removed the day after it’s rolled out. Your tool should have alerts for that too.

There are several advantages to short, sharp, in-and-out flag use.

First, it minimises the spreading effect of any flag-derived value. Fewer people will have time to use it, in fewer places. Get it out before the roots dig too deep, latch onto to too many things.

Second, the sooner you remove a flag, the more knowledge you have to hand about the flag. I’ve had flags that have been there for months, and I can’t remember what they did. I’ve seen flags laying around for years. Everyone involved with the flag has now left the company, retired. Get the flag out while its role and use is fresh in memory.

Third, if you get a flag out of the code as soon as it has no use, you get it out before you have time to add another. If you move onto another project, with old flags still in the code, you’re likely to add another flag before you remove that one. Flag combinations grow and things become exponentially complex. To fight this, you could try the one in, one out rule. You can’t add a flag without removing one.

Short lived flags are good hygiene. Long lived flags are tech debt.

Hot Potatoes

Reading flags in bulk, and passing their values around like parcels is dangerous. The value of a feature flag is determined externally from the code. That’s their greatest strength and their greatest threat. If I don’t know that the value I’m dealing with is being toggled as part of an experiment, I can’t treat it with the caution it deserves. When it comes to removing that value, good luck.

To prevent the dissemination of a flag off into the horizon, apply one rule: pretend it’s PII. Access it, read it, and drop it. Don’t pass it, save it or rename it. Every time a flag gets passed, there’s an opportunity to rename it, an opportunity often taken. Renaming it hides its source, making it harder and harder to remove the complexity later.

“but, but, I need to read it in multiple places, and I don’t want code repetition”

Deal with it. “A little copying is better than a little dependency” and WAAAY better than a lot of tech debt.

“Ugh, but accessing a flag requires passing around a user’s details”

That’s true, and that can be a pain. But if you were passing a flag value, why can’t you pass user values? In Go you should have constant access to the context. That’s the perfect place to store user information relevant to the entire code path.

“mmm, but I don’t want my domain layers to depend on a specific feature flag implementation”

That’s good instinct. There’s no reason why it needs to. Your context-stored user type can easily abstract this away, with a context helper.

package featureflags

type ctxKey string
const resolverKey ctxKey = "feature_flag_resolver"

type resolver interface {
   GetBinaryFlag(flagName string) bool
   GetMultivariateFlag(flagName string) string
}

func SetResolver(ctx context.Context, r resolver) context.Context {
    return context.WithValue(ctx, resolverKey, r)
}

func GetBinaryFlag(ctx context.Context, flagName string) bool {
    r, ok := ctx.Value(resolverKey).(resolver)
    if !ok {
        return false
    }
    return r.GetBinaryFlag(flagName)
}

func GetBinaryFlag(ctx context.Context, flagName) bool {
    r, ok := ctx.Value(resolverKey).(resolver)
    if !ok {
        return ""
    }
    return r.GetMultivariateFlag(flagName)
}

Your controller can create the resolver, with the user info it has at the code path entrance. Each time you need to access a flag, all you need is its name and context.

Now, to remove any flag from the codebase, we can reliably search for each place it is used, and get a clear gauge of its use and impact. In most cases, you will simply be able to delete the condition around it.

This drastically reduces the work required, and risk taken, to delete a flag. That minimises flags in the codebase, forks in the user journey, debug-time, cognitive overhead, and safety.

Optimise for removal.



Occasionally I send out an idea & ask for your thoughts.