Microservices Are Too (Conceptually) Big

The idea of microservices has attracted a lot of attention, particularly following James Lewis and Martin Fowler’s article. There’s lots of good advice and sensible thoughts in that piece, but despite some noble attempts to try and pin them down, microservices have suffered from a great deal of semantic diffusion and questions along the lines of ‘Is anyone actually doing microservices properly?’.

I believe that much of this is down to two valuable architectural patterns frequently becoming conflated when discussing microservices: independent services and single purpose applications.

Independent services

Independent services allow teams to align themselves with some capability in the business and focus on that without having to constantly consider the implementation detail of other services. In order to achieve this, they communicate over stable, well-defined interfaces to other services they interact with.

Single purpose applications

Single purpose applications do one thing.

If there’s more than one property which defines whether the app is healthy or not, it’s probably trying to do more than one thing.

If you think there’s more than one metric you’d like to scale the application on, it’s probably trying to do more than one thing.

The aim of this approach is to give the design of an app focus and provide simple answers to key operability questions. Often such apps are small enough that, if things change, you can rewrite them without having to negotiate budgetary approval to do so.

Why does the distinction matter?

While two independent services accessing the same data store is an anti-pattern, this isn’t necessarily the case for single purpose applications. As long as they form part of the same service, having different apps responsible for writing to and querying from the same store can be a beneficial design.

There are significant benefits in making apps independently deployable, but routinely deploying apps that form part of the same service together as an optimisation (e.g. to maximise cache consistency) can be sensible. Similarly, it might make sesnse to make sure you deploy such apps in a specific order sometimes to ease a migration. However, if you find yourself regularly deploying more than one service at time, or frequently worrying about the sequence of their releases, they’re probably not very independent.

As ever, it’s better to focus on the qualities that help us achieve the goals we are striving for; teams able to evolve independently, resiliency, simplicity, rather than striving for some platonic ideal of doing microservices ‘properly’.

Immutable Servers

Functional programming has become increasingly popular over the past few years. One of the key ideas of functional programming is the preference for immutability. Another important devlopment has been the rise of DevOps and the idea of infrastrucuture as code. If we value immutability in our code and we treat our infrastructure as code, can we gain benefit by having immutable servers?

What do I mean by an immutable server? Clearly, when we peer into the details, much of any server is mutable, but in terms of our interaction with them as developers and maintainers of the system; installing packages, updating configuration, deploying new versions of an application, we can keep to the simple rule that once the server has initialised we don’t attempt to make any changes. This can mean creating machine images with everything ready to start, but may also include some initialisation scripts, perhaps using a tool like Chef solo, or the direct application of Puppet, but definitely doesn’t include allowing a central Puppet or Chef server to effect changes.

It will, naturally, be necessary to make changes to the service running on the server, but it’s important to make the distinction that it’s the service that we’re ultimately interested in updating. The server is an implementation detail. Using deployment methods based on bringing up parallel, or growing and shrinking auto-scaling groups fit perfectly with this idiom.

Why might it be desirable to have immutable servers? The key reason for favouring immutability, whether in software or the servers they run on is to minimise the possible states of the system. This makes it much easier to reason about what state the system is currently in. There’s no need to worry about whether or not a particular change was applied to the servers. We know what state they started in, so that’s the state they’re in now.

Of course, no choice is without it’s caveats: virtualisation and fast boot times are necessary to make this approach practical, it’s much easier to apply this approach to stateless applications than datastores, but having used this approach for the past nine months, I’m convinced it’s a valuable way of working.

Why You Should Copy and Paste

DON’T REPEAT YOURSELF!

It’s a good rule of thumb, but I worry about the number of developers for whom it’s a comforting mantra.

Why is it we should avoid repition in our code? Sustaining the pace of delivery and minimising defects are goals most of us can agree on. Repitition leads to updating the same thing in many places, which can be time consuming and error prone. Therefore we should never repeat ourselves, right? I don’t believe it’s so simple.

What often seems to be ignored are the costs of avoiding repitition. Large companies appear to have huge benefits from being able to spread shared costs, such as HR, yet some start-ups without these apparent efficiencies, but with the ability to change rapidly can overhaul them. The same advantages can be seen in small independent software systems and we should accept that repitition may be a cost worth paying to achieve it.

Where the instances of the repition occur in close proximity, e.g. within the same function or class, these can be trivial, but as we move out to separate codebases maintained by separate teams, the costs can become prohibitive. Developers who haven’t maintained a library that is a dependency of other systems tend to underestimate the overhead of doing so and presume that it will automatically lead to those systems moving seamlessly in step. The reality is that downstream systems will often overlook an updated library and the assumed benefit in everyone receiving a change never materialises.

An even more fundamental problem that can occur is that what at first appears as repitition can upon further development be revealed as a number of subtly, but importantly different use cases. In these cases attempting to cram both into using the same code can unnecessarily complicate design, impeding delivery speed and increasing the chance of defects.

So what are the critical factors when examining reptition for deciding whether copy and paste is appropriate:

  • Locality – sharing code locally has a much lower overhead than sharing between teams
  • Frequency – dealing with code copied and pasted once is a lot more palatable than doing it five times
  • Scope – the overhead of managing the library or other abstraction to avoid repitition is more likely to be repaid when the scope of repitition is larger
  • Clarity of abstraction – collections of utilities without a single defining theme attract many small changes making them more complex for their consumers to keep up with
  • Implementation maturity – fresh code and especially not yet written code can often appear to involve repitition, but upon closer inspection have different needs

In many cases there are small amounts of code that are useful to two or three different projects, but aren’t on the critical path of those systems. Copying and pasting such code is the responsible thing to do.

Streams of Pleasure

Sometimes you have to work with something that’s slow and provides an interface you’d rather not deal with. Hopefully it’s a little less dumb and more useful than:

but that will serve as an example.

Coming from Java if you wanted to use this to find a magic number from a list of candidates as efficiently as possible, you might be tempted to write something like:

This doesn’t work as you might expect:

1
2
 scala> FailingNumberFinder.findMagicNumber(1 to 5)
 res0: Option[Int] = None 

Thanks to De La Soul, we all know 3 is a magic number, so what’s going on? Scala implements return statements using exceptions and so the catch block swallows the attempt to return early.

We can wrap the attempt to search in a function to avoid this:

1
2
scala> UglyNumberFinder.findMagicNumber(1 to 5)
 res1: Option[Int] = Some(3)

but ultimately there are good reasons why using the return keyword in scala is considered bad form.

If we weren’t worried how many times we called the troublesome function, we might instead use something more idiomatic, such as:

Fortunately there’s a simple way to maintain this style without having to accept the performance penalty:

Scala’s Stream allows any collection to be wrapped such that their members are lazily evaluated and the standard functions for transforming collections, such as map and flatMap will also be evaluated lazily across the collection, so that we only call the slow function the number of times absolutely necessary.