Recently I was reading an article with some advise on how to make your system is resilient. As I was reading it I stumbled across the following piece of advice.
…if you receive a message multiple times that says “increment X by 20” , you will probably end up with an inconsistent value. Maybe prefer “current-value” type messages where if you receive them multiple times, they don’t add to any inconsistencies in the data.http://blog.christianposta.com/microservices/3-easy-things-to-do-to-make-your-microservices-more-resilient/
There are times when this is valid, but as this is the new-year I’m reviewing the mistakes of the past and updating the guidance for my team to avoid the issues of the past 10 years. As part of this annual review, I noticed more and more the same old patterns of the past were creeping back in, and this sat as a perfect example of how something that is so commonly done, is so regularly commonly the wrong approach.
To dissect this, lets take an example or two, these are all the same type of operation, but good examples of what we so regularly do so poorly:
- Withdrawing money from an ATM
- Adding progress to a project
- Administering drugs to a patient
In all of these situations I witnessed the default go-to implementation from my staff and clients was have the current value loaded, modify the value, and then inform the service of the new value. In each situation I heard the following supporting answers:
- If we withdraw $20 and the request to update the bank balance has an issue and gets re-sent we don’t want to withdraw too much money from the account and only dispense $20
- If we say we’ve done 10 hours work in Project A today, and the request gets re-sent a second time, we don’t want to record 20 hours
- If we give a patient 10mg of morphine, and the request updating the remaining amount of morphine the patient is to get that day gets re-sent a second or third time, we don’t want to record an extra 20 mg used against the patients daily allowance – otherwise they won’t get all the medicine they need.
In every case this logic all sounded very solid. The problem it introduces is that in ever situation you now have to ensure the order of these requests is maintained and that no one else can do anything with the service being updated. Putting this back into a real-world example:
- Someone may be withdrawing $20 from an account at an ATM while someone else is purchasing a pie and a drink at the local store at the same time – payment networks are slow as we all know.
- Two workers may be recording their hours at 5 p.m. (the end of the day for them).
- A nurse might be recording the morphine dose administered while the doctor is updating the daily allowance.
In all these situations we’ve not added a complexity. One MUST happen before the other AND it MUST complete BEFORE the second can take place AND the second MUST NOT load it’s current value until the entire system is free for it. This introduces a vast amount of system locking to control system order.
In all of these situations I reminded the team and questioned the client on what was more important… recording the something has actually happened as quickly and easily as possible without making it difficult, and THEN updating what the computer system THINKS is the state of the real world. The flaw we found happening most regularly was that our staff and clients kept trying to treat the computer system as the absolute record of fact… when in reality the world we live in is the absolute record of fact. If a warehouse management tool tells you that you have no stock of an item and yet you’ve just put it onto a truck and sent it to a customer… it means the software is wrong and you should be allowed record that it really did just get sent to the customer. Going back to our above scenarios:
- The fact the $20 note was dispensed at the ATM is more important than ensuring you have $20 in the account. The bank can put the account into over-draft and charge a fee… ironically making it more profitable for them to enable people to withdraw more money than they have… up to a point.
- Workers need to record their hours so they get paid… more importantly than the report that says how much time (and their for cost on the project) has been spent.
- Recording that medicine has already been administered is paramount. Once it’s administered if the system detects that an over-dose has happened you alert a doctor or a nurse… you don’t stop them from being able to record it.
In all of these systems what turned out to be the correct approach was that each request to increment or decrement an amount should have had a unique tracking number, so that if it was sent multiple times, it would be ignored. In some cases it makes even more sense to design your system in a way where you can say “the value I’m incrementing was at version 21 when the change was requested”. This way the service can ensure the value being updated hasn’t moved since. This second approach is quite valid for booking systems. Booking a seat at a concert usually involves loading a map of available seats, a user picking a seat and then requesting those seats be reserved for the user. In this case the second approach would work really well as the system can check if the seats requested have changed. This makes more sense than just checking if the seat is still available, as the concert might have been cancelled, which means the seats will be available at the time, just not for this concert.