During the last year, I’ve done quite a few webinars, talks and workshops about Event Sourcing. A lot of time the audience answered a question: when I should not use this pattern. In broader terms, the question appeared during the Ask me anything with Udi Dahan session organised by Virtual DDD. The question was: what are the circumstances, where things like Event-Driven Architecture (EDA), Service-Oriented Architecture (SOA), CQRS, et al., aren’t applicable.
During the session, Udi argued that when building one of the first iterations of software for a startup, you’d need to focus on validating the idea, before thinking about things like scalability and, overall, “doing it right”. Therefore, the more complex software design patterns are premature in that kind of environment.
It resonates with one of my own answers given to the audience, when asked about Event Sourcing in particular. I usually tell people that when you don’t really know what you’re doing, going with Event Sourcing might introduce unnecessary complexity. There’s one caveat here though. I might be not expressing my thought well enough, so, let me elaborate on this a bit more.
Quality of early software versions
Early versions of any software are built with many workarounds, hacks and shortcuts. Deliberate acceptance of such compromises on quality is what we call technical debt, meaning that it is something that will be repaid, with interest. In a startup environment, such decisions are often well-justified. When we don’t really know if the software will actually solve the problem in hand, or the problem even worth solving, the debt might be even written off together with the software, which accumulated it. Like a startup company, burning investors money and then going our of business, early versions of software could get bankrupt, and the debt is then written off the books.
Keeping this in mind, we could clearly see that spending a lot of time on a technical excellency in a piece of software, which might even never be used, doesn’t make much of a sense.
I won’t talk here about EDA or Event Sourcing here, as I don’t believe that applying those patterns makes your software better by definition. It might, but it can also go the opposite way. However, I’d like to point out that Domain-Driven Design (DDD), which is not really a software pattern or architecture, might be saving you months of work, especially during the early phases. So, how DDD could help?
Let’s get back to the question, and some answers I mentioned earlier. Presuming, the startup idea hasn’t been properly validated, and we might even not know if the problem, which we want to solve, is relevant. That is, in fact, the very area, where DDD shines. Collaboration with your future customers (users), called domain experts in DDD, brings tons of valuable insights about what your customers struggle with and how their problems can be solved. Conducting a few Event Storming sessions with real customers would definitely change your view on the domain and on your proposed solutions. You will find things you didn’t know about, you forgot to consider, you thought are important, but they aren’t, and the other way around.
That being said, I am not telling you to use Aggregates, Repositories or Value Objects. That is not the point. But, you’d be able to see, understand, and model the behaviour of your system, in a way that purely writing software and discussing progress within your team would never do.
In fact, just by doing that, in iterations, could prevent your software from going bankrupt in a first place! Then, if that’s the case, should be start planning for bankruptcy from the start, or prepare our system to run a marathon instead of a sprint (and die)?
Another issue with poorly designed software, especially in a startup environment, is that the Phase Two never comes to be. What was intended to be a tech debt will essentially become a burden, which you won’t have time to address. Why? Because startups can rarely afford to spend time on solidifying their systems, as they have limited engineering capacity, always struggle with funding, and have a very ambitious product backlog. It’s totally fine to scrap five versions of badly written software because it doesn’t fit the purpose. When the sixth version will be a success, it would be also badly written, because we planned to scrap it too, but eventually found it useful. Don’t expect a refactoring break to make it better. It works, so you’d just need to build more and more features on this weak foundation. Eventually, it will grow to a Big Ball of Mud and disintegrate under its own weight.
Image credit: Bonkers World
Now, the question is - can we find some middle ground?
Event Sourcing in a startup
I sometimes build small products on my own, to get away from the daily routine and to keep my technical and product-oriented skills in shape. It looks like a startup of one, and I am sure some of my colleagues do the same.
Once, I built a working system to support my holiday property rental business. It was a monolith, and it used a document database for persistence. Lots of workarounds, hardcoded bits, usual hacks, were right there. In such a scenario I am also a domain expert, as I know exactly what issues I want to solve, just because these are my own issues. In that regard, I am not ready to say it’s a clean experiment, as you’d rarely get that level of understanding of the problem space from your prospect customers. Still, good enough to call it a lab experiment.
Models will be wrong
After a short while, I found out that my domain model was wrong. I hadn’t spent enough time to model a variety of scenarios and only focused on the most obvious ones. When I figured out a better model, I found myself being stuck with the system I’ve built. Why was that? Simply because all I’ve got were a bunch of documents in the database, representing the current system state. As the model was not right, the state itself was correct, but it didn’t have enough data. I could also say that the model was not entirely wrong, but it was missing an important context, which I didn’t even know exists.
What I realised at that time is something I clearly remember today. The behaviour of my system was right. All the commands I had were valid and useful. However, as I used state-based persistence, I didn’t capture the behaviour explicitly, updating the system state instead, as we do “be default” almost everywhere. For the new model, I needed a different representation of that behaviour, represented as another piece of state. So, here is what I learned:
Here’s an example of state, a Booking
state in MongoDB:
{
"_id": "ac2fd0edd2d74f249afea3f9014934ad",
"amount": "5600",
"bookingChannel": "booking.com",
"checkInDate": [{"$numberLong": "637498908000000000"}, 0],
"checkOutDate": [{"$numberLong": "637500636000000000"}, 0],
"externalBookingNumber": "2955008750",
"guest": {
"name": "Ole Nordmann",
"email": "[email protected]",
"phone": "+78123123123"
},
"paidInFull": false,
"prepaid": false,
"property": {
"_id": "0392c950d8ea4840850d098af0de12df",
"name": "Great Apartment"
}
}
What eventually found out is that when I need to check the availability for a new booking, just going through all the future bookings for that room is very hard. And, if you deal with room categories instead of individual rooms, it becomes even harder. I needed something that’s called a Day
, a concept you would find in many domains, which deal with scheduling.
If my system would’ve been event-sourced from day one, I could drop and rebuilt the system state, or introduce new representations of state very easily by writing new read-model projections. I could’ve created them in separation from the currently running production system, and still use production data, as it won’t even touch anything that already works.
Compare it with a possibility to change the state database schema. When doing such a change, I’d need to have a migration, which must run once. It must be rigorously tested before it goes to production, otherwise the whole system stops working. A migration cannot be run in production continuously, side-by-side with the production system, so you can experiment with it. No, you run it and it’s done. If it goes horribly wrong - you restore the whole thing from a backup (when you have one), then try again.
How would I produce a number of Day
things from a bunch of bookings? I’d need to split out each booking, per room, to a bunch of days, where each day can be free or occupied. I also wanted to manage ongoing tasks, so a day would also have things like clean the apartment after departure, but that is even from a different bounded context! First, that would not even be a migration, it would be something to run often to update those days from new, cancelled and updated bookings. The Booking
thing is not going away, the Day
thing is a derivative of the Booking
, and, potentially, other things.
I remember spending many hours, trying to fix my state-based system, I don’t even remember if I gave up or eventually dit it. What I do remember, that every minute of those hours I regretted not to have my system event-sourced from the start.
Experimentation
As a follow up from the previous paragraph, I can also share my experience with another iteration of that system of mine. This time, it’s fully event-sourced. I made quite a few mistakes in the model (again) as I had to build a quick working prototype, serving real users, seeking accommodation in my holiday apartment, and it worked. Now, did I gain anything from Event Sourcing? I definitely did, and it’s something that any startup company would highly value. As I mentioned before, the behaviour in my system is rather obvious for the most scenarios. The UI and UX part is not, but only when it comes to showing information on the screen. Again, as commands represent the intent, they are a part of the behavioural model of the system, and it is mostly fine. I discovered a need to build a few screens, which would require heavy queries in the existing representation of state, or some elements of state, which were entirely missing. And you know what? I had no issue at all to build those query models from the events I have, at all. In addition, I could build two different read-models and run them side-by-side. They would be continuously fed by new events. I can build different versions of one particular UI component, deploy them both to production and do A/B testing on real users, without doing any damage to any other piece of the system, including the source-of-truth database, which is EventStoreDB.
Another aspect of experimentation support, is using events to trigger another behaviour, the process we usually call integration. For example, I have a subsystem, which receives parsed incoming emails from SendGrid, and incoming texts via Twilio or Nexmo (now Vonage). Those things are events, and I treated them as such from the start. All I knew is those messages could be coming from guests or from a booking channel, like AirBnb or booking.com. But, I didn’t know the format of those events, if I needed to parse the text, and, essentially, what to do next. So, at the start, I just saved those events to my event store, as-is. After accumulating some data, I was able to understand how to process those events and trigger other behaviour, like communication with guests, recording new bookings, or processing cancellations. It is not Event Sourcing, but what we call EDA - Event Driven Architecture, and when it comes to integration, it’s an invaluable tool.
Issues with Event Sourcing
Of course, silver bullets don’t exist. There’s always a fly in the ointment, and I’ve got one too.
Although I said the behaviour of my system is rather obvious, it has quirks. For example, I can receive a block on availability from the calendar feed of the booking provider. I modelled it as an IncompleteBookingAdded
event, a part of the Booking
aggregate. However, that might not be entirely correct. It might be a block for a different reason, like the booking channel block based on certain limitations, which I have set for that channel myself. As an example, it could be the minimal stay duration block. As the calendar feed doesn’t include any essential booking details, I never really know what it is and how to deal with it. When I receive a text from the booking channel, I get much more information, and I could reliably parse it to a proper booking, but what shall I do with a matching calendar feed event? Do I need this IncompleteBooking
thing to be a part of my booking aggregate? Probably not. But, I already have those events, so what do I do with them? I’d rather move them out to a separate aggregate type, and, therefore, stream, but I can’t, as events are immutable. Another problem was very technical. I started using NodaTime, but forgot to configure the serialisation properly. So, right now, many of the events, which are already in the production database, have some dates serialised incorrectly. After I configured the serialisation properly, they don’t deserialise anymore!
I expected many such things to happen, but I accepted the challenge. For me, the benefit of explicitly storing the behaviour outweighs the issues I am currently facing. I also know for a fact that the issues I have can be solved by applying patterns from Greg’s book Versioning in an Event Sourced System, so the problems I have are purely technical, and technical problems aren’t the hardest ones to solve.
Microservices
Now I’d like to touch the SOA bit. Are services an overkill in a startup? I highly doubt so, if you don’t go overboard of course.
In the system I mentioned, there are four services right now:
- Backoffice
- Guest portal
- Messaging
- Calendar feed sync
The reason for me to split those subsystems into individual services is quite obvious. They have very different concerns, and I can clearly identify those as bounded contexts, although some of them might be multi-context services.
Services are awesome
Could I have built them all as a monolith? Yes, why not. Would it be easier to maintain? Definitely not. Here are my arguments.
First, I don’t want to deploy the whole thing, when only a part of it changes. The Backoffice is for me, as the property manager. It has authentication and authorisation bits, which don’t apply to any other component. I am also free to deploy it at any time, as it would only affect me. Deploying the guest portal might happen right at the time when a guest is trying to check in, or worse, is in the middle of the payment session. Deploying the Messaging service without a proper rollout strategy might lead to lost messages, so I must have at least one instance running at all times. The Calendar sync service has a similar limitation, but less extreme. In the worst case, I miss a sync call from the booking channel, but they will do it again later. Therefore, right now I am free to design different rollout strategies for each of the services, so I keep the balance between provisioning too much or too little.
Second, I feel better working on a contextualised piece of software. It doesn’t cause me as much fo cognitive overload, as I know exactly what it does and what I need to do with it. Not even talking about different developers or teams working on different services, just for me it is already a relief. I have this tendency to pick up things when I see them, so in a monolithic system I often find myself dragged away from the initial task, as I start to notice other, unrelated things, which I want to fix “by the way”. As the result, I might never be able to finish the task I really wanted to do. It might be not a problem for people who are more disciplined and focused, it’s just my experience.
As a consequence, I also separate issues to different repositories. When I work on a single services, I can clearly see what needs to be done right there, without applying any issue filter or similar. Seeing issues about incoming text message processing on the same board as the better guest check in experience is quite distracting.
Two of my current services also have the UI. Since the services are separated, I don’t need to care about using the same visual styling, the same SPA framework, and the same built system. I choose what’s best to do the job in that particular subdomain, and service, without affecting anything else.
I can also build a new version of the Guest Portal and do A/B tests with two different versions, without any concerns about parallel changes in the Backoffice or any other isolated component.
Concerns about services
As for Event Sourcing, splitting the system into services come with some trade-offs.
First, I need a transport medium to share events between the services. Luckily, as my system is event-sourced, I already persist events in a way that I can consume them elsewhere, real-time. I will have to introduce integration events later on, but I don’t have an immediate need for that, and I accept the coupling for the moment. Again, it’s a technical problem, and it’s easy to solve.
Second, I have to build and maintain proper delivery pipelines for all the services. Once more, it’s a technical problem, but it has to be solved immediately. Here I am in luck (maybe not the right word) again as I know how to do it. I have a managed Kubernetes cluster in Digital Ocean, my own private GitLab instance, and a good foundation for automated deployments using Pulumi. Since I have the knowledge and experience needed to do all that DevOps stuff, everything on the operational side runs smoothly. Some might prefer using serverless functions, which might be even better to do the job (not always), with even less effort. So, for me, maintaining a reasonable fleet of microservices is not really as much of a burden, as it used to be in .NET world a few years ago.
Conclusion
As I previously mentioned, dealing with Event Sourcing, DDD, microservices, etc, in a startup environment might not be the right thing for you. This statement is especially relevant if you don’t have much of experience doing some or any of it. Then again, if you don’t do it, you never get the experience anyway…
As I got more proficient in applying these patterns, and got more comfortable with the DevOps part, I stopped worrying so much about so-called accidental complexity, which people tend to associate Event Sourcing and SOA with. It’s like a muscle you need to train, then you just start using it. Alas, what I keep seeing is that systems are being built without any model, with monolithic databases, by teams, which struggle to focus, teams, which keep arguing about their endless dependencies.
Software development, as any other human activity, is full of compromises and trade-offs, so you need to choose your battle and the lesser evil. So, before you decide to build a monolith first, make sure to see the other point of view.
As for me, I’d rather build an event-sourced, event-driven system, separated in a few services. Even in a startup. Especially in a startup!