CQRS, Fonctionnel, Event Sourcing & Domain Driven Design – Arnaud Lemaire – PHP Tour 2018
Articles,  Blog

CQRS, Fonctionnel, Event Sourcing & Domain Driven Design – Arnaud Lemaire – PHP Tour 2018


Before starting quick disclaimer, I hope you’ve been drinking coffee during the break, because I have about a hundreds of slides to show you in 40 mins. If there is anyone insterrested my twitter is @lilobase and I’ll post the everything there. So there is some code in these slides don’t try to read everything on the screen you won’t be able to. Just to be clear. And the idea after is that you can later on review some examples that I’m going to display step by step. Today, we’re going to talk about a contradiction that we can frequently see. This is the big question about of the Data Driven versus Domain Driven And roughly this question are we talking about which data I’m going to manipulate or actually what are the behaviors that I can expect at the application level. To give you an example Usually we will link Data Driven to CRUD architecture where you have this kind of order that will be sent. I would to update the addresses an address The problem when we send this kind of order to a system is that we have no idea if this address has been updated because it was a wrong address or if the person has moved out. At a system and business level that’s not the same. What’s usually happening with a CRUD system you’re totally loosing the intent of the user. And if we want to exaggerate a little bit when you’re usually using CRUD you have an added value approximately equals to an access database, a bit nicer and accessible via web. So, we’re going to talk based on a little example. Imagine that one ask you to manage a software to manage sport centers and very often what we can see here when people want to start is that kind of questions. We’re going to try to make an ER (or UML) to describe what are the data, what are the relations, where are the integrity constraints, the foreign keys etc… and we’re going to associate that to a series of controllers, actions, resources and then we’re mixing both. Even you’re not creating an ER (or UML) in MySQL workbench, when we see projects that starts by the description of Doctrine entities this is exactly the same thing. Making a description of the entities is like making an ER (or UML), you’re simply doing it with the code. Now I tend to discourage to use it because again here, there is no consideration the behavior side and the business side of what you’ve been asked for. You’ve only been thinking about the technical side and usually you will be very fast at the beginning but quickly you will be stuck in terms of team productivity because you’re going to realise that to implement new behaviors the data are not really the correct ones, so you’re going to mangle things and we’re spending a lot of time to try to fight against the technical side. The idea of Domain Driven is: Instead of using this aspect of defining the tech side and trying to put inside, with more or less success, the domain inside it. Use the opposite problematic. This is to asked finally what are the business intent that are expressed by the information system and to adapt the tech to the domain Very often, we’re rather going to start by listing what we call Use Cases, i.e. what the system must be able to do and some collaborators that are going to be some entities of the business that will own some behaviors. And when we’re starting. Usually here, I really don’t care what are the data that will be inside What I’m interested are the behaviors that will be able to be assign to my system. Of course, the goal here is not to have an exhaustive list before starting but just enough to start. There is also an architecture where the goal is to discover step by step business needs and then finally to help the discover of needs that can be expressed by marketing, by product managers, etc… About this, a small advice that will improve a lot your productivity for these questions is to never write any code that is not directly linked to a use case. Because finally, if you ever write some code that is outside of a use case, you’re writing some code “just in case”, that doesn’t answer to a business goal. We can see that very often, we’re calling this over-engineering and we’re ending up with this kind of thing. This is very nice to have a spaceship but the only issue is that is very expensive to maintain and potentially you really don’t need it to solve your problem. So let’s start to talk about these entities I mentioned, simply put, they are objects, that’s pretty classic, we will be using those to apply some behaviors. These objects we will categorise them between entities which have a lifecycle, means they were born they live, they die and there you will have most of the objects that we are going to manipulate in the system compared to value objects that are equal based on their content and here you will find really classic things like dates. Two dates are equals even though it is not the same object, even though they don’t have the same identity. It can be the same with money 5 euros equal 5 euros even though they are different objects or you have also have things a bit more business related like billing interval, here for example we could image to invoice annually or monthly and this kind of duration doesn’t have an identity, in opposition to a person that could have the same first name and last name but it doesn’t mean this is the same person, they have identities. So this is what you need to remember, I’m not going to talk more about it, this was just to make a quick reminder. Because all of that we’re going to organise it in what we’re calling “Aggregate” An aggregate is a set of objects that are going to collaborate together to represent a business concept and you will find inside either entities or value objects that are going to work together, all coordinated by a root object that we’re calling “Root Aggregate”. If I’m talking about it, this is because one realises that when we’re looking to some architectures that affirm to be CQRS it’s very often missing this model. This model is very important on multiple aspects: the first one is that the root aggregate or the root entity is going to protect the rest of the aggregate of potential problems of integrity. This is a simple mean to validate that your system cannot end up in a really crap state. This is linked to the fact that all operations that can be done on this set of objects has to get through this root. If you don’t have this kind of thing you’re loosing an advantage, which is enormous, that normally, otherwise you’ve really badly design, all of these objects have a lifecycle that are identical. it means they live, evolve and die in the same time. To give you an example, if we’re talking about invoices, you could have the Invoice as root and invoice lines behind. We understand that an invoice line doesn’t have any reason to exist without its related Invoice. Here I gave you another example about the notion of membership which means that a person can subscribe to activities or can have a subscription that tightly linked them to the idea they are members of the sport center. So they are objects living together and the big advantage in terms of persistence, there is no reason to do (to make?) any diff inside. In terms of persistence, they are things that we’re going to fetch as a whole and save as a whole. And when I say to save a whole, this is really exploding the whole data because the root has guaranteed that the data was always valid. That the whole aggregate was always correct. And this, is the main advantage that you will have because if you don’t have these aggregates designed into your system you won’t be able to benefit of the persistence simplicity provided by this type of architecture. And here we can understand to be able to do that, the entities are not Doctrine (ORM) entities, they really are dedicated PHP object collaborating together and expose the expected business behaviors from these different concepts. I’ve been talking about it just before, all of that are organised into what we’re calling Use Cases. These Use Cases will be split them into 2 main parts. The first one is all Read operations, visualisation, we’re going to send to the user so that they can take decisions. And when they’re able to make a decision, they will be able to make an Intent that we will be calling a Command. From now I’ll using the terms Query and Command which simply means this is a use case for the system. How does it look like, a command is a DTO (Data Transfer Object). Really simple one that hold an Intent and the data to express that intent. This command, very often in a dedicated namespace, has a dedicated handler that is in charge of orchestrating the related businness. Typically here you have a handler related to the previous method, we’re specifying the associated command, we’ll see why just after, and you also have the collaborators that are injected by construction. So this is also an important point in terms of design, because these collaborators they always are interfaces. It means that the handler only knows about interfaces. During run time or boot time, we’re going to inject a concrete implementation of the interface. This is what we’re calling Hexagonal architecture that will give us the benefit of completely decouple our application. Simply put, because what is going to lock the application maintenance is very often technical considerations, technical collaborators and then the possibility to change easily these technical collaborators, this is a huge guarantee in terms of maintainability. Finally, the handler has a “handle” method which is going to perform the business operation. Here you can see a fairly traditional cycle which consist of creating an aggregate from external data, persisting and return it. Here you can see that I’m not returning the data of this aggregate but only its identifier and later on we’re going to see that it’ll be at the Read level that we’re going to get it. When using this kind of thing it’ll look like this, you can see that this is pretty simple to use and typically this is what we’re going to use in a test. Therefore to test it, this is 5-6 lines and you would have been testing one business use case. It means that you’ve been testing a very interesting level because you can start to validate that what you’re asking to your system is correctly executed. So, this is always very nice, again the main problem is the persistence. This is what we’re going to talk now, let’s start by the notion of CQS, you can notice there is a missing R here because CQS has a particularity the repository object, in charge of the database interactions, is shared between both sides. This kind of repository has to be seen as an interface that expose business objects or the rest of the domain and you have a concrete implementation injected during run time but never ever know by your domain code. And effectively, again, you’re never supposed to have any indicators about the ORM that can be used behind. If what I’m getting from the output of a DDD repository allow me to know what tool that has been soon to persist the aggregate is generally a very bad sign. Therefore it would mean that you have technical capacities that currently leaking into your domain and then you have a very insidous coupling that can start because we know that developer, and I’m the first to be, very lazy and the day there is a possibility to cheat by accessing directly from the domain the ORM layer, we’ll be happy to do it. A repository look roughly like this, you can see here the business interface, the domain, there is no technical indication on how it’s going to be done and obviously you have these domain entities that are systematically returned and also an important thing is that the identifiers are controlled. The identifiers that come from the database means that you have a coupling with your persistence layer. Coupling with persistence means the day you would like evolve things you will be in trouble. Because the identifier is know by the business layer, the identifier must be controlled by the business. This is really the principle of these architecture, an absolute border between the business layer and the technical services. Now let’s talk about the queries, the Read side, we can see there are also DTOs and one of the big advantage of these architectures is that we will find everywhere the same principles, is also in a dedicated namespace, and also have a handler exposing the query its satisfying. It’s also receiving collaborators from their interfaces and the handle method will simply return a model or a view model destined to the user. On the query side when using CQS is usually a bloc that going to use the repository that has been use on the Write side. Very quickly there is a problem that we often see, a tendency to apply some behaviours that are not really business related for all the commands applied. Typically you have a logging system, a system performance checking, to be able to validate the payload coming into the system, or an authentication system. This is the reason why we’re going to build a bus that will give us the possibility to add an abstraction layer between the command, coming from the top, and the moment it’s been executed by the handler. As I was saying you have loggin, errors management etc… what we have to remember here is the buses of commands or queries are split. This is very important because you could have behaviours that are unrelated. For instance, on the queries you could use cache system because you’re only reading here, we could add cache systems easily. Very often on reading side we have a Redis farm connected in parallel and if we can see we already executed the command during the n last minutes there is no need to reach the handler we will just directly send back the cached response. Of course if you start to do that on the command side you’re going to have a lot of troubles, this is why we’ve split them completely. A middleware, is something that will look like this, very simple objets to build. And again these are architectures that are really simple to setup. What I’m showing you here is almost production code. There are only missing few lines to be prod ready. A middleware will receive the next operation in the constructor and also some collaborators that it will be able to use. For example I’m going to show a middleware that will be responsible to mesure the time to execute a command. Here, is how you’re going to be able to proceed, in the dispatch method you can execute some code before, you execute then the next operation that is going to trigger the handler execution, you can then retrieve and execute some code after the operation and then return the response. Here you can see that are construction allowing to add some behaviours, that could be very complicated to add, with just 5 – 6 lines. The dispatcher is the tool that will be in charge to match a command with its handler, it looks like this. We’re taking all handlers to register them and we will see how we will find them. So when receive a command I’m simply going to find the associated handler to my command class so this is why we needed to implement the method listenTo, and then we execute the earth of the handler in order to call the related business. We’re going to compose all of that in what we’re calling the bus, here you have a factory for Symfony services for example and this will allow to find in our system a unique location in order to invoke all of our mutation operations. So here I haven’t told yet how I’m getting the list of all handlers, I quite like to use the Symfony dependency injection by asking the container to add tags to all services implenting a specific interface that is here the CommandHandler interface and after that in the services configuration, the bus service will receive everything tagged with the previously defined tag. And this allow you to never have to think about it, you just add new use cases in your system and it’s going to work. And that’s the goal. Here is an example of how to use it in a controller, very simply you inject it in the constructor, the dependency injection will work and give you back the bus. When you’re in the route, in the relation action to the route, you’re going to get back from the payload, that gives you the intent to be propagated, you’re going to dispatch it towards the bus, so here you have everything that we’ve seen to be triggered, and you send back the response to the client. Again here, we can also see that the stickiness to framework is really weak. This is the only place where I’m starting to have specific code to my implementation in terms of technical blocs. Everything are just plain PHP objects and I can use them in any context. So if we’re looking backward, here is what I’ve been showing, here you have this command arriving in a bus going through middlewares then dispatched to its handler, the information come back up and we’re sending it to the client. There is another thing that one like to do, we would like to validate that everything happening in a handler belong to the same transactional unit in software point of view. It means either the handler succeed to execute the whole business operation or otherwise executing absolutely nothing. Again, with this architecture using middlewares, this is really easy to always guarantee this. Here is for example is how it would look like at a middleware level, we starting a transaction at the beginning, we execute the rest in a “try catch”, if everything is working, we flush and commit. In case of any exception we rollback. This is really not complicated and we’re removing complexity, with just 5 lines. All of that this is reach the domain events. For now, the problem is that we haven’t done much, we’ve just to take something, create an intent, execute things, persisting them. Very often, with this kind of applications, we’re going to have some needs of orchestration. To do that, in addition to return a simple acknowledgement or non acknowledgement in the output of a handler, we’re going to add a series of events that describe the truth of what happened. This is really important to make them distinct because we can see very often the confusion between the command and the event. The command is something that can fail, this is an intent that we send to the system and we hope is going to succeed. The event is a true truth, it has happened and we’re going to be able to make some business decisions based on that event. An other mistake, really never broadcast this event in a global messaging bus. This is also a mistake that we can see very often and the problem is that you would couple implementation details, extremely local for the information system, to everything else. When people start to say that they have problem with event versioning, the problem is that cannot really change the way to change that because there are these really local events that have been broadcasted as is. You always need a layer in between, to make a mapping between local events and global events. An event is also a DTO in the domain namespace its events will be returned by the handler, here this is before we added only the identifier in the command response, now I’ve added an event. For your information here is the “withValue” signature so that you can understand what’s happening, simply, everything that I’m adding will be in an events bag. For these events, there are associated handlers build exactly the same way. You have injected collaborators and also the event that is related in the “listenTo” method and the “handle” method that gives the ability to execute what you want to do for the event. Something to point out, an event handler always return void, this is really supposed to represent a side effect and you’re not supposed to take into account the response of an event, it doesn’t mean that you shouldn’t take into account the success or not of its execution but should be manage internally in the handler and will not disturb the system execution. You don’t want a payment system to fail when you mail gateway is down. This is approximately the idea that is behind this kind of signature. Here you can see it gives you the ability to send an email when a new member join the sport center. Typically, this is the kind of place that we’re going to put all this little side effect, the things that are not part of the final use case, that doesn’t belong to the business rule of adding a new member in the sport center, they are things that have to be triggered as a reaction to this and it really does model this kind of problematics. If we’re looking back what’s going to happen. A command is coming, will be sent to the handler, the handler will also return an event that will be dispatched. The event dispatcher will invoke the corresponding event handler and of course because they are events we can have multiple handlers attached to the same event. Here you can triggered a chain of reactions. A dispatcher look exactly to what we’re seen earlier for the commands. Here the idea is to take all the events and to send them with “dispatch” via a middleware. You can see that the events are also sent in a bus which means that we can stack middlewares to manage the back and forth. Also the difference that we will have is that a handler that have multiple events you will have multiple dispatches to trigger. The event dispatcher receives the handlers, here you can use dependency injection via Symfony to manage this collection, we generally will do that, so we will always use the same patterns, we repeat them. The only difference we will have here in the “dispatch” method is that you could have multiple handlers associated to a same event. This is really important to have this in mind because this will make it possible to provide projection systems. Projection is taking into account that an event handler which ever one will be able to update a datastore. From an event we will be able to update data. For instance if you have a referral program, you will need to check who brought who in the sport center. Recursively, this is not something you would like to do on demand. But calculating it during subscription this is perfect because you will do it only once and you save it in a dedicated table. This is exactly what we’re going to do, very often we will have projectors that will prepare data and then prepare other views of data, this is what we’re going to call projection. The thing we really have to keep in mind is that the source of truth and is held by the repository which works with these data but you should not share it between projection sources and the sources used by the command side. On the other hand is gives the ability to create statistic tables, by default anonymised, this is really good for GDPR. You can also send it to other databases including legacy databases. This gives you a brand new system that is able to feed a legacy system which doesn’t not it’s not being used anymore. And you also have to possibility to update other datastore that are not related. like geo databases or search engines etc.. again here, the goal is prepare the data to prepare it for its future usage and then to stop performing text search in MySQL for instance. Of course, the goal is that we can create a mix of every methods, from one operation, we can construct as many representation as we like. This is what brings us to CQRS. The R is back, because in CQS the repository is shared whereas with CQRS the repository is going to one side it will stay of the command side and for the query side, it will be a bit in a “free for all” mode. It means the repository will look like this, you can see here that there is not any more any query methods, if we exaggerate a little bit can see that it looks like the interface of a map. Actually this is really the idea, on the command side the persistence should be seen like some thing very simple, as if I was working with a map in order to save my data. Here is what it could look like, this is an implementation using Doctrine. Nothing special. You can notice here that I have mappers that are completely separating the doctrine entity from the domain entity. Usually in PHP with can use a Trait that allows to access private data in order to really easily make a manually hydration. The only tips here, even though it’ll be removed from Doctrine 3, I have no idea how we will do that, but you can use the merge operator when adding new data into the peristence, re-manage the entity that doctrine was no anymore aware of, because we detached it from the “unit of work”. Usually you can use the same Trait. “mapFromEntity” and “mapFromDoctrine” is the same Trait injected in both classes, to transform from one way or the other. Basically this is what it will look like, the query will able to access directly the datastore. to do that we will have to give a direct connect and in its core we will execute a direct query. There is no reason to do more complicated in the query side. Even though we’re using an ORM, we very often end up to write some dql (for example with Doctrine) and not using magic methods from the ORM, we’re trying to make queries onto data potentially complex. On the query side, we will allow ourselves to directly access the data. Of course, because we may have different representations the query could fetch the data wherever it needs. If the query is performing a search we could call the search engine, if this is a query for statistic it can fetch from stats tables, if this is a query that needs to traverse a graph it can fetch from a graph database. and also, because this is only Read, it’s really easily scalable. You take query systems, you align them and that’s it. There is no problem at all. I have only 5 minutes left to talk about Event Sourcing. Until now, everything that I’ve shown, that’s not Event Sourcing. This is really important because there are a lot of people that associate CQRS and Event Sourcing. Event Sourcing will need to have previously a CQRS approach but we definitely can use CQRS without Event Sourcing. Basically we will have now our Aggregates that will return events. These Aggregates will be ably to apply these events to themselves, and to return this applying + the event. The event applier is basically a Trait, to allow the object manipulation of the private elements that will be able to simply update the Aggregate state from a given event. This will add a nice thing because the handler not anymore need to persist explicitly. This is basically the event dispatching system that will insert it in database, and behind we will have the repository, and here you can notice there no “save” anymore, the repository will load the events for a given Aggregate and then apply them one by one on the Aggregate before returning it. So then we’re getting an Aggregate which state is the last know compare to the events that have been inserted in the system. This is simplifying considerably the handlers, I’m getting an Aggregate, I’m working with it, I’m emitting the events I’ve received, and this is how it’ll look like. Basically, Event Sourcing is simply the idea to the events as a source of truth. When we previously save a state by replacing the previous state for each update, now we’re going to save everything that happened in the system. This will really gives us the possibility to create on demand projections. This is means that if at a moment, the business is asking a data visualisation from the last 5 years you have not issue to provide it because you saved the history of all events of the system. I’m going to finish with this. These are architectures that gives you a lot of freedom from the technical tooling that we’re using. because we’re thinking about maximising the persistence simplification. and all relation with transport layers are extremely light, we can definitely adapt ourselves as close as what the business asked to implement in the software. I’m going to stop here, we have 2 minutes for questions.

6 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *