Why Build Distributed Systems?



There is a wave and hype of excitement about microservices and the tools around it that leads devs to prematurely distribute applications without fully thinking through the reasons they should split an application into multiple remote services. The microservices architectural style promises many benefits such as separation of concerns, division of labor, fluid and flexible delivery of changes, implementation technology flexibility, performance, precise scalability and cloud readiness. I could see why someone might suggest this approach; however, the decision to distribute a solution should be be based on requirements and not for the sake of implementing a particular architectural style such as microservices.

When people talk about microservices they often are implicitly taking about services that communicate with other services remotely using some form of HTTP request/response or RPC. This style can result in increased complexity and severely impact performance during fault-tolerance scenarios such as server crashes, database crashes or when deadlocks in the database occur. Consequently, the decision to separate a solution into separate services should not be based solely on this style.
Ideally, the decision to distribute a system should be based on requirements because they describe how are system will function and inform how your system will be designed. During the requirements analysis phase of the SDLC, developers should decide on distributing a system based on requirements because they represent specific features and functions that define the behavior of the system.

The requirements identified during the requirements analysis phase are known as functional requirements. The requirements gathered should include a list of attributes that can be used to judge the operation of the system. These requirements are known as non-functional requirements and together, with the functional requirements, developers use them as inputs to design the system during the design phase. Specifically, the non-functional requirements that should influence the decision to distribute are: fault tolerance, performance, scale-out and security.

Fault tolerance. This is the ability of the system to continue to operate despite the failure of a particular service. For example, in an e-commerce system, if the shipping service were impacted by a fault-tolerance scenario, the order and billing system would continue to operate despite the shipping service being down. By splitting an application into multiple remote services you don’t have a single point of failure because the functionality of the system is spread across multiple services that can operate independently.

Performance. Splitting an application into independently deployable components gives you the ability to scale up the resources supporting the operation of a highly used component while reducing the impact on other parts of the application that do not have the same degree of use. Performance can also be increased by deploying multiple instances of a service as nodes or by implementing architectural patterns designed specifically to increase throughput. In either case, performance should improve by adding additional resources to the machine that is hosting the service or by adding additional instances of the service to support increased usage.


Scale-out. When we talk about scalability we are typically referring to the ability of an application to maintain an optimal level of performance as usage increases. As we saw with the performance requirement, performance can be increased as a result of scalability; however, a key difference between the scalability requirement and the performance requirement is intent. For example, by scaling out, a service can be extended to handle a new workflow or meet a desired quality of service by customer type.With performance the intent is focused explicitly on increasing performance, whereas with scalability the intent is focused on growth. Check out my previous blog post Scalability: What does it really mean? for a more detailed explanation about the differences in meaning of scalability.
Security. Splitting an application into a separate services allows you to reduce the area of a domain in order to achieve a certain degree of security. For example, for a banking application you may want to separate logins from transactions to avoid login code from having access to financial data. In this case, you would create a separate transaction service to handle banking transactions. In practice, this is just a separation of concern, but the intent is to separate split the application to make it more secure.

In closing, building a distributed system should not be determined upon a particular architectural style or a set of tools. Instead, the decision to distribute an application should be determined by requirements. As developers we should think about why we should build a distributed application in the first place and understand the complexities and the trade-offs that can result— simply splitting an application up because it is a requirement doesn’t guarantee that you will achieve the ability you think. Not distributing, or deferring the decision to distribute is a perfectly acceptable alternative. You can achieve some of the benefits of splitting your application into well defined “in-app" components that can be pushed out into a distinct services later when the need arises. This approach can give you time to identify what quality attributes are most important and help you avoid unnecessary complexity.

Are there any other reasons why we should build distributed systems that I might of missed? Any more insights you’d like to share on deferring the decision to distribute? Please comment below.