In the “Demystifying Microservices for Java EE Developers” guide I wrote for Payara, I list several advantages and disadvantages of a microservices architecture. In the guide I list the disadvantages, but don’t really go into how to mitigate or even eliminate the risks. In this post I’ll go over these disadvantages and how to reduce the risks that they present. When developing an application as a series of microservices, there will be some operational overhead since, instead of deploying one application into production, several small applications (i.e. your microservices) will need to be deployed. To decrease the costs of the operational overhead, a few approaches can be taken. When deploying to the cloud, the cloud vendor can provide some of their resources to help with operation tasks for your application, freeing your team from having to perform these tasks. Additionally, most cloud vendors provide elasticity, meaning that your applications can be scaled up or down on demand, as their load increases or decreases, this helps with the scalability of your application as a whole. Your development team assists with operational task such as deployment and monitoring There are several tools in the market that can automate deployments, some free and open source, some commercial. Bamboo and Jenkins are two examples that are popular in the Java world. Puppet is another example, popular with languages typically used with Linux, such as Python or Ruby. Monitoring performance becomes a lot harder if you deploy your application as a series of independent modules (microservices). An automated performance management tool can help with this. Some examples include AppDynamics, New Relic, Vector and Prometheus. When deploying an application as a series of microservices, instead of having a single log file to monitor, we typically have several, maybe hundreds of log files to monitor. Doing this by hand is not practical. To mitigate this risk a log aggregation tool such as Splunk, GrayLog or Loggly should be used. When debugging an application following a microservices architecture, it isn’t always obvious which of your microservices is causing the problem. Some user action (i.e. saving data on an HTML form) could trigger invocations to several microservices, making it harder to pinpoint the cause of the issue. Tools such as log aggregation tools and performance management tools may help. “Collective code ownership” means that nobody owns code, ownership is shared across all teams and all developers in your organization. If a user reports a problem to your team, but it turns out that the problem is not with your code, but with another service that your code depends on, then if there is a policy of collective code ownership your team can fix the problem themselves, instead of waiting for the other team to get around to it. When invoking your microservices, generate a correlation id, and pass it around as you invoke your microservices, then have your microservices log the correlation id as they are invoked. This will make it easier to trace microservices invocations when going through log files. Distributed transactions happen when we start a transaction, and while that transaction is in progress, an invocation to a microservice over the network takes place. Typically we want to avoid distributed transactions as there is a high probability they will timeout and rollback. Commit all your transactions before making invocations across the network. If you commit a transaction that depends on a call to a microservices, and the call fails, then initiate a compensating transaction to revert the changes made by the original transaction. L Peter Deutsch came up with the Fallacies of Distributed computing, microservices being inherently distributed, are susceptible to these fallacies. Modeled after an electrical circuit, the way the circuit breaker design pattern works is that your code attempts to make an invocation over the network, if the call fails, your code retries for a predetermined number of times, if the invocation does not succeed after repeated attempts, then the circuit breaker trips, and your code can handle the failure gracefully (how to do this depends on your specific application requirements). A load balancer such as nginx or F5 can distribute the load across several instances of your services. Most load balancers provide failover capabilities as well. The elasticity feature of most cloud providers will help mitigate susceptibility to the fallacies of distributed computing.Additional operational overhead
Deploy to the cloud
Implement a DevOps approach
Use an automated deployment tool
Use an Automated Performance Management (APM) Tool
Use a log aggregation tool
Increased Debugging Complexity
Additional Tooling
Implement a policy of collective code ownership
Correlation Identifiers
Distributed Transactions
Implement your microservices as atomic units
Use compensating transactions
Susceptibility to the fallacies of distributed computing
Implement the circuit breaker design pattern
Use a load balancer
Deploy to the cloud
14 Jul · Fri 2017
Mitigating Risks of a Microservices Architecture
Posted by David R. Heffelfinger
@ 02:38 PM EDT