Understanding Mule Exceptions

There are two related things I’ve noticed over a long career as a software professional. Developers (myself included) often ignore exceptions and almost always do not really look for them. And, many developers favor logging during development but the production logging lacks the coverage needed. This is because the end user’s needs are usually not specified in the requirements and the logging generally begins with developers using it for debugging. It is for these reasons that logging is almost always present but the information in the production logs can be lacking, vague, and sometimes inaccurate.

I’m currently working for a huge company with 13,000 employees and their operations team is using Splunk to derive quality information from their Mule server logs. Their struggle goes beyond a huge Splunk learning curve, it also dips into how they handle Mule exceptions, what exceptions are critical to operations, and how they log these exceptions. Since Splunk queries are being prepared before these exceptions are actually thrown in production, the exceptions are planned to occur or expected. This means that development needs to focus on thorough testing to ferret out the possible exceptions that could possibly occur in production.

While the exception strategy on my project is custom and proprietary I’ll share that default exception strategy is overridden and my goal is too help the development staff determine what exceptions can be thrown and to intercept critical exceptions and log them using a well-specified format. These critical exceptions will currently provide alerts for exceptions that will probably covered with health-checks in the future. All Mule applications will need to be analyzed for operational concerns.

Once the critical failures can be replicated within the Mule runtime environment, the developer needs to debug with an investigative eye and determine how to best identify this exception, error, or issue within the application’s Mule configuration. In my case, two things need to occur. First, the exception is caught somehow, and then secondly, the event is logged using a consistent format in every case. This log format is known to the Splunk developer so that queries are inclusive.

My goal for this post is not to show you how to develop a custom exception handling strategy but it is written specifically to provide a strategy for the identification of issues, errors, or exceptions that affect the healthy operations of all Mule services.

Process

First develop an initial list of issues you think the application could encounter. Consider anything and write them all down. Remember this is a process and not a final draft. We’ll use an imaginary customer API service that depends on several other API services and all services depend on a large database instance with multiple schemas (databases). Consider what might affect the health of this application (customer).

At first glance, I would review the Mule XML configuration and look for any connections in the flows that are outside of the application. This can be connections to other servers, file systems, databases, message brokers, etc. These are operational failure points and application dependencies where such failure will result in the parent service being down. Our application has the following connections:

  • Database connection (JDBC) for initial customer record(s)
  • Connection to Life Insurance
  • Connection to Homeowner’s Insurance
  • Connection to Auto Insurance

Can you think of anything else that this application might consider a health issue? I would consider performance a health issue but trying to monitor performance is best handled externally. We’ll go with this list for now and change it if needed. From here we need to simulate these issues somehow with the running application. Since the dependent services can just be turned off or not deployed, let’s start with the database connection.

Generally a database error comes in the form of an SQLException (A Java naming). But there are distinctions in database errors. Sometimes the database may not exist. It could be an authentication error due to bad or outdated credentials. Maybe the database server is down for maintenance. To catch these exceptions accurately we need to debug and look at the objects and their members.

Your database connection will have credentials somewhere in the application. They may be encrypted in a properties file. Find out where these credentials reside and change the password to be incorrect. Test your application and watch for the failure. If you see an SQLException, you will probably see a string message to go along with the exception. You’ll want to debug in Mule to find the object and the distinguishing text. Once you fully understand the error and the data it brings with it, you can catch the exception in your application like this:

Line 3 above was found by inspecting the Exception object and the message member. Now let’s try a different scenario.

Find the JDBC URL in your application that corresponds to the database connection. Change the database name (schema) in the JDBC URL. This will simulate a connection to a non-existent database schema. Run the application and watch it break. Run the application again in debug mode. Check the exception object again. You can catch this one like so:

One point to note here is the message payload is consumed by the catch exception strategy. Further processing is required if you want to deliver a final message to the user. This is handled by the when expressions. What happens if anything outside of the when expressions is caught? Processing will end unless an otherwise section is added like so:

Intercept Logging

Remember earlier that I said we need the development staff to identify and intercept critical exceptions and log them clearly. On my contract I’m assisting the Mule developers in the interception of operational (health-related) issues and then passing to the custom default handler. For some HTTP status type errors, the applications already consider some errors and pass the flow reference to processHTTPErrorResponse. For 5xx errors they were handling the catch but not logging with enough detail to troubleshoot the issues.

I want to add catch exception strategy blocks where needed, log the exception with clarity, and then continue the process execution into the processHTTPErrorResponse flow. This flow provides JSON responses that are relevant to the HTTP status code. For our purposes now, we have log messages per critical issue and good information for the operations team to troubleshoot application issues in production.

Leave a Reply

Wow, 3,873 people read this.