4 February 2026

8 mins read

Simple tricks to improve your application’s maintainability (Part 3): Correlation IDs and end-to-end traceability

4 February 2026

8 mins read

Wiktor Sztwiertnia

Senior Java Software Engineer

Correlation IDs and end-to-end traceability

TL;DR: When execution crosses threads, queues, and time, correlation IDs are the only reliable way to keep log stories intact.

In Part 1 (→ Why logs stop being useful in real systems), we saw how basic logging patterns fail under concurrency and scale.

In Part 2 (→ User context without breaking your design), we added user context using MDC, without polluting business logic.

That approach solves many real production problems – but not all of them. Let’s look at what happens when user-based context is no longer enough.

Correlation ID

Filtering logs by user ID is effective in many situations, but there are important cases where it simply doesn’t work.

Every system that uses authentication must expose at least one open endpoint, such as a login endpoint. That endpoint can be called by unauthenticated users, which means there is no userId available in MDC.

There are also other scenarios where user identity either doesn’t exist or isn’t useful:

guest users,
background jobs,
scheduled tasks,
asynchronous processing.

Let’s look at a concrete example.

❌ Problem: async execution breaks the story

A guest user orders an item and optionally applies a discount coupon. Payment processing happens asynchronously.

The logs look like this:

2025-10-25 19:32:25,446 INFO  305188 [t=Thread-2] demo.PrintingDemo    Guest used coupon code = *****   :   [u=]
2025-10-25 19:32:25,451 INFO  305188 [t=Thread-2] demo.PrintingDemo    Guest started payment process   :   [u=]
2025-10-25 19:32:25,454 INFO  305188 [t=Thread-1] demo.PrintingDemo    Guest skipped coupon code   :   [u=]
2025-10-25 19:32:25,461 INFO  305188 [t=Thread-3] demo.PrintingDemo    Guest skipped coupon code   :   [u=]
2025-10-25 19:32:25,548 INFO  305188 [t=Thread-1] demo.PrintingDemo    Guest started payment process   :   [u=]
2025-10-25 19:32:25,567 INFO  305188 [t=Thread-3] demo.PrintingDemo    Guest started payment process   :   [u=]
2025-10-25 19:32:25,772 INFO  305188 [t=Async--6] demo.PrintingDemo    Guest aborted payment   :   [u=]
2025-10-25 19:32:25,967 INFO  305188 [t=Async--7] demo.PrintingDemo    Guest finished payment   :   [u=]
2025-10-25 19:32:26,231 INFO  305188 [t=Async--8] demo.PrintingDemo    Guest finished payment   :   [u=]

The question we need to answer is simple: Did the guest who used the coupon code finish the payment?

From these logs alone, it’s impossible to tell. Thread ID won’t help here, as payment finalisation runs on a different thread pool. PID won’t help either. User ID doesn’t exist.

We have all the events, but no reliable way to tell which ones belong together.

✅ Solution: invent an execution identifier

If user ID, thread ID, and PID are insufficient, what else can we use? The answer is surprisingly simple: we don’t need to reuse an existing identifier – we can create one.

Any locally unique value will work: a random string, a random number, or a UUID. Because this identifier is used to show relationships between log entries, we call it a correlation ID.

A correlation ID represents a single execution, regardless of:

how many threads are involved,
whether the execution is synchronous or asynchronous,
or how long it takes.

Safety vs. readability

Choosing the right correlation ID format involves a trade-off between uniqueness (safety) and readability (operational usability).

A standard UUID is the safest option:

4d2108d1-35a6-41a3-9ed2-1b146c78bb9c

It guarantees uniqueness, even across systems, but it’s long and difficult to work with. To reduce length, one might consider Base64 encoding:

TSEI0TWmQaOe0hsUbHi7nA==

However, Base64 introduces visual ambiguity (0, O, l, I) and is error-prone when copied manually.

A common compromise is Base58, which avoids these characters and is designed for human use (famously used in Bitcoin). If absolute global uniqueness is not required, a short Base58 identifier is often sufficient: ‘aMXaBGD‘.

Dual-ID strategy

If you need both safety and good usability, a practical approach is to log two identifiers:

a short, human-readable ID (for searching and communication),
a full UUID (for forensic certainty).

In the rare case of a collision, you can filter logs by the short ID, and then disambiguate using the UUID.

Integrating correlation ID into MDC

Just like user ID, correlation ID should be stored in MDC at the beginning of execution. The logging pattern is then extended as follows:

%d{ISO8601} %-5level ${PID} [t=%thread] %-48logger{48} [c=%X{correlationId}] %msg   :   [u=%X{userId}] %n%throwable
                                                       |
                                                       +--- correlation ID
                                                            …shows relation between log lines

Here:

%X{correlationId} reads the value from MDC,
every log line now carries execution identity.

Full async example

With correlation IDs in place, the earlier asynchronous example becomes readable:

2025-10-25 21:05:49,293 INFO  317606 [t=Thread-1] demo.PrintingDemo    [c=FB6bA3f] Guest used coupon code = *****   :   [u=]
2025-10-25 21:05:49,375 INFO  317606 [t=Thread-1] demo.PrintingDemo    [c=FB6bA3f] Guest started payment process   :   [u=]
2025-10-25 21:05:51,203 INFO  317606 [t=Async--5] demo.PrintingDemo    [c=FB6bA3f] Guest aborted payment   :   [u=]

Filtering logs by correlation ID immediately answers our original question: The guest who used the coupon aborted the payment.

No guessing and no manual reconstruction.

Correlation ID beyond logs

Correlation IDs become even more powerful when they leave the logging system.

If we include the correlation ID in API error responses, a user’s bug report might contain something like this:

{
    "timestamp": "2025-10-25T21:43:56Z",
    "message": "Unable to process order with negative price: -64",
    "userId": "ferdynand@oo.pl",
    "error": "Bad request",
    "status": 400,
    "method": "POST",
    "path": "/api/orders",
    "correlationId": "pHVAVwv"
}

Debugging then becomes straightforward. We simply search logs for pHVAVwv and immediately see the full execution:

2025-10-25 21:43:56,210 INFO  326565 [t=Thread-1] demo.PrintingDemo    [c=pHVAVwv] Price of item = 64   :   [u=ferdynand@oo.pl]
2025-10-25 21:43:56,234 INFO  326565 [t=Thread-1] demo.PrintingDemo    [c=pHVAVwv] Due to shortages, quantity lowered by 3 items   :   [u=ferdynand@oo.pl]
2025-10-25 21:43:56,234 INFO  326565 [t=Thread-1] demo.PrintingDemo    [c=pHVAVwv] Items in order = -1   :   [u=ferdynand@oo.pl]
2025-10-25 21:43:56,249 ERROR 326565 [t=Thread-1] demo.PrintingDemo    [c=pHVAVwv] Malformed request   :   [u=ferdynand@oo.pl]
java.lang.IllegalStateException: Unable to process order with negative price: -64
	at demo.PrintingDemo.runSequence(PrintingDemo.java:92)
	at demo.PrintingDemo.lambda$onApplicationReady$0(PrintingDemo.java:66)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:545)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:328)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619)
	at java.base/java.lang.Thread.run(Thread.java:1447)
2025-10-25 21:43:56,252 INFO  326565 [t=Thread-1] demo.PrintingDemo    [c=pHVAVwv] Request body:
{"quantity":2}   :   [u=ferdynand@oo.pl]

At this point, there is no detective work involved.

Reaching beyond technical users

Non-technical users often report problems by sending screenshots. They might capture an error snackbar rather than an HTTP response.

It would be rather hard to find the root cause, let alone do it quickly. We already extended response messages with correlation ID. We can add this information to response, because it’s not a secret. And if it’s not a secret in response, it’s not a secret anywhere else. Therefore, why not put it directly into error snackbar?

If the correlation ID is visible in that message, even a screenshot is enough to locate the relevant logs. This is also why readability matters.

To demonstrate the importance of selecting a readable ID format, consider the issues that arise if we had used a complex identifier, such as a Base64-encoded UUID. The visual ambiguity makes copying prone to errors:

A short Base58 ID like WpJ9ZWr is easy to read, copy, and search – unlike a long, visually ambiguous identifier.

By taking the correlation ID from the clear snackbar screenshot (WpJ9ZWr) and searching our log system, we can quickly find the relevant request details:

2025-10-25 22:19:52,938 INFO  338180 [t=Thread-1] demo.PrintingDemo    [c=WpJ9ZWr] Price of item = 3   :   [u=mr.1337@pwnd.it]
2025-10-25 22:19:52,961 INFO  338180 [t=Thread-1] demo.PrintingDemo    [c=WpJ9ZWr] Due to shortages, quantity lowered by 3 items   :   [u=mr.1337@pwnd.it]
2025-10-25 22:19:52,961 INFO  338180 [t=Thread-1] demo.PrintingDemo    [c=WpJ9ZWr] Items in order = -1   :   [u=mr.1337@pwnd.it]
2025-10-25 22:19:53,150 ERROR 338180 [t=Thread-1] demo.PrintingDemo    [c=WpJ9ZWr] Malformed request   :   [u=mr.1337@pwnd.it]
java.lang.IllegalStateException: Unable to process order with negative price: -36
	at demo.PrintingDemo.runSequence(PrintingDemo.java:92)
	at demo.PrintingDemo.lambda$onApplicationReady$0(PrintingDemo.java:66)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:545)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:328)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619)
	at java.base/java.lang.Thread.run(Thread.java:1447)
2025-10-25 22:19:53,153 INFO  338180 [t=Thread-1] demo.PrintingDemo    [c=WpJ9ZWr] Request body:
{"quantity":2}   :   [u=mr.1337@pwnd.it]

What if no one saw the failure?

Sometimes failures happen without any user noticing – for example, in scheduled jobs. In these cases, alerts or notifications are usually triggered.

If you already have an alerting system, the same principle applies: Every alert should include a correlation ID. That way, an alert leads directly to the relevant execution in logs.

Conclusion

This article series only scratches the surface of making maintenance work easier.

In more complex systems, correlation IDs must be propagated across services, and dedicated observability tooling may be a better investment.

However, experience shows that the vast majority of production systems are still single services or simple deployments. In those cases, a few small, well-placed improvements – user context, MDC, and correlation IDs – can dramatically reduce the time spent understanding failures.

Sometimes, simple tricks really are enough.

Let's talk

About the author

Wiktor Sztwiertnia

Senior Java Software Engineer