
| This is the final part of a three-part series by Markus Eisele. Part 1 can be found here, and Part 2 here. |
In the first article we looked at the Java developer’s dilemma: the gap between flashy prototypes and the reality of enterprise production systems. In the second article we explored why new types of applications are needed, and how AI changes the shape of enterprise software. This article focuses on what those changes mean for architecture. If applications look different, the way we structure them has to change as well.
The Traditional Java Enterprise Stack
Enterprise Java applications have always been about structure. A typical system is built on a set of layers. At the bottom is persistence, often with JPA or JDBC. Business logic runs above that, enforcing rules and processes. On top sit REST or messaging endpoints that expose services to the outside world. Crosscutting concerns like transactions, security, and observability run through the stack. This model has proven durable. It has carried Java from the early servlet days to modern frameworks like Quarkus, Spring Boot, and Micronaut.
The success of this architecture comes from clarity. Each layer has a clear responsibility. The application is predictable and maintainable because you know where to add logic, where to enforce policies, and where to plug in monitoring. Adding AI does not remove these layers. But it does add new ones, because the behavior of AI doesn’t fit into the neat assumptions of deterministic software.
New Layers in AI-Infused Applications
AI changes the architecture by introducing layers that never existed in deterministic systems. Three of the most important ones are fuzzy validation, context sensitive guardrails, and observability of model behavior. In practice you’ll encounter even more components, but validation and observability are the foundation that make AI safe in production.
Validation and Guardrails
Traditional Java applications assume that inputs can be validated. You check whether a number is within range, whether a string is not empty, or whether a request matches a schema. Once validated, you process it deterministically. With AI outputs, this assumption no longer holds. A model might generate text that looks correct but is misleading, incomplete, or harmful. The system cannot blindly trust it.
This is where validation and guardrails come in. They form a new architectural layer between the model and the rest of the application. Guardrails can take different forms:
- Schema validation: If you expect a JSON object with three fields, you must check that the model’s output matches that schema. A missing or malformed field should be treated as an error.
- Policy checks: If your domain forbids certain outputs, such as exposing sensitive data, returning personal identifiers, or generating offensive content, policies must filter those out.
- Range and type enforcement: If the model produces a numeric score, you need to confirm that the score is valid before passing it into your business logic.
Enterprises already know what happens when validation is missing. SQL injection, cross-site scripting, and other vulnerabilities have taught us that unchecked inputs are dangerous. AI outputs are another kind of untrusted input, even if they come from inside your own system. Treating them with suspicion is a requirement.
In Java, this layer can be built with familiar tools. You can write bean validation annotations, schema checks, or even custom CDI interceptors that run after each AI call. The important part is architectural: Validation must not be hidden in utility methods. It has to be a visible, explicit layer in the stack so that it can be maintained, evolved, and tested rigorously over time.
Observability
Observability has always been critical in enterprise systems. Logs, metrics, and traces allow us to understand how applications behave in production. With AI, observability becomes even more important because behavior is not deterministic. A model might give different answers tomorrow than it does today. Without visibility, you cannot explain or debug why.
Observability for AI means more than logging a result. It requires:
- Tracing prompts and responses: Capturing what was sent to the model and what came back, ideally with identifiers that link them to the original request
- Recording context: Storing the data retrieved from vector databases or other sources so you know what influenced the model’s answer
- Tracking cost and latency: Monitoring how often models are called, how long they take, and how much they cost
- Notifying drift: Identifying when the quality of answers changes over time, which may indicate a model update or degraded performance on specific data
For Java developers, this maps to existing practice. We already integrate OpenTelemetry, structured logging frameworks, and metrics exporters like Micrometer. The difference is that now we need to apply those tools to AI-specific signals. A prompt is like an input event. A model response is like a downstream dependency. Observability becomes an additional layer that cuts through the stack, capturing the reasoning process itself.
Consider a Quarkus application that integrates with OpenTelemetry. You can create spans for each AI call; add attributes for the model name, token count, latency, and cache hits; and export those metrics to Grafana or another monitoring system. This makes AI behavior visible in the same dashboards your operations team already uses.
Mapping New Layers to Familiar Practices
The key insight is that these new layers do not replace the old ones. They extend them. Dependency injection still works. You should inject a guardrail component into a service the same way you inject a validator or logger. Fault tolerance libraries like MicroProfile Fault Tolerance or Resilience4j are still useful. You can wrap AI calls with time-outs, retries, and circuit breakers. Observability frameworks like Micrometer and OpenTelemetry are still relevant. You just point them at new signals.
By treating validation and observability as layers, not ad hoc patches, you maintain the same architectural discipline that has always defined enterprise Java. That discipline is what keeps systems maintainable when they grow and evolve. Teams know where to look when something fails, and they know how to extend the architecture without introducing brittle hacks.
An Example Flow
Imagine a REST end point that answers customer questions. The flow looks like this:
1. The request comes into the REST layer.
2. A context builder retrieves relevant documents from a vector store.
3. The prompt is assembled and sent to a local or remote model.
4. The result is passed through a guardrail layer that validates the structure and content.
5. Observability hooks record the prompt, context, and response for later analysis.
6. The validated result flows into business logic and is returned to the client.
This flow has clear layers. Each one can evolve independently. You can swap the vector store, upgrade the model, or tighten the guardrails without rewriting the whole system. That modularity is exactly what enterprise Java architectures have always valued.
A concrete example might be using LangChain4j in Quarkus. You define an AI service interface, annotate it with the model binding, and inject it into your resource class. Around that service you add a guardrail interceptor that enforces a schema using Jackson. You add an OpenTelemetry span that records the prompt and tokens used. None of this requires abandoning Java discipline. It’s the same stack thinking we’ve always used, now applied to AI.
Implications for Architects
For architects, the main implication is that AI doesn’t remove the need for structure. If anything, it increases it. Without clear boundaries, AI becomes a black box in the middle of the system. That’s not acceptable in an enterprise environment. By defining guardrails and observability as explicit layers, you make AI components as manageable as any other part of the stack.
This is what evaluation in this context means: systematically measuring how an AI component behaves, using tests and monitoring that go beyond traditional correctness checks. Instead of expecting exact outputs, evaluations look at structure, boundaries, relevance, and compliance. They combine automated tests, curated prompts, and sometimes human review to build confidence that a system is behaving as intended. In enterprise settings, evaluation becomes a recurring activity rather than a one-time validation step.
Evaluation itself becomes an architectural concern that reaches beyond just the models themselves. Hamel Husain describes evaluation as a first-class system, not an add-on. For Java developers, this means building evaluation into CI/CD, just as unit and integration tests are. Continuous evaluation of prompts, retrieval, and outputs becomes part of the deployment gate. This extends what we already do with integration testing suites.
This approach also helps with skills. Teams already know how to think in terms of layers, services, and crosscutting concerns. By framing AI integration in the same way, you lower the barrier to adoption. Developers can apply familiar practices to unfamiliar behavior. This is critical for staffing. Enterprises should not depend on a small group of AI specialists. They need large teams of Java developers who can apply their existing skills with only moderate retraining.
There is also a governance aspect. When regulators or auditors ask how your AI system works, you need to show more than a diagram with a “call LLM here” box. You need to show the validation layer that checks outputs, the guardrails that enforce policies, and the observability that records decisions. This is what turns AI from an experiment into a production system that can be trusted.
Looking Forward
The architectural shifts described here are only the beginning. More layers will emerge as AI adoption matures. We’ll see specialist and per-user caching layers to control cost, fine-grained access control to limit who can use which models, and new forms of testing to verify behavior. But the core lesson is clear: AI requires us to add structure, not remove it.
Java’s history gives us confidence. We’ve already navigated shifts from monoliths to distributed systems, from synchronous to reactive programming, and from on-premises to cloud. Each shift added layers and patterns. Each time, the ecosystem adapted. The arrival of AI is no different. It’s another step in the same journey.
For Java developers, the challenge is not to throw away what we know but to extend it. The shift is real, but it’s not alien. Java’s history of layered architectures, dependency injection, and crosscutting services gives us the tools to handle it. The result is not prototypes or one-off demos but applications that are reliable, auditable, and ready for the long lifecycles that enterprises demand.
In our book, Applied AI for Enterprise Java Development, we explore these architectural shifts in depth with concrete examples and patterns. From retrieval pipelines with Docling to guardrail testing and observability integration, we show how Java developers can take the ideas outlined here and turn them into production-ready systems.
