Overview
This architecture study documents the patterns and decisions required for production-grade enterprise RAG systems. The key challenges addressed:
- Connect to internal knowledge sources securely
- Retrieve relevant context from multiple systems
- Respect existing access controls (permission-aware retrieval)
- Provide audit trails for enterprise compliance
Key Challenges
Enterprise AI systems require architectural patterns that go beyond prototypes:
- Handling permissions when users query across distributed systems
- Defining chunking strategies that preserve semantic context
- Achieving real-time retrieval performance at enterprise scale
- Establishing eval frameworks to measure quality systematically
Architecture Highlights
- Connector Gateway: Unified interface to SharePoint, Confluence, internal APIs
- Embedding Pipeline: Automated chunking, vectorization, and indexing with metadata preservation
- Retrieval Engine: Hybrid search (semantic + keyword) with permission filtering
- LLM Integration: AWS Bedrock with guardrails and response validation
- Audit Layer: Full traceability of queries, sources, and responses
Technical Stack
- Backend: Python (FastAPI), AWS Lambda for connectors
- Vector Store: Pinecone with metadata filtering
- LLM: Claude (via AWS Bedrock) with prompt engineering for grounded responses
- Auth: SSO integration with role-based access control (RBAC)
- UI: React-based chat interface with citation links
Patterns Established
1. Permission Passthrough Pattern This work established the permission passthrough pattern now foundational to enterprise RAG systems. Every document retrieved is filtered by the user's actual permissions in source systems, ensuring zero data leakage.
2. Semantic Chunking Strategy Defined a context-aware chunking approach that preserves semantic boundaries (sections, paragraphs) rather than arbitrary token limits. This pattern improved retrieval relevance by 30% and has been adopted across our RAG implementations.
3. Citations build trust Users needed to verify AI responses. We added inline citations linking back to source documents. This single feature drove adoption more than any other.
4. Eval-driven iteration We built an eval harness with 200+ question-answer pairs curated from real user queries. Every architecture change had to improve eval scores—this prevented "vibes-based" optimization.
Strategic Insights
This work demonstrates that enterprise RAG is fundamentally a permissions problem, not just a retrieval problem. The key insight: any RAG system deployed in an enterprise must treat permissions as a first-class architectural concern from day one.
Three Architectural Principles Established:
-
Security-First Architecture: Permissions cannot be retrofitted. Permission-aware retrieval must be designed into the connector layer, not added as a filter on top.
-
Semantic Chunking Over Token Limits: Context preservation matters more than arbitrary limits. Chunking strategies should respect document structure and semantic boundaries.
-
Eval-Driven Development: Production RAG requires systematic measurement. Build eval harnesses before you build features, not after.
These principles now inform how our organization approaches all AI system development.
Impact & Adoption
The patterns established in this work have been adopted across multiple teams:
-
Permission Passthrough Pattern: Now the standard approach for all enterprise AI systems requiring data access. Three other teams building RAG systems adopted this pattern directly.
-
Semantic Chunking: Our chunking library has been extracted and is now used across 5+ internal projects, becoming the de facto standard for document processing.
-
Eval Framework: The eval harness approach influenced how the organization builds AI systems. Teams now start with evals, not prototypes.
External Recognition: Presented these patterns at an internal architecture review, which led to the approach being documented in the organization's AI development guidelines.
Outcome
The assistant became the primary interface for internal knowledge discovery. Teams use it for onboarding, policy lookups, technical troubleshooting, and cross-team knowledge sharing. The platform is now expanding to support additional use cases like contract analysis and incident response workflows.