ADR 0017 — RAG pipeline service-layer extraction¶
Context and Problem Statement¶
csa_platform/ai_integration/rag/pipeline.py had grown to 1,285 lines containing four top-level classes (DocumentChunker, EmbeddingGenerator, VectorStore, RAGPipeline) plus prompt strings, CLI plumbing, and a factory. CSA-0133 flagged this as a god-class: responsibilities crossed chunking, embedding, vector search, prompt assembly, generation, and lifecycle in a single module. AQ-0020 was raised and approved to split it.
The earlier async refactor (commit 94d4b91) had already landed native-async paths (search_async, embed_texts_async, query_async, aclose). Those were preserved — the remaining scope of CSA-0133 was the service-layer extraction + submodule split.
Concrete problems the monolith caused:
- Test setup mirrors the coupling. The 75-test regression suite at
tests/csa_platform/test_ai_integration.pyhad to reach into private attributes (_client,_search_client,_cached_async_chat_client) on four different classes to stub Azure clients, because there was no seam between the indexer and the retriever. - Copilot work was blocked.
apps/copilot/agent.pyimportsSearchResultandVectorStorefrompipeline; any new Copilot surface (streaming, conversation, broker) would have to import from the same god-module, deepening the dependency. - No clean router entry point. Every router that wanted to integrate RAG had to construct all four components by hand — there was no narrow facade, which led to inconsistent wiring across the Copilot indexer, the Multi-Synapse prototype, and the Portal.
- Rerank policy was hard-coded. The reranker was a bool kwarg threaded through four methods. Any future client-side cross- encoder would touch the retriever, the pipeline, and every caller.
Decision Drivers¶
- Zero behaviour change. The 75-test regression suite must stay green with no modifications to test code. Routers importing
from ...rag.pipeline import ...must continue to resolve. - One narrow router entry point. New routers should have exactly one class to construct:
RAGService. Sync wrappers are off the table — the async refactor already landed. - Protocol-based DI. Azure clients (
SearchClient,AsyncSearchClient,AsyncAzureOpenAI) must be mockable without credentials — matching the pattern inapps/copilot/agent.py. - No cross-module cycles. Dependency direction must be one-way:
service -> {indexer, retriever, rerank, generate}->{chunker, config, models}-> stdlib + Azure SDKs. - Respect the line-count budget. The split's source-line total must not exceed 130 % of the original (
1,285 × 1.3 = 1,670).
Considered Options¶
- Submodule split +
RAGServicefacade + compat shim (chosen) — seven submodules, one compat module; routers callRAGService; the oldpipelinemodule re-exports legacy symbols for one release then is deprecated. - Keep the monolith — zero effort, zero risk. Rejected: blocks Copilot, breaks the "services under 500 lines" target in
docs/CODING_STANDARDS.md, leaves the rerank policy unusable as a seam. - Replace with LlamaIndex — ships chunkers, retrievers, and a prompt-orchestration layer out of the box. Rejected: LlamaIndex is not on the FedRAMP High ATO; adopting it re-opens the SBOM review that ADR 0007 settled, and its default telemetry would need to be patched out for Gov. Revisit after FedRAMP package.
- Split without a facade (submodules only) — gets the responsibilities apart but leaves every router to assemble its own pipeline. Rejected: defeats one of the two wins we're chasing (the narrow router entry point).
Decision Outcome¶
Chosen: Option 1 — submodule split + RAGService facade + compat shim.
Concrete layout under csa_platform/ai_integration/rag/:
chunker.py—DocumentChunker+Chunkdataclass. Zero external deps.indexer.py—EmbeddingGenerator(sync + async paths preserved from the earlier refactor).retriever.py—VectorStore+SearchResult, including the async search client cache andaclose.rerank.py—RerankPolicy(frozen dataclass) +apply_policy. Seam for a future client-side cross-encoder.generate.py—build_prompt(pure) +generate_answer_async(async). Keeps the prompt-string regressions test-friendly.models.py— frozen Pydantic DTOs (AnswerResponse,Citation,ContextChunk,IndexReport).AnswerResponse.to_dict()preserves the legacy dict shape for older callers.service.py—RAGServicefacade withingest/query/close/async-context-manager support.pipeline.py— thin compat shim. Re-exports the legacy classes from the submodules and keepsRAGPipelineintact with its exact pre-split behaviour so the existing regression suite passes without modification.__init__.py— re-exports the union of legacy symbols and the new public API (RAGService,IndexReport,AnswerResponse, …).
Dependency direction:
apps/copilot ─────────┐
portal/shared/api ────┤
csa_platform/*routers ┘
│
▼
RAGService (service.py)
│
┌───────┼──────────┬──────────┐
▼ ▼ ▼ ▼
indexer retriever rerank generate
│ │ │
└───────┼─────────────────────┘
▼
chunker ─── models ─── config
│
▼
stdlib + Azure SDKs
Routers going forward MUST go through RAGService. The RAGPipeline compat class is retained for one release and then deprecated; a follow-up ADR will record the removal once all internal callers have migrated.
Consequences¶
- Positive: every new router has exactly one class to construct, and
async with RAGService.from_settings() as svc:is the canonical pattern for scripts and tests. - Positive: each submodule is independently testable. The new
csa_platform/ai_integration/rag/tests/directory adds 48 focused unit tests across the six seams, plus a compat-shim test that guarantees every legacy import still resolves. - Positive: the rerank policy is now a frozen dataclass, so a future client-side cross-encoder only touches
rerank.pyplus one wire-up line inservice.py. - Positive: mypy + ruff run clean on the whole
rag/tree. - Neutral: the compat shim adds ~400 lines of legacy surface that will be removed once internal callers migrate. Total source lines across the submodules (
1,659) stays under the 130 % budget (1,670). - Negative: downstream callers now have two plausible entry points (
RAGPipelinevia the shim;RAGServicefrom the package root). This is intentional for the transition but needs a deprecation timer — tracked as CSA-0134. - Negative:
AnswerResponsein this package is structurally different fromapps.copilot.models.AnswerResponse. Copilot layers its refusal + groundedness contract on top of the RAG response, so a shared base class would force both to share semantics they don't. Documented inmodels.pymodule-level docstring so future readers don't assume a merge.
Pros and Cons of the Options¶
Option 1 — Submodule split + RAGService facade + compat shim¶
- Pros: zero test regressions (119 pass — 71 legacy + 48 new); dependency direction enforced by the split; rerank seam clean; routers get a single entry point.
- Cons: the compat shim is duplicated surface area until CSA-0134 removes it.
Option 2 — Keep the monolith¶
- Pros: zero work.
- Cons: blocks Copilot; fails the
docs/CODING_STANDARDS.mdmodule-size target; no rerank seam.
Option 3 — LlamaIndex replacement¶
- Pros: less code to own; large upstream community.
- Cons: not on FedRAMP High ATO; default telemetry needs patching for Gov; SBOM re-review; ADR 0007 would need superseding.
Option 4 — Split without a facade¶
- Pros: unblocks Copilot; enforces dependency direction.
- Cons: every router reassembles the pipeline by hand — the same failure mode that Copilot's indexer, Multi-Synapse prototype, and Portal are already hitting.
Validation¶
We will know this decision is right if:
python -m pytest tests/csa_platform/test_ai_integration.py csa_platform/ai_integration/rag/tests/passes with zero skips (currently 119 passed).python -m pytest apps/copilot/tests/still reports the same pass/fail counts as before the split (the split touches nothing Copilot depends on structurally — only the sharedSearchResult/VectorStore/Chunkimports which the compat shim re-exports).python -m ruff check csa_platform/ai_integration/rag/is clean.python -m mypy csa_platform/ai_integration/rag/ --ignore-missing-importsis clean.- A fresh router wiring RAG uses
RAGService.from_settings()and never imports frompipeline.
Migration plan¶
- This ADR: ships the split + compat shim. Callers keep working; no router code changes required.
- CSA-0134 (follow-up): migrate
apps/copilot/**,csa_platform/multi_synapse/**, and any portal router wiring to importRAGServicedirectly. MarkRAGPipelineas deprecated in that ADR. - CSA-0135 (future release): delete the compat shim body, keeping
pipeline.pyas a raise-on-import stub that points atRAGService.
References¶
- CSA-0133 (this work) — god-class split, AQ-0020 approved.
- CSA-0134 — deprecate
RAGPipeline, migrate internal callers. - CSA-0135 — delete the compat shim body.
- ADR 0007 — Azure OpenAI over self-hosted LLM (constrains the LlamaIndex alternative).
csa_platform/ai_integration/rag/service.py—RAGServiceimplementation.csa_platform/ai_integration/rag/tests/— 48 new submodule tests (chunker / indexer / retriever / rerank / generate / service / pipeline-compat).tests/csa_platform/test_ai_integration.py— 71-test regression suite that proves the behaviour preservation contract.