← Back to main spec FUTURE SCOPE

Organization Context Enrichment

This feature is out of scope for the current Workerline Enhancements sprint. It is documented here as a standalone specification for future implementation.

Overview

Allow organizations to upload documents (PDFs, text files, policy manuals) that are automatically processed and injected as context into AI conversations for that organization's sessions. This enriches the AI's responses with org-specific knowledge — company policies, vessel procedures, benefit details, etc.

How It Works

Org Instructions (Existing)

The survey_instructions table already exists in the schema. Enhancement: improved dashboard UI with template suggestions and preview of how instructions affect AI behavior.

Org-Specific Documents (New)

Upload PDFs/text files → extract text → store → concatenate into the AI system prompt for that org's sessions (simple text injection, not semantic retrieval).

sequenceDiagram participant Admin as 🖥️ Admin participant API as ⚡ scb-api participant DB as 💾 SQLite Admin->>API: Upload PDF API->>API: Extract + chunk text API->>DB: Store org_document Note over API,DB: During conversations... participant W as 🚢 Worker participant OAI as 🤖 OpenAI W->>API: Chat message API->>DB: Load org_documents API->>API: Build prompt + doc excerpts API->>OAI: Chat completion (with context) OAI-->>W: Streaming response

Data Model

erDiagram orgs ||--o{ org_documents : has org_documents { text id PK text org_id FK text name text file_type text extracted_text text created_at }

API Routes

Method	Route	Auth	Description
`POST`	`/org/:id/documents`	Admin	Upload org document (PDF/text)
`GET`	`/org/:id/documents`	Admin	List org documents
`DELETE`	`/org/:id/documents/:docId`	Admin	Delete org document

Implementation Notes

PDF text extraction via pdf-parse or similar lightweight library
Simple text injection into system prompt (no vector DB / RAG for MVP)
Character limit per org to prevent prompt overflow (~8K chars)
Dashboard UI: upload widget, list with delete, preview of extracted text
Access scoped by org via existing verifyOrgAccess middleware

Future enhancement: If document volume grows, consider vector embeddings + similarity search (RAG) for more precise context injection. For now, simple text concatenation is sufficient.

Foundation Analysis — What Already Exists

This feature has a small footprint because it builds almost entirely on existing infrastructure:

survey_instructions table — already exists in the production schema (not shown in the main spec's simplified ERD) for org-level AI prompt customization. Context enrichment extends this pattern.
OpenAI GPT-4 integration — existing chat completion pipeline already accepts system prompts. Document excerpts are injected as additional system context.
verifyOrgAccess middleware — existing auth layer scopes all admin routes by org.
Dashboard (scb) — existing admin UI patterns (CRUD lists, file upload) can be reused for the document management interface.
File storage — R2/S3 integration (if used for learn content uploads in the current sprint) can be reused for document storage.

Key insight: This is primarily a data-pipeline + dashboard-UI task. No new external services, no new infrastructure patterns. The heaviest lift is the text extraction and the dashboard upload interface.

Estimated Cost

Organization Context Enrichment

$2,200

2–3 days · agentic coding workflow · builds entirely on existing infrastructure

Pricing note: This estimate assumes the same AI-driven development workflow used in the base sprint ($13,200 for 7 features / 2–3 weeks). Context Enrichment is a small, well-bounded task — 3 API routes, 1 table, PDF text extraction, and a basic dashboard upload UI. No new external services or infrastructure patterns.

What's Included

Deliverable	Details
Document upload API	POST/GET/DELETE routes with org-scoped access. PDF + plain text support.
Text extraction pipeline	Automatic text extraction on upload via `pdf-parse`. Chunking to stay within prompt limits (~8K chars per org).
AI context injection	Org documents concatenated into the GPT-4 system prompt during worker conversations. Full text injected up to the ~8K character limit per org — no topic-based filtering in MVP (simple concatenation, not semantic search).
Dashboard UI	Upload widget, document list with preview of extracted text, delete confirmation. Integrated into existing org settings page.
Improved instructions UI	Better editing experience for existing `survey_instructions` with template suggestions.

Ongoing Costs

OpenAI: Marginal increase in prompt tokens (~$0.001–0.005 per conversation from injected context)
Storage: Negligible (text files are small; PDFs stored in existing R2/S3)