High-Level Architecture
Design Principles
- Serverless-first — No idle infrastructure. Compute scales to zero and up to thousands of concurrent requests automatically.
- Multi-tenant isolation — Every user’s data is logically isolated at the database, vector store, and API layers. Org memory uses separate vector collections per team and organization.
- Semantic by default — Every memory is embedded on write. Search is always vector-based, not keyword matching.
- Graph-enriched — Entities and relationships are extracted from memories into a knowledge graph, enabling traversal and structural reasoning alongside vector search.
- Identity-aware — User preferences, expertise, and goals are tracked as a persistent identity model, enabling personalized recall.
- Event-driven — Embedding, extraction, backups, and sync operations are triggered by database events, not polling.
Data Flow
Store
- Your app sends a memory to the API
- Request is authenticated and rate-limited at the gateway
- The memory is persisted to the relational store
- A vector embedding is generated and stored in the vector index
- Entities and relationships are extracted into the knowledge graph
- Response returns with the memory ID
Recall
- Your app sends a natural language query
- The query is embedded into the same vector space
- Cosine similarity search runs across all accessible collections
- Results are ranked, deduplicated, and enriched with metadata
- If relevant, knowledge graph context is attached (related entities, contradictions)
- Top matches are returned with relevance scores
Org Memory
Org memories follow an event-driven pipeline:- Memory is committed to a team workspace
- A database trigger fires asynchronously
- The embedding is generated and stored in the team’s vector collection
- When promoted to org scope, the memory is re-embedded into the org-level collection
- All operations are audit-logged automatically
Infrastructure
| Component | Purpose |
|---|---|
| API Gateway | Request routing, authentication, rate limiting |
| Serverless Compute | Stateless request handling, embedding generation |
| Vector Database | High-dimensional similarity search with collection-level isolation |
| Knowledge Graph Store | Entity and relationship storage, traversal queries |
| Relational Database | Memory metadata, user data, org structure, audit logs |
| Cache | Per-user rate limit tracking, session state |
| Object Storage | Static assets, backups |
| CDN | Global edge delivery for the dashboard and docs |
| Secrets Manager | Credential storage with rotation support |
| Scheduled Jobs | Daily vector store backups, document sync |
Reliability
- Daily automated backups of all vector collections
- Hourly document sync for connected integrations
- Cold start optimization — Lambda layers pre-package dependencies for sub-second warm-up
- Graceful degradation — If the cache layer is unavailable, rate limiting falls back to per-instance tracking rather than failing open