Staging Environment (formerly Temp)
A multi-stage data processing pipeline with automatic garbage collection.
Core Concept
Staging is NOT a junk drawer. It's an intentional data flow pipeline:
Raw Data → Stage 1 (dump) → Stage 2 (refine) → Stage 3 (organize) → Stage N (final)
↓ ↓ ↓
[wipe] [optional persist] [persist to KB/Context/Corpus]
Stage Lifecycle
Stage 1: Raw Dump
- Web scrape markdown
- Document extraction
- API response data
- Large, messy, temporary
- NEVER persists - wiped when Stage 2 created
Stage 2-N: Refinement
- Each stage cleaner than previous
- AI decides how many stages needed
- Could be 2 stages, could be 5
- Speed: milliseconds per stage
- Optional persist to permanent environment
Final Stage
- Polished output
- Ready for delivery or storage
- Persisted to appropriate environment
- Staging chain cleaned up after
Automatic Garbage Collection
Rules
- Stage 1 wipes when Stage 2 exists
- Stage N wipes when Stage N+1 exists
- All stages wipe after final persist
- 30-second delay before wipe (backtrack window)
- No manual cleanup needed
Why This Matters
- Prevents memory bloat from large dumps
- Staging stays lean automatically
- No junk drawer accumulation
- AI can process massive data without choking
Persist-to-Environment
AI decides where refined data belongs:
| Stage Content | Destination |
|---|---|
| Structured docs | KB |
| Summaries | Context |
| Raw extracted text | Corpus |
| Contact info found | Contact |
| URLs discovered | Links |
| Project items | Track |
Example Flow
1. Web scrape Corlera.com → Stage 1 (raw markdown)
2. Extract key sections → Stage 2 (structured data)
3. Organize into outline → Stage 3 (KB-ready)
4. Generate summary → Stage 4 (Context-ready)
5. Persist Stage 3 → KB
6. Persist Stage 4 → Context
7. Notify user with summary
8. Cleanup: All stages wiped
Speed Requirements
- Stage transitions: milliseconds
- Full pipeline (dump to persist): seconds
- AI doesn't wait, it flows
- Parallel processing where possible
Implementation Notes
Data Structure
stage = {
'id': 'stg_abc123',
'stage_number': 2,
'parent_stage': 'stg_xyz789',
'content': {...},
'created_at': timestamp,
'persist_to': None, # or 'kb', 'context', 'corpus'
'cleanup_after': timestamp + 30s
}
Cleanup Process
- Background worker checks cleanup_after
- Wipes expired stages
- Runs every 10 seconds
- No manual intervention
Rename: Temp → Staging
Why rename: - 'Temp' implies junk drawer - 'Staging' implies intentional pipeline - Better reflects multi-stage processing - Aligns with data engineering terminology