Staging Environment

Staging Environment (formerly Temp)

A multi-stage data processing pipeline with automatic garbage collection.

Core Concept

Staging is NOT a junk drawer. It's an intentional data flow pipeline:

Raw Data → Stage 1 (dump) → Stage 2 (refine) → Stage 3 (organize) → Stage N (final)
                ↓                ↓                 ↓
              [wipe]     [optional persist]  [persist to KB/Context/Corpus]

Stage Lifecycle

Stage 1: Raw Dump

Web scrape markdown
Document extraction
API response data
Large, messy, temporary
NEVER persists - wiped when Stage 2 created

Each stage cleaner than previous
AI decides how many stages needed
Could be 2 stages, could be 5
Speed: milliseconds per stage
Optional persist to permanent environment

Final Stage

Polished output
Ready for delivery or storage
Persisted to appropriate environment
Staging chain cleaned up after

Automatic Garbage Collection

Rules

Stage 1 wipes when Stage 2 exists
Stage N wipes when Stage N+1 exists
All stages wipe after final persist
30-second delay before wipe (backtrack window)
No manual cleanup needed

Why This Matters

Prevents memory bloat from large dumps
Staging stays lean automatically
No junk drawer accumulation
AI can process massive data without choking

Persist-to-Environment

AI decides where refined data belongs:

Stage Content	Destination
Structured docs	KB
Summaries	Context
Raw extracted text	Corpus
Contact info found	Contact
URLs discovered	Links
Project items	Track

Example Flow

1. Web scrape Corlera.com → Stage 1 (raw markdown)
2. Extract key sections → Stage 2 (structured data)
3. Organize into outline → Stage 3 (KB-ready)
4. Generate summary → Stage 4 (Context-ready)
5. Persist Stage 3 → KB
6. Persist Stage 4 → Context
7. Notify user with summary
8. Cleanup: All stages wiped

Speed Requirements

Stage transitions: milliseconds
Full pipeline (dump to persist): seconds
AI doesn't wait, it flows
Parallel processing where possible

Implementation Notes

Data Structure

stage = {
    'id': 'stg_abc123',
    'stage_number': 2,
    'parent_stage': 'stg_xyz789',
    'content': {...},
    'created_at': timestamp,
    'persist_to': None,  # or 'kb', 'context', 'corpus'
    'cleanup_after': timestamp + 30s
}

Cleanup Process

Background worker checks cleanup_after
Wipes expired stages
Runs every 10 seconds
No manual intervention

Rename: Temp → Staging

Why rename: - 'Temp' implies junk drawer - 'Staging' implies intentional pipeline - Better reflects multi-stage processing - Aligns with data engineering terminology