Data Flow Architecture

Web Intelligence: AI's Eyes on the Web

Web Intelligence is a standalone environment that serves as the AI's primary interface for web data collection. It is NOT limited to contact enrichment - it enables ANY task requiring web data.

Core Pattern

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│ Web Intelligence│───▶│    Staging      │───▶│     Target      │
│  (6670/6671)    │    │  (6680/6681)    │    │   Environment   │
│  crawl/scrape   │    │  temporary hold │    │ final destination│
└─────────────────┘    └─────────────────┘    └─────────────────┘

Why Staging?

The Staging Environment is a multi-tiered intermediary that: - Holds raw web data temporarily during processing - Allows transformation and enrichment before routing - Decouples web collection from final storage - Enables retry/reprocessing without re-crawling - Supports audit trails and processing logs

Downstream Environments

Web Intelligence data can flow to ANY Nexus environment:

Environment	Port	Use Case
KB	6625/6626	Documentation, knowledge articles
Documents	TBD	Reports, scraped content
Contacts	6630/6631	Contact enrichment
Corpus	TBD	Training data, transcripts
Links	6635/6636	URL bookmarks, discovered links
Track	6640/6641	Project research data

Example Data Flows

1. Documentation Scraping

web.scrape(docs_url)
  ↓
staging.store(raw_html)
  ↓
web.to_markdown()
  ↓
kb.create(title, markdown_content)

Use case: Scraping API documentation for LARS reference.

2. Competitive Research

web.research(competitor, depth='deep')
  ↓
staging.store(research_data)
  ↓
Process and structure
  ↓
documents.create(research_report)

Use case: Building competitive analysis reports.

3. Contact Enrichment

web.research(person_company, depth='standard')
  ↓
staging.store(enrichment_data)
  ↓
Extract entities and verify
  ↓
contact.update(contact_id, enriched_fields)

Use case: Adding business intelligence to CRM contacts.

4. YouTube Transcripts

web.fetch_transcript(video_url)
  ↓
staging.store(raw_transcript)
  ↓
Process and clean
  ↓
corpus.add(transcript_document)

Use case: Building training data from educational videos.

5. Link Discovery

web.crawl(site_url, depth=2)
  ↓
web.discover_links()
  ↓
web.filter_links(criteria)
  ↓
links.save_batch(discovered_urls)

Use case: Building curated bookmark collections.

LARS Integration

LARS (Local AI Runtime System) uses Web Intelligence as its primary tool for gathering external information:

Research tasks → web.research()
Documentation → web.scrape() + web.to_markdown()
Monitoring → web.crawl() for changes
Enrichment → web.research() with entity extraction

All results flow through Staging, allowing LARS to process data incrementally and route to appropriate destinations.

Key Principles

Separation of Concerns: Web Intelligence collects; Staging holds; Target environments store
Flexibility: Same crawl data can route to multiple destinations
Idempotency: Staging enables safe reprocessing
Audit Trail: Processing steps tracked in Staging
Universal Interface: One web tool serves all AI tasks