January 202611 min read

Architecture Overview

Understand the architecture and design principles behind pxlpeak's scalable analytics platform.

System Overview

pxlpeak is built on a modern, event-driven architecture designed for:

High throughput: Process billions of events daily
Low latency: Real-time analytics with sub-second query response
Global scale: Edge locations worldwide for fast data collection
Privacy-first: Data isolation, encryption at rest and in transit

code

┌─────────────────────────────────────────────────────────────────────┐
│                          Data Collection                            │
├─────────────────────────────────────────────────────────────────────┤
│  Browser SDK  │  Server SDK  │  Mobile SDK  │  API  │  Webhooks    │
└───────┬───────┴──────┬───────┴──────┬───────┴───┬───┴───────┬──────┘
        │              │              │           │           │
        ▼              ▼              ▼           ▼           ▼
┌─────────────────────────────────────────────────────────────────────┐
│                        Edge Network (CDN)                           │
│           • Global PoPs • Request Validation • Rate Limiting        │
└───────────────────────────────┬─────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────────┐
│                        Ingestion Layer                              │
│    • Event Queue (Kafka) • Schema Validation • Deduplication        │
└───────────────────────────────┬─────────────────────────────────────┘
                                │
        ┌───────────────────────┼───────────────────────┐
        ▼                       ▼                       ▼
┌───────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  Real-time    │     │  Batch          │     │  Attribution    │
│  Processing   │     │  Processing     │     │  Engine         │
│  (Flink)      │     │  (Spark)        │     │  (Custom)       │
└───────┬───────┘     └────────┬────────┘     └────────┬────────┘
        │                      │                       │
        ▼                      ▼                       ▼
┌─────────────────────────────────────────────────────────────────────┐
│                         Storage Layer                               │
├──────────────┬──────────────┬──────────────┬───────────────────────┤
│  ClickHouse  │  PostgreSQL  │  Redis       │  Object Storage       │
│  (Analytics) │  (Metadata)  │  (Cache)     │  (Raw Events)         │
└──────────────┴──────────────┴──────────────┴───────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────────┐
│                         Query Layer                                 │
│         • REST API • GraphQL • Real-time Subscriptions              │
└─────────────────────────────────────────────────────────────────────┘

Data Collection Layer

Browser SDK

The browser SDK is designed for minimal footprint and maximum compatibility:

code

// Core tracker: ~3KB gzipped
// Full SDK with plugins: ~8KB gzipped

// Initialization
pxlpeak.init({
  siteId: 'site_xxx',
  transport: 'beacon',      // Use Beacon API for reliability
  batchSize: 10,            // Batch events before sending
  flushInterval: 5000,      // Flush every 5 seconds
  retryAttempts: 3,         // Retry failed requests
  sessionTimeout: 30,       // Session timeout in minutes
  cookieOptions: {
    secure: true,
    sameSite: 'Lax',
    domain: '.example.com'  // Cross-subdomain tracking
  }
});

Collection Flow:

Events captured in memory buffer
Batched based on size or time threshold
Sent via Beacon API (or XHR fallback)
Queued in IndexedDB if offline
Synced when connection restored

Server-Side Collection

Server SDKs provide reliable tracking for backend events:

code

// Node.js SDK with batching
import { PxlPeak } from '@pxlpeak/sdk';

const pxlpeak = new PxlPeak({
  apiKey: process.env.PXLPEAK_API_KEY,
  batch: {
    enabled: true,
    maxSize: 100,
    flushInterval: 5000
  },
  retry: {
    maxAttempts: 3,
    backoff: 'exponential'
  }
});

// Events are queued and sent in batches
pxlpeak.track('purchase', {
  transactionId: 'TXN-123',
  value: 99.99
});

// Ensure all events sent before process exit
process.on('SIGTERM', async () => {
  await pxlpeak.flush();
  process.exit(0);
});

Edge Network

Global Distribution

Events are collected through our global edge network:

| Region | Locations | Latency Target | |--------|-----------|----------------| | North America | 15 PoPs | < 20ms | | Europe | 12 PoPs | < 25ms | | Asia Pacific | 10 PoPs | < 30ms | | South America | 4 PoPs | < 40ms |

Request Processing at Edge

code

┌─────────────────────────────────────────────────────────────────┐
│                    Edge Worker Processing                        │
├─────────────────────────────────────────────────────────────────┤
│  1. TLS Termination                                              │
│  2. Request Validation (API key, payload structure)              │
│  3. Rate Limiting (per API key, per IP)                          │
│  4. Geo-enrichment (country, region, city)                       │
│  5. Bot Detection (signature matching, behavior analysis)        │
│  6. Forward to nearest ingestion cluster                         │
└─────────────────────────────────────────────────────────────────┘

Edge Data Enrichment

Events are enriched at the edge before ingestion:

code

{
  "event": "page_view",
  "client_id": "cid_xxx",
  "properties": {
    "page_path": "/products"
  },
  "_enriched": {
    "geo": {
      "country": "US",
      "region": "California",
      "city": "San Francisco",
      "timezone": "America/Los_Angeles"
    },
    "device": {
      "type": "desktop",
      "browser": "Chrome",
      "browser_version": "120",
      "os": "macOS",
      "os_version": "14.2"
    },
    "edge": {
      "pop": "SFO",
      "received_at": "2026-01-12T14:30:00.123Z"
    }
  }
}

Ingestion Layer

Event Queue

Apache Kafka serves as the central event bus:

code

Event Flow:
                     ┌──────────────────┐
Ingestion API ──────▶│  Kafka Cluster   │
                     │  ┌────────────┐  │
                     │  │ Partition 0│  │──────▶ Real-time Consumer
                     │  ├────────────┤  │
                     │  │ Partition 1│  │──────▶ Batch Consumer
                     │  ├────────────┤  │
                     │  │ Partition 2│  │──────▶ Attribution Consumer
                     │  └────────────┘  │
                     └──────────────────┘

Configuration:
- Partitions: 256 per topic
- Replication: 3x
- Retention: 7 days
- Throughput: 1M events/sec per partition

Schema Validation

Events are validated against JSON Schema:

code

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "required": ["event", "client_id", "timestamp"],
  "properties": {
    "event": {
      "type": "string",
      "maxLength": 64,
      "pattern": "^[a-z][a-z0-9_]*$"
    },
    "client_id": {
      "type": "string",
      "pattern": "^cid_[a-zA-Z0-9]+$"
    },
    "timestamp": {
      "type": "string",
      "format": "date-time"
    },
    "properties": {
      "type": "object",
      "additionalProperties": true,
      "maxProperties": 100
    }
  }
}

Deduplication

Events are deduplicated using a combination of:

Event ID hash: SHA-256 of event content
Time window: 5-minute deduplication window
Bloom filter: Memory-efficient duplicate detection

code

// Deduplication logic
interface DeduplicationConfig {
  windowMs: 300000;         // 5 minutes
  bloomFilterSize: 10000000; // 10M bits
  hashFunctions: 7;
}

function shouldProcess(event: Event): boolean {
  const eventId = hash(event);

  // Check bloom filter first (fast path)
  if (bloomFilter.mightContain(eventId)) {
    // Verify in Redis (accurate check)
    if (await redis.exists(`dedup:${eventId}`)) {
      return false; // Duplicate
    }
  }

  // Add to bloom filter and Redis
  bloomFilter.add(eventId);
  await redis.setex(`dedup:${eventId}`, 300, '1');

  return true;
}

Processing Layer

Real-time Processing (Apache Flink)

Flink handles time-sensitive computations:

code

// Real-time session computation
DataStream<Event> events = env.addSource(kafkaConsumer);

DataStream<Session> sessions = events
    .keyBy(event -> event.getClientId())
    .window(EventTimeSessionWindows.withGap(Time.minutes(30)))
    .process(new SessionProcessFunction());

// Real-time funnel analysis
DataStream<FunnelStep> funnelProgress = events
    .keyBy(event -> event.getClientId())
    .process(new FunnelTrackingFunction(funnelConfig));

// Anomaly detection
DataStream<Anomaly> anomalies = events
    .keyBy(event -> event.getSiteId())
    .window(SlidingEventTimeWindows.of(Time.minutes(5), Time.minutes(1)))
    .process(new AnomalyDetectionFunction());

Real-time Capabilities:

| Feature | Latency | Use Case | |---------|---------|----------| | Live visitor count | < 1s | Dashboard widgets | | Session tracking | < 5s | User journey analysis | | Funnel updates | < 10s | Real-time optimization | | Alert triggers | < 30s | Anomaly detection |

Batch Processing (Apache Spark)

Spark handles heavy aggregations and historical analysis:

code

# Daily aggregation job
from pyspark.sql import SparkSession
from pyspark.sql import functions as F

spark = SparkSession.builder \
    .appName("DailyAggregation") \
    .config("spark.sql.adaptive.enabled", "true") \
    .getOrCreate()

# Read events from object storage
events = spark.read.parquet(f"s3://events/{date}/*")

# Compute daily metrics
daily_metrics = events \
    .groupBy("site_id", "source", "medium") \
    .agg(
        F.countDistinct("client_id").alias("users"),
        F.count("*").alias("events"),
        F.countDistinct("session_id").alias("sessions"),
        F.sum(F.when(F.col("event") == "purchase", F.col("value"))).alias("revenue")
    )

# Write to ClickHouse
daily_metrics.write \
    .format("clickhouse") \
    .option("table", "daily_metrics") \
    .mode("append") \
    .save()

Attribution Engine

Custom-built attribution processing:

code

// Attribution processor
class AttributionEngine {
  private models: Map<string, AttributionModel>;

  async processConversion(conversion: Conversion): Promise<Attribution[]> {
    // Get all touchpoints for this user
    const touchpoints = await this.getTouchpoints(
      conversion.profileId,
      conversion.timestamp,
      this.lookbackWindow
    );

    // Calculate attribution for each model
    const results: Attribution[] = [];

    for (const [modelName, model] of this.models) {
      const credits = model.calculate(touchpoints, conversion);

      for (const credit of credits) {
        results.push({
          conversionId: conversion.id,
          model: modelName,
          touchpointId: credit.touchpointId,
          source: credit.source,
          medium: credit.medium,
          campaign: credit.campaign,
          credit: credit.value,
          revenue: conversion.value * credit.value
        });
      }
    }

    return results;
  }
}

// Attribution models
const models = {
  last_click: new LastClickModel(),
  first_click: new FirstClickModel(),
  linear: new LinearModel(),
  time_decay: new TimeDecayModel({ halfLifeDays: 7 }),
  position_based: new PositionBasedModel({ first: 0.4, middle: 0.2, last: 0.4 }),
  data_driven: new DataDrivenModel({ minConversions: 1000 })
};

Storage Layer

ClickHouse (Analytics)

ClickHouse powers our analytics queries:

code

-- Table schema optimized for time-series analytics
CREATE TABLE events (
    site_id String,
    event_name String,
    client_id String,
    session_id String,
    timestamp DateTime64(3),
    date Date MATERIALIZED toDate(timestamp),

    -- Dimensions
    source String,
    medium String,
    campaign String,
    country String,
    device_type String,
    browser String,

    -- Properties (flexible schema)
    properties String, -- JSON

    -- Metrics
    value Float64,

    -- Indexes
    INDEX idx_source source TYPE bloom_filter GRANULARITY 4,
    INDEX idx_campaign campaign TYPE bloom_filter GRANULARITY 4
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(date)
ORDER BY (site_id, date, event_name, client_id)
TTL date + INTERVAL 2 YEAR;

-- Materialized view for real-time aggregation
CREATE MATERIALIZED VIEW events_hourly
ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(date)
ORDER BY (site_id, date, hour, source, medium)
AS SELECT
    site_id,
    date,
    toHour(timestamp) as hour,
    source,
    medium,
    count() as events,
    uniqExact(client_id) as users,
    uniqExact(session_id) as sessions,
    sum(value) as revenue
FROM events
GROUP BY site_id, date, hour, source, medium;

Query Performance:

| Query Type | Data Volume | Response Time | |------------|-------------|---------------| | Real-time dashboard | Last 24h | < 100ms | | Weekly report | 7 days | < 500ms | | Monthly trends | 30 days | < 1s | | Year-over-year | 365 days | < 5s |

PostgreSQL (Metadata)

PostgreSQL stores configuration and metadata:

code

-- Site configuration
CREATE TABLE sites (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    workspace_id UUID NOT NULL REFERENCES workspaces(id),
    name TEXT NOT NULL,
    domain TEXT NOT NULL,
    timezone TEXT DEFAULT 'UTC',
    currency TEXT DEFAULT 'USD',
    settings JSONB DEFAULT '{}',
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

-- Conversion goals
CREATE TABLE conversion_goals (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    site_id UUID NOT NULL REFERENCES sites(id),
    name TEXT NOT NULL,
    event_name TEXT NOT NULL,
    value_type TEXT CHECK (value_type IN ('fixed', 'dynamic')),
    fixed_value DECIMAL(10, 2),
    attribution_window_days INTEGER DEFAULT 30,
    counting_method TEXT DEFAULT 'every',
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Segments
CREATE TABLE segments (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    site_id UUID NOT NULL REFERENCES sites(id),
    name TEXT NOT NULL,
    description TEXT,
    rules JSONB NOT NULL,
    user_count INTEGER DEFAULT 0,
    last_computed TIMESTAMPTZ,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

Redis (Cache)

Redis provides caching and real-time features:

code

// Cache layers
const cacheConfig = {
  // Query cache (5 min TTL)
  queries: {
    prefix: 'cache:query:',
    ttl: 300,
    maxSize: 10000
  },

  // Session data (30 min TTL)
  sessions: {
    prefix: 'session:',
    ttl: 1800
  },

  // Rate limiting
  rateLimit: {
    prefix: 'rate:',
    window: 1
  },

  // Real-time counters
  counters: {
    prefix: 'counter:',
    ttl: 60
  }
};

// Real-time visitor count
async function getLiveVisitors(siteId: string): Promise<number> {
  const key = `live:${siteId}`;
  return await redis.pfcount(key);
}

async function recordVisitor(siteId: string, clientId: string) {
  const key = `live:${siteId}`;
  await redis.pfadd(key, clientId);
  await redis.expire(key, 60); // 1 minute window
}

Query Layer

REST API

RESTful API for all operations:

code

// API route handlers
app.get('/v1/sites/:siteId/analytics/traffic', async (req, res) => {
  const { siteId } = req.params;
  const { startDate, endDate, metrics, dimensions } = req.query;

  // Check cache
  const cacheKey = `query:${hash(req.query)}`;
  const cached = await redis.get(cacheKey);
  if (cached) {
    return res.json(JSON.parse(cached));
  }

  // Execute query
  const result = await clickhouse.query(`
    SELECT
      ${dimensions.join(', ')},
      ${metrics.map(m => getMetricSQL(m)).join(', ')}
    FROM events
    WHERE site_id = {siteId:String}
      AND date BETWEEN {startDate:Date} AND {endDate:Date}
    GROUP BY ${dimensions.join(', ')}
    ORDER BY ${getOrderBy(metrics[0])} DESC
    LIMIT 1000
  `, { siteId, startDate, endDate });

  // Cache result
  await redis.setex(cacheKey, 300, JSON.stringify(result));

  res.json(result);
});

Real-time Subscriptions

WebSocket subscriptions for live data:

code

// WebSocket handler
wss.on('connection', (ws, req) => {
  const { siteId, subscription } = parseAuth(req);

  // Subscribe to real-time events
  const subscriber = redis.duplicate();
  subscriber.subscribe(`events:${siteId}`);

  subscriber.on('message', (channel, message) => {
    const event = JSON.parse(message);

    if (matchesSubscription(event, subscription)) {
      ws.send(JSON.stringify({
        type: 'event',
        data: event
      }));
    }
  });

  // Live metrics updates
  const metricsInterval = setInterval(async () => {
    const metrics = await getLiveMetrics(siteId);
    ws.send(JSON.stringify({
      type: 'metrics',
      data: metrics
    }));
  }, 5000);

  ws.on('close', () => {
    subscriber.unsubscribe();
    clearInterval(metricsInterval);
  });
});

Security Architecture

Data Isolation

code

┌─────────────────────────────────────────────────────────────────────┐
│                    Multi-Tenant Architecture                        │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                │
│  │ Workspace A │  │ Workspace B │  │ Workspace C │                │
│  ├─────────────┤  ├─────────────┤  ├─────────────┤                │
│  │ Site 1      │  │ Site 3      │  │ Site 5      │                │
│  │ Site 2      │  │ Site 4      │  │             │                │
│  └─────────────┘  └─────────────┘  └─────────────┘                │
│         │               │               │                         │
│         └───────────────┼───────────────┘                         │
│                         │                                          │
│                    Row-Level Security                              │
│                         │                                          │
│  ┌──────────────────────┴────────────────────────┐                │
│  │              Shared Infrastructure             │                │
│  │  • Kafka (partitioned by workspace)           │                │
│  │  • ClickHouse (filtered by site_id)           │                │
│  │  • PostgreSQL (RLS policies)                  │                │
│  └───────────────────────────────────────────────┘                │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Encryption

| Layer | Method | |-------|--------| | Transit | TLS 1.3 | | At Rest | AES-256 | | API Keys | Argon2id hashing | | PII | Field-level encryption |

Compliance

GDPR: Data subject access/deletion requests
CCPA: Consumer data rights
SOC 2: Security controls audit
HIPAA: Available for healthcare (BAA required)

Scalability

Horizontal Scaling

code

Load Characteristics:
- Events/second: 100K - 10M
- Queries/second: 1K - 100K
- Storage growth: ~1TB/day at scale

Scaling Strategy:
┌─────────────────────────────────────────────────────────────────┐
│  Layer          │  Scaling Method         │  Current Capacity  │
├─────────────────┼─────────────────────────┼───────────────────-┤
│  Edge           │  Auto-scale CDN         │  Unlimited         │
│  Ingestion      │  Kafka partitions       │  256 → 1024        │
│  Processing     │  Flink task slots       │  100 → 1000        │
│  ClickHouse     │  Sharding + replicas    │  3 → 100 nodes     │
│  PostgreSQL     │  Read replicas          │  1 → 10 replicas   │
│  Redis          │  Cluster mode           │  6 → 60 nodes      │
└─────────────────────────────────────────────────────────────────┘

Disaster Recovery

RPO (Recovery Point Objective): < 1 minute
RTO (Recovery Time Objective): < 15 minutes
Multi-region replication: Active-passive with automatic failover
Backup frequency: Continuous (streaming) + daily snapshots

Next: See Workspaces to learn about organizing your projects and teams.

January 202611 min read

Architecture Overview

Understand the architecture and design principles behind pxlpeak's scalable analytics platform.

System Overview

pxlpeak is built on a modern, event-driven architecture designed for:

High throughput: Process billions of events daily
Low latency: Real-time analytics with sub-second query response
Global scale: Edge locations worldwide for fast data collection
Privacy-first: Data isolation, encryption at rest and in transit

code

┌─────────────────────────────────────────────────────────────────────┐
│                          Data Collection                            │
├─────────────────────────────────────────────────────────────────────┤
│  Browser SDK  │  Server SDK  │  Mobile SDK  │  API  │  Webhooks    │
└───────┬───────┴──────┬───────┴──────┬───────┴───┬───┴───────┬──────┘
        │              │              │           │           │
        ▼              ▼              ▼           ▼           ▼
┌─────────────────────────────────────────────────────────────────────┐
│                        Edge Network (CDN)                           │
│           • Global PoPs • Request Validation • Rate Limiting        │
└───────────────────────────────┬─────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────────┐
│                        Ingestion Layer                              │
│    • Event Queue (Kafka) • Schema Validation • Deduplication        │
└───────────────────────────────┬─────────────────────────────────────┘
                                │
        ┌───────────────────────┼───────────────────────┐
        ▼                       ▼                       ▼
┌───────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  Real-time    │     │  Batch          │     │  Attribution    │
│  Processing   │     │  Processing     │     │  Engine         │
│  (Flink)      │     │  (Spark)        │     │  (Custom)       │
└───────┬───────┘     └────────┬────────┘     └────────┬────────┘
        │                      │                       │
        ▼                      ▼                       ▼
┌─────────────────────────────────────────────────────────────────────┐
│                         Storage Layer                               │
├──────────────┬──────────────┬──────────────┬───────────────────────┤
│  ClickHouse  │  PostgreSQL  │  Redis       │  Object Storage       │
│  (Analytics) │  (Metadata)  │  (Cache)     │  (Raw Events)         │
└──────────────┴──────────────┴──────────────┴───────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────────┐
│                         Query Layer                                 │
│         • REST API • GraphQL • Real-time Subscriptions              │
└─────────────────────────────────────────────────────────────────────┘

Data Collection Layer

Browser SDK

The browser SDK is designed for minimal footprint and maximum compatibility:

code

// Core tracker: ~3KB gzipped
// Full SDK with plugins: ~8KB gzipped

// Initialization
pxlpeak.init({
  siteId: 'site_xxx',
  transport: 'beacon',      // Use Beacon API for reliability
  batchSize: 10,            // Batch events before sending
  flushInterval: 5000,      // Flush every 5 seconds
  retryAttempts: 3,         // Retry failed requests
  sessionTimeout: 30,       // Session timeout in minutes
  cookieOptions: {
    secure: true,
    sameSite: 'Lax',
    domain: '.example.com'  // Cross-subdomain tracking
  }
});

Collection Flow:

Events captured in memory buffer
Batched based on size or time threshold
Sent via Beacon API (or XHR fallback)
Queued in IndexedDB if offline
Synced when connection restored

Server-Side Collection

Server SDKs provide reliable tracking for backend events:

code

// Node.js SDK with batching
import { PxlPeak } from '@pxlpeak/sdk';

const pxlpeak = new PxlPeak({
  apiKey: process.env.PXLPEAK_API_KEY,
  batch: {
    enabled: true,
    maxSize: 100,
    flushInterval: 5000
  },
  retry: {
    maxAttempts: 3,
    backoff: 'exponential'
  }
});

// Events are queued and sent in batches
pxlpeak.track('purchase', {
  transactionId: 'TXN-123',
  value: 99.99
});

// Ensure all events sent before process exit
process.on('SIGTERM', async () => {
  await pxlpeak.flush();
  process.exit(0);
});

Edge Network

Global Distribution

Events are collected through our global edge network:

Request Processing at Edge

code

┌─────────────────────────────────────────────────────────────────┐
│                    Edge Worker Processing                        │
├─────────────────────────────────────────────────────────────────┤
│  1. TLS Termination                                              │
│  2. Request Validation (API key, payload structure)              │
│  3. Rate Limiting (per API key, per IP)                          │
│  4. Geo-enrichment (country, region, city)                       │
│  5. Bot Detection (signature matching, behavior analysis)        │
│  6. Forward to nearest ingestion cluster                         │
└─────────────────────────────────────────────────────────────────┘

Edge Data Enrichment

Events are enriched at the edge before ingestion:

code

{
  "event": "page_view",
  "client_id": "cid_xxx",
  "properties": {
    "page_path": "/products"
  },
  "_enriched": {
    "geo": {
      "country": "US",
      "region": "California",
      "city": "San Francisco",
      "timezone": "America/Los_Angeles"
    },
    "device": {
      "type": "desktop",
      "browser": "Chrome",
      "browser_version": "120",
      "os": "macOS",
      "os_version": "14.2"
    },
    "edge": {
      "pop": "SFO",
      "received_at": "2026-01-12T14:30:00.123Z"
    }
  }
}

Ingestion Layer

Event Queue

Apache Kafka serves as the central event bus:

code

Event Flow:
                     ┌──────────────────┐
Ingestion API ──────▶│  Kafka Cluster   │
                     │  ┌────────────┐  │
                     │  │ Partition 0│  │──────▶ Real-time Consumer
                     │  ├────────────┤  │
                     │  │ Partition 1│  │──────▶ Batch Consumer
                     │  ├────────────┤  │
                     │  │ Partition 2│  │──────▶ Attribution Consumer
                     │  └────────────┘  │
                     └──────────────────┘

Configuration:
- Partitions: 256 per topic
- Replication: 3x
- Retention: 7 days
- Throughput: 1M events/sec per partition

Schema Validation

Events are validated against JSON Schema:

code

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "required": ["event", "client_id", "timestamp"],
  "properties": {
    "event": {
      "type": "string",
      "maxLength": 64,
      "pattern": "^[a-z][a-z0-9_]*$"
    },
    "client_id": {
      "type": "string",
      "pattern": "^cid_[a-zA-Z0-9]+$"
    },
    "timestamp": {
      "type": "string",
      "format": "date-time"
    },
    "properties": {
      "type": "object",
      "additionalProperties": true,
      "maxProperties": 100
    }
  }
}

Deduplication

Events are deduplicated using a combination of:

Event ID hash: SHA-256 of event content
Time window: 5-minute deduplication window
Bloom filter: Memory-efficient duplicate detection

code

// Deduplication logic
interface DeduplicationConfig {
  windowMs: 300000;         // 5 minutes
  bloomFilterSize: 10000000; // 10M bits
  hashFunctions: 7;
}

function shouldProcess(event: Event): boolean {
  const eventId = hash(event);

  // Check bloom filter first (fast path)
  if (bloomFilter.mightContain(eventId)) {
    // Verify in Redis (accurate check)
    if (await redis.exists(`dedup:${eventId}`)) {
      return false; // Duplicate
    }
  }

  // Add to bloom filter and Redis
  bloomFilter.add(eventId);
  await redis.setex(`dedup:${eventId}`, 300, '1');

  return true;
}

Processing Layer

Real-time Processing (Apache Flink)

Flink handles time-sensitive computations:

code

// Real-time session computation
DataStream<Event> events = env.addSource(kafkaConsumer);

DataStream<Session> sessions = events
    .keyBy(event -> event.getClientId())
    .window(EventTimeSessionWindows.withGap(Time.minutes(30)))
    .process(new SessionProcessFunction());

// Real-time funnel analysis
DataStream<FunnelStep> funnelProgress = events
    .keyBy(event -> event.getClientId())
    .process(new FunnelTrackingFunction(funnelConfig));

// Anomaly detection
DataStream<Anomaly> anomalies = events
    .keyBy(event -> event.getSiteId())
    .window(SlidingEventTimeWindows.of(Time.minutes(5), Time.minutes(1)))
    .process(new AnomalyDetectionFunction());

Real-time Capabilities:

Batch Processing (Apache Spark)

Spark handles heavy aggregations and historical analysis:

code

# Daily aggregation job
from pyspark.sql import SparkSession
from pyspark.sql import functions as F

spark = SparkSession.builder \
    .appName("DailyAggregation") \
    .config("spark.sql.adaptive.enabled", "true") \
    .getOrCreate()

# Read events from object storage
events = spark.read.parquet(f"s3://events/{date}/*")

# Compute daily metrics
daily_metrics = events \
    .groupBy("site_id", "source", "medium") \
    .agg(
        F.countDistinct("client_id").alias("users"),
        F.count("*").alias("events"),
        F.countDistinct("session_id").alias("sessions"),
        F.sum(F.when(F.col("event") == "purchase", F.col("value"))).alias("revenue")
    )

# Write to ClickHouse
daily_metrics.write \
    .format("clickhouse") \
    .option("table", "daily_metrics") \
    .mode("append") \
    .save()

Attribution Engine

Custom-built attribution processing:

code

// Attribution processor
class AttributionEngine {
  private models: Map<string, AttributionModel>;

  async processConversion(conversion: Conversion): Promise<Attribution[]> {
    // Get all touchpoints for this user
    const touchpoints = await this.getTouchpoints(
      conversion.profileId,
      conversion.timestamp,
      this.lookbackWindow
    );

    // Calculate attribution for each model
    const results: Attribution[] = [];

    for (const [modelName, model] of this.models) {
      const credits = model.calculate(touchpoints, conversion);

      for (const credit of credits) {
        results.push({
          conversionId: conversion.id,
          model: modelName,
          touchpointId: credit.touchpointId,
          source: credit.source,
          medium: credit.medium,
          campaign: credit.campaign,
          credit: credit.value,
          revenue: conversion.value * credit.value
        });
      }
    }

    return results;
  }
}

// Attribution models
const models = {
  last_click: new LastClickModel(),
  first_click: new FirstClickModel(),
  linear: new LinearModel(),
  time_decay: new TimeDecayModel({ halfLifeDays: 7 }),
  position_based: new PositionBasedModel({ first: 0.4, middle: 0.2, last: 0.4 }),
  data_driven: new DataDrivenModel({ minConversions: 1000 })
};

Storage Layer

ClickHouse (Analytics)

ClickHouse powers our analytics queries:

code

-- Table schema optimized for time-series analytics
CREATE TABLE events (
    site_id String,
    event_name String,
    client_id String,
    session_id String,
    timestamp DateTime64(3),
    date Date MATERIALIZED toDate(timestamp),

    -- Dimensions
    source String,
    medium String,
    campaign String,
    country String,
    device_type String,
    browser String,

    -- Properties (flexible schema)
    properties String, -- JSON

    -- Metrics
    value Float64,

    -- Indexes
    INDEX idx_source source TYPE bloom_filter GRANULARITY 4,
    INDEX idx_campaign campaign TYPE bloom_filter GRANULARITY 4
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(date)
ORDER BY (site_id, date, event_name, client_id)
TTL date + INTERVAL 2 YEAR;

-- Materialized view for real-time aggregation
CREATE MATERIALIZED VIEW events_hourly
ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(date)
ORDER BY (site_id, date, hour, source, medium)
AS SELECT
    site_id,
    date,
    toHour(timestamp) as hour,
    source,
    medium,
    count() as events,
    uniqExact(client_id) as users,
    uniqExact(session_id) as sessions,
    sum(value) as revenue
FROM events
GROUP BY site_id, date, hour, source, medium;

Query Performance:

PostgreSQL (Metadata)

PostgreSQL stores configuration and metadata:

code

-- Site configuration
CREATE TABLE sites (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    workspace_id UUID NOT NULL REFERENCES workspaces(id),
    name TEXT NOT NULL,
    domain TEXT NOT NULL,
    timezone TEXT DEFAULT 'UTC',
    currency TEXT DEFAULT 'USD',
    settings JSONB DEFAULT '{}',
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

-- Conversion goals
CREATE TABLE conversion_goals (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    site_id UUID NOT NULL REFERENCES sites(id),
    name TEXT NOT NULL,
    event_name TEXT NOT NULL,
    value_type TEXT CHECK (value_type IN ('fixed', 'dynamic')),
    fixed_value DECIMAL(10, 2),
    attribution_window_days INTEGER DEFAULT 30,
    counting_method TEXT DEFAULT 'every',
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Segments
CREATE TABLE segments (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    site_id UUID NOT NULL REFERENCES sites(id),
    name TEXT NOT NULL,
    description TEXT,
    rules JSONB NOT NULL,
    user_count INTEGER DEFAULT 0,
    last_computed TIMESTAMPTZ,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

Redis (Cache)

Redis provides caching and real-time features:

code

// Cache layers
const cacheConfig = {
  // Query cache (5 min TTL)
  queries: {
    prefix: 'cache:query:',
    ttl: 300,
    maxSize: 10000
  },

  // Session data (30 min TTL)
  sessions: {
    prefix: 'session:',
    ttl: 1800
  },

  // Rate limiting
  rateLimit: {
    prefix: 'rate:',
    window: 1
  },

  // Real-time counters
  counters: {
    prefix: 'counter:',
    ttl: 60
  }
};

// Real-time visitor count
async function getLiveVisitors(siteId: string): Promise<number> {
  const key = `live:${siteId}`;
  return await redis.pfcount(key);
}

async function recordVisitor(siteId: string, clientId: string) {
  const key = `live:${siteId}`;
  await redis.pfadd(key, clientId);
  await redis.expire(key, 60); // 1 minute window
}

Query Layer

REST API

RESTful API for all operations:

code

// API route handlers
app.get('/v1/sites/:siteId/analytics/traffic', async (req, res) => {
  const { siteId } = req.params;
  const { startDate, endDate, metrics, dimensions } = req.query;

  // Check cache
  const cacheKey = `query:${hash(req.query)}`;
  const cached = await redis.get(cacheKey);
  if (cached) {
    return res.json(JSON.parse(cached));
  }

  // Execute query
  const result = await clickhouse.query(`
    SELECT
      ${dimensions.join(', ')},
      ${metrics.map(m => getMetricSQL(m)).join(', ')}
    FROM events
    WHERE site_id = {siteId:String}
      AND date BETWEEN {startDate:Date} AND {endDate:Date}
    GROUP BY ${dimensions.join(', ')}
    ORDER BY ${getOrderBy(metrics[0])} DESC
    LIMIT 1000
  `, { siteId, startDate, endDate });

  // Cache result
  await redis.setex(cacheKey, 300, JSON.stringify(result));

  res.json(result);
});

Real-time Subscriptions

WebSocket subscriptions for live data:

code

// WebSocket handler
wss.on('connection', (ws, req) => {
  const { siteId, subscription } = parseAuth(req);

  // Subscribe to real-time events
  const subscriber = redis.duplicate();
  subscriber.subscribe(`events:${siteId}`);

  subscriber.on('message', (channel, message) => {
    const event = JSON.parse(message);

    if (matchesSubscription(event, subscription)) {
      ws.send(JSON.stringify({
        type: 'event',
        data: event
      }));
    }
  });

  // Live metrics updates
  const metricsInterval = setInterval(async () => {
    const metrics = await getLiveMetrics(siteId);
    ws.send(JSON.stringify({
      type: 'metrics',
      data: metrics
    }));
  }, 5000);

  ws.on('close', () => {
    subscriber.unsubscribe();
    clearInterval(metricsInterval);
  });
});

Security Architecture

Data Isolation

code

┌─────────────────────────────────────────────────────────────────────┐
│                    Multi-Tenant Architecture                        │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                │
│  │ Workspace A │  │ Workspace B │  │ Workspace C │                │
│  ├─────────────┤  ├─────────────┤  ├─────────────┤                │
│  │ Site 1      │  │ Site 3      │  │ Site 5      │                │
│  │ Site 2      │  │ Site 4      │  │             │                │
│  └─────────────┘  └─────────────┘  └─────────────┘                │
│         │               │               │                         │
│         └───────────────┼───────────────┘                         │
│                         │                                          │
│                    Row-Level Security                              │
│                         │                                          │
│  ┌──────────────────────┴────────────────────────┐                │
│  │              Shared Infrastructure             │                │
│  │  • Kafka (partitioned by workspace)           │                │
│  │  • ClickHouse (filtered by site_id)           │                │
│  │  • PostgreSQL (RLS policies)                  │                │
│  └───────────────────────────────────────────────┘                │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Encryption

| Layer | Method | |-------|--------| | Transit | TLS 1.3 | | At Rest | AES-256 | | API Keys | Argon2id hashing | | PII | Field-level encryption |

Compliance

GDPR: Data subject access/deletion requests
CCPA: Consumer data rights
SOC 2: Security controls audit
HIPAA: Available for healthcare (BAA required)

Scalability

Horizontal Scaling

code

Load Characteristics:
- Events/second: 100K - 10M
- Queries/second: 1K - 100K
- Storage growth: ~1TB/day at scale

Scaling Strategy:
┌─────────────────────────────────────────────────────────────────┐
│  Layer          │  Scaling Method         │  Current Capacity  │
├─────────────────┼─────────────────────────┼───────────────────-┤
│  Edge           │  Auto-scale CDN         │  Unlimited         │
│  Ingestion      │  Kafka partitions       │  256 → 1024        │
│  Processing     │  Flink task slots       │  100 → 1000        │
│  ClickHouse     │  Sharding + replicas    │  3 → 100 nodes     │
│  PostgreSQL     │  Read replicas          │  1 → 10 replicas   │
│  Redis          │  Cluster mode           │  6 → 60 nodes      │
└─────────────────────────────────────────────────────────────────┘

Disaster Recovery

RPO (Recovery Point Objective): < 1 minute
RTO (Recovery Time Objective): < 15 minutes
Multi-region replication: Active-passive with automatic failover
Backup frequency: Continuous (streaming) + daily snapshots

Next: See Workspaces to learn about organizing your projects and teams.