Skip to main content

Infrastructure Overview

Nex-T1 is designed for production deployment with enterprise-grade infrastructure components, comprehensive monitoring, and flexible orchestration options. Nex-T1 Infrastructure Architecture

Architecture Components

FastAPI Backend

High-performance API with LangGraph orchestration, JWT auth, and SSE streaming

PostgreSQL + pgvector

Persistent storage with vector similarity search capabilities

Observability Stack

Prometheus metrics + Grafana dashboards for real-time monitoring

Container Orchestration

Docker Compose for single-host, Kubernetes for clustered deployments

System Architecture


Docker Compose Deployments

Local Development

The docker-compose.yml configuration provides a complete development environment:
ServiceImagePurposePorts
dbpgvector/pgvector:pg16PostgreSQL with vector extensions5432
appCustom build from ./appFastAPI application8000
prometheusprom/prometheus:latestMetrics collection9090
grafanagrafana/grafana:latestMetrics visualization3000
cadvisorgcr.io/cadvisor/cadvisor:latestContainer metrics8080
Quick Start:
# Start all services
docker-compose up -d

# View logs
docker-compose logs -f app

# Check health
curl http://localhost:8000/health

# Access Grafana
open http://localhost:3000
Development Features:
  • Hot-reload enabled with volume mounts (./app/app)
  • Reads .env.development for configuration
  • Health checks ensure service readiness
  • Shared monitoring network for service discovery
  • Persistent volumes: postgres-data, grafana-storage

Production Deployment

The docker-compose.prod.yml configuration includes production hardening:
# docker-compose.prod.yml highlights
services:
  db:
    environment:
      - POSTGRES_PASSWORD=${DB_PASSWORD}
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 4G
        reservations:
          cpus: '1'
          memory: 2G
    volumes:
      - postgres-data-prod:/var/lib/postgresql/data
      - ./scripts/init-db.sql:/docker-entrypoint-initdb.d/init.sql
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${DB_USER}"]
      interval: 10s
      timeout: 5s
      retries: 5

  app:
    environment:
      - APP_ENV=production
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 8G
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"

  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
      - ./nginx/ssl:/etc/nginx/ssl:ro
    depends_on:
      - app
Production Features:
  • Resource limits and reservations
  • Enhanced health checks with retry logic
  • SSL/TLS termination via Nginx reverse proxy
  • Log rotation with size limits
  • Custom network subnet (langgraph-network)
  • Separated data volumes (postgres-data-prod, prometheus-data)
  • Database initialization scripts
Deployment Commands:
# Deploy production stack
docker-compose -f docker-compose.prod.yml up -d

# Update application only
docker-compose -f docker-compose.prod.yml up -d --no-deps app

# View resource usage
docker stats

# Backup database
docker exec -t db pg_dump -U ${DB_USER} nexis > backup.sql

Kubernetes Deployment

For multi-node clusters and auto-scaling, use the Kubernetes manifests in kubernetes/deployment.yaml.

Deployment Configuration

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nex-t1-api
  labels:
    app: nex-t1
    component: api
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 1
  template:
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 1000
      containers:
      - name: api
        image: nexis/nex-t1:latest
        ports:
        - containerPort: 8000
          name: http
        resources:
          requests:
            cpu: "500m"
            memory: "1Gi"
          limits:
            cpu: "2000m"
            memory: "4Gi"
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 10
          periodSeconds: 5
        startupProbe:
          httpGet:
            path: /health
            port: 8000
          failureThreshold: 30
          periodSeconds: 10

Service & Ingress

apiVersion: v1
kind: Service
metadata:
  name: nex-t1-api
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8000"
    prometheus.io/path: "/metrics"
spec:
  type: ClusterIP
  ports:
  - port: 80
    targetPort: 8000
    protocol: TCP
  selector:
    app: nex-t1
    component: api

Horizontal Pod Autoscaler

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nex-t1-api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nex-t1-api
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 15
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 4
        periodSeconds: 15
      selectPolicy: Max
Kubernetes Features:
  • Auto-scaling: CPU/memory-based HPA with 3-20 replica range
  • Rolling updates: Zero-downtime deployments with surge control
  • Health probes: Startup, liveness, and readiness checks
  • Security: Non-root user, capability drops, read-only configs
  • Observability: Prometheus scrape annotations on pods
  • Resource management: Requests and limits for CPU/memory
Prerequisites:
  • Namespace: nex-t1 or custom
  • Secrets: Database credentials, API keys
  • ConfigMaps: Application configuration
  • Ingress Controller: Nginx, Traefik, or cloud provider
  • Persistent Volume Claims: For PostgreSQL storage
Deployment Commands:
# Create namespace
kubectl create namespace nex-t1

# Apply secrets (base64 encoded)
kubectl apply -f kubernetes/secrets.yaml -n nex-t1

# Deploy application
kubectl apply -f kubernetes/deployment.yaml -n nex-t1

# Check rollout status
kubectl rollout status deployment/nex-t1-api -n nex-t1

# Scale manually
kubectl scale deployment nex-t1-api --replicas=5 -n nex-t1

# View logs
kubectl logs -f deployment/nex-t1-api -n nex-t1

# Port forward for testing
kubectl port-forward svc/nex-t1-api 8000:80 -n nex-t1

Metrics & Observability

Prometheus Metrics

Nex-T1 exposes comprehensive metrics at /metrics via the starlette_prometheus middleware. Configuration (app/core/metrics.py):
from starlette_prometheus import metrics, PrometheusMiddleware

app.add_middleware(PrometheusMiddleware)
app.add_route("/metrics", metrics)
Custom Metrics Tracked:
MetricTypeDescription
http_requests_totalCounterTotal HTTP requests by method, path, status
http_request_duration_secondsHistogramRequest latency distribution
llm_inference_duration_secondsHistogramLLM API call duration
llm_inference_tokens_totalCounterTotal tokens consumed by model
mcp_tool_invocations_totalCounterMCP tool usage by server and tool name
coordinator_execution_duration_secondsHistogramLangGraph coordinator execution time
active_sse_connectionsGaugeCurrent streaming connections
database_query_duration_secondsHistogramPostgreSQL query latency
Prometheus Configuration (prometheus/prometheus.yml):
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'nex-t1-api'
    static_configs:
      - targets: ['app:8000']
    metrics_path: '/metrics'

  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']

Grafana Dashboards

Access Grafana at http://localhost:3000 (default: admin/admin). Pre-configured Dashboards:
  1. API Overview: Request rates, latency percentiles, error rates
  2. LLM Performance: Token usage, inference time, model breakdown
  3. Multi-Agent System: Route distribution, coordinator timings, tool usage
  4. Infrastructure: CPU, memory, disk I/O, network throughput
  5. Database: Query performance, connection pool, cache hit rates
Sample PromQL Queries:
# Request rate by endpoint
rate(http_requests_total[5m])

# 95th percentile latency
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

# Error rate
rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m])

# LLM token usage by model
sum by (model) (rate(llm_inference_tokens_total[1h]))

Structured Logging

Nex-T1 uses structlog for JSON-formatted logs optimized for log aggregation systems. Configuration (app/core/logging.py):
import structlog

structlog.configure(
    processors=[
        structlog.stdlib.filter_by_level,
        structlog.stdlib.add_logger_name,
        structlog.stdlib.add_log_level,
        structlog.stdlib.PositionalArgumentsFormatter(),
        structlog.processors.TimeStamper(fmt="iso"),
        structlog.processors.StackInfoRenderer(),
        structlog.processors.format_exc_info,
        structlog.processors.UnicodeDecoder(),
        structlog.processors.JSONRenderer()
    ],
    context_class=dict,
    logger_factory=structlog.stdlib.LoggerFactory(),
    cache_logger_on_first_use=True,
)
Log Output:
  • File: ./logs/app.log (JSONL format, rotated daily)
  • Console: Pretty-printed in development, JSON in production
  • Environment-aware: Configured via APP_ENV setting
Example Log Entry:
{
  "event": "chat_request_completed",
  "level": "info",
  "logger": "app.api.chatbot",
  "timestamp": "2025-01-15T10:30:45.123456Z",
  "user_id": "usr_abc123",
  "session_id": "ses_xyz789",
  "message_count": 5,
  "duration_ms": 1250,
  "llm_provider": "anthropic",
  "tokens_used": 850
}

Security & Rate Limiting

Authentication

All protected endpoints require JWT bearer authentication:
Authorization: Bearer <jwt_token>
Token Flow:
  1. User registers/logs in → receives JWT access token
  2. Token included in Authorization header for subsequent requests
  3. Token validated on each request via JWTBearer dependency
  4. Expired tokens rejected with 401 status
Configuration (app/core/security.py):
from datetime import timedelta

JWT_SECRET_KEY = settings.JWT_SECRET_KEY
JWT_ALGORITHM = "HS256"
JWT_ACCESS_TOKEN_EXPIRE_MINUTES = 60 * 24  # 24 hours

CORS Configuration

Cross-Origin Resource Sharing controlled via settings.ALLOWED_ORIGINS:
from fastapi.middleware.cors import CORSMiddleware

app.add_middleware(
    CORSMiddleware,
    allow_origins=settings.ALLOWED_ORIGINS,  # ["https://nex-t1.ai"]
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

Rate Limiting

Implemented via slowapi with per-endpoint and default limits. Configuration (app/core/limiter.py):
from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.util import get_remote_address

limiter = Limiter(key_func=get_remote_address)

# Default limits
RATE_LIMIT_DEFAULT = "100/minute"

# Per-endpoint limits
RATE_LIMIT_ENDPOINTS = {
    "/api/v1/chatbot/chat": "20/minute",
    "/api/v1/multi-agent/execute": "10/minute",
    "/api/v1/auth/register": "5/hour",
}
Example Usage:
from app.core.limiter import limiter

@router.post("/chat")
@limiter.limit("20/minute")
async def chat_endpoint(request: Request, ...):
    # Endpoint logic
    pass
Rate Limit Headers:
X-RateLimit-Limit: 20
X-RateLimit-Remaining: 15
X-RateLimit-Reset: 1642262400

API Documentation

Nex-T1 provides multiple interactive documentation interfaces:
InterfaceURLDescription
Swagger UI/docsOpenAPI interactive explorer with “Try it out”
Redoc/redocClean, responsive API reference
Scalar/referenceModern API documentation with examples
OpenAPI JSON/api/v1/openapi.jsonMachine-readable OpenAPI 3.1 spec
Convenience Redirect:
/openapi.json → /api/v1/openapi.json
Scalar Configuration (/scalar.config.json):
{
  "theme": "purple",
  "layout": "modern",
  "showSidebar": true,
  "withDefaultFonts": true
}

Background Schedulers

Market Overview Scheduler

Optional automated market snapshots for research agents. Configuration:
ENABLE_MARKET_OVERVIEW_SCHEDULER = True  # Set in .env
Schedule:
  • Morning: 08:00 America/New_York
  • Evening: 20:00 America/New_York
Function:
  • Aggregates market data from MCP servers (DeFiLlama, Binance, etc.)
  • Stores snapshot in database for fast retrieval
  • Used by research agent for market analysis queries

Task Scheduler

Background task execution framework for async operations. Features:
  • Job queuing and prioritization
  • Retry logic with exponential backoff
  • Task status tracking and history
  • Started automatically on app startup
Example Task:
from app.services.scheduler import task_scheduler

@task_scheduler.register
async def process_pending_transactions():
    # Background task logic
    pass

MCP (Model Context Protocol) Integration

Nex-T1 optionally mounts MCP servers for extended capabilities. Configuration (app/main.py):
ENABLE_MCP_SERVERS = True  # Set in .env

if settings.ENABLE_MCP_SERVERS:
    await mount_mcp_servers(app)
Available MCP Servers:
ServerPurposeEndpoints
DeFiLlamaProtocol TVL, yields, market data/api/v1/multi-agent/defillama/*
BinanceExchange data, order books, tickers/api/v1/multi-agent/binance/*
BitcoinBlockchain queries, transaction data/api/v1/multi-agent/bitcoin/*
ExaWeb search and content extraction/api/v1/multi-agent/exa/*
Exa MCP Details: See docs/EXA_MCP.md for API key setup and configuration. Tool Invocation:
# List available tools
curl https://api.nex-t1.ai/api/v1/multi-agent/defillama/tools

# Invoke tool
curl -X POST https://api.nex-t1.ai/api/v1/multi-agent/defillama/invoke \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "tool_name": "get_protocol_tvl",
    "arguments": {"protocol": "uniswap"}
  }'

Environment Variables

Complete reference for infrastructure configuration:

Application

APP_ENV=production                    # Environment: development|production
APP_NAME="Nex-T1"                     # Application name
APP_VERSION="1.0.0"                   # Version string
DEBUG=false                           # Debug mode
LOG_LEVEL=info                        # Logging level

Database

DB_HOST=localhost                     # PostgreSQL host
DB_PORT=5432                          # PostgreSQL port
DB_USER=nexis                         # Database user
DB_PASSWORD=<secure-password>         # Database password
DB_NAME=nexis                         # Database name
DB_POOL_SIZE=20                       # Connection pool size
DB_MAX_OVERFLOW=40                    # Max overflow connections

Authentication

JWT_SECRET_KEY=<secret-key>           # JWT signing key (CRITICAL)
JWT_ACCESS_TOKEN_EXPIRE_MINUTES=1440  # Token expiry (24 hours)
SESSION_SECRET_KEY=<secret-key>       # Session encryption key

LLM Providers

ANTHROPIC_API_KEY=<api-key>           # Anthropic Claude
OPENAI_API_KEY=<api-key>              # OpenAI GPT models
DEFAULT_LLM_PROVIDER=anthropic        # Default provider
DEFAULT_MODEL=claude-3-5-sonnet-20241022  # Default model

Security

ALLOWED_ORIGINS=["https://nex-t1.ai"] # CORS origins
RATE_LIMIT_DEFAULT=100/minute         # Default rate limit

Observability

ENABLE_METRICS=true                   # Prometheus metrics
ENABLE_TRACING=false                  # Distributed tracing
SENTRY_DSN=<dsn>                      # Error tracking

Schedulers

ENABLE_MARKET_OVERVIEW_SCHEDULER=true # Market snapshot scheduler
ENABLE_TASK_SCHEDULER=true            # Background task scheduler

MCP Servers

ENABLE_MCP_SERVERS=true               # Mount MCP servers
EXA_API_KEY=<api-key>                 # Exa search API key

Deployment Checklist

1

Pre-deployment

  • Update .env.production with secure secrets
  • Configure database backup strategy
  • Set up SSL certificates (Let’s Encrypt or cloud provider)
  • Configure DNS records for domain
  • Review rate limits and resource quotas
2

Infrastructure Setup

  • Provision compute resources (VMs, Kubernetes cluster)
  • Deploy PostgreSQL with pgvector extension
  • Set up Prometheus + Grafana monitoring
  • Configure reverse proxy (Nginx/Traefik)
  • Set up log aggregation (ELK, Loki, cloud logging)
3

Application Deployment

  • Build Docker image: docker build -t nexis/nex-t1:latest .
  • Push to registry: docker push nexis/nex-t1:latest
  • Deploy with Compose or Kubernetes manifests
  • Run database migrations
  • Verify health checks: curl https://api.nex-t1.ai/health
4

Post-deployment Verification

  • Test authentication flow (register/login)
  • Test chat endpoints (non-streaming and SSE)
  • Test multi-agent routes (quote, execute, risk)
  • Verify MCP tool integrations
  • Check metrics in Grafana dashboards
  • Review logs for errors
5

Monitoring & Maintenance

  • Set up alerting rules (Prometheus Alertmanager)
  • Configure backup automation (database, logs)
  • Document incident response procedures
  • Schedule regular security audits
  • Plan capacity scaling based on load

Troubleshooting

Common Issues

Symptoms:
  • API returns 500 errors on all requests
  • Logs show sqlalchemy.exc.OperationalError
Solutions:
  1. Verify database is running: docker ps | grep db
  2. Check connection string in .env
  3. Ensure pgvector extension installed: CREATE EXTENSION IF NOT EXISTS vector;
  4. Test connection: psql -h localhost -U nexis -d nexis
Symptoms:
  • API container using >8GB RAM
  • OOMKilled events in Kubernetes
Solutions:
  1. Review connection pool settings (DB_POOL_SIZE, DB_MAX_OVERFLOW)
  2. Check for memory leaks in LLM client connections
  3. Reduce concurrent requests with rate limiting
  4. Increase container memory limits
  5. Enable request streaming to reduce buffering
Symptoms:
  • Chat requests timeout after 30s
  • llm_inference_duration_seconds metric very high
Solutions:
  1. Check LLM provider status and quotas
  2. Verify API keys are valid and not rate-limited
  3. Use streaming endpoints for faster perceived performance
  4. Implement response caching for common queries
  5. Consider using smaller/faster models for simple tasks
Symptoms:
  • Empty dashboards in Grafana
  • Prometheus targets show “DOWN”
Solutions:
  1. Verify /metrics endpoint accessible: curl http://localhost:8000/metrics
  2. Check Prometheus targets: http://localhost:9090/targets
  3. Review prometheus/prometheus.yml configuration
  4. Ensure services on same Docker network
  5. Check firewall rules for port 8000

Performance Tuning

Database Optimization

-- Create indexes for frequent queries
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_sessions_user_id ON sessions(user_id);
CREATE INDEX idx_chat_history_session ON chat_history(session_id, created_at DESC);

-- Enable query statistics
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Analyze slow queries
SELECT query, mean_exec_time, calls
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

Application Tuning

# Increase worker processes
uvicorn app.main:app --workers 4 --host 0.0.0.0 --port 8000

# Tune connection pool
DATABASE_URL = "postgresql://user:pass@host/db?pool_size=20&max_overflow=40"

# Enable HTTP/2
uvicorn app.main:app --http h11 --interface asgi3

Kubernetes Resource Optimization

# Right-size resource requests
resources:
  requests:
    cpu: "500m"      # 0.5 CPU cores
    memory: "1Gi"    # 1 GB RAM
  limits:
    cpu: "2000m"     # 2 CPU cores
    memory: "4Gi"    # 4 GB RAM

# Enable horizontal scaling
hpa:
  minReplicas: 3
  maxReplicas: 20
  targetCPUUtilization: 70
  targetMemoryUtilization: 80

Support & Resources

For production support, security inquiries, or custom deployment assistance, contact [email protected].