Infrastructure Overview
Nex-T1 is designed for production deployment with enterprise-grade infrastructure components, comprehensive monitoring, and flexible orchestration options.Architecture Components
FastAPI Backend
High-performance API with LangGraph orchestration, JWT auth, and SSE streaming
PostgreSQL + pgvector
Persistent storage with vector similarity search capabilities
Observability Stack
Prometheus metrics + Grafana dashboards for real-time monitoring
Container Orchestration
Docker Compose for single-host, Kubernetes for clustered deployments
System Architecture
Docker Compose Deployments
Local Development
Thedocker-compose.yml configuration provides a complete development environment:
| Service | Image | Purpose | Ports |
|---|---|---|---|
| db | pgvector/pgvector:pg16 | PostgreSQL with vector extensions | 5432 |
| app | Custom build from ./app | FastAPI application | 8000 |
| prometheus | prom/prometheus:latest | Metrics collection | 9090 |
| grafana | grafana/grafana:latest | Metrics visualization | 3000 |
| cadvisor | gcr.io/cadvisor/cadvisor:latest | Container metrics | 8080 |
- Hot-reload enabled with volume mounts (
./app→/app) - Reads
.env.developmentfor configuration - Health checks ensure service readiness
- Shared
monitoringnetwork for service discovery - Persistent volumes:
postgres-data,grafana-storage
Production Deployment
Thedocker-compose.prod.yml configuration includes production hardening:
- Resource limits and reservations
- Enhanced health checks with retry logic
- SSL/TLS termination via Nginx reverse proxy
- Log rotation with size limits
- Custom network subnet (
langgraph-network) - Separated data volumes (
postgres-data-prod,prometheus-data) - Database initialization scripts
Kubernetes Deployment
For multi-node clusters and auto-scaling, use the Kubernetes manifests inkubernetes/deployment.yaml.
Deployment Configuration
Service & Ingress
Horizontal Pod Autoscaler
- Auto-scaling: CPU/memory-based HPA with 3-20 replica range
- Rolling updates: Zero-downtime deployments with surge control
- Health probes: Startup, liveness, and readiness checks
- Security: Non-root user, capability drops, read-only configs
- Observability: Prometheus scrape annotations on pods
- Resource management: Requests and limits for CPU/memory
- Namespace:
nex-t1or custom - Secrets: Database credentials, API keys
- ConfigMaps: Application configuration
- Ingress Controller: Nginx, Traefik, or cloud provider
- Persistent Volume Claims: For PostgreSQL storage
Metrics & Observability
Prometheus Metrics
Nex-T1 exposes comprehensive metrics at/metrics via the starlette_prometheus middleware.
Configuration (app/core/metrics.py):
| Metric | Type | Description |
|---|---|---|
http_requests_total | Counter | Total HTTP requests by method, path, status |
http_request_duration_seconds | Histogram | Request latency distribution |
llm_inference_duration_seconds | Histogram | LLM API call duration |
llm_inference_tokens_total | Counter | Total tokens consumed by model |
mcp_tool_invocations_total | Counter | MCP tool usage by server and tool name |
coordinator_execution_duration_seconds | Histogram | LangGraph coordinator execution time |
active_sse_connections | Gauge | Current streaming connections |
database_query_duration_seconds | Histogram | PostgreSQL query latency |
prometheus/prometheus.yml):
Grafana Dashboards
Access Grafana athttp://localhost:3000 (default: admin/admin).
Pre-configured Dashboards:
- API Overview: Request rates, latency percentiles, error rates
- LLM Performance: Token usage, inference time, model breakdown
- Multi-Agent System: Route distribution, coordinator timings, tool usage
- Infrastructure: CPU, memory, disk I/O, network throughput
- Database: Query performance, connection pool, cache hit rates
Structured Logging
Nex-T1 usesstructlog for JSON-formatted logs optimized for log aggregation systems.
Configuration (app/core/logging.py):
- File:
./logs/app.log(JSONL format, rotated daily) - Console: Pretty-printed in development, JSON in production
- Environment-aware: Configured via
APP_ENVsetting
Security & Rate Limiting
Authentication
All protected endpoints require JWT bearer authentication:- User registers/logs in → receives JWT access token
- Token included in
Authorizationheader for subsequent requests - Token validated on each request via
JWTBearerdependency - Expired tokens rejected with 401 status
app/core/security.py):
CORS Configuration
Cross-Origin Resource Sharing controlled viasettings.ALLOWED_ORIGINS:
Rate Limiting
Implemented viaslowapi with per-endpoint and default limits.
Configuration (app/core/limiter.py):
API Documentation
Nex-T1 provides multiple interactive documentation interfaces:| Interface | URL | Description |
|---|---|---|
| Swagger UI | /docs | OpenAPI interactive explorer with “Try it out” |
| Redoc | /redoc | Clean, responsive API reference |
| Scalar | /reference | Modern API documentation with examples |
| OpenAPI JSON | /api/v1/openapi.json | Machine-readable OpenAPI 3.1 spec |
/scalar.config.json):
Background Schedulers
Market Overview Scheduler
Optional automated market snapshots for research agents. Configuration:- Morning: 08:00 America/New_York
- Evening: 20:00 America/New_York
- Aggregates market data from MCP servers (DeFiLlama, Binance, etc.)
- Stores snapshot in database for fast retrieval
- Used by research agent for market analysis queries
Task Scheduler
Background task execution framework for async operations. Features:- Job queuing and prioritization
- Retry logic with exponential backoff
- Task status tracking and history
- Started automatically on app startup
MCP (Model Context Protocol) Integration
Nex-T1 optionally mounts MCP servers for extended capabilities. Configuration (app/main.py):
| Server | Purpose | Endpoints |
|---|---|---|
| DeFiLlama | Protocol TVL, yields, market data | /api/v1/multi-agent/defillama/* |
| Binance | Exchange data, order books, tickers | /api/v1/multi-agent/binance/* |
| Bitcoin | Blockchain queries, transaction data | /api/v1/multi-agent/bitcoin/* |
| Exa | Web search and content extraction | /api/v1/multi-agent/exa/* |
docs/EXA_MCP.md for API key setup and configuration.
Tool Invocation:
Environment Variables
Complete reference for infrastructure configuration:Application
Database
Authentication
LLM Providers
Security
Observability
Schedulers
MCP Servers
Deployment Checklist
1
Pre-deployment
- Update
.env.productionwith secure secrets - Configure database backup strategy
- Set up SSL certificates (Let’s Encrypt or cloud provider)
- Configure DNS records for domain
- Review rate limits and resource quotas
2
Infrastructure Setup
- Provision compute resources (VMs, Kubernetes cluster)
- Deploy PostgreSQL with pgvector extension
- Set up Prometheus + Grafana monitoring
- Configure reverse proxy (Nginx/Traefik)
- Set up log aggregation (ELK, Loki, cloud logging)
3
Application Deployment
- Build Docker image:
docker build -t nexis/nex-t1:latest . - Push to registry:
docker push nexis/nex-t1:latest - Deploy with Compose or Kubernetes manifests
- Run database migrations
- Verify health checks:
curl https://api.nex-t1.ai/health
4
Post-deployment Verification
- Test authentication flow (register/login)
- Test chat endpoints (non-streaming and SSE)
- Test multi-agent routes (quote, execute, risk)
- Verify MCP tool integrations
- Check metrics in Grafana dashboards
- Review logs for errors
5
Monitoring & Maintenance
- Set up alerting rules (Prometheus Alertmanager)
- Configure backup automation (database, logs)
- Document incident response procedures
- Schedule regular security audits
- Plan capacity scaling based on load
Troubleshooting
Common Issues
Database connection failures
Database connection failures
Symptoms:
- API returns 500 errors on all requests
- Logs show
sqlalchemy.exc.OperationalError
- Verify database is running:
docker ps | grep db - Check connection string in
.env - Ensure pgvector extension installed:
CREATE EXTENSION IF NOT EXISTS vector; - Test connection:
psql -h localhost -U nexis -d nexis
High memory usage
High memory usage
Symptoms:
- API container using >8GB RAM
- OOMKilled events in Kubernetes
- Review connection pool settings (
DB_POOL_SIZE,DB_MAX_OVERFLOW) - Check for memory leaks in LLM client connections
- Reduce concurrent requests with rate limiting
- Increase container memory limits
- Enable request streaming to reduce buffering
Slow LLM responses
Slow LLM responses
Symptoms:
- Chat requests timeout after 30s
llm_inference_duration_secondsmetric very high
- Check LLM provider status and quotas
- Verify API keys are valid and not rate-limited
- Use streaming endpoints for faster perceived performance
- Implement response caching for common queries
- Consider using smaller/faster models for simple tasks
Metrics not appearing in Grafana
Metrics not appearing in Grafana
Symptoms:
- Empty dashboards in Grafana
- Prometheus targets show “DOWN”
- Verify
/metricsendpoint accessible:curl http://localhost:8000/metrics - Check Prometheus targets:
http://localhost:9090/targets - Review
prometheus/prometheus.ymlconfiguration - Ensure services on same Docker network
- Check firewall rules for port 8000
Performance Tuning
Database Optimization
Application Tuning
Kubernetes Resource Optimization
Support & Resources
Docker Compose Files
View
docker-compose.yml and docker-compose.prod.ymlKubernetes Manifests
Deployment, service, HPA, and ingress configs
Grafana Dashboards
Pre-built dashboard JSON for import
For production support, security inquiries, or custom deployment assistance, contact [email protected].