This document shows how the pieces fit together. The gateway handles auth and routing, individual services handle forecasting and thesis evaluation, and a shared schema package keeps everything speaking the same language. Each piece can be developed and tested on its own, which makes the whole thing easier to work with.
If you want to contribute or just poke around to see how it works, start here.
Three main areas: apps for frontend stuff, services for backend logic, and packages for shared code. Infra config lives in its own folder.
repo/ ├── apps/ │ ├── web/ # Next.js frontend │ └── api-gateway/ # FastAPI with auth, routing, OpenAPI ├── services/ │ ├── data_state/ # Data ingestion and feature computation │ ├── forecast/ # Deep learning and tree ensemble inference │ ├── calibration/ # Temperature scaling, isotonic, conformal │ ├── narrative/ # LLM parsing and claim templates │ ├── explain_redteam/ # SHAP, integrated gradients, analog search │ ├── metrics/ # Calibration dashboards and evaluation │ └── backtest/ # Time machine and walk forward replay ├── packages/ │ ├── schemas/ # Shared Pydantic models │ ├── client-js/ # Generated JavaScript SDK │ └── client-py/ # Generated Python SDK └── infra/ ├── docker/ # Container definitions ├── k8s/ # Kubernetes manifests └── terraform/ # Infrastructure as code
All requests go through the gateway. It handles auth, rate limiting, and sends requests to the right service.
The main endpoint. Send a list of tickers, pick your horizons, say what events you care about. You get back quantile distributions, event probabilities, and regime info. The thermostat controls how picky the system is: crank it up for fewer but more confident calls.
"abstain": true with a reason like "low_signal_to_noise_in_current_regime". Silence beats noise.
See exactly what the model saw when it made a call. Returns the feature vector for a ticker at a given time, plus provenance info that proves no future data snuck in.
Got a thesis? This tells you if the data backs it up. Pass a claim type like "momentum_overheated" and get back the relevant metrics, where they sit historically, and what would flip the verdict.
The thesis translator takes your investment idea in plain English and turns it into structured claims we can test. An LLM does the parsing, but evidence scoring is fully deterministic. The LLM formats and categorizes. It never makes up data.
Claim Taxonomy
Ten claim types to start. Each one maps to specific metrics and scoring logic.
Paste a headline or thesis. The system pulls out entities and maps statements to claim types. Output is validated against a strict schema. Bad claim type or missing entities? Rejected.
Once you have structured claims, this scores them. You get support level, the metrics behind it, what would change the call, and red team analysis on how the thesis could blow up.
Want to track a thesis over time? This saves it and runs periodic checks. The system keeps snapshots of evidence state and pings you when your triggers get hit.
For any forecast, this tells you what drove it and what could go wrong. Not about justifying the call. About being transparent.
Searches for historical setups that looked like this one and shows you how they played out. It specifically hunts for times the obvious trade failed.
When a forecast request hits, here is how data flows through:
Inference Flow
data_state service for feature vectors with correct as_of timestamp
forecast service: deep model (TFT or TCN) produces distribution, LightGBM produces events, stacker combines both
calibration service: temperature scaling and isotonic calibration per horizon and regime, then conformal interval wrapper
Model Artifacts
Trained models live in object storage. Each artifact has a version and URI so you can roll back or A/B test easily.
deep_model.onnx
The main deep learning model exported to ONNX format for fast inference. Can also be TorchScript if needed.
lgbm.txt
LightGBM booster in text format. Handles tabular features and produces complementary predictions.
stacker.pkl
Meta learner that combines deep and tree model outputs. Trained on held-out validation data.
calibration_params.json
Temperature scaling factors and isotonic regression mappings, organized by horizon and regime.