Probabull AI Implementation Plan | Yash Nishit Kapadia

← Back to Probabull AI

This document shows how the pieces fit together. The gateway handles auth and routing, individual services handle forecasting and thesis evaluation, and a shared schema package keeps everything speaking the same language. Each piece can be developed and tested on its own, which makes the whole thing easier to work with.

If you want to contribute or just poke around to see how it works, start here.

Enough reading. The API skeleton is live. View API Skeleton →

Repository Structure

Three main areas: apps for frontend stuff, services for backend logic, and packages for shared code. Infra config lives in its own folder.

repo/
├── apps/
│   ├── web/                      # Next.js frontend
│   └── api-gateway/              # FastAPI with auth, routing, OpenAPI
├── services/
│   ├── data_state/               # Data ingestion and feature computation
│   ├── forecast/                 # Deep learning and tree ensemble inference
│   ├── calibration/              # Temperature scaling, isotonic, conformal
│   ├── narrative/                # LLM parsing and claim templates
│   ├── explain_redteam/          # SHAP, integrated gradients, analog search
│   ├── metrics/                  # Calibration dashboards and evaluation
│   └── backtest/                 # Time machine and walk forward replay
├── packages/
│   ├── schemas/                  # Shared Pydantic models
│   ├── client-js/                # Generated JavaScript SDK
│   └── client-py/                # Generated Python SDK
└── infra/
    ├── docker/                   # Container definitions
    ├── k8s/                      # Kubernetes manifests
    └── terraform/                # Infrastructure as code

API Gateway Routes

All requests go through the gateway. It handles auth, rate limiting, and sends requests to the right service.

/v1/forecast

/v1/state

/v1/evidence

/v1/thesis/*

/v1/explain

/v1/redteam

/v1/metrics/*

/v1/backtest/*

Core Endpoints

POST /v1/forecast Get probability distributions and event forecasts ▼

The main endpoint. Send a list of tickers, pick your horizons, say what events you care about. You get back quantile distributions, event probabilities, and regime info. The thermostat controls how picky the system is: crank it up for fewer but more confident calls.

          Request
{
  "tickers": ["AAPL", "MSFT"],
  "as_of": "2026-01-24T16:00:00Z",
  "horizons": ["1d", "5d", "20d"],
  "thermostat": 0.75,
  "outputs": {
    "quantiles": [0.1, 0.5, 0.9],
    "events": [
      {"type": "return_gt", "value": 0.0},
      {"type": "drawdown_gt", "value": 0.05},
      {"type": "vol_spike_gt", "value": 0.3}
    ]
  }
}
        

          Response
{
  "as_of": "2026-01-24T16:00:00Z",
  "feature_set_version": "fs_v3",
  "model_version": "moe_tft_lgbm_v12",
  "regime": {
    "label": "chop",
    "probs": {"trend": 0.12, "chop": 0.62, "stress": 0.22, "rate_shock": 0.04}
  },
  "results": [
    {
      "ticker": "AAPL",
      "by_horizon": {
        "1d": {
          "return_quantiles": {"0.1": -0.012, "0.5": 0.002, "0.9": 0.014},
          "event_probs": {
            "return_gt_0": 0.54,
            "drawdown_gt_0.05": 0.06,
            "vol_spike_gt_0.3": 0.18
          },
          "abstain": false,
          "confidence": 0.71
        }
      }
    }
  ]
}
        

Thermostat behavior: High thermostat means higher bar for making a call. If the signal is too noisy, you get "abstain": true with a reason like "low_signal_to_noise_in_current_regime". Silence beats noise.

GET /v1/state Retrieve computed feature vectors with provenance ▼

See exactly what the model saw when it made a call. Returns the feature vector for a ticker at a given time, plus provenance info that proves no future data snuck in.

          Response
{
  "ticker": "AAPL",
  "as_of": "2026-01-24T16:00:00Z",
  "feature_set_version": "fs_v3",
  "features": {
    "ret_1d": 0.003,
    "vol_20d": 0.21,
    "ma_dist_50d": 0.04,
    "breadth_proxy": 0.58
  },
  "provenance": {
    "no_leakage": true,
    "inputs": [
      {"source": "ohlcv", "available_time_max": "2026-01-24T16:00:00Z"},
      {"source": "fundamentals", "available_time_max": "2026-01-10T00:00:00Z"}
    ]
  }
}
        

GET /v1/evidence Check evidence for specific claim types ▼

Got a thesis? This tells you if the data backs it up. Pass a claim type like "momentum_overheated" and get back the relevant metrics, where they sit historically, and what would flip the verdict.

          Response
{
  "ticker": "AAPL",
  "claim_type": "momentum_overheated",
  "as_of": "2026-01-24T16:00:00Z",
  "metrics": [
    {"name": "ma_dist_200d", "value": 0.17, "percentile_5y": 0.92},
    {"name": "rsi_14", "value": 73.4, "percentile_5y": 0.89}
  ],
  "support": {"label": "supported", "score": 82},
  "change_my_mind": [
    {"metric": "rsi_14", "condition": "drop_below", "value": 60},
    {"metric": "ma_dist_200d", "condition": "drop_below", "value": 0.08}
  ]
}
        

Thesis Translator

The thesis translator takes your investment idea in plain English and turns it into structured claims we can test. An LLM does the parsing, but evidence scoring is fully deterministic. The LLM formats and categorizes. It never makes up data.

Claim Taxonomy

Ten claim types to start. Each one maps to specific metrics and scoring logic.

momentum_overheated

valuation_stretched

macro_rate_sensitive

earnings_revision_up

risk_off_shift

volatility_expansion

sector_rotation

crowded_positioning

mean_reversion_setup

catalyst_event_risk

POST /v1/thesis/parse Extract structured claims from narrative text ▼

Paste a headline or thesis. The system pulls out entities and maps statements to claim types. Output is validated against a strict schema. Bad claim type or missing entities? Rejected.

          Request
{
  "text": "AI hype is peaking; semis look overheated ahead of earnings.",
  "hint_tickers": ["NVDA", "AVGO"],
  "horizon": "20d"
}
        

          Response
{
  "entities": [
    {"type": "sector", "value": "semiconductors"},
    {"type": "ticker", "value": "NVDA"}
  ],
  "claims": [
    {
      "claim_text": "Semis are overheated",
      "claim_type": "momentum_overheated",
      "entities": [{"type": "sector", "value": "semiconductors"}]
    },
    {
      "claim_text": "Earnings are a major near-term risk",
      "claim_type": "catalyst_event_risk",
      "entities": [{"type": "ticker", "value": "NVDA"}]
    }
  ]
}
        

POST /v1/thesis/evaluate Score claims against current evidence ▼

Once you have structured claims, this scores them. You get support level, the metrics behind it, what would change the call, and red team analysis on how the thesis could blow up.

          Response
{
  "as_of": "2026-01-24T16:00:00Z",
  "scorecard": [
    {
      "claim_type": "momentum_overheated",
      "support": {"label": "mixed", "score": 61},
      "evidence": [
        {"metric": "ma_dist_200d", "value": 0.22, "percentile_5y": 0.95}
      ],
      "change_my_mind": [
        {"metric": "ma_dist_200d", "condition": "drop_below", "value": 0.12}
      ]
    }
  ],
  "red_team": [
    {
      "failure_mode": "trend_regime_can_stay_overheated_longer",
      "early_warning": "breadth_breakdown"
    }
  ]
}
        

POST /v1/thesis/track Save thesis for ongoing monitoring ▼

Want to track a thesis over time? This saves it and runs periodic checks. The system keeps snapshots of evidence state and pings you when your triggers get hit.

How tracking works: A background job runs daily (or hourly) to re-check tracked theses. Trigger gets hit? Event logged, notification sent. You can replay the whole history to see how your thinking evolved.

Explain and Red Team

POST /v1/explain Get drivers and failure modes for a forecast ▼

For any forecast, this tells you what drove it and what could go wrong. Not about justifying the call. About being transparent.

          Response
{
  "drivers": [
    {"feature": "vol_20d", "impact": 0.18, "direction": "lower_vol_increases_confidence"},
    {"feature": "breadth_proxy", "impact": 0.12, "direction": "higher_breadth_supports_upside"}
  ],
  "why_wrong": [
    {"risk": "macro_surprise", "trigger": "cpi_surprise_up"},
    {"risk": "regime_shift_to_stress", "trigger": "vol_20d_spikes_above_0.35"}
  ]
}
        

POST /v1/redteam Find historical analogs and failure patterns ▼

Searches for historical setups that looked like this one and shows you how they played out. It specifically hunts for times the obvious trade failed.

Why this matters: Markets reward humility. Knowing how your setup has blown up before is as valuable as knowing when it worked.

Model Serving

When a forecast request hits, here is how data flows through:

Inference Flow

1 Gateway receives forecast request and validates parameters

2 Calls data_state service for feature vectors with correct as_of timestamp

3 Calls forecast service: deep model (TFT or TCN) produces distribution, LightGBM produces events, stacker combines both

4 Calls calibration service: temperature scaling and isotonic calibration per horizon and regime, then conformal interval wrapper

5 Gateway assembles response with model version and returns to client

Model Artifacts

Trained models live in object storage. Each artifact has a version and URI so you can roll back or A/B test easily.

deep_model.onnx

The main deep learning model exported to ONNX format for fast inference. Can also be TorchScript if needed.

lgbm.txt

LightGBM booster in text format. Handles tabular features and produces complementary predictions.

stacker.pkl

Meta learner that combines deep and tree model outputs. Trained on held-out validation data.

calibration_params.json

Temperature scaling factors and isotonic regression mappings, organized by horizon and regime.