Satellite-Driven Strategy — AlgoGators Quant Fund

Harvest
Signal

Climate Intelligence → Ag Futures

A systematic trading strategy for agricultural commodity futures, driven by satellite-derived climate data from NASA's POWER project. Ridge regression models translate production-weighted weather stress into directional signals in corn and soybean markets — capturing dislocations that fundamentals-driven traders are slow to price.

Ann. Return
15–20%
Backtest est.
Sharpe Ratio
1.2–1.8
Backtest est.
Win Rate
54–58%
Backtest est.
Max Drawdown
−30–40%
Backtest est.
01 — Strategy

Core Thesis

Agricultural commodity prices are fundamentally determined by supply — and supply is fundamentally determined by weather. Droughts, heat waves, and cold snaps materially impact crop yields, but these signals are often underpriced during the growing season because most market participants rely on backward-looking reports: USDA crop condition data, export inspections, inventory estimates. By the time consensus catches up, the dislocation has already closed.

HarvestSignal takes a different path. NASA's POWER project provides validated satellite-derived climate data at daily resolution for any point on Earth, available with only a 1–2 day lag. By quantifying heat, cold, and drought stress across the major US corn belt regions — production-weighted by actual harvest volumes — and training Ridge regression models on the historical relationship between these stresses and 10-day forward futures returns, the system attempts to systematically front-run the consensus repricing.

The model is intentionally parsimonious. Complex architectures overfit severely given 15–20 growing seasons of data. Ridge regression's L2 regularization provides the right bias-variance tradeoff and produces interpretable coefficients that can be examined and challenged — critical for a strategy that needs to be trusted in production.

02 — Data Infrastructure

Three Data
Sources

The system is built on freely available, high-quality data sources. No proprietary data feeds, no expensive weather subscriptions. The caching layer ensures NASA POWER's rate limits are respected intelligently.

🛰
NASA POWER
power.larc.nasa.gov
Satellite-derived daily climate data at 0.5° × 0.5° global resolution. Available from 1981 to near real-time (1–2 day lag). Provides max/min temperature, precipitation, solar radiation, and humidity. Validated and gap-filled by NASA Langley Research Center.
📈
Yahoo Finance
ZC=F · ZS=F · ZW=F
Continuous front-month futures pricing via the yfinance Python library. No API costs, no roll-adjustment complexity. Corn (ZC=F) is the Phase 1 focus; soybeans (ZS=F), wheat (ZW=F), and cotton are in the expansion roadmap.
📋
CFTC CoT
Weekly positioning data
CFTC Commitments of Traders reports provide weekly positioning context for managed money, commercials, and small speculators. Used as a sentiment overlay and regime filter — helping avoid entries when positioning is structurally unfavorable.
03 — Production Weighting

Corn Belt
Coverage

Climate data is not simply averaged across the US. Each production region is weighted by its actual contribution to national corn output. This ensures that a drought in Iowa — which produces 35% of US corn — has proportionally more impact on the composite stress signal than the same drought in Indiana.

Iowa
35%
42.0°N 93.5°W
Illinois
30%
40.0°N 89.0°W
Nebraska
20%
41.5°N 99.8°W
Indiana
15%
40.3°N 86.1°W

The soybeans expansion uses a similar framework — Iowa (30%), Illinois (30%), Minnesota (20%), Indiana (20%) — reflecting the slight northward shift in peak soybean production relative to corn.

04 — Feature Engineering

Weather Stress
Indicators

Three distinct forms of crop stress are quantified. Each is computed as a binary daily indicator (stress / no stress) and then aggregated into rolling 7-day, 14-day, and 30-day cumulative windows — capturing both the intensity and duration of adverse conditions.

Heat Stress Threshold
> 32°C
Max daily temperature above 32°C. Pollen viability collapses above this level during pollination, directly reducing kernel set. Z-score normalized against 30-year historical baseline.
Cold Stress Threshold
< 10°C
Min daily temperature below 10°C. Delays stand establishment in early season, reduces final yield potential. Particularly damaging before V6 leaf stage.
Drought Stress Threshold
< 25mm
7-day cumulative precipitation below 25mm/week. Water deficit during vegetative growth and grain fill is the single most damaging yield risk. Critical during July pollination window.
01 ——
Seasonal Deviation (Z-Score)
Each climate variable is z-score normalized against its historical baseline for that calendar day and location. This removes the seasonal cycle entirely, isolating genuine anomalies from normal warm summers or dry spells.
02 ——
Multi-Window Aggregation
Rolling 7-day, 14-day, and 30-day cumulative sums of each stress indicator. Short windows capture acute shocks; longer windows identify sustained stress periods that have a disproportionate impact on final yields.
03 ——
Production Weighting
Regional stress indicators are combined using production volume weights before entering the model. Iowa's drought matters more than Indiana's drought — the weighting scheme reflects this precisely.
04 ——
Growing Season Gating
Trading is restricted to the corn growing season (April–September). Out-of-season weather has negligible impact on futures prices. The gate prevents the model from trading on signals that have no fundamental anchor.
05 ——
PCA Dimensionality Reduction
After standard scaling, PCA retains 95% of variance across the stress feature set. This eliminates redundant features (correlated stress indicators across adjacent time windows), stabilizing Ridge regression coefficient estimates.
06 ——
Feature Selection Gate
Variance-based filtering limits each model to its 15 most informative features. Reduces noise contribution and speeds up monthly retraining cycles without sacrificing predictive power on the core stress signals.
05 — Prediction Model

Rolling Ridge
Regression

A Ridge regression model predicts 10-day forward futures returns as a linear function of the current weather stress feature vector. Models are retrained monthly on a rolling 3-year window to allow the weather-price relationship to evolve with shifting market structure, without overfitting to distant historical data.

Prediction Model Ŷ_t = β₀ + β₁·HeatStress_7d + β₂·HeatStress_14d + β₃·HeatStress_30d
      + β₄·ColdStress_7d + β₅·ColdStress_14d + ...
      + β_n·DroughtStress_30d + ε

Where Ŷ_t is the predicted 10-day forward return on ZC=F.
All features are production-weighted across the 4 corn belt regions and z-score normalized.
L2 Regularization (Ridge) min   ‖y − Xβ‖² + α‖β‖²

The regularization term α‖β‖² penalizes large coefficients, preventing overfitting
on the limited sample of 15–20 growing seasons. α = 1.0 (default), tuned via
time-series cross-validation on the training window.
Walk-Forward Protocol For each date t in the backtest:
  1. Use only data available through t−1 (strict point-in-time)
  2. Train Ridge model on the most recent 3 years of growing-season data
  3. Apply PCA (95% variance) + feature selection to training features
  4. Generate prediction Ŷ_t for the next 10 trading days
  5. Convert to directional signal: sign(Ŷ_t) if |Ŷ_t| > min_signal_strength
  6. Size position via volatility normalization (target 12% annual vol)
  7. Mark-to-market at close, charge 10 bps transaction cost on entry/exit
06 — Signal Pipeline

Six-Stage
Signal Pipeline

Raw satellite data enters one end; a sized futures position comes out the other. Each stage has a single responsibility, making the system easy to audit, debug, and extend to new commodities.

01
Climate Fetch
NASA POWER API pulls daily temp/precip for each region. Smart cache layer avoids redundant calls.
02
Stress Compute
Binary heat/cold/drought indicators. Rolling 7d/14d/30d cumulative aggregation. Z-score normalization against historical baseline.
03
Region Aggregate
Production-weight each region's stress into composite corn belt indicators. Iowa 35%, Illinois 30%, Nebraska 20%, Indiana 15%.
04
Model Predict
PCA → Ridge regression → predicted 10-day return. Models retrained monthly on 3-year rolling window.
05
Season Gate
Signal only active during April–September growing season. Out-of-season output forced to zero regardless of model prediction.
06
Size & Execute
Volatility-normalized position sizing. 5% fixed base size. Max 20% single position. 10 bps/side transaction cost applied.
07 — Performance

Backtest
Estimates

Based on walk-forward simulation from 2010–2023 on corn futures (ZC=F), with 10 bps/trade transaction costs, 5% base position sizing, monthly model retraining, and growing season restriction. All figures are estimates from the backtest framework — live results are pending Phase 4 deployment.

Return Metrics
Annualized Return 15–20%
Sharpe Ratio 1.2–1.8
Sortino Ratio 1.5–2.2
Calmar Ratio 0.4–0.6
Max Drawdown −30% to −40%
Trading Statistics
Win Rate 54–58%
Profit Factor 1.4–1.8×
Transaction Costs 10 bps/side
Initial Capital $1,000,000
Prediction Horizon 10 trading days

★ The elevated drawdown profile (-30% to -40%) is characteristic of weather-driven dislocation strategies. Severe droughts can persist for an entire growing season, creating extended adverse periods before mean reversion reasserts. This makes the strategy unsuitable for low-volatility mandates but potentially compelling for absolute return allocators with appropriate risk tolerance and long time horizons. The Sharpe ratio of 1.2–1.8 indicates strong risk-adjusted returns despite the high drawdowns, reflecting a high-volatility, high-return profile.

08 — Risk Factors

Known
Limitations

Rigorous quantitative strategy development requires honest accounting of model risks. Four risks are material for this strategy.

Model Risk
Relationship Stationarity
The strategy assumes the weather-price relationship is stationary. Climate change, advances in drought-resistant seed genetics, and shifting global trade patterns may gradually erode historical coefficients. Monthly retraining helps, but structural breaks can cause sustained underperformance before detection.
Execution Risk
Slippage at Extremes
Weather signals are strongest during volatile conditions — droughts, heat waves — precisely when futures bid-ask spreads widen. The 10 bps cost assumption may prove optimistic at scale. Real-world slippage could meaningfully erode the edge, particularly for larger position sizes.
Data Risk
NASA POWER Gaps
NASA POWER data occasionally contains gaps or anomalies, particularly for the most recent dates. Validation checks are implemented in data_validator.py but cannot eliminate all quality issues. The cache layer also risks serving stale data if invalidation logic fails.
Regime Risk
Orthogonal Macro Shocks
When commodity markets are dominated by factors orthogonal to weather — USD strength, Chinese demand shocks, policy interventions, export sanctions — weather signals fail systematically. No regime filter is currently implemented; the strategy will continue trading through unfavorable macro environments.
09 — Roadmap

Expansion
Phases

Phase 1 (corn, US) is the production baseline. Each subsequent phase extends the edge into new markets, models, and execution venues.

Phase 1 — Current
Corn, US Corn Belt
Rolling Ridge regression on heat, cold, drought stress. ZC=F continuous front-month. Monthly retraining, growing season only.
Phase 2 — Expansion
Multi-Commodity
Extend to soybeans, wheat, cotton, and coffee with commodity-specific thresholds, regional weights, and growing season calendars already defined in config.
Phase 3 — Modeling
Ensemble + XGBoost
Explore gradient boosting and ensemble methods once the multi-commodity dataset is large enough to support deeper architectures without overfitting. XGBoost model skeleton already implemented.
Phase 4 — Live
IB API Deployment
Live trading via Interactive Brokers API with real-time data pipelines, position monitoring, and automated daily execution. Options strategies exploring implied vol regime forecasting.