A bandit-based framework that automatically selects the best variance reduction strategy for Monte Carlo option pricing without prior knowledge of the payoff structure. Achieves near-oracle performance on well-separated option types with theoretical O(K log N/N) MSE bounds.
Alexander Robbins
We propose Adaptive Strategy-selected Variance Reduction (ASVR), a bandit-based framework that automatically selects the best variance reduction strategy for Monte Carlo option pricing without requiring prior knowledge of the option payoff structure.
ASVR runs a two-phase explore-then-exploit procedure over a pool of K variance reduction strategies and achieves expected MSE within O(K log N/N) of the oracle (the best fixed strategy).
Across eight option types—European, Asian, barrier, and lookback—ASVR always outperforms plain Monte Carlo and achieves near-oracle performance on 2 out of 8 options (within 10% of the best fixed strategy) with a single, parameter-free implementation.
We also propose a Bayesian fusion estimator that combines exploration estimates via inverse-variance weighting; with oracle weights this is provably variance-minimising, and with estimated weights it achieves genuine MSE-ratio improvement on 6 out of 8 option types.
All experiments use N = 10,000 paths, 252 steps, and 200 independent trials with parameters S₀ = K = 100, r = 0.05, σ = 0.20, T = 1.
1 Introduction — Motivation and Research Question
2 Problem Setup — Monte Carlo, VR Strategies, Oracle
3 The ASVR Algorithm — Two-Phase Framework & Theory
4 Implementation Details — Pool Design & Halton QMC
5 Experimental Results — Performance across 8 option types
6 Discussion — IS Parameterization & Extensions
7 Conclusion
28 pages | 5 figures | Appendices with proofs & data
Monte Carlo simulation is the standard method for pricing path-dependent and high-dimensional derivatives. However, its O(N^−1/2) convergence rate can be slow. Variance reduction (VR) techniques dramatically improve efficiency, but the challenge is that different strategies excel on different option types.
Control variates dominate European calls but fail on puts. Latin hypercube sampling works for Asian puts but not lookback calls. Importance sampling is powerful but requires careful parameterization and matching the option's sensitivity direction. In practice, practitioners choose strategies by domain knowledge or trial-and-error — both impractical for automated pricing pipelines.
ASVR solves this by running a two-phase explore-then-exploit procedure: allocate exploration paths to test each strategy's variance on the unknown payoff, then devote the remainder of the budget to the estimated best strategy. The result is a parameter-free algorithm that's never worse than plain Monte Carlo and achieves near-oracle performance when strategy separation is large.
ASVR was tested on 8 option types across 200 independent trials with 10,000 paths each, using the same GBM parameters throughout. The algorithm always improves over plain Monte Carlo and achieves near-oracle performance on well-separated option types.
ASVR is a multi-armed bandit algorithm that identifies the best variance reduction strategy without prior knowledge of the payoff structure. It operates in two phases:
In experiments, n_exp = 100 and K = 9, consuming 900 of 10,000 total paths (9% exploration overhead). The oracle-gap theorem guarantees E[MSE(ASVR)] ≤ MSE(oracle) + C · K log N / N, where the constant C depends on strategy separation.
For strategies with large variance separation (European call: CV vs LHS gap is 3×), the bound is tight and empirical performance matches theory. For options with small separation (Asian put: LHS vs moment matching gap is only 1.2×), finite-sample noise dominates and the gap to oracle widens.
After both phases, ASVR has K exploration estimates and one exploitation estimate. Rather than discarding exploration paths, the fusion estimator combines all K+1 estimates via inverse-variance weighting:
With oracle weights (true variances), the fusion variance is guaranteed ≤ exploit variance. In practice with estimated weights, two failure modes arise: estimation noise when σ̂²_k is noisy, and selection bias (winner's curse) when σ̂²_k* is optimistically biased. Despite these, fusion wins on 6 out of 8 option types and produces 2.1–5.2% MSE improvements.
The full paper includes detailed proofs of the oracle-gap theorem, variance reduction strategy implementations, extended benchmark tables across all 8 option types, and discussion of the Halton failure mode in high dimensions.
↓ Download Full PDF (587 KB)
Paper: "ASVR: Adaptive Strategy-selected Variance Reduction for Monte Carlo Option Pricing"
Author: Alexander Robbins
Date: May 13, 2026
Pages: 28
Keywords: variance reduction, Monte Carlo, multi-armed bandit, explore-then-exploit, Bayesian fusion, quasi-Monte Carlo