Systematic quantitative research on SPY zero-day-to-expiration options strategies. Rigorous backtesting, layered filter validation, and explicit rules for promoting findings from data point to candidate-tier. Approximately 450+ configurations tested across ~30,000 simulated trades as of April 2026.
This research program develops and validates systematic 0DTE options strategies on SPY, with findings cross-validated against QQQ, IWM, and SPX. All research uses Option Alpha's backtest engine with $0.05/leg slippage (representing disciplined limit-order execution with 5–10 second patience), two-year windows, and mandatory multi-regime validation before any strategy promotes to candidate tier.
The framework exists to prevent premature conclusions, enforce rigorous testing, and build genuine mechanistic understanding of market edge — not pattern-matching or wishful thinking.
The zero live trades metric is intentional. The research is complete enough to deploy, but every validated strategy is still a backtest. Live validation is the next phase — each real trade yields more information than dozens of additional backtests. Honest disclosure of this gap matters more than pretending the research is "done."
Seven principles govern all research decisions. They were established iteratively as research revealed specific failure modes; each exists because an actual research error triggered its creation.
A single backtest result — positive or negative — does not close a research question. It narrows the search space for that specific configuration under those specific conditions.
Wrong framing: "This produced 2.09 PF, it's the best we can get." "This structure is dead." "Let's lock this in as the rule."
Correct framing: "This configuration fails at these conditions; filters or alternative parameters may reveal different behavior." "This is our current best baseline; refinement could push it higher."
Strategies don't reveal their full edge in a single test. They reveal it across a sequence: baseline → filter → regime gate → parameter optimization.
Example: the afternoon Iron Condor candidate (B30-C) wasn't discovered as PF 2.82. It emerged through:
A baseline PF of 1.2 can become PF 2.0+ with the right filters applied. Starting points matter less than the refinement process.
Before every batch, expected ranking and PF range are committed explicitly. This serves as a calibration check that prevents hindsight bias and triggers investigation when results surprise.
Tracking prediction accuracy across batches reveals where intuition fails. For example, the B65 wing sweep surprised prediction: raw PF climbed monotonically through $25 wings, but capital-adjusted edge peaked at $5 wings. This refined Principle 7 into existence.
All research uses $0.05/leg slippage minimum. This represents disciplined limit-order execution with 5–10 second patience. Higher slippage variants ($0.07–$0.10/leg) are used for stress testing.
Mid-price backtests lie. A strategy that works only at theoretical mid-fills is not a strategy — it's a mirage that won't replicate in live trading. Several early research findings were invalidated when realistic slippage was applied.
Two strategies with PF 1.5 that lose money on the same days = one strategy levered up. Two strategies with PF 1.3 that lose on different days = real diversification.
Correlation analysis is mandatory before declaring two strategies stackable. Combined drawdown matters more than combined expected value. This is an open research question for B65-A and B30-C, which may or may not fire on overlapping days.
Monthly equity-curve validation is required before candidate promotion. Specifically:
These tests catch strategies whose aggregate PF hides bimodal regime dependence. B63-C (morning SPS without VIX filter) showed aggregate PF 1.46 — but monthly breakdown revealed 17 months underwater before recovery. Adding the VIX ≥ 20 filter produced B65-A, which passed all three regime tests cleanly.
The methodology explicitly treats self-gating (strategy skips days via filter) as protective, not a limitation. B65-A fires ~11–12 times per year; the remaining ~240 trading days are skipped by design.
When a sweep varies risk per trade (e.g., wing width on SPS), raw PF systematically favors wider wings — but max risk scales linearly while P/L scales sub-linearly. Return-on-risk is the correct tiebreaker.
Empirical example from the B65 wing sweep:
| Wing | PF | Avg P/L | Max Risk | RoR |
|---|---|---|---|---|
| $5 | 1.78 | $12 | $548 | 48.9% |
| $10 | 2.02 | $18 | $1,049 | 39.7% |
| $25 | 2.17 | $22 | $2,674 | 18.7% |
The $5-wing variant wins on capital efficiency (48.9% vs 18.7%) despite lower raw PF (1.78 vs 2.17). This principle was established during the B65 research and applied retroactively to revise candidate selection.
Research progresses through five phases. Each has explicit closure criteria to prevent premature promotion or endless refinement.
Goal: Identify all configurations showing positive expectancy or marginal positive signal.
Method: Test broad combinations of structure × entry time × exit time × delta × wings. Each batch isolates ONE variable and holds others constant (Option Alpha's 5-test comparison limit). Realistic slippage applied.
Closure: Phase 1 closes when major dimensions have been mapped sufficiently to identify Phase 2 candidates — NOT when "the answer" is found.
Goal: Concentrate edge via individual filters.
Filter dimensions tested:
Output: Filter heatmap per candidate. Filters producing >20% PF improvement are flagged for Phase 3.
Goal: Combine validated filters to maximize edge without overfitting.
Sample size rules:
B65-A (heavy-down + continuation + VIX ≥ 20) is a 3-filter combination with N=23. Below the 30-trade threshold but accepted given mechanistic coherence and cross-regime contrast.
Methods:
Method: Smallest viable size (1 contract). Track every trade against backtest expectation. Compare actual slippage to modeled. Document system-rule deviations.
Current status: no candidate has entered Phase 5. The research-to-execution gap is the highest-priority item on the open research list.
Configurations are classified into four tiers by how many qualification criteria they satisfy:
| Tier | Criteria met | Treatment |
|---|---|---|
| Candidate | 7 of 7 | Live-deployable, size carefully |
| Research-tier | 4–6 of 7 | Research-worthy, additional validation needed |
| Data point | Below 4 | Data point only, not actionable |
| Closed | — | Falsified or validated-and-subsumed |
| Metric | Threshold | Rationale |
|---|---|---|
| Profit factor | > 1.25 | Below this, slippage variance flips negative |
| Win rate | > 70% | Psychological sustainability; small drawdown clusters |
| Max drawdown | Account-appropriate | Account survival under worst case |
| Sample size | > 50 trades | Statistical reliability after filtering |
| Avg P/L per trade | > $8 net slippage | Worth cognitive load and capital tie-up |
| Monthly validation | Passes all three regime tests | Catches bimodal / regime-masked findings |
| Correlation (if stacking) | < 0.5 | Real diversification, not amplification |
| Priority | Metric | Why |
|---|---|---|
| Primary | Profit Factor | Risk-adjusted return |
| Secondary | Total P&L ($) | Absolute dollars |
| Tertiary | Win Rate | Consistency / psychology |
| Check | Max Drawdown | Tail risk / account survival |
| Check | Avg Win / Avg Loss | Outcome asymmetry |
A configuration with PF 2.8, $37 avg P&L, 78% WR beats one with PF 1.6, $45 avg P&L, 85% WR. The second has better headline numbers but worse risk-adjusted return.
Strategies that have cleared the qualification criteria. Each includes batch-level progression showing how it emerged from broader research.
Candidate Monthly validated
Short Put Spread, 9:40 AM entry, 5-hour hold, filtered to stressed regime reversal setups. Currently the flagship morning-session candidate.
| Structure | Short Put Spread (credit) |
| Short put | −0.20 delta |
| Long put | $5.00 below short put leg, exactly |
| Entry time | 9:40 AM ET |
| Exit rule | 5 hours (2:40 PM ET) or expiration |
| Filters (AND) | Change% ≤ −0.5% · Open CHG% ≤ 0% · VIX prior close ≥ 20 |
| Slippage | $0.05/leg entry and exit |
When VIX prior close ≥ 20, overnight fear is already priced in. A heavy gap down (Change% ≤ −0.5%) that continues through 9:40 represents the last wave of sell-first-think-later flow. These sellers typically exhaust within 5 hours, at which point put-side premium deflates and the SPS collects credit as IV normalizes.
The mechanism specifically requires pre-existing fear, not fresh fear. A calm-market gap down (VIX < 15) behaves differently — the gap itself generates new fear, continuation dominates, and SPS gets run over. This was confirmed in the VIX regime sweep (B64).
| VIX regime | PF | N | Verdict |
|---|---|---|---|
| ≥ 20 | 2.02 | 23 | Winner |
| 15–20 | 0.99 | 15 | Breakeven |
| 10–15 | 0 | 1 | Insufficient data |
| < 15 | 0.67 | 15 | Losing |
| No filter | 1.46 | 39 | Regime-masked average |
| Wing | PF | Avg P/L | Return on Risk |
|---|---|---|---|
| $5 (B65-A) | 1.78 | $12 | 48.9% |
| $10 | 2.02 | $18 | 39.7% |
| $15 | 2.12 | $20 | 28.3% |
| $20 | 2.15 | $21 | 22.7% |
| $25 | 2.17 | $22 | 18.7% |
Passes all three regime tests cleanly. Monthly validation on the $5-wing deployment variant specifically is an open item — expected to pass since the same trades fire, but needed for full candidate certification.
To confirm the mechanism is asymmetric rather than a general high-vol mean-reversion effect, the symmetric mirror was tested: Long Put Spread on heavy-up + continuation + VIX ≥ 20.
Result: PF 0.62, WR 27.8%, N=18. Thesis falsified. Up-day continuation trends further (does not reverse), confirming B65-A's mechanism is specifically down-day fear exhaustion, not generic reversal. See Closed Findings for full analysis.
Candidate
Afternoon Iron Condor entered at 1:30 PM ET on down-regime days. Asymmetric construction (Put 15δ / Call 25δ) exploits put-side volatility skew. Held to expiration per the validated hold-to-expiry finding.
| Structure | Iron Condor (asymmetric) |
| Short put | −0.15 delta |
| Short call | +0.25 delta |
| Wing width | $10 |
| Entry time | 1:30 PM ET |
| Exit rule | Hold to expiration |
| Regime | Down-day regime (applied at 1:30 PM) |
Emerged from the afternoon IC research sequence that established 1:30 PM as the dominant entry time across ~100+ configurations. The asymmetric construction (tighter put side, wider call side) reflects the structural SPY skew — put-side IV is inflated by constant institutional protective buying, so selling tighter puts captures more of that premium while the wider call side reduces upside breach risk.
Sister finding B28-D (afternoon Jade Lizard on strong-up days, PF 4.16) represents the opposite regime — up-days benefit from call-side directional premium rather than symmetric condor construction.
Candidate
Morning strategy on overnight gap > +0.5%. Strong-up opens tend to trend further through the morning, supporting directional premium capture.
Underlying structure and mechanism align with the asymmetric-reversal finding: up-day momentum continues (as confirmed by B66's falsification of the symmetric reversal hypothesis), making directional up-side strategies appropriate when filter fires.
Findings with meaningful positive signal that have not yet cleared all qualification criteria. Candidates for future validation work.
Research-tier
Morning premium strategy on slight overnight gap up. PF 1.34. Moderate edge but not yet cleared through full qualification. Pending: sample size confirmation, monthly validation, and cross-ticker check.
Research-tier
Aggregate PF 1.46 over 2 years. Regime-masked (17-month underwater period). Superseded by B65-A with the VIX ≥ 20 filter. Kept as research-tier for completeness; do not deploy without the VIX gate.
Research-tier
Wider-wing variants of B65-A. All have higher raw PF (2.02–2.17) but worse capital efficiency. Retained as research-tier for edge cases where account size makes wider wings acceptable, but B65-A ($5 wing) is the default deployment target.
Research-tier
Baseline heavy-down SPS (PF 1.35) that kicked off the research sequence leading to B65-A. Superseded but documented for lineage.
Research-tier
The VIX regime sweep on heavy-down SPS that identified VIX ≥ 20 as the gating variable. B64-A (VIX ≥ 20, $10 wing) is the direct parent of B65-A with alternate wing width.
Questions that have been tested with sufficient rigor to close pending new evidence. These are not to be retested without a specific mechanistic reason — retesting without new data is the "confirmation bias trap."
Closed
Hypothesis: if heavy-down + continuation + VIX ≥ 20 reverses up (B65-A thesis), the symmetric heavy-up + continuation setup should reverse down — tradable via Long Put Spread.
Result:
Interpretation: the reversal mechanism is asymmetric. Three mechanistic reasons:
WR 27.8% is the telling metric. You needed to call direction correctly on a short-duration debit spread; the market moved against you 72% of the time.
Implication: down-day reversal edge is real; up-day reversal is not. The two sides cannot be traded symmetrically.
Closed
Heavy-down + continuation filter with VIX < 15: PF 0.67, avg P/L −$10. Calm markets lack the fear-premium inflation that powers B65-A's reversal edge. Morning SPS should not be deployed when VIX is < 20.
Closed
Aggregate PF 1.46 over two years, but monthly equity curve reveals 17-month underwater period followed by 7-month recovery. All edge concentrated in the recovery regime. Without VIX ≥ 20 gate, strategy is bimodal and fails Principle 6. Superseded by B65-A.
Closed
Every profit-target and stop-loss management variation tested (across dozens of configurations) reduced or eliminated edge relative to hold-to-expiration. Hold-to-expiry is the validated approach. This is counterintuitive — managing losers feels prudent — but data is unambiguous.
Closed
Batches 2, 4, 6b established that morning IC entries with early exits (before afternoon) consistently produce negative edge once realistic slippage is applied. Early research findings that suggested otherwise were invalidated once $0.05/leg slippage was enforced.
Closed
Batch 3a vs 3b established the earlier entry wins on every metric for the unfiltered case. However, this finding is subsumed by the broader result that 1:30 PM entry beats every tested morning entry time (Batch 5). Afternoon IC is the correct lane for symmetric structures.
Closed
Across IC, IB, BWIC, BW IB, and SPS structures (Batches 4, 6b), exiting at 1:00 PM consistently destroys edge. Hold-to-expiration or hold-to-late-afternoon minimum.
Findings from external sources that have been integrated as reference data or additional structural candidates. External research is treated distinctly from proprietary backtests — it's either reference data (descriptive, not tested) or external strategy (tested framework from another practitioner, integrated but not validated in our engine).
External strategy · EXT-02
Jack Sokin's Peggy Bank uses percentage-OTM strike selection with a reward/risk cascade (0.5% / 0.4% / 0.3% OTM) rather than delta or fixed-dollar offsets. This is a structurally different parameterization from our research family.
Integration status: %-OTM structures added to the Decision Engine's structure library alongside dollar-offset variants. Not yet independently backtested as a standalone candidate. Initial engine output suggests Peggy-style structures converge on similar answers to dollar-offset variants when analog days are well-matched (~0.75% OTM SPS ≈ $5 OTM SPS at SPY $710).
Why %-OTM matters: more robust to SPY price drift over time. A $3 OTM strike at SPY $400 has very different meaning from $3 OTM at SPY $700. Percentage-based parameterization normalizes across price regimes.
Reference data
PEG study provides containment probabilities for SPX over different lookback windows (30D / 90D / 180D) and entry times. Key confirmed patterns:
Status: reference data, not integrated as a filter. Worth testing as a future research direction.
Reference data
External composite liquidity ranking for 220 tickers validates the SPX → SPY → QQQ → IWM → XSP hierarchy. Quantitatively backs the transfer-gap hypothesis: findings on the top 3–5 tickers transfer cleanly across the family; findings on lower-liquidity tickers are less reliable.
Reference data
Empirical overnight range probabilities for SPX 1DTE by entry time. Reference data, not used as a filter. Potentially useful for bridging between 0DTE research and 1DTE applications.
Beyond the validated candidate strategies, research has produced a daily decision-support framework that extends the candidate-filter approach with historical analog matching. This complements (does not replace) the validated filters.
Traditional backtest analysis asks "what is the expected P/L of trade X today?" This requires modeling options prices, IV surfaces, and slippage accurately — a tall order with free-tier data.
The containment framework reframes the question: "On the historical days most similar to today, what fraction of them closed inside a given structure's profit zone?" This is a binary outcome question (contained / max loss / partial) that can be answered purely from underlying price data without options pricing models.
Don't predict P/L. Predict containment. The trade edge expresses itself across many decisions conditional on containment, not via accurate per-trade P/L forecasts.
| Tier | Trigger | Action |
|---|---|---|
| GREEN | Validated filter fires AND top structure ≥ 80% containment on analogs | Standard size |
| YELLOW | One signal strong, the other mixed | Reduced size (½ contract) |
| ORANGE | No validated fire, top structure 60–80% | Small experimental or skip |
| RED | No validated fire AND top structure < 60% | Stand down |
Untested or insufficiently tested questions, priority ranked.
Research errors this methodology is explicitly designed to prevent. Each exists because we encountered it.
Treating any single backtest as conclusive. Negative results don't kill ideas; they narrow the conditions where ideas might work.
Believing the highest PF in current data is the ceiling. The afternoon IC went from PF 1.4 to PF 2.82 through layered refinement. Today's "best" is tomorrow's baseline.
Comparing strategies tested under different conditions (slippage, time periods, regime filters). Always normalize comparison conditions before drawing conclusions.
Wanting larger sample sizes that don't exist. 0DTE SPY has only ~3.5 years of real data. Cross-ticker and cross-period validation substitute for raw N where appropriate.
Running backtests without slippage and treating results as achievable. Realistic execution cost destroys naive edges. Always use $0.05/leg minimum.
Adding multiple positive-PF strategies without checking if they lose on the same days. Combined drawdown can be worse than expected if correlations are high.
Running additional tests to confirm a desired result rather than challenge a finding. Best practice: actively look for tests that could falsify your current hypothesis.
Adding strategies because they're profitable, even if execution complexity exceeds capacity. A single PF 2.0 strategy executed flawlessly beats three PF 1.4 strategies executed sloppily.
Added this session. Building execution infrastructure before live-validating the strategy it's meant to support. The risk: polishing the delivery mechanism for edge that hasn't been proven to survive real markets.
Option Alpha: real options pricing, 5 simultaneous backtests for comparison, 2-year window typical, $0.01–$0.25/leg slippage configurable.
Wealthsimple (CAD margin): 1.5% FX fee per USD conversion, wider spreads than US-direct brokers. Suitable for initial live testing; transferable to other brokers for scale.
| Source | Data | Notes |
|---|---|---|
| Polygon (free tier) | SPY, QQQ, IWM, TLT, UUP daily & intraday | 2-year history cap on free tier |
| FRED (St. Louis Fed) | VIX daily close | Free, 3-year+ history available |
| Option Alpha | Historical options chains (indirect via backtester) | Actual options pricing, not modeled |
| Economic calendar | FOMC / CPI / NFP / GDP dates | Manual population from ForexFactory |
Methodology is versioned explicitly. Each revision documents what changed and why.
| Version | Date | Changes |
|---|---|---|
| v1 | Apr 16, 2026 | Initial methodology document. Established after morning bridge research session demonstrated need for explicit framework to prevent premature conclusions. Seven principles, five phases, seven qualification criteria, nine anti-patterns. |
| v2 | ~Apr 18, 2026 | Added monthly validation formalization (Principle 6). Added Reference Data vs External Research distinction (Principle 8 in some revisions). Added handoff-document discipline for session continuity. |
| v3 | ~Apr 19, 2026 | Refined qualification criteria to include out-of-sample validation and correlation. Expanded anti-patterns. Integrated morning research findings (B49, B53, B55 families) into candidate tier framework. |
| v3.1 | Apr 21, 2026 | Current. Added Principle 7 (capital-efficiency over raw PF) from B65 wing-sweep research. Added asymmetric reversal finding (B66 falsification). Added Decision Engine / containment framework as research-tier deliverable. Added "premature web app" anti-pattern. |
A "closed" question can be reopened if:
Example: B63 family was tentatively closed after monthly validation failure, then reopened when the VIX regime sweep (B64) revealed the regime-masked edge. B65-A is the result.