Research Methodology & Findings

Systematic quantitative research on SPY zero-day-to-expiration options strategies. Rigorous backtesting, layered filter validation, and explicit rules for promoting findings from data point to candidate-tier. Approximately 450+ configurations tested across ~30,000 simulated trades as of April 2026.

Contents

Research overview

This research program develops and validates systematic 0DTE options strategies on SPY, with findings cross-validated against QQQ, IWM, and SPX. All research uses Option Alpha's backtest engine with $0.05/leg slippage (representing disciplined limit-order execution with 5–10 second patience), two-year windows, and mandatory multi-regime validation before any strategy promotes to candidate tier.

The framework exists to prevent premature conclusions, enforce rigorous testing, and build genuine mechanistic understanding of market edge — not pattern-matching or wishful thinking.

Current research state (April 2026)

Configurations tested
450+
Simulated trades
~30,000
Candidate-tier
3
Research-tier
~12
Closed findings
15+
Live trades
0

The zero live trades metric is intentional. The research is complete enough to deploy, but every validated strategy is still a backtest. Live validation is the next phase — each real trade yields more information than dozens of additional backtests. Honest disclosure of this gap matters more than pretending the research is "done."

Key structural findings

  • Entry time dominates most other variables. 1:30 PM ET entries substantially outperform morning entries for Iron Condor strategies held to expiration.
  • VIX 20–25 is a universal "sweet spot" producing the strongest profit factors across tickers for IC strategies.
  • Asymmetric delta selection (Put 10–15δ / Call 20–35δ) outperforms symmetric configurations by exploiting structural put-side volatility skew.
  • Hold-to-expiration is mandatory. All profit target and stop loss combinations reduced or eliminated edge across dozens of tested configurations.
  • Morning strategies are regime-dependent. Morning SPS without a VIX regime filter is bimodal and unreliable; the same strategy with VIX ≥ 20 gate becomes PF 1.78 with clean monthly validation.
  • Down-day reversal is asymmetric. Stressed-down-day fear exhausts quickly (reversal edge exists); stressed-up-day continuation does not mean-revert (inverse test falsified at PF 0.63).
Core principles (7)

Seven principles govern all research decisions. They were established iteratively as research revealed specific failure modes; each exists because an actual research error triggered its creation.

Principle 1 — Tests are data points, not verdicts

A single backtest result — positive or negative — does not close a research question. It narrows the search space for that specific configuration under those specific conditions.

Wrong framing: "This produced 2.09 PF, it's the best we can get." "This structure is dead." "Let's lock this in as the rule."

Correct framing: "This configuration fails at these conditions; filters or alternative parameters may reveal different behavior." "This is our current best baseline; refinement could push it higher."

Principle 2 — Edge is found through layered refinement

Strategies don't reveal their full edge in a single test. They reveal it across a sequence: baseline → filter → regime gate → parameter optimization.

Example: the afternoon Iron Condor candidate (B30-C) wasn't discovered as PF 2.82. It emerged through:

  1. Generic 1:30 PM IC → PF ~1.4
  2. Hold-to-expiry rule applied → PF ~1.7
  3. VIX regime filter (20–25) → PF 2.5+
  4. Asymmetric delta (Put 10δ / Call 35δ) → PF 2.82

A baseline PF of 1.2 can become PF 2.0+ with the right filters applied. Starting points matter less than the refinement process.

Principle 3 — Predict before testing

Before every batch, expected ranking and PF range are committed explicitly. This serves as a calibration check that prevents hindsight bias and triggers investigation when results surprise.

Tracking prediction accuracy across batches reveals where intuition fails. For example, the B65 wing sweep surprised prediction: raw PF climbed monotonically through $25 wings, but capital-adjusted edge peaked at $5 wings. This refined Principle 7 into existence.

Principle 4 — Realistic execution costs are non-negotiable

All research uses $0.05/leg slippage minimum. This represents disciplined limit-order execution with 5–10 second patience. Higher slippage variants ($0.07–$0.10/leg) are used for stress testing.

Mid-price backtests lie. A strategy that works only at theoretical mid-fills is not a strategy — it's a mirage that won't replicate in live trading. Several early research findings were invalidated when realistic slippage was applied.

Principle 5 — Negative correlation > stacked positive PF

Two strategies with PF 1.5 that lose money on the same days = one strategy levered up. Two strategies with PF 1.3 that lose on different days = real diversification.

Correlation analysis is mandatory before declaring two strategies stackable. Combined drawdown matters more than combined expected value. This is an open research question for B65-A and B30-C, which may or may not fire on overlapping days.

Principle 6 — Regime-masking awareness

Monthly equity-curve validation is required before candidate promotion. Specifically:

  • ≥ 60% of active months positive
  • Longest losing streak ≤ 3 months
  • No 15+ month bleed-then-recovery pattern

These tests catch strategies whose aggregate PF hides bimodal regime dependence. B63-C (morning SPS without VIX filter) showed aggregate PF 1.46 — but monthly breakdown revealed 17 months underwater before recovery. Adding the VIX ≥ 20 filter produced B65-A, which passed all three regime tests cleanly.

The methodology explicitly treats self-gating (strategy skips days via filter) as protective, not a limitation. B65-A fires ~11–12 times per year; the remaining ~240 trading days are skipped by design.

Principle 7 — Capital efficiency trumps raw profit factor

When a sweep varies risk per trade (e.g., wing width on SPS), raw PF systematically favors wider wings — but max risk scales linearly while P/L scales sub-linearly. Return-on-risk is the correct tiebreaker.

Empirical example from the B65 wing sweep:

WingPFAvg P/LMax RiskRoR
$51.78$12$54848.9%
$102.02$18$1,04939.7%
$252.17$22$2,67418.7%

The $5-wing variant wins on capital efficiency (48.9% vs 18.7%) despite lower raw PF (1.78 vs 2.17). This principle was established during the B65 research and applied retroactively to revise candidate selection.

Research phase framework (5)

Research progresses through five phases. Each has explicit closure criteria to prevent premature promotion or endless refinement.

Phase 1 — Surface mapping

Goal: Identify all configurations showing positive expectancy or marginal positive signal.

Method: Test broad combinations of structure × entry time × exit time × delta × wings. Each batch isolates ONE variable and holds others constant (Option Alpha's 5-test comparison limit). Realistic slippage applied.

Closure: Phase 1 closes when major dimensions have been mapped sufficiently to identify Phase 2 candidates — NOT when "the answer" is found.

Phase 2 — Filter testing

Goal: Concentrate edge via individual filters.

Filter dimensions tested:

  • Volatility: VIX bands (10–15, 15–20, 20–25, 25–32, 32+), VIX trend (rising/falling 5-day), VVIX, IV rank
  • Technical: Price vs 10/20/50/200-day MAs, RSI bands, Bollinger Band position, ATR, opening range
  • Market structure: Gap size/direction, prior-day close position, day-of-week, days-since-1%-move, inside/outside days
  • Calendar: FOMC/CPI/NFP/GDP proximity, earnings season, quarter-end, holiday-shortened weeks
  • Cross-asset: USD/JPY, 10Y yield, sector rotation, BTC correlation, crude oil

Output: Filter heatmap per candidate. Filters producing >20% PF improvement are flagged for Phase 3.

Phase 3 — Combination testing

Goal: Combine validated filters to maximize edge without overfitting.

Sample size rules:

  • Single filter: minimum 50 trades remaining after filter
  • 2-filter combination: minimum 30 trades
  • 3-filter combination: minimum 20 trades (rare to justify)

B65-A (heavy-down + continuation + VIX ≥ 20) is a 3-filter combination with N=23. Below the 30-trade threshold but accepted given mechanistic coherence and cross-regime contrast.

Phase 4 — Out-of-sample validation

Methods:

  • Walk-forward (train 2024, test 2025–2026)
  • Cross-ticker (SPY → QQQ, IWM, SPX)
  • Cross-period (2022–2023 if applicable)
  • Regime stability (multiple VIX environments)

Phase 5 — Live deployment

Method: Smallest viable size (1 contract). Track every trade against backtest expectation. Compare actual slippage to modeled. Document system-rule deviations.

Current status: no candidate has entered Phase 5. The research-to-execution gap is the highest-priority item on the open research list.

Qualification criteria & tiers

Configurations are classified into four tiers by how many qualification criteria they satisfy:

TierCriteria metTreatment
Candidate7 of 7Live-deployable, size carefully
Research-tier4–6 of 7Research-worthy, additional validation needed
Data pointBelow 4Data point only, not actionable
ClosedFalsified or validated-and-subsumed

Seven qualification criteria

MetricThresholdRationale
Profit factor> 1.25Below this, slippage variance flips negative
Win rate> 70%Psychological sustainability; small drawdown clusters
Max drawdownAccount-appropriateAccount survival under worst case
Sample size> 50 tradesStatistical reliability after filtering
Avg P/L per trade> $8 net slippageWorth cognitive load and capital tie-up
Monthly validationPasses all three regime testsCatches bimodal / regime-masked findings
Correlation (if stacking)< 0.5Real diversification, not amplification

Ranking hierarchy (when choosing among candidates)

PriorityMetricWhy
PrimaryProfit FactorRisk-adjusted return
SecondaryTotal P&L ($)Absolute dollars
TertiaryWin RateConsistency / psychology
CheckMax DrawdownTail risk / account survival
CheckAvg Win / Avg LossOutcome asymmetry
A configuration with PF 2.8, $37 avg P&L, 78% WR beats one with PF 1.6, $45 avg P&L, 85% WR. The second has better headline numbers but worse risk-adjusted return.
Validated candidate strategies

Strategies that have cleared the qualification criteria. Each includes batch-level progression showing how it emerged from broader research.

B65-A ★ — Morning Heavy-Down Reversal

Candidate Monthly validated

Short Put Spread, 9:40 AM entry, 5-hour hold, filtered to stressed regime reversal setups. Currently the flagship morning-session candidate.

Setup specification

StructureShort Put Spread (credit)
Short put−0.20 delta
Long put$5.00 below short put leg, exactly
Entry time9:40 AM ET
Exit rule5 hours (2:40 PM ET) or expiration
Filters (AND)Change% ≤ −0.5% · Open CHG% ≤ 0% · VIX prior close ≥ 20
Slippage$0.05/leg entry and exit

Validated metrics (2-year backtest)

Profit factor
1.78
Win rate
69.6%
Avg P/L
+$12
Max DD
−$105
Return on risk
48.9%
Sample
N=23

Mechanism

When VIX prior close ≥ 20, overnight fear is already priced in. A heavy gap down (Change% ≤ −0.5%) that continues through 9:40 represents the last wave of sell-first-think-later flow. These sellers typically exhaust within 5 hours, at which point put-side premium deflates and the SPS collects credit as IV normalizes.

The mechanism specifically requires pre-existing fear, not fresh fear. A calm-market gap down (VIX < 15) behaves differently — the gap itself generates new fear, continuation dominates, and SPS gets run over. This was confirmed in the VIX regime sweep (B64).

How it was discovered — batch progression

  1. B51-A (precursor): Heavy-down SPS, no VIX filter, 9:40 + 5hr. PF 1.35, avg P/L $3. Thin but positive.
  2. B61 — structure screen: Five morning down-day structures. Identified morning SPS as the most promising of five tested structures.
  3. B62 — delta sweep: 10δ/15δ/20δ/25δ/30δ on the B61 filter. Result U-shaped, peaked at 20δ (PF 1.31 vs 15δ's 1.25). Higher R:R at deeper deltas countered by worse tail risk.
  4. B63 — wing sweep without VIX filter: $3/$5/$10/$15/$20 wings on 20δ SPS. $10 wing won at PF 1.57. BUT: monthly equity curve showed 17-month underwater period followed by 7-month recovery. Failed Principle 6 regime-masking test.
  5. B64 — VIX regime sweep on B63-C: Five VIX regime variants. The result was decisive:
    VIX regimePFNVerdict
    ≥ 202.0223Winner
    15–200.9915Breakeven
    10–1501Insufficient data
    < 150.6715Losing
    No filter1.4639Regime-masked average
    The VIX filter is the gating variable. "Fear exhausts fast" requires actual fear in the tape.
  6. B65 — wing sweep with VIX filter: Re-swept wings with the VIX ≥ 20 filter active. Same 23 trades fired across all wings; only risk changed.
    WingPFAvg P/LReturn on Risk
    $5 (B65-A)1.78$1248.9%
    $102.02$1839.7%
    $152.12$2028.3%
    $202.15$2122.7%
    $252.17$2218.7%
    Per Principle 7, $5 wing wins on capital efficiency despite lower raw PF. This gave the methodology document its capital-efficiency rule.

Monthly validation evidence (on $10 wing variant; $5 wing pending)

  • Active months positive: 77.8% (7 of 9 months with fires)
  • Longest losing streak: 1 month
  • Max cumulative drawdown: −$101
  • Self-gated inactive months: 16 of 25 calendar months (protective)

Passes all three regime tests cleanly. Monthly validation on the $5-wing deployment variant specifically is an open item — expected to pass since the same trades fire, but needed for full candidate certification.

Falsification check — inverse symmetry (B66)

To confirm the mechanism is asymmetric rather than a general high-vol mean-reversion effect, the symmetric mirror was tested: Long Put Spread on heavy-up + continuation + VIX ≥ 20.

Result: PF 0.62, WR 27.8%, N=18. Thesis falsified. Up-day continuation trends further (does not reverse), confirming B65-A's mechanism is specifically down-day fear exhaustion, not generic reversal. See Closed Findings for full analysis.

Deployment caveats

  • Sample size (N=23) below the 50-trade minimum in qualification criteria. Accepted given the decisive regime contrast (PF 2.02 vs 0.67) and mechanistic coherence.
  • Live-test for 5–10 fires required before scaling size. Backtest slippage model may not match real Wealthsimple fills.
  • Correlation with B30-C (afternoon down-regime IC) not yet measured — may fire on overlapping days, reducing stackability.
  • Win rate 69.6% is just below the 70% qualification threshold. Within sampling variance at N=23.

B30-C — Afternoon Iron Condor, Down Regime

Candidate

Afternoon Iron Condor entered at 1:30 PM ET on down-regime days. Asymmetric construction (Put 15δ / Call 25δ) exploits put-side volatility skew. Held to expiration per the validated hold-to-expiry finding.

Setup specification

StructureIron Condor (asymmetric)
Short put−0.15 delta
Short call+0.25 delta
Wing width$10
Entry time1:30 PM ET
Exit ruleHold to expiration
RegimeDown-day regime (applied at 1:30 PM)

Validated metrics

Profit factor
2.24
Structure
IC P15/C25

Mechanism & lineage

Emerged from the afternoon IC research sequence that established 1:30 PM as the dominant entry time across ~100+ configurations. The asymmetric construction (tighter put side, wider call side) reflects the structural SPY skew — put-side IV is inflated by constant institutional protective buying, so selling tighter puts captures more of that premium while the wider call side reduces upside breach risk.

Sister finding B28-D (afternoon Jade Lizard on strong-up days, PF 4.16) represents the opposite regime — up-days benefit from call-side directional premium rather than symmetric condor construction.

B49-A — Morning Strong-Up

Candidate

Morning strategy on overnight gap > +0.5%. Strong-up opens tend to trend further through the morning, supporting directional premium capture.

Profit factor
2.92

Underlying structure and mechanism align with the asymmetric-reversal finding: up-day momentum continues (as confirmed by B66's falsification of the symmetric reversal hypothesis), making directional up-side strategies appropriate when filter fires.

Research-tier findings (partial validation)

Findings with meaningful positive signal that have not yet cleared all qualification criteria. Candidates for future validation work.

B53-E — Morning Slight-Up (gap 0 to +0.5%)

Research-tier

Morning premium strategy on slight overnight gap up. PF 1.34. Moderate edge but not yet cleared through full qualification. Pending: sample size confirmation, monthly validation, and cross-ticker check.

B63 family — Heavy-down SPS without VIX filter

Research-tier

Aggregate PF 1.46 over 2 years. Regime-masked (17-month underwater period). Superseded by B65-A with the VIX ≥ 20 filter. Kept as research-tier for completeness; do not deploy without the VIX gate.

B65 wing variants (B65-B through E)

Research-tier

Wider-wing variants of B65-A. All have higher raw PF (2.02–2.17) but worse capital efficiency. Retained as research-tier for edge cases where account size makes wider wings acceptable, but B65-A ($5 wing) is the default deployment target.

B51-A — Original morning heavy-down SPS (precursor)

Research-tier

Baseline heavy-down SPS (PF 1.35) that kicked off the research sequence leading to B65-A. Superseded but documented for lineage.

B64 VIX regime sweep (full family)

Research-tier

The VIX regime sweep on heavy-down SPS that identified VIX ≥ 20 as the gating variable. B64-A (VIX ≥ 20, $10 wing) is the direct parent of B65-A with alternate wing width.

Closed findings (falsified or superseded)

Questions that have been tested with sufficient rigor to close pending new evidence. These are not to be retested without a specific mechanistic reason — retesting without new data is the "confirmation bias trap."

B66 — Inverse symmetry FALSIFIED

Closed

Hypothesis: if heavy-down + continuation + VIX ≥ 20 reverses up (B65-A thesis), the symmetric heavy-up + continuation setup should reverse down — tradable via Long Put Spread.

Result:

Profit factor
0.62
Win rate
27.8%
Sample
N=18

Interpretation: the reversal mechanism is asymmetric. Three mechanistic reasons:

  • Volatility skew structure. Put-side IV is structurally inflated on SPY (constant institutional put buying for portfolio insurance). When selling puts into a panic, you're fading inflated premium. No equivalent inflation exists on the call side to fade.
  • VIX term dynamics. VIX tends to decline on up-days (hurts debit spreads via IV crush). VIX stabilizes on down-days (helps credit spreads).
  • Microstructure asymmetry. Gap-up + continuation days are driven by positive momentum flows (earnings beats, relief rallies, short covering) — trending behavior. Gap-down + continuation days are driven by fear — burnout behavior.

WR 27.8% is the telling metric. You needed to call direction correctly on a short-duration debit spread; the market moved against you 72% of the time.

Implication: down-day reversal edge is real; up-day reversal is not. The two sides cannot be traded symmetrically.

Morning SPS in low-VIX regime (< 15)

Closed

Heavy-down + continuation filter with VIX < 15: PF 0.67, avg P/L −$10. Calm markets lack the fear-premium inflation that powers B65-A's reversal edge. Morning SPS should not be deployed when VIX is < 20.

Morning SPS unfiltered — regime-masked

Closed

Aggregate PF 1.46 over two years, but monthly equity curve reveals 17-month underwater period followed by 7-month recovery. All edge concentrated in the recovery regime. Without VIX ≥ 20 gate, strategy is bimodal and fails Principle 6. Superseded by B65-A.

Profit targets & stop losses on IC/IB/SPS/SCS

Closed

Every profit-target and stop-loss management variation tested (across dozens of configurations) reduced or eliminated edge relative to hold-to-expiration. Hold-to-expiry is the validated approach. This is counterintuitive — managing losers feels prudent — but data is unambiguous.

Morning IC with short hold + slippage

Closed

Batches 2, 4, 6b established that morning IC entries with early exits (before afternoon) consistently produce negative edge once realistic slippage is applied. Early research findings that suggested otherwise were invalidated once $0.05/leg slippage was enforced.

9:35 AM entry beats 10:00 AM for unfiltered IC

Closed

Batch 3a vs 3b established the earlier entry wins on every metric for the unfiltered case. However, this finding is subsumed by the broader result that 1:30 PM entry beats every tested morning entry time (Batch 5). Afternoon IC is the correct lane for symmetric structures.

Premature exits (1:00 PM) destroy edge

Closed

Across IC, IB, BWIC, BW IB, and SPS structures (Batches 4, 6b), exiting at 1:00 PM consistently destroys edge. Hold-to-expiration or hold-to-late-afternoon minimum.

External research integration

Findings from external sources that have been integrated as reference data or additional structural candidates. External research is treated distinctly from proprietary backtests — it's either reference data (descriptive, not tested) or external strategy (tested framework from another practitioner, integrated but not validated in our engine).

Peggy Bank framework (Jack Sokin)

External strategy · EXT-02

Jack Sokin's Peggy Bank uses percentage-OTM strike selection with a reward/risk cascade (0.5% / 0.4% / 0.3% OTM) rather than delta or fixed-dollar offsets. This is a structurally different parameterization from our research family.

Integration status: %-OTM structures added to the Decision Engine's structure library alongside dollar-offset variants. Not yet independently backtested as a standalone candidate. Initial engine output suggests Peggy-style structures converge on similar answers to dollar-offset variants when analog days are well-matched (~0.75% OTM SPS ≈ $5 OTM SPS at SPY $710).

Why %-OTM matters: more robust to SPY price drift over time. A $3 OTM strike at SPY $400 has very different meaning from $3 OTM at SPY $700. Percentage-based parameterization normalizes across price regimes.

PEG containment study (Jack Sokin)

Reference data

PEG study provides containment probabilities for SPX over different lookback windows (30D / 90D / 180D) and entry times. Key confirmed patterns:

  • Later entry time → higher containment (supports our 1:30 PM afternoon IC finding)
  • Longer lookback → higher containment (makes sense — more time for the market to settle)
  • R:R ratio filter is a novel filter dimension our research has not tested

Status: reference data, not integrated as a filter. Worth testing as a future research direction.

Liquidity ranking (220 tickers, composite score)

Reference data

External composite liquidity ranking for 220 tickers validates the SPX → SPY → QQQ → IWM → XSP hierarchy. Quantitatively backs the transfer-gap hypothesis: findings on the top 3–5 tickers transfer cleanly across the family; findings on lower-liquidity tickers are less reliable.

SPX 1DTE overnight range probabilities

Reference data

Empirical overnight range probabilities for SPX 1DTE by entry time. Reference data, not used as a filter. Potentially useful for bridging between 0DTE research and 1DTE applications.

Decision engine & containment framework

Beyond the validated candidate strategies, research has produced a daily decision-support framework that extends the candidate-filter approach with historical analog matching. This complements (does not replace) the validated filters.

Philosophy — containment over P/L prediction

Traditional backtest analysis asks "what is the expected P/L of trade X today?" This requires modeling options prices, IV surfaces, and slippage accurately — a tall order with free-tier data.

The containment framework reframes the question: "On the historical days most similar to today, what fraction of them closed inside a given structure's profit zone?" This is a binary outcome question (contained / max loss / partial) that can be answered purely from underlying price data without options pricing models.

Don't predict P/L. Predict containment. The trade edge expresses itself across many decisions conditional on containment, not via accurate per-trade P/L forecasts.

Framework components

  1. Master feature dataset (500 days × 72 features): gap, open momentum at 9:35/9:40/9:45/10:00, RSI(14), ATR(14), MAs (20/50/200), VIX level and 60-day percentile rank, cross-asset prior-day moves (QQQ/IWM/TLT/UUP), intraday checkpoints every 30 minutes from 9:30 to 4:00.
  2. Analog search: strict hard-filter mode first (gap ±0.3%, open momentum ±0.15%, VIX ±3), falls back to k-nearest weighted distance if fewer than 15 strict matches.
  3. Structure sweep: evaluates ~75 structure variants across symmetric IC, asymmetric IC, SPS, SCS, Iron Butterfly — with both dollar-offset and percentage-OTM parameterizations.
  4. Containment ranking: for each structure, computes contained% (closed inside profit zone) and max loss% on the analog set. Net edge = contained% − max_loss%.
  5. Validated strategy augmentation: overlays whether B65-A, B49-A, B53-E filters fire on today's setup. Analog engine and validated filters produce independent signals; tier decision uses both.

Tier decision logic

TierTriggerAction
GREENValidated filter fires AND top structure ≥ 80% containment on analogsStandard size
YELLOWOne signal strong, the other mixedReduced size (½ contract)
ORANGENo validated fire, top structure 60–80%Small experimental or skip
REDNo validated fire AND top structure < 60%Stand down

Honest limitations

  • No credit estimation. Containment ≠ P/L. Real fills, slippage, and credit collected still determine actual returns.
  • Analog heuristics are not optimized. Filter widths and soft weights are chosen reasonably but not validated. Worth tuning over time.
  • Sample size variability. Some days produce 50+ analogs; others force fallback to k-nearest with weaker matches. Trust decisions less when fallback is used.
  • Assumes regime continuity. 2024–2026 patterns drive the analog base. Market structure changes (real bear market, rate cycle reversal, 0DTE flow shifts) could invalidate priors temporarily.
Open research questions

Untested or insufficiently tested questions, priority ranked.

  1. Live execution gap for B65-A — highest priority. Every validated strategy is still a backtest. One real fire teaches more than ten additional backtests. Execution plan: single contract, document fill vs modeled slippage, compare actual credit to backtest.
  2. B65-A and B30-C correlation analysis — do they fire on overlapping days? Determines whether they stack or duplicate.
  3. Monthly validation on B65-A $5-wing specifically — $10-wing passed cleanly; $5-wing is the deployment target and needs its own monthly breakdown.
  4. Peggy Bank %-OTM validation as standalone candidate — integrated into engine, not yet backtested independently.
  5. Day-of-week stratification — diagnostics suggest Thursday and Monday behave differently on down-gap mornings.
  6. Cross-ticker validation of B65-A — does the thesis hold on QQQ, IWM, XSP?
  7. R:R filter dimension — novel filter implied by Peggy Bank framework; not tested in our research family.
  8. Long volatility structures — straddle/strangle untested in any framework.
  9. Walk-forward validation — train on 2024, test on 2025–2026, for each validated candidate.
  10. Asymmetric configurations tuned for morning conditions — we've tested asymmetric afternoon IC; morning asymmetric variants are less explored.
Anti-patterns to avoid

Research errors this methodology is explicitly designed to prevent. Each exists because we encountered it.

The "verdict trap"

Treating any single backtest as conclusive. Negative results don't kill ideas; they narrow the conditions where ideas might work.

The "best find" trap

Believing the highest PF in current data is the ceiling. The afternoon IC went from PF 1.4 to PF 2.82 through layered refinement. Today's "best" is tomorrow's baseline.

The "apples to oranges" trap

Comparing strategies tested under different conditions (slippage, time periods, regime filters). Always normalize comparison conditions before drawing conclusions.

The "more data wishful thinking" trap

Wanting larger sample sizes that don't exist. 0DTE SPY has only ~3.5 years of real data. Cross-ticker and cross-period validation substitute for raw N where appropriate.

The "backtest mid-fills" trap

Running backtests without slippage and treating results as achievable. Realistic execution cost destroys naive edges. Always use $0.05/leg minimum.

The "stacking without correlation" trap

Adding multiple positive-PF strategies without checking if they lose on the same days. Combined drawdown can be worse than expected if correlations are high.

The "confirmation bias" trap

Running additional tests to confirm a desired result rather than challenge a finding. Best practice: actively look for tests that could falsify your current hypothesis.

The "cognitive load" trap

Adding strategies because they're profitable, even if execution complexity exceeds capacity. A single PF 2.0 strategy executed flawlessly beats three PF 1.4 strategies executed sloppily.

The "premature web app" trap

Added this session. Building execution infrastructure before live-validating the strategy it's meant to support. The risk: polishing the delivery mechanism for edge that hasn't been proven to survive real markets.

Data & infrastructure

Backtest engine

Option Alpha: real options pricing, 5 simultaneous backtests for comparison, 2-year window typical, $0.01–$0.25/leg slippage configurable.

Live execution

Wealthsimple (CAD margin): 1.5% FX fee per USD conversion, wider spreads than US-direct brokers. Suitable for initial live testing; transferable to other brokers for scale.

Reference data sources

SourceDataNotes
Polygon (free tier)SPY, QQQ, IWM, TLT, UUP daily & intraday2-year history cap on free tier
FRED (St. Louis Fed)VIX daily closeFree, 3-year+ history available
Option AlphaHistorical options chains (indirect via backtester)Actual options pricing, not modeled
Economic calendarFOMC / CPI / NFP / GDP datesManual population from ForexFactory

Known data gaps (free tier)

  • Real-time intraday data — Polygon free is 15-minute delayed. Briefing tool uses prior-close data plus user-entered morning observables.
  • VIX intraday — not available on Polygon free tier. Daily VIX from FRED is sufficient for current research.
  • Options chain snapshots — requires Polygon Starter tier ($29/mo). Would enable direct credit/debit validation.
  • Dealer gamma exposure (GEX) — requires paid services (SpotGamma, SqueezeMetrics). Potentially useful for mechanism confirmation.

Output deliverables produced

  • Quantitative research report (Word format, versions 1 & 2)
  • Trading decision tool (HTML, local, versions 1 through 13)
  • Backtest database (Excel, 449 configurations tested)
  • Master feature dataset (Python / pandas, 500 days × 72 features)
  • Morning Briefing script (Python, terminal-based)
  • Decision Engine (Python, containment analysis)
  • Web application (this site, sections 1–2 live, sections 3–5 planned)
Methodology version history

Methodology is versioned explicitly. Each revision documents what changed and why.

VersionDateChanges
v1 Apr 16, 2026 Initial methodology document. Established after morning bridge research session demonstrated need for explicit framework to prevent premature conclusions. Seven principles, five phases, seven qualification criteria, nine anti-patterns.
v2 ~Apr 18, 2026 Added monthly validation formalization (Principle 6). Added Reference Data vs External Research distinction (Principle 8 in some revisions). Added handoff-document discipline for session continuity.
v3 ~Apr 19, 2026 Refined qualification criteria to include out-of-sample validation and correlation. Expanded anti-patterns. Integrated morning research findings (B49, B53, B55 families) into candidate tier framework.
v3.1 Apr 21, 2026 Current. Added Principle 7 (capital-efficiency over raw PF) from B65 wing-sweep research. Added asymmetric reversal finding (B66 falsification). Added Decision Engine / containment framework as research-tier deliverable. Added "premature web app" anti-pattern.

Reopen criteria (closed findings)

A "closed" question can be reopened if:

  • Market structure changes meaningfully (real bear market, sustained regime shift)
  • New data periods become available
  • Cross-ticker or cross-period evidence contradicts prior finding
  • New filter dimensions reveal previously hidden edge

Example: B63 family was tentatively closed after monthly validation failure, then reopened when the VIX regime sweep (B64) revealed the regime-masked edge. B65-A is the result.