Research Methodology & Findings

Systematic quantitative research on SPY zero-day-to-expiration options strategies. Rigorous backtesting, layered filter validation, and explicit rules for promoting findings from data point to candidate-tier. Approximately 450+ configurations tested across ~30,000 simulated trades as of April 2026.

Research overview
Core principles
Phase framework
Qualification criteria
Validated candidates
Research-tier findings
Closed findings
External research
Decision engine
Open questions
Anti-patterns
Data & infrastructure
Version history

Research overview

This research program develops and validates systematic 0DTE options strategies on SPY, with findings cross-validated against QQQ, IWM, and SPX. All research uses Option Alpha's backtest engine with $0.05/leg slippage (representing disciplined limit-order execution with 5–10 second patience), two-year windows, and mandatory multi-regime validation before any strategy promotes to candidate tier.

The framework exists to prevent premature conclusions, enforce rigorous testing, and build genuine mechanistic understanding of market edge — not pattern-matching or wishful thinking.

Current research state (April 2026)

Configurations tested

450+

Simulated trades

~30,000

Candidate-tier

Research-tier

~12

Closed findings

15+

Live trades

The zero live trades metric is intentional. The research is complete enough to deploy, but every validated strategy is still a backtest. Live validation is the next phase — each real trade yields more information than dozens of additional backtests. Honest disclosure of this gap matters more than pretending the research is "done."

Key structural findings

Entry time dominates most other variables. 1:30 PM ET entries substantially outperform morning entries for Iron Condor strategies held to expiration.
VIX 20–25 is a universal "sweet spot" producing the strongest profit factors across tickers for IC strategies.
Asymmetric delta selection (Put 10–15δ / Call 20–35δ) outperforms symmetric configurations by exploiting structural put-side volatility skew.
Hold-to-expiration is mandatory. All profit target and stop loss combinations reduced or eliminated edge across dozens of tested configurations.
Morning strategies are regime-dependent. Morning SPS without a VIX regime filter is bimodal and unreliable; the same strategy with VIX ≥ 20 gate becomes PF 1.78 with clean monthly validation.
Down-day reversal is asymmetric. Stressed-down-day fear exhausts quickly (reversal edge exists); stressed-up-day continuation does not mean-revert (inverse test falsified at PF 0.63).

Core principles (7)

Seven principles govern all research decisions. They were established iteratively as research revealed specific failure modes; each exists because an actual research error triggered its creation.

Principle 1 — Tests are data points, not verdicts

A single backtest result — positive or negative — does not close a research question. It narrows the search space for that specific configuration under those specific conditions.

Wrong framing: "This produced 2.09 PF, it's the best we can get." "This structure is dead." "Let's lock this in as the rule."

Correct framing: "This configuration fails at these conditions; filters or alternative parameters may reveal different behavior." "This is our current best baseline; refinement could push it higher."

Principle 2 — Edge is found through layered refinement

Strategies don't reveal their full edge in a single test. They reveal it across a sequence: baseline → filter → regime gate → parameter optimization.

Example: the afternoon Iron Condor candidate (B30-C) wasn't discovered as PF 2.82. It emerged through:

Generic 1:30 PM IC → PF ~1.4
Hold-to-expiry rule applied → PF ~1.7
VIX regime filter (20–25) → PF 2.5+
Asymmetric delta (Put 10δ / Call 35δ) → PF 2.82

A baseline PF of 1.2 can become PF 2.0+ with the right filters applied. Starting points matter less than the refinement process.

Principle 3 — Predict before testing

Before every batch, expected ranking and PF range are committed explicitly. This serves as a calibration check that prevents hindsight bias and triggers investigation when results surprise.

Tracking prediction accuracy across batches reveals where intuition fails. For example, the B32 Change% sweep produced a near-perfect inverted ranking vs prediction: predicted A>B>E>C>D, actual D>B>E>C>A. The bullish-continuation mechanism was refuted. Calibration failures like this prompted the principle that predictions should be made sparingly.

Principle 4 — Realistic execution costs are non-negotiable

All research uses $0.05/leg slippage minimum. This represents disciplined limit-order execution with 5–10 second patience. Higher slippage variants ($0.07–$0.10/leg) are used for stress testing.

Mid-price backtests lie. A strategy that works only at theoretical mid-fills is not a strategy — it's a mirage that won't replicate in live trading. Several early research findings were invalidated when realistic slippage was applied.

Principle 5 — Negative correlation > stacked positive PF

Two strategies with PF 1.5 that lose money on the same days = one strategy levered up. Two strategies with PF 1.3 that lose on different days = real diversification.

Correlation analysis is mandatory before declaring two strategies stackable. Combined drawdown matters more than combined expected value. This is an open research question for B49-A (morning strong-up) and B28-D (afternoon strong-up) — both fire on the same Change% > +0.5% filter and could be highly correlated rather than additive.

Principle 6 — Regime-masking awareness

Monthly equity-curve validation is required before candidate promotion. Specifically:

≥ 60% of active months positive
Longest losing streak ≤ 3 months
No 15+ month bleed-then-recovery pattern

These tests catch strategies whose aggregate PF hides bimodal regime dependence. B54-D (slight-down LP, earlier batch) showed aggregate PF 1.03 — but monthly breakdown revealed a May 2024-Aug 2025 losing period offset by Sep 2025-Feb 2026 recovery. Two sub-regimes were masking each other. Applying the same check to B53-E (slight-up LP) confirmed its edge was NOT bimodal — real monthly stability.

The methodology explicitly treats self-gating (strategy skips days via filter) as protective, not a limitation. B49-A fires approximately 38 times per year (77 trades over 2 years); the remaining days are skipped by design. B28-D fires ~65 times per year (131 trades over 2 years) with the same filter.

Principle 7 — Capital efficiency trumps raw profit factor

When a sweep varies risk per trade (e.g., wing width on SPS), raw PF systematically favors wider wings — but max risk scales linearly while P/L scales sub-linearly. Return-on-risk is the correct tiebreaker.

Empirical example from the B49 wing width comparison:

Config	Wing	PF	Avg P/L	Max DD	Retail yield
B48-A	$5	2.46	$12	−$181	$24 (2-contract)
B49-A	$10	2.92	$19	−$202	$19 (1-contract)

Both: IC P25Δ/C15Δ at 9:40 AM, hold to 11:00 AM, filter Change% > +0.5%. Same underlying data, only wing width differs. N=77 trades each.

The $5-wing variant (B48-A, PF 2.46) yields $24/trade at 2-contract size vs $19/trade at 1-contract $10-wing (B49-A, PF 2.92) on retail account sizes. B49-A wins the profit-factor contest; B48-A wins the dollar-output contest at constrained margin. Both are valid candidates; choice depends on account size.

Research phase framework (5)

Research progresses through five phases. Each has explicit closure criteria to prevent premature promotion or endless refinement.

Phase 1 — Surface mapping

Goal: Identify all configurations showing positive expectancy or marginal positive signal.

Method: Test broad combinations of structure × entry time × exit time × delta × wings. Each batch isolates ONE variable and holds others constant (Option Alpha's 5-test comparison limit). Realistic slippage applied.

Closure: Phase 1 closes when major dimensions have been mapped sufficiently to identify Phase 2 candidates — NOT when "the answer" is found.

Phase 2 — Filter testing

Goal: Concentrate edge via individual filters.

Filter dimensions tested:

Volatility: VIX bands (10–15, 15–20, 20–25, 25–32, 32+), VIX trend (rising/falling 5-day), VVIX, IV rank
Technical: Price vs 10/20/50/200-day MAs, RSI bands, Bollinger Band position, ATR, opening range
Market structure: Gap size/direction, prior-day close position, day-of-week, days-since-1%-move, inside/outside days
Calendar: FOMC/CPI/NFP/GDP proximity, earnings season, quarter-end, holiday-shortened weeks
Cross-asset: USD/JPY, 10Y yield, sector rotation, BTC correlation, crude oil

Output: Filter heatmap per candidate. Filters producing >20% PF improvement are flagged for Phase 3.

Phase 3 — Combination testing

Goal: Combine validated filters to maximize edge without overfitting.

Sample size rules:

Single filter: minimum 50 trades remaining after filter
2-filter combination: minimum 30 trades
3-filter combination: minimum 20 trades (rare to justify)

B49-A (Change% > +0.5% alone) is a 1-filter specification with N=77 trades — clears all sample thresholds. Multi-filter combinations (e.g., Change% + VIX bands, Change% + day-of-week) have been tested but current candidates all use single-filter specifications.

Phase 4 — Out-of-sample validation

Methods:

Walk-forward (train 2024, test 2025–2026)
Cross-ticker (SPY → QQQ, IWM, SPX)
Cross-period (2022–2023 if applicable)
Regime stability (multiple VIX environments)

Phase 5 — Live deployment

Method: Smallest viable size (1 contract). Track every trade against backtest expectation. Compare actual slippage to modeled. Document system-rule deviations.

Current status: no candidate has entered Phase 5. The research-to-execution gap is the highest-priority item on the open research list.

Qualification criteria & tiers

Configurations are classified into four tiers by how many qualification criteria they satisfy:

Tier	Criteria met	Treatment
Candidate	7 of 7	Live-deployable, size carefully
Research-tier	4–6 of 7	Research-worthy, additional validation needed
Data point	Below 4	Data point only, not actionable
Closed	—	Falsified or validated-and-subsumed

Seven qualification criteria

Metric	Threshold	Rationale
Profit factor	> 1.25	Below this, slippage variance flips negative
Win rate	> 70%	Psychological sustainability; small drawdown clusters
Max drawdown	Account-appropriate	Account survival under worst case
Sample size	> 50 trades	Statistical reliability after filtering
Avg P/L per trade	> $8 net slippage	Worth cognitive load and capital tie-up
Monthly validation	Passes all three regime tests	Catches bimodal / regime-masked findings
Correlation (if stacking)	< 0.5	Real diversification, not amplification

Ranking hierarchy (when choosing among candidates)

Priority	Metric	Why
Primary	Profit Factor	Risk-adjusted return
Secondary	Total P&L ($)	Absolute dollars
Tertiary	Win Rate	Consistency / psychology
Check	Max Drawdown	Tail risk / account survival
Check	Avg Win / Avg Loss	Outcome asymmetry

A configuration with PF 2.8, $37 avg P&L, 78% WR beats one with PF 1.6, $45 avg P&L, 85% WR. The second has better headline numbers but worse risk-adjusted return.

Validated candidate strategies

Strategies that have cleared the qualification criteria. Each includes batch-level progression showing how it emerged from broader research.

B49-A ★★★ — Morning Strong-Up Iron Condor

Candidate Monthly validated

Iron Condor with inverse-asymmetric deltas, 9:40 AM entry, 11:00 AM early close, filtered to strong-up overnight gap setups. Currently the most robust morning-session finding in the database.

Setup specification

Structure	Iron Condor, asymmetric deltas
Put side	Short 25 delta / Long 25δ − $10 (wing width $10)
Call side	Short 15 delta / Long 15δ + $10 (wing width $10)
Entry	9:40 AM ET
Exit	11:00 AM ET (1h 20m hold — early close, NOT held to expiration)
Filter	Change% > +0.5% (SPY opens at least 0.5% above prior close)
Slippage	$0.05/leg

Validated metrics (2-year backtest)

Profit factor

2.92

Win rate

76.6%

Sample

N=77

Avg P&L

$19

Max drawdown

−$202

Mechanism

Strong-up opens continue trending for approximately 90 minutes rather than mean-reverting. Exit at 11:00 AM captures the continuation theta without holding through midday reversal risk. The inverse asymmetry (puts tighter at 25Δ, calls further OTM at 15Δ) reflects a specific market reality: on strong-up days, overnight fear-premium on the put side is maximally inflated at open and deflates sharply — capturing that deflation is where the edge is. Call-side premium is minimal (market is already at intraday highs) and carrying wider call wings sacrifices little.

Monthly validation evidence

MOST ROBUST MORNING FINDING IN DATABASE. 18 of 23 active months positive (78%), worst month −$101, longest losing streak only 2 months. No regime-masking pattern. Top 3 months removed still leaves +$671 over 20 months. Filter self-gates: months with zero strong-up days produced zero trades and zero risk (Feb 2025, Sep 2025). Strategy cannot fire when its edge conditions are absent.

Margin-adjusted tradeoff

At retail account scale: 1×$10-wide (B49-A as specified) uses ~$900 margin vs ~$450 for a 2×$5-wide variant (B48-A). On $1,798 account: B48-A gives $24/trade on strong-up days vs B49-A's $19/trade. B49-A has the better profit factor; B48-A has the better dollar output at retail sizing. Both are documented candidates.

Deployment caveats

Entry timing is structure-specific: the 9:40 peak falls to PF 1.47 at 9:41 (one minute wide). Materially tightens execution window for live deployment.
Afternoon B28-D (Jade Lizard) fires on the same Change% > +0.5% filter — these two form a natural same-day strong-up pairing, with ~2.5 hours of capital freed between the 11:00 AM close and 2:30 PM entry.
Live execution validation still pending. Zero real fires logged.

B53-E ★★★ — Morning Slight-Up Long Put

Candidate Monthly validated

Long Put (single leg, buy premium, not a spread), 9:40 AM entry, 11:00 AM early close, filtered to tepid overnight gap-up setups. Fills the slight-up morning regime gap.

Setup specification

Structure	Long Put — single put option purchased outright
Delta	40 delta (near-the-money, higher delta for more sensitivity to move)
Entry	9:40 AM ET
Exit	11:00 AM ET (1h 20m hold)
Filter	Change% in (0%, +0.5%] (SPY opens slightly above prior close)
Slippage	$0.05/leg

Validated metrics (2-year backtest)

Profit factor

1.34

Win rate

39.7%

Sample

N=234

Avg P&L

$11

Max drawdown

−$1,118

Mechanism

Slight-up overnight gaps tend to mean-revert through midday more reliably than they continue. A long put at 40Δ captures this reversion. The structurally low win rate (40%) is expected for long-premium strategies — the edge comes from R:R asymmetry (2.0x — Avg Win $95, Avg Loss $36), not win frequency. The delta sweep produced a coherent monotonic pattern (PF 1.27 to 1.34 across 15-40δ variants), validating real edge vs. noise.

Monthly validation evidence

16 of 25 months positive (64%), longest losing streak 3 months, no regime-masking pattern. Edge is real and regime-stable. Caveat: concentration risk — top 3 months (Feb 2026, Jan 2026, Jul 2024) contribute 73% of total P/L. Top-3-removed = +$767 over 22 months (still positive but thin). Deploy with sizing discipline.

Tier positioning

Tier: one notch below the afternoon champions (B28-D at PF 4.16, B30-C at PF 2.24). Candidate-tier under the long-premium-adjusted criteria (Win$/Loss$ ratio ≥ 1.5 replaces the standard WR > 70% threshold, since long-premium is structurally low-WR).

B28-D ★★★ — Afternoon Strong-Up Jade Lizard

Candidate Monthly validated

HIGHEST STABLE PROFIT FACTOR IN THE ENTIRE DATABASE. Jade Lizard with symmetric deltas, 2:30 PM entry, hold to expiration, filtered to strong-up days.

Setup specification

Structure	Jade Lizard — short put + short call + long call (no long put, simulated-naked put side)
Put side	Short 15 delta, Long put $100 below (simulated-naked via extremely wide wing)
Call side	Short 15 delta, Long call $5 above (defined-risk upside)
Entry	2:30 PM ET
Exit	Hold to expiration (4:00 PM close)
Filter	Change% > +0.5% (same filter as B49-A morning)
Slippage	$0.05/leg

Validated metrics (2-year backtest)

Profit factor

4.16

Win rate

91.6%

Sample

N=131

Max DD

−$248

Return/DD

1,014%

Mechanism

Strong-up days benefit disproportionately from the Jade Lizard's asymmetric structure. The wide put wing (simulated-naked) absorbs the rare reversal fully. The tight $5 call wing captures full theta on continued upside. 120W/11L over 131 trades. Return on Drawdown of 1,014% is exceptional.

Monthly validation evidence

21 of 25 months profitable. The 4 losing months were all small: Aug 2024 −$180, Dec 2024 −$96, Jul 2025 −$64, Nov 2025 −$10. Top 3 months removed still leaves +$1,079 positive — edge is not concentrated in a few outlier months.

Critical finding: the filter did NOT cherry-pick bad months. December 2024 had 4 trades within filter (vs 20 unfiltered) and performed near-normal (−$96 vs −$805 for unfiltered). February 2025 had only 3 trades and was actually positive (+$59). The filter identifies bad DAYS via prior-day behavior, not bad months. Real structural edge, not regime luck.

Deployment gate

3-year validation on 2023 data still pending per methodology. Natural pairing with B49-A morning: both fire on Change% > +0.5%, so a strong-up day produces a two-trade sequence with capital recycling between 11:00 AM and 2:30 PM.

B30-C ★★★ — Afternoon Down-Regime Iron Condor

Candidate

Iron Condor with asymmetric deltas ($10 wings), 1:30 PM entry, hold to expiration, filtered to down-day afternoon setups. Fills the down-regime gap in the three-regime afternoon portfolio.

Setup specification

Structure	Iron Condor, asymmetric deltas
Put side	Short 15 delta / Long 15δ − $10 (wing width $10)
Call side	Short 25 delta / Long 25δ + $10 (wing width $10)
Entry	1:30 PM ET
Exit	Hold to expiration (4:00 PM close)
Filter	Change% < −0.1% (down day)
Slippage	$0.05/leg

Validated metrics (2-year backtest)

Profit factor

2.24

Win rate

82.5%

Sample

N=177

Avg P&L

$27

Max DD

−$752

Mechanism

Down-day afternoon entries catch theta decay through close after the initial down move has played out. The asymmetric wings reflect the regime: on down-days, the put side has already been tested (stress absorbed), while the call side is the safer premium collection opportunity. Return on Drawdown 627%.

Comparative

Beats symmetric IC $5-wide down-day (PF 2.08) AND beats Jade Lizard down-day (PF 1.68). Specific combination of $10 wings and inverse asymmetry (vs B29-D's P15/C25 $5w) is the winning configuration.

Monthly validation pending — not yet run.

B31-A ★★★ — Afternoon Flat-Regime Iron Butterfly

Candidate

Iron Butterfly ATM, $5 wings, 1:30 PM entry, hold to expiration, filtered to flat days. Fills the flat-regime gap in the three-regime afternoon portfolio.

Setup specification

Structure	Iron Butterfly — both shorts at-the-money
Strikes	Short put ATM / Short call ATM / Long put ATM − $5 / Long call ATM + $5
Entry	1:30 PM ET
Exit	Hold to expiration (4:00 PM close)
Filter	Change% in [−0.5%, +0.5%] (flat day)
Slippage	$0.05/leg

Validated metrics (2-year backtest)

Profit factor

1.97

Win rate

71.4%

Sample

N=276

Worst loss

−$354

Return/DD

931%

Mechanism

Iron butterfly at ATM is structurally purpose-built for narrow-range days. The 1:30 PM entry catches the afternoon theta burn without morning gap risk. Previously overlooked because IB had been tested at 10:00 AM (PF 0.99, losing). 1:30 PM is the correct entry time — entry timing dominates structure at 0DTE.

Methodology flag ⚠

B31-A filter (Change% −0.5% to +0.5%) overlaps with B30-C filter (Change% < −0.1%) on days where Change% falls between −0.5% and −0.1%. On those days, both strategies fire. This requires an overlap diagnostic before a clean three-regime portfolio claim can be made. Currently: clears 5/7 candidate criteria; pending out-of-sample and correlation testing.

B51-A — Morning Heavy-Down SPS (research-tier)

Research-tier

Short Put Spread 10Δ, $10 wings, 9:40 AM entry, 11:00 AM early close, filter Change% < −0.5%. PF 1.35, WR 66.1%, N=62, Avg P/L $3, MaxDD −$265.

First positive morning heavy-down finding in the database. Mechanism: on fear regimes (heavy-down), tighter put delta (further OTM at 10Δ) wins because cushion matters more than premium capture. On bullish regimes, closer-to-money put delta wins because premium capture matters more.

Fails 4 of 7 candidate criteria (WR and Avg P/L thresholds). Research-tier only — not deployment-ready. Currently the best morning heavy-down result; no candidate-tier morning heavy-down strategy exists.

Closed findings (falsified or superseded)

Questions that have been tested with sufficient rigor to close pending new evidence. These are not to be retested without a specific mechanistic reason — retesting without new data is the "confirmation bias trap."

Profit targets & stop losses on IC/IB/SPS/SCS

Closed

Every profit-target and stop-loss management variation tested (across dozens of configurations) reduced or eliminated edge relative to hold-to-expiration. Hold-to-expiry is the validated approach. This is counterintuitive — managing losers feels prudent — but data is unambiguous.

Morning IC with short hold + slippage

Closed

Batches 2, 4, 6b established that morning IC entries with early exits (before afternoon) consistently produce negative edge once realistic slippage is applied. Early research findings that suggested otherwise were invalidated once $0.05/leg slippage was enforced.

9:35 AM entry beats 10:00 AM for unfiltered IC

Closed

Batch 3a vs 3b established the earlier entry wins on every metric for the unfiltered case. However, this finding is subsumed by the broader result that 1:30 PM entry beats every tested morning entry time (Batch 5). Afternoon IC is the correct lane for symmetric structures.

Premature exits (1:00 PM) destroy edge

Closed

Across IC, IB, BWIC, BW IB, and SPS structures (Batches 4, 6b), exiting at 1:00 PM consistently destroys edge. Hold-to-expiration or hold-to-late-afternoon minimum.

External research integration

Findings from external sources that have been integrated as reference data or additional structural candidates. External research is treated distinctly from proprietary backtests — it's either reference data (descriptive, not tested) or external strategy (tested framework from another practitioner, integrated but not validated in our engine).

Peggy Bank framework (Jack Sokin)

External strategy · EXT-02

Jack Sokin's Peggy Bank uses percentage-OTM strike selection with a reward/risk cascade (0.5% / 0.4% / 0.3% OTM) rather than delta or fixed-dollar offsets. This is a structurally different parameterization from our research family.

Integration status: %-OTM structures added to the Decision Engine's structure library alongside dollar-offset variants. Not yet independently backtested as a standalone candidate. Initial engine output suggests Peggy-style structures converge on similar answers to dollar-offset variants when analog days are well-matched (~0.75% OTM SPS ≈ $5 OTM SPS at SPY $710).

Why %-OTM matters: more robust to SPY price drift over time. A $3 OTM strike at SPY $400 has very different meaning from $3 OTM at SPY $700. Percentage-based parameterization normalizes across price regimes.

PEG containment study (Jack Sokin)

Reference data

PEG study provides containment probabilities for SPX over different lookback windows (30D / 90D / 180D) and entry times. Key confirmed patterns:

Later entry time → higher containment (supports our 1:30 PM afternoon IC finding)
Longer lookback → higher containment (makes sense — more time for the market to settle)
R:R ratio filter is a novel filter dimension our research has not tested

Status: reference data, not integrated as a filter. Worth testing as a future research direction.

Liquidity ranking (220 tickers, composite score)

Reference data

External composite liquidity ranking for 220 tickers validates the SPX → SPY → QQQ → IWM → XSP hierarchy. Quantitatively backs the transfer-gap hypothesis: findings on the top 3–5 tickers transfer cleanly across the family; findings on lower-liquidity tickers are less reliable.

SPX 1DTE overnight range probabilities

Reference data

Empirical overnight range probabilities for SPX 1DTE by entry time. Reference data, not used as a filter. Potentially useful for bridging between 0DTE research and 1DTE applications.

Decision engine & containment framework

Beyond the validated candidate strategies, research has produced a daily decision-support framework that extends the candidate-filter approach with historical analog matching. This complements (does not replace) the validated filters.

Philosophy — containment over P/L prediction

Traditional backtest analysis asks "what is the expected P/L of trade X today?" This requires modeling options prices, IV surfaces, and slippage accurately — a tall order with free-tier data.

The containment framework reframes the question: "On the historical days most similar to today, what fraction of them closed inside a given structure's profit zone?" This is a binary outcome question (contained / max loss / partial) that can be answered purely from underlying price data without options pricing models.

Don't predict P/L. Predict containment. The trade edge expresses itself across many decisions conditional on containment, not via accurate per-trade P/L forecasts.

Framework components

Master feature dataset (500 days × 72 features): gap, open momentum at 9:35/9:40/9:45/10:00, RSI(14), ATR(14), MAs (20/50/200), VIX level and 60-day percentile rank, cross-asset prior-day moves (QQQ/IWM/TLT/UUP), intraday checkpoints every 30 minutes from 9:30 to 4:00.
Analog search: strict hard-filter mode first (gap ±0.3%, open momentum ±0.15%, VIX ±3), falls back to k-nearest weighted distance if fewer than 15 strict matches.
Structure sweep: evaluates ~75 structure variants across symmetric IC, asymmetric IC, SPS, SCS, Iron Butterfly — with both dollar-offset and percentage-OTM parameterizations.
Containment ranking: for each structure, computes contained% (closed inside profit zone) and max loss% on the analog set. Net edge = contained% − max_loss%.
Validated strategy augmentation: overlays whether the 5 candidate-tier strategies (B49-A, B53-E, B28-D, B30-C, B31-A) and research-tier B51-A fire on today's setup. Analog engine and validated filters produce independent signals; tier decision uses both.

Tier decision logic

Tier	Trigger	Action
GREEN	Validated filter fires AND top structure ≥ 80% containment on analogs	Standard size
YELLOW	One signal strong, the other mixed	Reduced size (½ contract)
ORANGE	No validated fire, top structure 60–80%	Small experimental or skip
RED	No validated fire AND top structure < 60%	Stand down

Honest limitations

No credit estimation. Containment ≠ P/L. Real fills, slippage, and credit collected still determine actual returns.
Analog heuristics are not optimized. Filter widths and soft weights are chosen reasonably but not validated. Worth tuning over time.
Sample size variability. Some days produce 50+ analogs; others force fallback to k-nearest with weaker matches. Trust decisions less when fallback is used.
Assumes regime continuity. 2024–2026 patterns drive the analog base. Market structure changes (real bear market, rate cycle reversal, 0DTE flow shifts) could invalidate priors temporarily.

Open research questions

Untested or insufficiently tested questions, priority ranked.

Live execution gap for all candidates — highest priority. Every validated strategy is still a backtest. One real fire teaches more than ten additional backtests. Execution plan: single contract starting with B49-A (morning) and B28-D (afternoon) on strong-up days, document fill vs modeled slippage, compare actual credit to backtest.
B49-A and B28-D correlation analysis — both fire on Change% > +0.5% so they correlate by construction. Question is whether their intraday P/L correlates beyond the filter overlap. Determines whether they stack or duplicate.
Monthly validation on B30-C and B31-A — these candidates clear structural criteria but have not been through monthly breakdown yet. B28-D, B49-A, and B53-E are already monthly-validated.
Peggy Bank %-OTM validation as standalone candidate — integrated into engine, not yet backtested independently.
Day-of-week stratification — diagnostics suggest Thursday and Monday behave differently on down-gap mornings.
Cross-ticker validation of B49-A and B28-D — does the strong-up edge hold on QQQ, IWM, XSP? B30-C has partial support via QQQ VIX 20-25 work (PF 2.90) but strong-up morning IC has only been SPY-validated.
R:R filter dimension — novel filter implied by Peggy Bank framework; not tested in our research family.
Long volatility structures — straddle/strangle untested in any framework.
Walk-forward validation — train on 2024, test on 2025–2026, for each validated candidate.
Asymmetric configurations tuned for morning conditions — we've tested asymmetric afternoon IC; morning asymmetric variants are less explored.

Anti-patterns to avoid

Research errors this methodology is explicitly designed to prevent. Each exists because we encountered it.

The "verdict trap"

Treating any single backtest as conclusive. Negative results don't kill ideas; they narrow the conditions where ideas might work.

The "best find" trap

Believing the highest PF in current data is the ceiling. The afternoon IC went from PF 1.4 to PF 2.82 through layered refinement. Today's "best" is tomorrow's baseline.

The "apples to oranges" trap

Comparing strategies tested under different conditions (slippage, time periods, regime filters). Always normalize comparison conditions before drawing conclusions.

The "more data wishful thinking" trap

Wanting larger sample sizes that don't exist. 0DTE SPY has only ~3.5 years of real data. Cross-ticker and cross-period validation substitute for raw N where appropriate.

The "backtest mid-fills" trap

Running backtests without slippage and treating results as achievable. Realistic execution cost destroys naive edges. Always use $0.05/leg minimum.

The "stacking without correlation" trap

Adding multiple positive-PF strategies without checking if they lose on the same days. Combined drawdown can be worse than expected if correlations are high.

The "confirmation bias" trap

Running additional tests to confirm a desired result rather than challenge a finding. Best practice: actively look for tests that could falsify your current hypothesis.

The "cognitive load" trap

Adding strategies because they're profitable, even if execution complexity exceeds capacity. A single PF 2.0 strategy executed flawlessly beats three PF 1.4 strategies executed sloppily.

The "premature web app" trap

Added this session. Building execution infrastructure before live-validating the strategy it's meant to support. The risk: polishing the delivery mechanism for edge that hasn't been proven to survive real markets.

Data & infrastructure

Backtest engine

Option Alpha: real options pricing, 5 simultaneous backtests for comparison, 2-year window typical, $0.01–$0.25/leg slippage configurable.

Live execution

Wealthsimple (CAD margin): 1.5% FX fee per USD conversion, wider spreads than US-direct brokers. Suitable for initial live testing; transferable to other brokers for scale.

Reference data sources

Source	Data	Notes
Polygon (free tier)	SPY, QQQ, IWM, TLT, UUP daily & intraday	2-year history cap on free tier
FRED (St. Louis Fed)	VIX daily close	Free, 3-year+ history available
Option Alpha	Historical options chains (indirect via backtester)	Actual options pricing, not modeled
Economic calendar	FOMC / CPI / NFP / GDP dates	Manual population from ForexFactory

Known data gaps (free tier)

Real-time intraday data — Polygon free is 15-minute delayed. Briefing tool uses prior-close data plus user-entered morning observables.
VIX intraday — not available on Polygon free tier. Daily VIX from FRED is sufficient for current research.
Options chain snapshots — requires Polygon Starter tier ($29/mo). Would enable direct credit/debit validation.
Dealer gamma exposure (GEX) — requires paid services (SpotGamma, SqueezeMetrics). Potentially useful for mechanism confirmation.

Output deliverables produced

Quantitative research report (Word format, versions 1 & 2)
Trading decision tool (HTML, local, versions 1 through 13)
Backtest database (Excel, 449 configurations tested)
Master feature dataset (Python / pandas, 500 days × 72 features)
Morning Briefing script (Python, terminal-based)
Decision Engine (Python, containment analysis)
Web application (this site, sections 1–2 live, sections 3–5 planned)

Methodology version history

Methodology is versioned explicitly. Each revision documents what changed and why.

Version	Date	Changes
v1	Apr 16, 2026	Initial methodology document. Established after morning bridge research session demonstrated need for explicit framework to prevent premature conclusions. Seven principles, five phases, seven qualification criteria, nine anti-patterns.
v2	~Apr 18, 2026	Added monthly validation formalization (Principle 6). Added Reference Data vs External Research distinction (Principle 8 in some revisions). Added handoff-document discipline for session continuity.
v3	~Apr 19, 2026	Refined qualification criteria to include out-of-sample validation and correlation. Expanded anti-patterns. Integrated morning research findings (B49, B53, B55 families) into candidate tier framework.
v3.1	Apr 21, 2026	Current. Added Principle 7 (capital-efficiency over raw PF) from B49 wing-width comparison. Integrated inverse asymmetry finding (B48-A, P25/C15 beats P15/C25 for strong-up mornings). Added Decision Engine / containment framework as research-tier deliverable. Added "premature web app" anti-pattern.

Reopen criteria (closed findings)

A "closed" question can be reopened if:

Market structure changes meaningfully (real bear market, sustained regime shift)
New data periods become available
Cross-ticker or cross-period evidence contradicts prior finding
New filter dimensions reveal previously hidden edge

Example: morning Iron Butterfly was tentatively closed in early batches (10:00 AM entry produced PF 0.99) but reopened when B31-D tested IB at 1:30 PM and found PF 1.67 unfiltered. B31-A (flat-filter variant at PF 1.97) is the candidate-tier result.

Research Methodology & Findings

Contents

Current research state (April 2026)

Key structural findings

Principle 1 — Tests are data points, not verdicts

Principle 2 — Edge is found through layered refinement

Principle 3 — Predict before testing

Principle 4 — Realistic execution costs are non-negotiable

Principle 5 — Negative correlation > stacked positive PF

Principle 6 — Regime-masking awareness

Principle 7 — Capital efficiency trumps raw profit factor

Phase 1 — Surface mapping

Phase 2 — Filter testing

Phase 3 — Combination testing

Phase 4 — Out-of-sample validation

Phase 5 — Live deployment

Seven qualification criteria

Ranking hierarchy (when choosing among candidates)

B49-A ★★★ — Morning Strong-Up Iron Condor

Setup specification

Validated metrics (2-year backtest)

Mechanism

Monthly validation evidence

Margin-adjusted tradeoff

Deployment caveats

B53-E ★★★ — Morning Slight-Up Long Put

Setup specification

Validated metrics (2-year backtest)

Mechanism

Monthly validation evidence

Tier positioning

B28-D ★★★ — Afternoon Strong-Up Jade Lizard

Setup specification

Validated metrics (2-year backtest)

Mechanism

Monthly validation evidence

Deployment gate

B30-C ★★★ — Afternoon Down-Regime Iron Condor

Setup specification

Validated metrics (2-year backtest)

Mechanism

Comparative

B31-A ★★★ — Afternoon Flat-Regime Iron Butterfly

Setup specification

Validated metrics (2-year backtest)

Mechanism

Methodology flag ⚠

B51-A — Morning Heavy-Down SPS (research-tier)

Profit targets & stop losses on IC/IB/SPS/SCS

Morning IC with short hold + slippage

9:35 AM entry beats 10:00 AM for unfiltered IC

Premature exits (1:00 PM) destroy edge

Peggy Bank framework (Jack Sokin)

PEG containment study (Jack Sokin)

Liquidity ranking (220 tickers, composite score)

SPX 1DTE overnight range probabilities

Philosophy — containment over P/L prediction

Framework components

Tier decision logic

Honest limitations

The "verdict trap"

The "best find" trap

The "apples to oranges" trap

The "more data wishful thinking" trap

The "backtest mid-fills" trap

The "stacking without correlation" trap

The "confirmation bias" trap

The "cognitive load" trap

The "premature web app" trap

Backtest engine

Live execution

Reference data sources

Known data gaps (free tier)

Output deliverables produced

Reopen criteria (closed findings)