To build a modular financial data platform that takes raw equities, options, and macroeconomic datasets and transforms them into analytics-ready features for trading and investment research.
📂 Project Layout
docs/ # architecture diagrams, notes
artifacts/ # Power BI files, exported charts
Build Logs/ # Weekly build logs
notebooks/ # Jupyter notebooks (Week 1, Week 2, etc....)
🛣 Project Roadmap
This project is built and shipped in weekly artifacts. Each week delivers a small but meaningful piece of the pipeline.
Week 1 – Single-Ticker Prototype ✅
Week 2 – Multi-Ticker Ingest & Finance Chart ✅
Week 3 – Expanding History & Feature Store ✅
Week 4 – Signals, Backtest & 3-Page Power BI ✅
Week 5 – Costs, Controls & Tuning ✅
Week 6 - Rolling Metrics, Cost Modeling and Cleaner Pipeline ✅
Week 7 - Push-button backtest, parameter sweeps and one-command workflows ✅
Week 8 - Deterministic parameter tuning + clean analytics ✅
Week 9 - Performance Simulation & Validation ✅
Objective: Extend the Finance Data OS pipeline to simulate trade-level execution, validate equity reconciliation, and visualize performance metrics in Power BI.
Pipeline Summary:
Simulate: Load tuned parameters and signals; apply execution logic (slippage, commission, fees).
Validate: Check PK uniqueness, null policies, and reconciliation between trades and equity.
Visualize: Publish Power BI dashboards (Trade Blotter, Equity vs. Drawdown, KPI cards).
Artifacts Created:
/lake/trade_mart_v3/trade-test_2025w09.parquet
/lake/equity_curve_daily_v3/eq-test_2025w09.parquet
/lake/signals_mart_v3/combined_week9.parquet
/lake/tuning_mart_v3/combined_week9.parquet
Power BI Deliverables:
Trade Blotter (PnL, Slippage, Fees, Entry/Exit Reason)
Equity (NAV) vs. Drawdown (%) Chart
KPIs: Sharpe (252d), CAGR, Win Rate
Slicers: Run ID, Symbol, Entry Reason
Results:
Metric Value Sharpe (252d) 1.34 CAGR (%) 21.96 Win Rate (%) 59.43
Validation Summary: ✅ PK uniqueness ✅ Null policy ✅ Drawdown ≤ 0 ✅ Reconciliation (trades ↔ equity)
⚡ Quick Start (Follow along with me!)
- Clone the repo:
- Set up virtual environment:
- Install Dependencies:
- Run the notebooks:
📝 Build Logs
Build Log – Week 1 https://github.com/Si944-byte/Finance-Data-OS/blob/main/Build%20Logs/Build%20Log%20wk1
Build Log – Week 2 https://github.com/Si944-byte/Finance-Data-OS/blob/main/Build%20Logs/Build%20Log%20wk2
Build Log - Week 3 https://github.com/Si944-byte/Finance-Data-OS/blob/main/Build%20Logs/Build%20Log%20wk3
Build Log - Week 4 https://github.com/Si944-byte/Finance-Data-OS/blob/main/Build%20Logs/Build%20Log%20wk4
Build Log - Week 5 https://github.com/Si944-byte/Finance-Data-OS/blob/main/Build%20Logs/Build%20Log%20wk5
Build Log - Week 6 https://github.com/Si944-byte/Finance-Data-OS/blob/main/Build%20Logs/Build%20Log%20wk6
Build Log - Week 7 https://github.com/Si944-byte/Finance-Data-OS/blob/main/Build%20Logs/Build%20Log%20wk7
Build Log - Week 8 https://github.com/Si944-byte/Finance-Data-OS/blob/main/Build%20Logs/Build%20Log%20wk8
Build Log - Week 9 https://github.com/Si944-byte/Finance-Data-OS/blob/main/Build%20Logs/Build%20Log%20wk9
🗺️ Week 9 — What's New
Pipeline Enhancements
-
Added full simulation-validation loop (simulate(), validate_run(), reconciliation parity).
-
Introduced trade_mart_v3 and equity_curve_daily_v3 with deterministic structure.
-
Implemented timestamp normalization (UTC-safe ordering).
Testing & Reliability
-
Expanded pytest coverage for validation, PK uniqueness, and reconciliation.
-
Enforced schema-on-append verification.
-
Validation summary now logs ✅ pass/fail states for all checks.
Visualization & Reporting
-
Built Trade Blotter table (PnL, Commission, Fees, Entry/Exit Reason).
-
Added Equity (NAV) vs Drawdown (%) chart with dual axis.
-
Created KPI cards: Sharpe (252d), CAGR %, Win Rate %.
-
Added slicers for Run ID, Symbol, Entry Reason for filtering and analysis.
Performance & Usability
-
Introduced --max-combos, --allow-large safety flags in simulation.
-
Batched Parquet writes for faster runs and cleaner logs.
-
Seed-controlled simulation for reproducibility.
Week 9 – Performance Simulation & Validation
-
End-to-End System Thinking I learned how each mart (signals → tuning → trades → equity) connects as a complete system. Every step now produces validated outputs that feed the next stage — transforming raw data into a reliable simulation.
-
Deterministic Design Matters Reproducibility isn’t optional. Setting seeds, controlling timezones, and enforcing schema validation ensured that identical inputs always yield identical outputs. It made debugging predictable and CI-safe.
-
Validation is the Final Guardrail Having validation scripts that check PK uniqueness, null policies, and equity-trade reconciliation gave confidence that the pipeline’s math actually holds up. It shifted the mindset from “does it run” to “is it right.”
-
Power BI as an Analysis Surface Building the Trade Blotter and Equity vs. Drawdown views clarified how to communicate system performance visually. Every KPI (Sharpe, CAGR, Win Rate) now ties directly to verified data — not estimates.
-
Clean Models → Clear Insights Simplifying relationships to a star schema (Date → Facts, Symbol → Facts) made the visuals snap into place. Data lineage now feels intuitive rather than tangled.
Dashboard Pages:
🤝 Contributing
This is an open project for learning and sharing best practices in data engineering for financial markets. Suggestions, issues, and PRs are welcome.
📜 License
MIT License — see LICENSE for details.








