Skip to content

End-to-end big data system for financial markets: ingest, transform, and visualize market & macro data

License

Notifications You must be signed in to change notification settings

Si944-byte/Finance-Data-OS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Finance Data OS

License: MIT
Python
Status
GitHub stars


🚀 Project Goal

To build a modular financial data platform that takes raw equities, options, and macroeconomic datasets and transforms them into analytics-ready features for trading and investment research.


🏗️ Architecture (Phase 1)

Untitled diagram _ Mermaid Chart-2025-09-04-222054

📂 Project Layout

docs/ # architecture diagrams, notes

artifacts/ # Power BI files, exported charts

Build Logs/ # Weekly build logs

notebooks/ # Jupyter notebooks (Week 1, Week 2, etc....)


🛣 Project Roadmap

This project is built and shipped in weekly artifacts. Each week delivers a small but meaningful piece of the pipeline.

Week 1 – Single-Ticker Prototype ✅

Week 2 – Multi-Ticker Ingest & Finance Chart ✅

Week 3 – Expanding History & Feature Store ✅

Week 4 – Signals, Backtest & 3-Page Power BI ✅

Week 5 – Costs, Controls & Tuning ✅

Week 6 - Rolling Metrics, Cost Modeling and Cleaner Pipeline ✅

Week 7 - Push-button backtest, parameter sweeps and one-command workflows ✅

Week 8 - Deterministic parameter tuning + clean analytics ✅

Week 9 - Performance Simulation & Validation ✅

Objective: Extend the Finance Data OS pipeline to simulate trade-level execution, validate equity reconciliation, and visualize performance metrics in Power BI.

Pipeline Summary:

Simulate: Load tuned parameters and signals; apply execution logic (slippage, commission, fees).

Validate: Check PK uniqueness, null policies, and reconciliation between trades and equity.

Visualize: Publish Power BI dashboards (Trade Blotter, Equity vs. Drawdown, KPI cards).

Artifacts Created:

/lake/trade_mart_v3/trade-test_2025w09.parquet

/lake/equity_curve_daily_v3/eq-test_2025w09.parquet

/lake/signals_mart_v3/combined_week9.parquet

/lake/tuning_mart_v3/combined_week9.parquet

Power BI Deliverables:

Trade Blotter (PnL, Slippage, Fees, Entry/Exit Reason)

Equity (NAV) vs. Drawdown (%) Chart

KPIs: Sharpe (252d), CAGR, Win Rate

Slicers: Run ID, Symbol, Entry Reason

Results:

Metric Value Sharpe (252d) 1.34 CAGR (%) 21.96 Win Rate (%) 59.43

Validation Summary: ✅ PK uniqueness ✅ Null policy ✅ Drawdown ≤ 0 ✅ Reconciliation (trades ↔ equity)


⚡ Quick Start (Follow along with me!)

  1. Clone the repo:

Quick start

  1. Set up virtual environment:

virtual environment

  1. Install Dependencies:

dependencies

  1. Run the notebooks:

Jupyter notebook


📝 Build Logs

Build Log – Week 1 https://github.com/Si944-byte/Finance-Data-OS/blob/main/Build%20Logs/Build%20Log%20wk1

Build Log – Week 2 https://github.com/Si944-byte/Finance-Data-OS/blob/main/Build%20Logs/Build%20Log%20wk2

Build Log - Week 3 https://github.com/Si944-byte/Finance-Data-OS/blob/main/Build%20Logs/Build%20Log%20wk3

Build Log - Week 4 https://github.com/Si944-byte/Finance-Data-OS/blob/main/Build%20Logs/Build%20Log%20wk4

Build Log - Week 5 https://github.com/Si944-byte/Finance-Data-OS/blob/main/Build%20Logs/Build%20Log%20wk5

Build Log - Week 6 https://github.com/Si944-byte/Finance-Data-OS/blob/main/Build%20Logs/Build%20Log%20wk6

Build Log - Week 7 https://github.com/Si944-byte/Finance-Data-OS/blob/main/Build%20Logs/Build%20Log%20wk7

Build Log - Week 8 https://github.com/Si944-byte/Finance-Data-OS/blob/main/Build%20Logs/Build%20Log%20wk8

Build Log - Week 9 https://github.com/Si944-byte/Finance-Data-OS/blob/main/Build%20Logs/Build%20Log%20wk9


🗺️ Week 9 — What's New

Pipeline Enhancements

  • Added full simulation-validation loop (simulate(), validate_run(), reconciliation parity).

  • Introduced trade_mart_v3 and equity_curve_daily_v3 with deterministic structure.

  • Implemented timestamp normalization (UTC-safe ordering).

Testing & Reliability

  • Expanded pytest coverage for validation, PK uniqueness, and reconciliation.

  • Enforced schema-on-append verification.

  • Validation summary now logs ✅ pass/fail states for all checks.

Visualization & Reporting

  • Built Trade Blotter table (PnL, Commission, Fees, Entry/Exit Reason).

  • Added Equity (NAV) vs Drawdown (%) chart with dual axis.

  • Created KPI cards: Sharpe (252d), CAGR %, Win Rate %.

  • Added slicers for Run ID, Symbol, Entry Reason for filtering and analysis.

Performance & Usability

  • Introduced --max-combos, --allow-large safety flags in simulation.

  • Batched Parquet writes for faster runs and cleaner logs.

  • Seed-controlled simulation for reproducibility.


🧠 What I Learned

Week 9 – Performance Simulation & Validation

  1. End-to-End System Thinking I learned how each mart (signals → tuning → trades → equity) connects as a complete system. Every step now produces validated outputs that feed the next stage — transforming raw data into a reliable simulation.

  2. Deterministic Design Matters Reproducibility isn’t optional. Setting seeds, controlling timezones, and enforcing schema validation ensured that identical inputs always yield identical outputs. It made debugging predictable and CI-safe.

  3. Validation is the Final Guardrail Having validation scripts that check PK uniqueness, null policies, and equity-trade reconciliation gave confidence that the pipeline’s math actually holds up. It shifted the mindset from “does it run” to “is it right.”

  4. Power BI as an Analysis Surface Building the Trade Blotter and Equity vs. Drawdown views clarified how to communicate system performance visually. Every KPI (Sharpe, CAGR, Win Rate) now ties directly to verified data — not estimates.

  5. Clean Models → Clear Insights Simplifying relationships to a star schema (Date → Facts, Symbol → Facts) made the visuals snap into place. Data lineage now feels intuitive rather than tangled.


🗂️ Artifacts (Week 8: current week)

Dashboard Pages:

Page 1 - Signals Screenshot 2025-10-28 094748

Page 2 - Back-test Screenshot 2025-10-28 094444

Page 3 - Tuning Results Screenshot 2025-10-28 094019

Page 4 - Performance Overview Screenshot 2025-10-28 093348

Page 5 - About Page Screenshot 2025-10-27 191326


🤝 Contributing

This is an open project for learning and sharing best practices in data engineering for financial markets. Suggestions, issues, and PRs are welcome.


📜 License

MIT License — see LICENSE for details.

About

End-to-end big data system for financial markets: ingest, transform, and visualize market & macro data

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published