Skip to content

Python portfolio project: download, clean & visualize real-world datasets with cybersecurity focus

License

Notifications You must be signed in to change notification settings

DigiFenix777/python-crash-course-downloading-data-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🌍 Data Visualization with CSV, JSON, and GeoJSON

Last Commit Top Language Repo Size Issues License: MIT

🔑 Executive Summary

Explored Python for cybersecurity and data analysis through a portfolio project focused on downloading, cleaning, and visualizing real-world datasets. Demonstrated ability to parse CSV/JSON/GeoJSON data and generate both static and interactive visualizations using Matplotlib and Plotly. The project also applies refactoring, error handling, and reproducibility practices (virtual environments, requirements file, structured repo). Cybersecurity-transferrable skills:

  • Parsing and cleaning log files, vulnerability scan results, and breach datasets.
  • Accessing and processing security APIs (e.g., VirusTotal, HaveIBeenPwned, Shodan).
  • Building visualizations of attack trends to communicate risks to stakeholders.
  • Applying refactoring and error handling that mirror practices in secure coding.

👉 Recruiters and hiring managers: This project demonstrates my ability to work with real-world data and lays the groundwork for automation and analysis in cybersecurity contexts.


📌 Overview

This project demonstrates how to ingest, process, and visualize structured and unstructured data using Python. It includes:

  • Parsing and plotting weather data from CSV files.
  • Downloading JSON/GeoJSON datasets (earthquakes, wildfires).
  • Applying data cleaning, error checking, and refactoring.
  • Generating visualizations with Matplotlib amd Plotly, including static and interactive charts.

🧠 Skills & Concepts

  • Data parsing from CSV, JSON, and GeoJSON
  • Dataset exploration
  • Time-series visualization with datetime
  • Plot customization (colors, scales, shading)
  • Global data mapping (earthquakes, fires)
  • Refactoring with list comprehensions and automated headers
  • Professional Git/GitHub workflow (branching, version control, .gitignore)

🗂️ Repository Structure

project-data-visualization/
│
├── data/               # Input CSV/JSON/GeoJSON datasets
│   ├── earthquake_data/
│   │   ├── eq_data_1_day_m1.geojson
│   │   ├── eq_data_30_day_m1.geojson
│   │   ├── eq_data_past_30_days_m4plus.geojson
│   │   ├── readable_eq_data_geojson
│   │   └── README.md
│   ├── weather_data/
│   │   ├── death_valley_2021_full.csv
│   │   ├── death_valley_2021_simple.csv
│   │   ├── greater_seattle_2024_dense.csv
│   │   ├── README.md
│   │   ├── redmond_wa_2024_simple.csv
│   │   ├── sitka_weather_07-2021_simple.csv
│   │   ├── sitka_weather_2021_full.csv
│   │   └── sitka_weather_2021_simple.csv
│   ├── wildfire_data/
│   │   ├── world_fires_1_day.csv
│   │   └── world_fires_7_day.csv
│   └── README.md
│
├── images/portfolio/             # Exported static plots
│   ├── Recent_Earthquakes.png
│   ├── Sitka_Death_Valley_Comparison.png
│   └── World_Wildfires.png
│
├── src/                # Python scripts for each exercise
│   ├── 16_1_death_valley_rainfall.py
│   ├── 16_1_sitka_rainfall.py
│   ├── 16_2_sitka_death_valley_comparison.py
│   ├── 16_4_automatic_indexes.py
│   ├── 16_6_refactoring.py
│   ├── 16_7_automated_title.py
│   ├── 16_8_recent_earthquakes.py
│   ├── 16_9_world_fires.py
│   ├── death_valley_highs_lows.py
│   ├── eq_explore_data.py
│   ├── main.py
│   ├── redmond_wa_rain_snow_december_2024.py
│   ├── sitka_highs.py
│   └── sitka_highs_lows.py
│ 
├── tests/
├── .gitattributes
├── .gitignore
├── LICENSE
├── README.md
└── requirements.txt

🚀 How to Run

  1. Clone this repo and move into the folder:

    git clone git@github.com:your-username/project-data-visualization.git
    cd project-data-visualization
  2. (Optional) Create a virtual environment:

    python3 -m venv .venv
    source .venv/bin/activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Run any exercise script, for example:

    python src/16_8_recent_earthquakes.py

🔎 Skills Spotlight

This project highlights skills in:

  • Python scripting for data processing
  • CSV, JSON, and GeoJSON importing and processing
  • Data visualization using Matplotlib and Plotly
  • Working with real-world datasets (climate, seismic, fire activity)
  • Applying PEP 8, project structuring, and GitHub portfolio practices

📝 Lessons Learned

Through this project, I strengthened both my Python skills and their application to cybersecurity:

  • Data parsing & cleaning → transferable to analyzing log files, firewall exports, and SIEM data.
  • Datetime handling → useful for correlating events across multiple security data sources.
  • Visualization (Matplotlib & Plotly) → critical for reporting incidents and trends to executives or compliance auditors.
  • Error handling & validation → aligns with defensive programming principles in security tools.
  • Refactoring & automation → prepares me to build reusable scripts for common security workflows.

Problems, errors, and issues overcome during this project:

  • Gained hands-on experience resolving Git merge conflicts and managing divergent branches.
  • Learned how to configure and update branch protection rules to balance collaboration with control.
  • Improved workflow efficiency by practicing PyCharm Git integration and fallback to the CLI when needed.
  • Built resilience by troubleshooting environment and formatting issues in VS Code and PyCharm.
  • Reinforced the importance of clear data directory structures for consistent project organization.

🛠️ Skills Spotlight

  • Version Control: Git branching, conflict resolution, and pull request management.
  • Tool Proficiency: PyCharm, VS Code, GitHub Desktop, and CLI.
  • Problem-Solving: Overcame environment friction and repo cleanup challenges.
  • Cybersecurity Relevance: Applied structured, methodical approaches to error handling and configuration—transferable to securing and maintaining resilient systems.

This project helped me bridge Python fundamentals with practical cybersecurity applications, positioning me to develop tools that improve detection, response, and reporting.


📊 Exercises Implemented

  • 16-1: Sitka Rainfall → Visualized daily rainfall (Sitka & Death Valley).
  • 16-2: Sitka–Death Valley Comparison → Standardized y-axes for temperature comparisons.
  • 16-4: Automatic Indexes → Automated detection of CSV header indexes and titles.
  • 16-6: Refactoring → Simplified earthquake data parsing with list comprehensions.
  • 16-7: Automated Title → Dynamically pulled dataset titles from GeoJSON metadata.
  • 16-8: Recent Earthquakes → Visualized past 30 days of earthquake data (incl. 2025 Kamchatka quake).
  • 16-9: World Fires → Plotted NASA FIRMS fire data with intensity-based opacity and color scaling.

🌍 Data Sources

  • NOAA Climate Data — temperature and rainfall data for Sitka, Alaska, and Death Valley, California. ncdc.noaa.gov
  • USGS Earthquake Hazards Program — real-time and historical earthquake data in GeoJSON format: earthquake.usgs.gov
  • NASA Earthdata (FIRMS) — global active fire data: earthdata.nasa.gov/firms
  • Datasets and exercises adapted from Python Crash Course, 3rd Edition by Eric Matthes (No Starch Press).

📚 Attribution

Based on exercises from:
Matthes, E. (2023). Python Crash Course (3rd ed.). No Starch Press.
Book website


🧩 License

Distributed under the MIT License.
See LICENSE for details.

About

Python portfolio project: download, clean & visualize real-world datasets with cybersecurity focus

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages