Network Security project for Phishing Dataset

Overview

This is a full-scale MLOps project that implements an end-to-end machine learning pipeline for network security applications. The project covers the complete ML lifecycle, including data ingestion, preprocessing, model training, evaluation, deployment, and continuous monitoring using cloud platforms like AWS.

Features

Automated Data Ingestion from structured network datasets\
Data Preprocessing & Validation using defined schemas
Model Training & Evaluation using state-of-the-art ML algorithms
Cloud Integration with AWS S3 for model storage and retrieval
CI/CD Pipeline for continuous integration and deployment
Dockerization for seamless deployment
Batch Prediction Pipeline for inference on new data
Logging & Exception Handling for debugging and monitoring
MLflow + DagsHub Integration for experiment tracking and reproducibility

Project Structure

├── mohammedsaim-quadri-networksecurity/
│   ├── README.md                # Project Documentation
│   ├── app.py                   # Main Application Entry Point
│   ├── Dockerfile               # Docker Configuration for Deployment
│   ├── main.py                  # Training Pipeline Execution
│   ├── push_data.py             # Data Pusher for Cloud Storage
│   ├── requirements.txt         # Dependencies
│   ├── setup.py                 # Package Setup
│   ├── test_mongo.py            # MongoDB Connectivity Testing
│   ├── data_schema/
│   │   └── schema.yaml          # Data Schema Definitions
│   ├── final_models/            # Trained Models
│   │   ├── model.pkl            # Final ML Model
│   │   └── preprocessor.pkl     # Data Preprocessing Pipeline
│   ├── Network_Data/            # Raw Network Datasets
│   │   └── phisingData.csv
│   ├── networksecurity/         # Core Codebase
│   │   ├── cloud/               # AWS S3 Integration
│   │   ├── components/          # ML Pipeline Components
│   │   ├── constant/            # Constants & Configs
│   │   ├── entity/              # Entity Definitions
│   │   ├── exception/           # Custom Exception Handling
│   │   ├── logging/             # Logging Mechanisms
│   │   ├── pipeline/            # Training & Prediction Pipelines
│   │   └── utils/               # Utility Functions
│   ├── templates/
│   │   └── table.html           # UI Component
│   ├── valid_data/
│   │   └── test.csv             # Validated Test Data
│   └── .github/workflows/
│       └── main.yml             # GitHub Actions Workflow

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#bbdefb', 'edgeLabelBackground':'#ffffff'}}}%%

graph TD;
    
    %% Data Pipeline %%
    A(["🗄️ Data Collection"]) -->|Ingest Data| B(["📥 Data Ingestion"])
    B -->|Validate Data| C(["✅ Data Validation"])
    C -->|Transform Data| D(["🔄 Data Transformation"])
    D -->|Train Model| E(["🤖 Model Training"])
    E -->|Evaluate Model| F(["📊 Model Evaluation"])
    F -->|Log to MLflow| X(["🧪 MLflow Tracking"])
    F -->|Store Model| G(["📁 Model Registry (S3)"])

    %% Deployment %%
    G -->|Deploy Model| H(["🚀 Model Deployment"])
    H -->|Serve Predictions| I(["🌍 Web Service (Flask API)"])
    I -->|User Requests| J(["🖥️ Frontend/Table UI"])

    %% MLOps & Monitoring %%
    G -->|Monitor Model| K(["📡 Model Monitoring"])
    K -->|Trigger Retraining| D

    %% Subgraphs for Organization %%
    subgraph "🔧 MLOps Pipeline"
      A
      B
      C
      D
      E
      F
      G
      H
      I
      J
      K
    end

    subgraph "☁️ Cloud Storage"
      L(["🗂️ AWS S3"])
      G --> L
      H --> L
    end

    subgraph "🛠️ CI/CD Pipeline"
      M(["⚙️ GitHub Actions"])
      M -->|Automate Deployment| H
    end

    subgraph "🔬 Experiment Tracking"
      X -->|Push Logs| Y(["📊 DagsHub MLflow UI"])
    end

Installation & Setup

Prerequisites

Python 3.8+
Docker
AWS Account & CLI Setup
GitHub Actions for CI/CD
DagsHub Account
MLflow installed

Install Dependencies

pip install -r requirements.txt

Run Model Training

python main.py

Run Application Locally

python app.py

Build & Run Docker Container

docker build -t networksecurity-app .
docker run -p 5000:5000 networksecurity-app

CI/CD & Deployment

GitHub Actions Workflow (.github/workflows/main.yml)

This project implements Continuous Integration (CI) and Continuous Deployment (CD) using GitHub Actions:

Linting & Unit Testing
Building and Pushing Docker Image to AWS ECR
Deploying Model & Application

AWS Integration

S3 Bucket: Stores model artifacts
ECR: Stores containerized app

MLflow & DagsHub Integration

To manage and track experiments across the pipeline, we use MLflow integrated with DagsHub as a remote tracking server. This enables:

Centralized logging of metrics, parameters, artifacts, and models
Versioned experiment tracking across multiple runs
Remote access to experiment dashboards via DagsHub
Collaboration and reproducibility using Git-backed tracking

Setting Up Experiment Tracking

Ensure mlflow is installed via pip install mlflow
Set the tracking URI to DagsHub:

mlflow.set_tracking_uri("https://dagshub.com/<username>/<repo>.mlflow")

Use mlflow.start_run(), mlflow.log_params(), mlflow.log_metrics(), and mlflow.log_artifact() in training scripts to track runs

Example available in the Model Training component of the pipeline

Future Improvements

Implement Model Drift Detection
Integrate Live Monitoring with Prometheus/Grafana
Enable AutoML for Hyperparameter Tuning
Expand to Real-time Threat Detection
Deploy using Kubernetes for horizontal scaling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Network Security project for Phishing Dataset

Overview

Features

Project Structure

Installation & Setup

Prerequisites

Install Dependencies

Run Model Training

Run Application Locally

Build & Run Docker Container

CI/CD & Deployment

GitHub Actions Workflow (.github/workflows/main.yml)

AWS Integration

MLflow & DagsHub Integration

Setting Up Experiment Tracking

Future Improvements

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
Network_Data		Network_Data
data_schema		data_schema
final_models		final_models
networksecurity		networksecurity
templates		templates
valid_data		valid_data
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
main.py		main.py
push_data.py		push_data.py
requirements.txt		requirements.txt
setup.py		setup.py
test_mongo.py		test_mongo.py

MohammedSaim-Quadri/networksecurity

Folders and files

Latest commit

History

Repository files navigation

Network Security project for Phishing Dataset

Overview

Features

Project Structure

Installation & Setup

Prerequisites

Install Dependencies

Run Model Training

Run Application Locally

Build & Run Docker Container

CI/CD & Deployment

GitHub Actions Workflow (.github/workflows/main.yml)

AWS Integration

MLflow & DagsHub Integration

Setting Up Experiment Tracking

Future Improvements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages