Skip to content

This project is an end-to-end MLOps pipeline for a network security system that detects phishing and malicious activities using machine learning. It automates data ingestion, preprocessing, model training, and deployment while leveraging AWS S3 for model storage and GitHub Actions for CI/CD. The system includes realtime monitoring & a web interface

Notifications You must be signed in to change notification settings

MohammedSaim-Quadri/networksecurity

Repository files navigation

Network Security project for Phishing Dataset

Overview

This is a full-scale MLOps project that implements an end-to-end machine learning pipeline for network security applications. The project covers the complete ML lifecycle, including data ingestion, preprocessing, model training, evaluation, deployment, and continuous monitoring using cloud platforms like AWS.

Features

  • Automated Data Ingestion from structured network datasets\
  • Data Preprocessing & Validation using defined schemas
  • Model Training & Evaluation using state-of-the-art ML algorithms
  • Cloud Integration with AWS S3 for model storage and retrieval
  • CI/CD Pipeline for continuous integration and deployment
  • Dockerization for seamless deployment
  • Batch Prediction Pipeline for inference on new data
  • Logging & Exception Handling for debugging and monitoring
  • MLflow + DagsHub Integration for experiment tracking and reproducibility

Project Structure

├── mohammedsaim-quadri-networksecurity/
│   ├── README.md                # Project Documentation
│   ├── app.py                   # Main Application Entry Point
│   ├── Dockerfile               # Docker Configuration for Deployment
│   ├── main.py                  # Training Pipeline Execution
│   ├── push_data.py             # Data Pusher for Cloud Storage
│   ├── requirements.txt         # Dependencies
│   ├── setup.py                 # Package Setup
│   ├── test_mongo.py            # MongoDB Connectivity Testing
│   ├── data_schema/
│   │   └── schema.yaml          # Data Schema Definitions
│   ├── final_models/            # Trained Models
│   │   ├── model.pkl            # Final ML Model
│   │   └── preprocessor.pkl     # Data Preprocessing Pipeline
│   ├── Network_Data/            # Raw Network Datasets
│   │   └── phisingData.csv
│   ├── networksecurity/         # Core Codebase
│   │   ├── cloud/               # AWS S3 Integration
│   │   ├── components/          # ML Pipeline Components
│   │   ├── constant/            # Constants & Configs
│   │   ├── entity/              # Entity Definitions
│   │   ├── exception/           # Custom Exception Handling
│   │   ├── logging/             # Logging Mechanisms
│   │   ├── pipeline/            # Training & Prediction Pipelines
│   │   └── utils/               # Utility Functions
│   ├── templates/
│   │   └── table.html           # UI Component
│   ├── valid_data/
│   │   └── test.csv             # Validated Test Data
│   └── .github/workflows/
│       └── main.yml             # GitHub Actions Workflow
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#bbdefb', 'edgeLabelBackground':'#ffffff'}}}%%

graph TD;
    
    %% Data Pipeline %%
    A(["🗄️ Data Collection"]) -->|Ingest Data| B(["📥 Data Ingestion"])
    B -->|Validate Data| C(["✅ Data Validation"])
    C -->|Transform Data| D(["🔄 Data Transformation"])
    D -->|Train Model| E(["🤖 Model Training"])
    E -->|Evaluate Model| F(["📊 Model Evaluation"])
    F -->|Log to MLflow| X(["🧪 MLflow Tracking"])
    F -->|Store Model| G(["📁 Model Registry (S3)"])

    %% Deployment %%
    G -->|Deploy Model| H(["🚀 Model Deployment"])
    H -->|Serve Predictions| I(["🌍 Web Service (Flask API)"])
    I -->|User Requests| J(["🖥️ Frontend/Table UI"])

    %% MLOps & Monitoring %%
    G -->|Monitor Model| K(["📡 Model Monitoring"])
    K -->|Trigger Retraining| D

    %% Subgraphs for Organization %%
    subgraph "🔧 MLOps Pipeline"
      A
      B
      C
      D
      E
      F
      G
      H
      I
      J
      K
    end

    subgraph "☁️ Cloud Storage"
      L(["🗂️ AWS S3"])
      G --> L
      H --> L
    end

    subgraph "🛠️ CI/CD Pipeline"
      M(["⚙️ GitHub Actions"])
      M -->|Automate Deployment| H
    end

    subgraph "🔬 Experiment Tracking"
      X -->|Push Logs| Y(["📊 DagsHub MLflow UI"])
    end
Loading

Installation & Setup

Prerequisites

  • Python 3.8+
  • Docker
  • AWS Account & CLI Setup
  • GitHub Actions for CI/CD
  • DagsHub Account
  • MLflow installed

Install Dependencies

pip install -r requirements.txt

Run Model Training

python main.py

Run Application Locally

python app.py

Build & Run Docker Container

docker build -t networksecurity-app .
docker run -p 5000:5000 networksecurity-app

CI/CD & Deployment

GitHub Actions Workflow (.github/workflows/main.yml)

This project implements Continuous Integration (CI) and Continuous Deployment (CD) using GitHub Actions:

  • Linting & Unit Testing
  • Building and Pushing Docker Image to AWS ECR
  • Deploying Model & Application

AWS Integration

  • S3 Bucket: Stores model artifacts
  • ECR: Stores containerized app

MLflow & DagsHub Integration

To manage and track experiments across the pipeline, we use MLflow integrated with DagsHub as a remote tracking server. This enables:

  • Centralized logging of metrics, parameters, artifacts, and models
  • Versioned experiment tracking across multiple runs
  • Remote access to experiment dashboards via DagsHub
  • Collaboration and reproducibility using Git-backed tracking

Setting Up Experiment Tracking

  • Ensure mlflow is installed via pip install mlflow
  • Set the tracking URI to DagsHub:
mlflow.set_tracking_uri("https://dagshub.com/<username>/<repo>.mlflow")
  • Use mlflow.start_run(), mlflow.log_params(), mlflow.log_metrics(), and mlflow.log_artifact() in training scripts to track runs

Example available in the Model Training component of the pipeline

Future Improvements

  • Implement Model Drift Detection
  • Integrate Live Monitoring with Prometheus/Grafana
  • Enable AutoML for Hyperparameter Tuning
  • Expand to Real-time Threat Detection
  • Deploy using Kubernetes for horizontal scaling

About

This project is an end-to-end MLOps pipeline for a network security system that detects phishing and malicious activities using machine learning. It automates data ingestion, preprocessing, model training, and deployment while leveraging AWS S3 for model storage and GitHub Actions for CI/CD. The system includes realtime monitoring & a web interface

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages