Skip to content

AfrahMS/Phishing-URLs-Detection-Using-Sequential-and-Parallel-ML

Repository files navigation

🔐 Phishing URL Detection using Sequential and Parallel ML Techniques

This repository contains the implementation and experimental setup related to the research paper:

“Phishing URLs Detection Using Sequential and Parallel ML Techniques: Comparative Analysis”
published in Sensors (MDPI), 2023.

The project investigates how parallel computing techniques can significantly reduce training time for machine learning and deep learning models without compromising detection performance.


📌 Abstract

Phishing attacks remain one of the most prevalent cybersecurity threats, exploiting malicious URLs to deceive users and steal sensitive information.
This project applies machine learning (ML) and deep learning (DL) techniques to detect phishing URLs and explores the impact of parallel execution using multithreading and multiprocessing in Python.

Both sequential and parallel implementations are evaluated and compared in terms of:

  • Accuracy
  • Precision
  • Recall
  • F1-score
  • Execution time
  • Speedup

🧠 Models Implemented

  • Random Forest (RF)
  • Naïve Bayes (NB)
  • Convolutional Neural Network (CNN)
  • Long Short-Term Memory (LSTM)

Each model is trained and tested using:

  • Sequential execution
  • Parallel execution with:
    • Python backend threading
    • Threading with n_jobs
    • Manual threading
    • Multiprocessing

🛠️ Technologies & Tools

  • Python
  • Scikit-learn
  • TensorFlow / Keras
  • Joblib
  • Google Colab
  • Parallel Computing (Multithreading & Multiprocessing)

📊 Dataset

  • Dataset: Malicious and Benign Webpages Dataset
  • Training Samples: 54,000 (balanced)
  • Testing Samples: 12,000
  • Features: URL-based, content-based, and network-based features
  • Feature Selection: Correlation, ANOVA, and Chi-square

📈 Key Findings

  • Parallel execution significantly reduced training time
  • Maximum speedup achieved: 3.51×
  • No performance degradation observed when using parallel techniques
  • Best accuracy achieved by Naïve Bayes (96.01%)
  • 100% recall achieved by RF, CNN, and LSTM models

About

Phishing URL detection using sequential and parallel machine learning techniques

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors