This repository contains the implementation and experimental setup related to the research paper:
“Phishing URLs Detection Using Sequential and Parallel ML Techniques: Comparative Analysis”
published in Sensors (MDPI), 2023.
The project investigates how parallel computing techniques can significantly reduce training time for machine learning and deep learning models without compromising detection performance.
Phishing attacks remain one of the most prevalent cybersecurity threats, exploiting malicious URLs to deceive users and steal sensitive information.
This project applies machine learning (ML) and deep learning (DL) techniques to detect phishing URLs and explores the impact of parallel execution using multithreading and multiprocessing in Python.
Both sequential and parallel implementations are evaluated and compared in terms of:
- Accuracy
- Precision
- Recall
- F1-score
- Execution time
- Speedup
- Random Forest (RF)
- Naïve Bayes (NB)
- Convolutional Neural Network (CNN)
- Long Short-Term Memory (LSTM)
Each model is trained and tested using:
- Sequential execution
- Parallel execution with:
- Python backend threading
- Threading with
n_jobs - Manual threading
- Multiprocessing
- Python
- Scikit-learn
- TensorFlow / Keras
- Joblib
- Google Colab
- Parallel Computing (Multithreading & Multiprocessing)
- Dataset: Malicious and Benign Webpages Dataset
- Training Samples: 54,000 (balanced)
- Testing Samples: 12,000
- Features: URL-based, content-based, and network-based features
- Feature Selection: Correlation, ANOVA, and Chi-square
- Parallel execution significantly reduced training time
- Maximum speedup achieved: 3.51×
- No performance degradation observed when using parallel techniques
- Best accuracy achieved by Naïve Bayes (96.01%)
- 100% recall achieved by RF, CNN, and LSTM models