Malware Classifier: ML-Powered PE File Detection | Quantic MSSE Intro to machine learning Project
Автор: WisdomWord GH
Загружено: 2026-02-28
Просмотров: 8
Описание:
Malware Classifier using Machine Learning
Quantic School of Business and Technology - MSSE Program
This video demonstrates a production-ready malware classification system that analyzes PE (Portable Executable) files using machine learning. The project was completed in partial fulfillment of the Introduction to Machine Learning course at Quantic School of Business and Technology.
Live Application:
https://tetteh-apotey-malware-classif...
GitHub Repository:
https://github.com/life2allsofts/malw...
(Private - quantic-grader added as collaborator)
PROJECT OVERVIEW
The application uses an XGBoost model with 17 PE header features to classify executable files as malware or benign software. Key features include:
98.03% accuracy on test set
99.59% AUC-ROC for excellent discrimination
17 PE header features (no data leakage)
Bias correction with 0.6 threshold
Fully automated CI/CD pipeline (51 successful runs)
APPLICATION FEATURES
File Upload Analysis
Upload .exe, .dll, .sys, .ocx, .scr, .cpl files
Extracts SHA-256 hash and entropy
Real-time prediction with confidence scores
Manual Input
Enter all 17 PE features manually
Sample templates for testing
Understand how features influence predictions
Batch Processing
CSV upload for multiple files
Download predictions.csv with results
Ideal for bulk analysis
Model Information
Feature importance visualization
Confusion matrix and performance metrics
Complete transparency
CI/CD PIPELINE
The project includes a fully automated GitHub Actions pipeline that:
Runs tests on every push (16 tests in 46 seconds)
Checks for prior bias and model sanity
Auto-deploys to Hugging Face Spaces on success
Performs smoke tests to verify deployment
Total workflow runs: 51 | Latest status: Passing
MODEL PERFORMANCE
Metric Value
Accuracy 98.03%
Precision 98.24%
Recall 98.37%
F1-Score 98.30%
AUC-ROC 99.59%
Confusion Matrix (Test Set):
Predicted
BENIGN MALWARE
Actual BENIGN 1648 42
Actual MALWARE 38 2287
False Positives: 42
False Negatives: 38
Total Errors: 80 (1.99% error rate)
TECHNOLOGIES USED
Machine Learning: XGBoost, scikit-learn, pandas, numpy
Web Framework: Flask, Jinja2 templates
Deployment: Hugging Face Spaces, Docker
CI/CD: GitHub Actions
AI Tools: DeepSeek AI (97%), ChatGPT (2%), GitHub Copilot (1%)
DOCUMENTATION
All project documentation is available in the GitHub repository:
Evaluation and Design:
https://github.com/life2allsofts/malw...
AI Tooling Strategy:
https://github.com/life2allsofts/malw...
Deployment Information:
https://github.com/life2allsofts/malw...
Results and Metrics:
https://github.com/life2allsofts/malw...
ABOUT THE DEVELOPER
Isaac Tetteh-Apotey
MSSE Candidate, Quantic School of Business and Technology
Geomatics Engineer & Software Engineering Researcher
GitHub: https://github.com/life2allsofts
Portfolio: https://tetteh-apotey.vercel.app/
LinkedIn: / isaac-tetteh-apotey-67408b89
PROJECT TIMELINE
Started: February 17, 2026
Completed: February 28, 2026
Development Time: 11 days
CI/CD Runs: 51 successful workflows
DISCLAIMER
This application is intended for educational and research purposes only. The model should not be used as the sole determinant for malware classification in production environments without additional validation.
For questions about this project, please reach out via GitHub or LinkedIn.
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: