Detecting Data Exfiltration Anomalies in Academic Networks Using the Isolation Forest Algorithm
Date
2025
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
KCA University
Abstract
Academic networks face increased risks of data
exfiltration due to sensitive personal information
and research data. Traditional supervised
detection models rely on labeled datasets which
are often unavailable in resource constrained
institutions. This study investigates the
applicability of the unsupervised Isolation Forest
algorithm for detecting anomalous network
traffic indicative of data exfiltration. The research
utilized the CICIDS2017 dataset focusing on the
Thursday-WorkingHours-Afternoon-Infiltration
subset. Key features including Flow Duration,
Total Fwd Packets, Flow Bytes/s, Flow IAT Mean,
and Destination Port were preprocessed and
normalized for modeling. The model achieved a
precision of 1.00, recall of 0.99 and F1-score of
1.00 for anomalous traffic detection successfully
identifying approximately 4.8% of flows as
anomalous. Comparative analysis with previous
methods, including supervised Random Forest
and SVM demonstrated that Isolation Forest
offers competitive accuracy with lower
computational overhead and does not require
labeled data. The findings highlight the
algorithm’s suitability for academic network
monitoring, providing an effective early warning
mechanism while emphasizing the importance of
threshold tuning to reduce false positives.
Description
Keywords
Anomaly Detection, Data Exfiltration Machine Learning, Isolation Forest, Academic Networks