GitHub - keithknott26/anomalydetection: Detect anomalies in logging using machine learning models

Anomaly Detection on unstructured logs using ensemble machine learning models

Screenshot

Overview

This project aims to perform anomaly detection on raw logs using ensemble machine learning models, it uses IsolationForest and Levenshein distance to determine outliers. The following steps outline the process:

Input Data: Input data can be a root directory of static logs or live logs (logs being written to currently).
Database Record Creation: The program will walk the directory and find logs to monitor, creating a database record for each one.
Log Feeding: Logs are fed into the program by them being present in the specific root directory which is being polled for changes.
Log Monitoring: Similar to tail, each log file's attributes (size, modified date, created date, etc) are continually hashed and compared against the database to detect log changes. This approach allows us to monitor a large number of logs, and process them in chunks (instead of opening file watchers and eating up resources).
File Polling Interval: The file polling interval is configurable depending on how frequently you'd like to poll for changes to the logs. Each log is checked for changes per poll.
Log Change Handling: When a log change is found, a chunk of the log is sent to the log parser.
Log Cleanup and Parsing: Logs are cleaned up and parsed, removing any empty lines and duplicates which may be contained within that chunk.
Individual Model Creation: One model is created for each log to train the machine learning model with data specific to that application's logging profile.
Master Model: Once the individual models are trained on the log data, you can enable the master model which polls the individual models for anomalies and performs anomaly detection at a bird's eye level, watching over the individual models. This gives a reasonably accurate view of the anomalies detected across the entire logging root directory or your application suite.
Future Integration: This is a POC and will likely include integration with a timeseries database to better visualize and tag the anomalies as they come into the monitoring system (similar to Splunk).

Install Requirements

bash
./run.sh

Running:

python3 main.py --log_dir sample_input_logs

Core Classes and Functions

Let's start by defining the core classes, functions, and methods that will be necessary for this project:

LogRetriever: This class will be responsible for retrieving logs from different sources. It will have methods like retrieve_from_cloudwatch and retrieve_from_filesystem.
DatabaseManager: This class will handle all database operations. It will have methods like store_log_entry, get_log_entry, update_log_entry, and delete_log_entry.
LogMonitor: This class will monitor the logs for changes. It will have methods like check_for_changes and handle_log_change.
TaskScheduler: This class will handle the scheduling of tasks. It will have methods like schedule_task and execute_task.
LogParser: This class will parse the log lines. It will have methods like parse_log_line.
ModelManager: This class will handle all operations related to the model. It will have methods like feed_log_line, extract_features, train_model, detect_anomalies, and update_model.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
input_logs		input_logs
README.md		README.md
database_manager.py		database_manager.py
drain3.ini		drain3.ini
ensemble_model.py		ensemble_model.py
log_monitor.py		log_monitor.py
log_parser.py		log_parser.py
log_retriever.py		log_retriever.py
logger_config.py		logger_config.py
main.py		main.py
model_manager.py		model_manager.py
model_manager_factory.py		model_manager_factory.py
requirements.txt		requirements.txt
run.sh		run.sh
screenshot.png		screenshot.png
task_scheduler.py		task_scheduler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Anomaly Detection on unstructured logs using ensemble machine learning models

Screenshot

Overview

Install Requirements

Running:

Core Classes and Functions

About

Releases

Packages

Languages

keithknott26/anomalydetection

Folders and files

Latest commit

History

Repository files navigation

Anomaly Detection on unstructured logs using ensemble machine learning models

Screenshot

Overview

Install Requirements

Running:

Core Classes and Functions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages