Skip to content

Submitted as term project for CS5231, comparing a rule-based and an unsupervised ML-based approach to log scoring

License

Notifications You must be signed in to change notification settings

toopieare/CS5231_Proj

Repository files navigation

Process Flow Visualization Tool for Audit Logs

This tool analyzes and visualizes process relationships, behaviors, and timelines from Linux audit logs generated by auditbeat. It combines traditional rule-based analysis with machine learning to provide comprehensive process behavior analysis and visualization.

Features

  • Parses auditbeat NDJSON log files
  • Builds process hierarchy trees and activity timelines
  • Identifies suspicious process behaviors and security concerns
  • Generates visualizations using Mermaid.js
  • Provides multiple visualization views:
    • Process Tree View: Shows hierarchical relationships
    • Timeline View: Shows process activities over time
    • Analysis Comparison: Compares traditional and ML analysis results
  • Color-coded process classification:
    • 🟢 Green: Root processes (PID 1)
    • 🔵 Blue: Normal processes
    • 🟠 Orange: Privileged processes (running as root)
    • 🔴 Red: Suspicious processes

Security Analysis

The tool performs security analysis:

Process Name Analysis

  • Detection of hex-encoded and obfuscated names
  • Recognition of known suspicious process names
  • Identification of attack indicators
  • Analysis of unusual character distributions
  • Detection of Unicode/non-ASCII characters

Behavior Analysis

  • System call pattern analysis
  • Privilege escalation attempts
  • File system operations
  • Network activity patterns
  • Process manipulation
  • Resource usage patterns

Machine Learning Analysis

  • Unsupervised anomaly detection
  • Process behavior pattern learning
  • Automatic feature extraction
  • Syscall frequency analysis
  • Temporal pattern recognition

Security Alerts

  • Suspicious execution patterns
  • Failed operations
  • Privilege abuse
  • Network abuse patterns
  • File system tampering
  • Command injection attempts

Installation

  1. Clone the repository:
git clone https://github.com/toopieare/CS5231_Proj.git
cd CS5231_Proj
  1. Create a virtual environment (recommended):
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt

Usage

  1. Place your auditbeat log file in the input directory and update config.py to indicate the correct filename
  2. Run the tool:
python main.py
  1. View the visualizations:
  • Open output/process_flow.html for the process hierarchy view
  • Open output/process_gantt.html for the timeline view

Project Structure

CS5231_Proj/
├── config.py           # Configuration settings
├── main.py            # Main application entry point
├── requirements.txt   # Python dependencies
├── src/
│   ├── analysis/      # Analysis modules
│   │   ├── behavior_analyzer.py    # Traditional behavior analysis
│   │   ├── ml_behavior_analyzer.py # Machine learning analysis
│   │   ├── process_tree.py        # Process hierarchy building
│   │   ├── security_analyzer.py    # Security checks
│   │   └── analysis_reporter.py    # Analysis comparison reporting
│   ├── data/          # Data processing
│   │   ├── data_processor.py      # Log data processing
│   │   └── log_loader.py          # Log file handling
│   ├── utils/         # Utility functions
│   └── visualization/ # Visualization generators
│       ├── html_generator.py      # HTML output generation
│       └── mermaid_generator.py   # Diagram generation
├── input/             # Input log files
└── output/           # Generated visualizations

Understanding the Visualizations

Process Tree View

  • Shows hierarchical relationships between processes
  • Indicates process states and security concerns
  • Provides detailed process information and alerts
  • Color-coded for quick status identification

Timeline View

  • Shows process lifetimes and activities
  • Groups processes by type and behavior
  • Indicates activity levels and suspicious patterns
  • Provides temporal context for process behaviors

Edge Types

  • Normal edges (-->) indicate standard relationships
  • Bold edges (==>) indicate suspicious process relationships
  • Red nodes indicate security concerns
  • Orange nodes indicate privileged operations

ML-Based Analysis

  • Uses autoencoder for anomaly detection
  • Learns normal process behavior patterns
  • Identifies unusual syscall patterns
  • Provides anomaly scores
  • Adapts to system-specific patterns

Comparative Visualization

  • Interactive scatter plot of analysis scores
  • Score distribution histograms
  • Detailed process-level metrics
  • Category-based score breakdowns
  • Debug information for validation

About

Submitted as term project for CS5231, comparing a rule-based and an unsupervised ML-based approach to log scoring

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published