GitHub Issues Contract Violation Analyzer

A tool for analyzing GitHub issues to identify and categorize API contract violations using state-of-the-art language models.

Features

Advanced Contract Analysis: Leverages LLMs to analyze GitHub issues for potential API contract violations
Multiple Storage Options:
- JSON storage for detailed analysis results
- CSV export functionality for data analysis
- MongoDB integration for scalable data storage
Robust Data Processing:
- Support for both direct GitHub API fetching and CSV file input
- Automatic checkpointing for long-running analyses
- Intermediate results saving
- Graceful shutdown handling
Modular Architecture:
- Pluggable storage backends
- Extensible analyzer framework
- Configurable LLM clients
Progress Tracking: Real-time progress monitoring with customizable trackers

Installation

Clone the repository:

git clone https://github.com/thromel/llm-contracts-research.git
cd llm-contracts-research

Install dependencies:

pip install -r requirements.txt

Create a .env file with your configuration:

# API Keys
GITHUB_TOKEN=your_github_token
OPENAI_API_KEY=your_openai_key

# OpenAI Settings
OPENAI_MODEL=your_model_name
OPENAI_BASE_URL=your_api_base_url
OPENAI_TEMPERATURE=0.7
OPENAI_MAX_TOKENS=2000
OPENAI_TOP_P=1.0
OPENAI_FREQUENCY_PENALTY=0.0
OPENAI_PRESENCE_PENALTY=0.0

# MongoDB Settings (Optional)
MONGODB_URI=your_mongodb_uri
MONGODB_DB=your_database_name
MONGODB_ENABLED=true

# Analysis Settings
BATCH_SIZE=50
MAX_COMMENTS_PER_ISSUE=10
DEFAULT_LOOKBACK_DAYS=1000
SAVE_INTERMEDIATE=true
JSON_EXPORT=true
CSV_EXPORT=true

Project Structure

src/
├── analysis/
│   ├── core/
│   │   ├── analyzers/
│   │   │   ├── contract_analyzer.py    # Core contract analysis logic
│   │   │   ├── github.py              # GitHub-specific analysis
│   │   │   └── orchestrator.py        # Analysis orchestration
│   │   ├── clients/
│   │   │   ├── github.py              # GitHub API client
│   │   │   └── openai.py             # OpenAI API client
│   │   ├── processors/
│   │   │   ├── cleaner.py            # Response cleaning
│   │   │   ├── validator.py          # Analysis validation
│   │   │   └── checkpoint.py         # Checkpoint management
│   │   ├── storage/
│   │   │   ├── json_storage.py       # JSON storage implementation
│   │   │   ├── csv_storage.py        # CSV storage implementation
│   │   │   └── mongodb/              # MongoDB integration
│   │   └── dto/                      # Data transfer objects
│   └── main.py                       # Main entry point
├── config/
│   └── settings.py                   # Configuration settings
└── utils/
    └── logger.py                     # Logging utilities

Usage

Basic Usage

Analyzing issues from a GitHub repository:

python -m src.analysis.main --repo owner/repo --issues 100

Analyzing issues from a CSV file:

python -m src.analysis.main --input-csv path/to/issues.csv

Advanced Options

--resume: Resume from the last checkpoint if available
--checkpoint-interval N: Create checkpoints every N issues (default: 5)

Storage Configuration

The analyzer supports multiple storage backends that can be configured in your .env file:

JSON Storage: Enable with JSON_EXPORT=true
CSV Storage: Enable with CSV_EXPORT=true
MongoDB Storage: Enable with MONGODB_ENABLED=true and configure connection settings

Examples

Analyze 50 issues with custom checkpoint interval:

python -m src.analysis.main --repo openai/openai-python --issues 50 --checkpoint-interval 10

Resume a previously interrupted analysis:

python -m src.analysis.main --repo openai/openai-python --issues 50 --resume

Analyze issues from a CSV file:

python -m src.analysis.main --input-csv data/raw/github_issues.csv

Output Files

The analyzer generates several output files in the data/analyzed directory:

JSON Output:
- github_issues_analysis_TIMESTAMP_raw.json: Raw analysis data
- github_issues_analysis_TIMESTAMP_final.json: Final analysis results
CSV Output:
- github_issues_analysis_TIMESTAMP_final.csv: Tabular format of analysis results
Checkpoints:
- analysis_checkpoint.json: Temporary checkpoint file
- intermediate/: Directory containing intermediate analysis results

Architecture

Core Components

Analyzers:
- ContractAnalyzer: Core analysis logic for contract violations
- GitHubIssuesAnalyzer: GitHub-specific implementation
- AnalysisOrchestrator: Coordinates the analysis process
Storage:
- Modular storage system with support for multiple backends
- Factory pattern for storage creation
- Adapter pattern for consistent interface
Processors:
- Response cleaning and validation
- Checkpoint management
- Progress tracking

Design Patterns

Factory Pattern: Used for storage backend creation
Strategy Pattern: Used for different analysis strategies
Adapter Pattern: Used for storage implementations
Observer Pattern: Used for progress tracking

Contributing

Fork the repository
Create a feature branch
Make your changes
Run tests
Submit a pull request

Development Guidelines

Follow PEP 8 style guide
Add type hints to all functions
Write unit tests for new features
Update documentation for significant changes

License

This project is licensed under the MIT License - see the LICENSE file for details.

How to Contribute

We welcome contributions from the community! If you'd like to contribute improvements, fixes, or new features, please follow these guidelines:

Fork the repository and clone your fork.
Create a new branch for your changes (e.g., feature/your-feature or fix/issue-number).
Make your changes with clear, concise commit messages.
Ensure that your code adheres to the project's coding style (PEP 8).
Write tests for your changes where applicable.
Push your branch and open a pull request describing your changes.
Consult the issue tracker before making major changes to avoid duplicated efforts.

Thank you for your interest in contributing to GitHub Issues Contract Violation Analyzer!

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
config		config
data		data
docs		docs
papers		papers
scripts		scripts
src		src
tests		tests
.DS_Store		.DS_Store
.env.template		.env.template
.gitignore		.gitignore
README.md		README.md
alembic.ini		alembic.ini
config.yaml.template		config.yaml.template
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GitHub Issues Contract Violation Analyzer

Features

Installation

Project Structure

Usage

Basic Usage

Advanced Options

Storage Configuration

Examples

Output Files

Architecture

Core Components

Design Patterns

Contributing

Development Guidelines

License

How to Contribute

About

Releases

Packages

Languages

thromel/llm-contracts-research

Folders and files

Latest commit

History

Repository files navigation

GitHub Issues Contract Violation Analyzer

Features

Installation

Project Structure

Usage

Basic Usage

Advanced Options

Storage Configuration

Examples

Output Files

Architecture

Core Components

Design Patterns

Contributing

Development Guidelines

License

How to Contribute

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages