A command-line tool that converts code repositories into text format, making them suitable for use as context in Large Language Models (LLMs). Supports both local repositories and GitHub remote repositories.
- Convert local Git repositories to text format
- Convert GitHub repositories to text format (public and private)
- Process specific subfolders in monorepos
- Respect
.gitignore
patterns for local repositories - Skip binary files automatically
- Structured output with clear file demarcation
- Token counting with OpenAI tokenizer
- Cost estimation for GPT-3.5 and GPT-4
pip install repo-to-singlefile
- Convert a local repository:
repo-to-singlefile /path/to/local/repo output.txt
- Convert a public GitHub repository:
repo-to-singlefile https://github.com/owner/repo output.txt
- Convert a private GitHub repository:
repo-to-singlefile https://github.com/owner/repo output.txt --github-token YOUR_GITHUB_TOKEN
Process only specific subfolders in a repository:
- Local monorepo:
repo-to-singlefile /path/to/repo output.txt --subfolder packages/mylib
- GitHub monorepo:
repo-to-singlefile https://github.com/owner/repo output.txt --subfolder packages/mylib
The generated text file contains the contents of all text files in the repository, with clear headers separating each file:
### File: src/main.py ###
[content of main.py]
### File: src/utils.py ###
[content of utils.py]
...
After processing, you'll see a summary that includes:
- Total token count
- Total character count
- Estimated costs for GPT-3.5 and GPT-4 usage
Example summary:
==================================================
CONVERSION SUMMARY
==================================================
Total tokens: 15,234
Total characters: 45,678
Estimated costs (based on current OpenAI pricing):
GPT-4:
- Input cost: $0.46
- Output cost: $0.91
GPT-3.5:
- Input cost: $0.02
- Output cost: $0.03
==================================================
The tool automatically:
- Respects
.gitignore
patterns in local repositories - Skips binary files
- Processes common text file extensions:
- Python (.py)
- JavaScript (.js)
- Java (.java)
- C++ (.cpp, .h)
- Web (.html, .css)
- Documentation (.md)
- Config files (.yml, .yaml, .json)
- Shell scripts (.sh)
- Text files (.txt)
- XML files (.xml)
For private repositories, you'll need a GitHub personal access token:
- Generate a token at https://github.com/settings/tokens
- Use the token with the --github-token option:
repo-to-singlefile https://github.com/owner/private-repo output.txt --github-token YOUR_TOKEN
The tool provides clear error messages for common issues:
- Invalid repository paths or URLs
- Missing subfolders
- Permission denied errors
- Binary file skipping
- Token counting errors
- Clone the repository:
git clone https://github.com/yourusername/repo-to-singlefile.git
cd repo-to-singlefile
- Install dependencies:
pip install -e .
pytest
When accessing private GitHub repositories, make sure your token has the necessary permissions:
- For public repositories: No token needed
- For private repositories: Token needs
repo
scope
When specifying a subfolder:
- Ensure the path is relative to the repository root
- Use forward slashes (/) even on Windows
- Check that the subfolder exists in the repository
For very large repositories:
- Consider processing specific subfolders
- Be aware of rate limits when using GitHub API
- Monitor token costs for large codebases
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a pull request
This project is licensed under the MIT License
- Report bugs through GitHub issues
- Submit feature requests through GitHub issues
- For security issues, please see SECURITY.md