md5sift is a CLI tool written in Python designed to generate checksum reports for files across local directories or network shares. It offers filtering by file extensions or predefined file lists and produces reports in CSV format.
- Bulk Checksum Generation: Calculate hashes for multiple files in a directory.
- File Filtering: Filter files by extension or from a provided file list.
- Multi-threaded Processing: Faster checksum generation with multi-threading.
- CSV Output: Generate comprehensive CSV reports including file paths, MD5 hashes, and timestamps.
- Algorithm Options: Supports hashing algorithms (
md5
,sha1
,sha256
). - Verbose Mode: Real-time progress updates.
- Test Mode: Process a subset of files for quick validation.
md5sift can be installed and run as a Python script or via an RPM package.
- Python 3.x
- Git (optional)
Option 1: Clone from GitHub:
git clone https://github.com/madebyjake/md5sift.git && cd md5sift
Option 2: Download ZIP and extract.
To install via RPM package:
- Download the RPM from the Releases page.
- Install using a package manager:
sudo rpm -ivh md5sift-<ver>-1.noarch.rpm # using RPM
sudo yum install md5sift-<ver>-1.noarch.rpm # using YUM
sudo dnf install md5sift-<ver>-1.noarch.rpm # using DNF
Replace <ver>
with the package version.
To build the RPM package from source, refer to the Building the RPM Package section.
Depending on the chosen installation method, md5sift can be run as a Python script or via the command-line interface.
NOTE:
- Default scan path is the current directory if
-s
/--scan-path
isn’t provided. - Default output file is
hash_report.csv
in the current directory if-o
/--output
isn’t specified.
Run directly using Python:
python3 md5sift.py -s <scan_directory> -o <output_file> [OPTIONS]
After RPM installation, run:
md5sift -s <scan_directory> -o <output_file> [OPTIONS]
Argument | Description |
---|---|
-s, --scan-path |
Path to the directory to scan. Defaults to the current directory. |
-o, --output |
Path to the output CSV file. Defaults to md5_report.csv . |
-e, --extension |
Filter files by specific extension (e.g., .txt ). |
-f, --filelist |
Path to a CSV file containing specific file names to process. |
-v, --verbose |
Enable verbose mode for progress updates. |
-t, --threads |
Number of threads (default: CPU core count). |
--test |
Run in test mode and process a limited number of files. |
-a, --algorithm |
Hashing algorithm (md5 , sha1 , sha256 ). Defaults to md5 . |
--exclude |
Paths or directories to exclude from scanning. |
-h, --help |
Show help message. |
--version |
Show version information. |
Below are some examples of how to use md5sift (rpm package) with different options:
Scan a Directory and Save to CSV
md5sift -s /path/to/scan -o /path/to/output/report.csv
Filter by File Extension
md5sift -s /path/to/scan -o /path/to/output/report.csv -e .txt
Use a File List and Verbose Mode
md5sift -s /path/to/scan -o /path/to/output/report.csv -f /path/to/filelist.csv -v
Test Mode (Process First 10 Files)
md5sift -s /path/to/scan -o /path/to/output/report.csv --test 10
Use SHA-256 and Exclude Directories
md5sift -s /path/to/scan -o /path/to/output/report.csv -a sha256 --exclude /path/to/exclude_dir
- By default,
INFO
level logging is enabled. - Use
-v
(--verbose
) for real-time progress updates.
- To build the RPM package, install the required dependencies:
sudo dnf install rpm-build python3-devel python3-setuptools
- From the project root directory to generate the md5sift.spec file:
python3 setup.py genspec
- Build the RPM package:
python3 setup.py bdist_rpm
The RPM package will be generated in the dist/
directory.
This project is licensed under the MIT License.
Contributions are welcome! Please refer to the CONTRIBUTING.md file for guidance.
Please open an issue for support or feedback.