-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
18 changed files
with
202 additions
and
107 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,5 @@ | ||
/target | ||
/Cargo.lock | ||
/*/target | ||
Cargo.lock | ||
*.py | ||
/.vscode |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,16 +1,6 @@ | ||
[package] | ||
name = "tf-binding-rs" | ||
version = "0.1.4" | ||
edition = "2021" | ||
description = "Fast transcription factor binding site prediction and FASTA manipulation in Rust" | ||
license = "MIT" | ||
|
||
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html | ||
|
||
[dependencies] | ||
ndarray = "0.16.1" | ||
polars = { version = "0.44.2", features = ["lazy", "dtype-struct", "log"] } | ||
serde = { version = "1.0.215", features = ["derive"] } | ||
thiserror = "2.0.3" | ||
statrs = "0.17.1" | ||
phf = {version = "0.11.2", features = ["macros"]} | ||
[workspace] | ||
resolver = "2" | ||
members = [ | ||
"tf-binding-rs", | ||
"motif-scanner" | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
[package] | ||
name = "motif-scanner" | ||
version = "0.1.0" | ||
edition = "2021" | ||
description = "Command line tool for scanning DNA sequences for transcription factor binding sites" | ||
authors = ["Jiayu Huang <[email protected]>"] | ||
license = "MIT" | ||
|
||
[dependencies] | ||
tf-binding-rs = { path = "../tf-binding-rs" } | ||
clap = { version = "4.5.21", features = ["derive"] } |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
fn main() { | ||
println!("Hello, world!"); | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,113 +1,76 @@ | ||
# tf-binding-rs (In Development) | ||
# TF Binding Analysis Tools | ||
|
||
This workspace contains tools for analyzing transcription factor (TF) binding sites in DNA sequences: | ||
|
||
- **[tf-binding-rs](tf-binding-rs/)**: A Rust library for TF binding site prediction and sequence analysis | ||
- **[motif-scanner](motif-scanner/)**: A command-line tool for scanning DNA sequences for TF binding sites | ||
|
||
## 🧬 tf-binding-rs | ||
|
||
[<img alt="github" src="https://img.shields.io/badge/github-peter6866/tf--binding--rs-8da0cb?style=for-the-badge&labelColor=555555&logo=github" height="20">](https://github.com/peter6866/tf-binding-rs) | ||
[<img alt="crates.io" src="https://img.shields.io/crates/v/tf-binding-rs.svg?style=for-the-badge&color=fc8d62&logo=rust" height="20">](https://crates.io/crates/tf-binding-rs) | ||
|
||
A Rust library for predicting transcription factor (TF) binding site occupancy in DNA sequences. This toolkit provides efficient implementations for: | ||
A Rust library providing efficient implementations for: | ||
|
||
- FASTA file manipulation and sequence processing | ||
- Position Weight Matrix (PWM) handling and Energy Weight Matrix (EWM) conversion | ||
- TF binding site occupancy prediction using statistical thermodynamics | ||
- Binding energy landscape and occupancy probability calculations | ||
- Position Weight Matrix (PWM) handling | ||
- Energy Weight Matrix (EWM) conversion | ||
- TF binding site occupancy prediction | ||
- Multi-TF occupancy analysis | ||
|
||
## Features | ||
|
||
- 🧬 Fast FASTA file reading and writing | ||
- 📊 PWM/EWM-based binding site analysis | ||
- 🔍 Efficient sequence scanning with energy matrices | ||
- 📈 Occupancy landscape calculation for multiple TFs | ||
- 🧮 Statistical thermodynamics-based predictions | ||
|
||
## Installation | ||
|
||
Add this to your `Cargo.toml`: | ||
|
||
```toml | ||
[dependencies] | ||
tf-binding-rs = "0.1.1" | ||
``` | ||
|
||
Or install using cargo: | ||
|
||
```bash | ||
cargo add tf-binding-rs | ||
``` | ||
|
||
## Examples | ||
|
||
### Reading FASTA Files | ||
|
||
```rust | ||
use tf_binding_rs::fasta; | ||
use tf_binding_rs::occupancy; | ||
|
||
fn main() -> Result<(), Box<dyn std::error::Error>> { | ||
// Read sequences from a FASTA file | ||
let sequences = fasta::read_fasta("path/to/sequences.fasta")?; | ||
|
||
// Print sequence information | ||
println!("Number of sequences: {}", sequences.height()); | ||
|
||
// Calculate GC content | ||
let gc_stats = fasta::gc_content(&sequences)?; | ||
println!("GC content analysis: {:?}", gc_stats); | ||
|
||
let ewms = occupancy::read_pwm_to_ewm("motifs.meme")?; | ||
let sequence = "ATCGATCGTAGCTACGT"; | ||
let landscape = occupancy::total_landscape(sequence, &ewms, -3.0)?; | ||
println!("Binding landscape:\n{}", landscape); | ||
Ok(()) | ||
} | ||
``` | ||
|
||
### Working with PWM Files | ||
|
||
```rust | ||
use tf_binding_rs::occupancy; | ||
## 🔍 motif-scanner | ||
|
||
fn main() -> Result<(), Box<dyn std::error::Error>> { | ||
// Read PWM motifs from MEME format file | ||
let pwm_collection = occupancy::read_pwm_files("path/to/motifs.meme")?; | ||
A command-line tool for scanning DNA sequences and predicting TF binding sites. Features: | ||
|
||
// Process each motif | ||
for (motif_id, pwm) in pwm_collection { | ||
println!("Processing motif: {}", motif_id); | ||
println!("Matrix dimensions: {:?}", pwm.shape()); | ||
} | ||
- Batch processing of sequence files | ||
- Occupancy score calculation | ||
- Multiple output formats (CSV, Parquet) | ||
- Filtering by occupancy threshold | ||
|
||
Ok(()) | ||
} | ||
```bash | ||
motif-scanner input.csv motifs.meme output.csv --cutoff 0.2 --mu 9.0 | ||
``` | ||
|
||
### Working with PWMs and Energy Matrices | ||
|
||
```rust | ||
use tf_binding_rs::occupancy; | ||
## Installation | ||
|
||
fn main() -> Result<(), Box<dyn std::error::Error>> { | ||
// Read PWMs and convert to Energy Weight Matrices | ||
let ewm_collection = occupancy::read_pwm_to_ewm("path/to/motifs.meme")?; | ||
### From Source | ||
|
||
// Calculate binding landscape for a sequence | ||
let sequence = "ATCGATCGTAGCTACGT"; | ||
let mu = -3.0; // chemical potential | ||
```bash | ||
# Clone the repository | ||
git clone https://github.com/peter6866/tf-binding-rs | ||
cd tf-binding-rs | ||
|
||
// Get occupancy predictions for all TFs | ||
let occupancy_landscape = occupancy::total_landscape( | ||
&sequence, | ||
&ewm_collection, | ||
mu | ||
)?; | ||
# Build both the library and the scanner | ||
cargo build --release --workspace | ||
|
||
println!("Occupancy predictions:\n{}", occupancy_landscape); | ||
Ok(()) | ||
} | ||
# Install the motif-scanner binary | ||
cargo install --path motif-scanner | ||
``` | ||
|
||
## Use Cases | ||
### From crates.io | ||
|
||
```bash | ||
# Install just the motif-scanner tool | ||
cargo install motif-scanner | ||
|
||
- Genomic sequence analysis | ||
- TF binding site prediction and quantification | ||
- Multi-factor binding landscape analysis | ||
- Regulatory sequence characterization | ||
- Statistical thermodynamics of protein-DNA interactions | ||
# For library usage, add to your Cargo.toml: | ||
[dependencies] | ||
tf-binding-rs = "0.1.4" | ||
``` | ||
|
||
## Documentation | ||
|
||
For detailed API documentation, visit [docs.rs/tf-binding-rs](https://docs.rs/tf-binding-rs) | ||
- [tf-binding-rs API Documentation](https://docs.rs/tf-binding-rs) | ||
- [motif-scanner Usage Guide](motif-scanner/README.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
[package] | ||
name = "tf-binding-rs" | ||
version = "0.1.4" | ||
edition = "2021" | ||
description = "Fast transcription factor binding site prediction and FASTA manipulation in Rust" | ||
license = "MIT" | ||
|
||
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html | ||
|
||
[dependencies] | ||
ndarray = "0.16.1" | ||
polars = { version = "0.44.2", features = ["lazy", "dtype-struct", "log"] } | ||
thiserror = "2.0.3" | ||
statrs = "0.17.1" | ||
phf = {version = "0.11.2", features = ["macros"]} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,113 @@ | ||
# tf-binding-rs (In Development) | ||
|
||
[<img alt="github" src="https://img.shields.io/badge/github-peter6866/tf--binding--rs-8da0cb?style=for-the-badge&labelColor=555555&logo=github" height="20">](https://github.com/peter6866/tf-binding-rs) | ||
[<img alt="crates.io" src="https://img.shields.io/crates/v/tf-binding-rs.svg?style=for-the-badge&color=fc8d62&logo=rust" height="20">](https://crates.io/crates/tf-binding-rs) | ||
|
||
A Rust library for predicting transcription factor (TF) binding site occupancy in DNA sequences. This toolkit provides efficient implementations for: | ||
|
||
- FASTA file manipulation and sequence processing | ||
- Position Weight Matrix (PWM) handling and Energy Weight Matrix (EWM) conversion | ||
- TF binding site occupancy prediction using statistical thermodynamics | ||
- Binding energy landscape and occupancy probability calculations | ||
- Multi-TF occupancy analysis | ||
|
||
## Features | ||
|
||
- 🧬 Fast FASTA file reading and writing | ||
- 📊 PWM/EWM-based binding site analysis | ||
- 🔍 Efficient sequence scanning with energy matrices | ||
- 📈 Occupancy landscape calculation for multiple TFs | ||
- 🧮 Statistical thermodynamics-based predictions | ||
|
||
## Installation | ||
|
||
Add this to your `Cargo.toml`: | ||
|
||
```toml | ||
[dependencies] | ||
tf-binding-rs = "0.1.1" | ||
``` | ||
|
||
Or install using cargo: | ||
|
||
```bash | ||
cargo add tf-binding-rs | ||
``` | ||
|
||
## Examples | ||
|
||
### Reading FASTA Files | ||
|
||
```rust | ||
use tf_binding_rs::fasta; | ||
|
||
fn main() -> Result<(), Box<dyn std::error::Error>> { | ||
// Read sequences from a FASTA file | ||
let sequences = fasta::read_fasta("path/to/sequences.fasta")?; | ||
|
||
// Print sequence information | ||
println!("Number of sequences: {}", sequences.height()); | ||
|
||
// Calculate GC content | ||
let gc_stats = fasta::gc_content(&sequences)?; | ||
println!("GC content analysis: {:?}", gc_stats); | ||
|
||
Ok(()) | ||
} | ||
``` | ||
|
||
### Working with PWM Files | ||
|
||
```rust | ||
use tf_binding_rs::occupancy; | ||
|
||
fn main() -> Result<(), Box<dyn std::error::Error>> { | ||
// Read PWM motifs from MEME format file | ||
let pwm_collection = occupancy::read_pwm_files("path/to/motifs.meme")?; | ||
|
||
// Process each motif | ||
for (motif_id, pwm) in pwm_collection { | ||
println!("Processing motif: {}", motif_id); | ||
println!("Matrix dimensions: {:?}", pwm.shape()); | ||
} | ||
|
||
Ok(()) | ||
} | ||
``` | ||
|
||
### Working with PWMs and Energy Matrices | ||
|
||
```rust | ||
use tf_binding_rs::occupancy; | ||
|
||
fn main() -> Result<(), Box<dyn std::error::Error>> { | ||
// Read PWMs and convert to Energy Weight Matrices | ||
let ewm_collection = occupancy::read_pwm_to_ewm("path/to/motifs.meme")?; | ||
|
||
// Calculate binding landscape for a sequence | ||
let sequence = "ATCGATCGTAGCTACGT"; | ||
let mu = -3.0; // chemical potential | ||
|
||
// Get occupancy predictions for all TFs | ||
let occupancy_landscape = occupancy::total_landscape( | ||
&sequence, | ||
&ewm_collection, | ||
mu | ||
)?; | ||
|
||
println!("Occupancy predictions:\n{}", occupancy_landscape); | ||
Ok(()) | ||
} | ||
``` | ||
|
||
## Use Cases | ||
|
||
- Genomic sequence analysis | ||
- TF binding site prediction and quantification | ||
- Multi-factor binding landscape analysis | ||
- Regulatory sequence characterization | ||
- Statistical thermodynamics of protein-DNA interactions | ||
|
||
## Documentation | ||
|
||
For detailed API documentation, visit [docs.rs/tf-binding-rs](https://docs.rs/tf-binding-rs) |
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.