Skip to content

Commit

Permalink
restructure to workspace
Browse files Browse the repository at this point in the history
  • Loading branch information
peter6866 committed Nov 17, 2024
1 parent af04b49 commit 2910a55
Show file tree
Hide file tree
Showing 18 changed files with 202 additions and 107 deletions.
12 changes: 6 additions & 6 deletions .github/workflows/rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@ on:
push:
branches: ['master']
paths:
- 'src/**'
- 'tests/**'
- 'tf-binding-rs/**'
- 'motif-scanner/**'
pull_request:
branches: ['master']
paths:
- 'src/**'
- 'tests/**'
- 'tf-binding-rs/**'
- 'motif-scanner/**'

env:
CARGO_TERM_COLOR: always
Expand All @@ -22,6 +22,6 @@ jobs:
steps:
- uses: actions/checkout@v4
- name: Build
run: cargo build --verbose
run: cargo build --workspace --verbose
- name: Run tests
run: cargo test --verbose
run: cargo test --workspace --verbose
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
/target
/Cargo.lock
/*/target
Cargo.lock
*.py
/.vscode
22 changes: 6 additions & 16 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,16 +1,6 @@
[package]
name = "tf-binding-rs"
version = "0.1.4"
edition = "2021"
description = "Fast transcription factor binding site prediction and FASTA manipulation in Rust"
license = "MIT"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
ndarray = "0.16.1"
polars = { version = "0.44.2", features = ["lazy", "dtype-struct", "log"] }
serde = { version = "1.0.215", features = ["derive"] }
thiserror = "2.0.3"
statrs = "0.17.1"
phf = {version = "0.11.2", features = ["macros"]}
[workspace]
resolver = "2"
members = [
"tf-binding-rs",
"motif-scanner"
]
11 changes: 11 additions & 0 deletions motif-scanner/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[package]
name = "motif-scanner"
version = "0.1.0"
edition = "2021"
description = "Command line tool for scanning DNA sequences for transcription factor binding sites"
authors = ["Jiayu Huang <[email protected]>"]
license = "MIT"

[dependencies]
tf-binding-rs = { path = "../tf-binding-rs" }
clap = { version = "4.5.21", features = ["derive"] }
Empty file added motif-scanner/README.md
Empty file.
3 changes: 3 additions & 0 deletions motif-scanner/src/main.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
fn main() {
println!("Hello, world!");
}
129 changes: 46 additions & 83 deletions readme.md
Original file line number Diff line number Diff line change
@@ -1,113 +1,76 @@
# tf-binding-rs (In Development)
# TF Binding Analysis Tools

This workspace contains tools for analyzing transcription factor (TF) binding sites in DNA sequences:

- **[tf-binding-rs](tf-binding-rs/)**: A Rust library for TF binding site prediction and sequence analysis
- **[motif-scanner](motif-scanner/)**: A command-line tool for scanning DNA sequences for TF binding sites

## 🧬 tf-binding-rs

[<img alt="github" src="https://img.shields.io/badge/github-peter6866/tf--binding--rs-8da0cb?style=for-the-badge&labelColor=555555&logo=github" height="20">](https://github.com/peter6866/tf-binding-rs)
[<img alt="crates.io" src="https://img.shields.io/crates/v/tf-binding-rs.svg?style=for-the-badge&color=fc8d62&logo=rust" height="20">](https://crates.io/crates/tf-binding-rs)

A Rust library for predicting transcription factor (TF) binding site occupancy in DNA sequences. This toolkit provides efficient implementations for:
A Rust library providing efficient implementations for:

- FASTA file manipulation and sequence processing
- Position Weight Matrix (PWM) handling and Energy Weight Matrix (EWM) conversion
- TF binding site occupancy prediction using statistical thermodynamics
- Binding energy landscape and occupancy probability calculations
- Position Weight Matrix (PWM) handling
- Energy Weight Matrix (EWM) conversion
- TF binding site occupancy prediction
- Multi-TF occupancy analysis

## Features

- 🧬 Fast FASTA file reading and writing
- 📊 PWM/EWM-based binding site analysis
- 🔍 Efficient sequence scanning with energy matrices
- 📈 Occupancy landscape calculation for multiple TFs
- 🧮 Statistical thermodynamics-based predictions

## Installation

Add this to your `Cargo.toml`:

```toml
[dependencies]
tf-binding-rs = "0.1.1"
```

Or install using cargo:

```bash
cargo add tf-binding-rs
```

## Examples

### Reading FASTA Files

```rust
use tf_binding_rs::fasta;
use tf_binding_rs::occupancy;

fn main() -> Result<(), Box<dyn std::error::Error>> {
// Read sequences from a FASTA file
let sequences = fasta::read_fasta("path/to/sequences.fasta")?;

// Print sequence information
println!("Number of sequences: {}", sequences.height());

// Calculate GC content
let gc_stats = fasta::gc_content(&sequences)?;
println!("GC content analysis: {:?}", gc_stats);

let ewms = occupancy::read_pwm_to_ewm("motifs.meme")?;
let sequence = "ATCGATCGTAGCTACGT";
let landscape = occupancy::total_landscape(sequence, &ewms, -3.0)?;
println!("Binding landscape:\n{}", landscape);
Ok(())
}
```

### Working with PWM Files

```rust
use tf_binding_rs::occupancy;
## 🔍 motif-scanner

fn main() -> Result<(), Box<dyn std::error::Error>> {
// Read PWM motifs from MEME format file
let pwm_collection = occupancy::read_pwm_files("path/to/motifs.meme")?;
A command-line tool for scanning DNA sequences and predicting TF binding sites. Features:

// Process each motif
for (motif_id, pwm) in pwm_collection {
println!("Processing motif: {}", motif_id);
println!("Matrix dimensions: {:?}", pwm.shape());
}
- Batch processing of sequence files
- Occupancy score calculation
- Multiple output formats (CSV, Parquet)
- Filtering by occupancy threshold

Ok(())
}
```bash
motif-scanner input.csv motifs.meme output.csv --cutoff 0.2 --mu 9.0
```

### Working with PWMs and Energy Matrices

```rust
use tf_binding_rs::occupancy;
## Installation

fn main() -> Result<(), Box<dyn std::error::Error>> {
// Read PWMs and convert to Energy Weight Matrices
let ewm_collection = occupancy::read_pwm_to_ewm("path/to/motifs.meme")?;
### From Source

// Calculate binding landscape for a sequence
let sequence = "ATCGATCGTAGCTACGT";
let mu = -3.0; // chemical potential
```bash
# Clone the repository
git clone https://github.com/peter6866/tf-binding-rs
cd tf-binding-rs

// Get occupancy predictions for all TFs
let occupancy_landscape = occupancy::total_landscape(
&sequence,
&ewm_collection,
mu
)?;
# Build both the library and the scanner
cargo build --release --workspace

println!("Occupancy predictions:\n{}", occupancy_landscape);
Ok(())
}
# Install the motif-scanner binary
cargo install --path motif-scanner
```

## Use Cases
### From crates.io

```bash
# Install just the motif-scanner tool
cargo install motif-scanner

- Genomic sequence analysis
- TF binding site prediction and quantification
- Multi-factor binding landscape analysis
- Regulatory sequence characterization
- Statistical thermodynamics of protein-DNA interactions
# For library usage, add to your Cargo.toml:
[dependencies]
tf-binding-rs = "0.1.4"
```

## Documentation

For detailed API documentation, visit [docs.rs/tf-binding-rs](https://docs.rs/tf-binding-rs)
- [tf-binding-rs API Documentation](https://docs.rs/tf-binding-rs)
- [motif-scanner Usage Guide](motif-scanner/README.md)
15 changes: 15 additions & 0 deletions tf-binding-rs/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
[package]
name = "tf-binding-rs"
version = "0.1.4"
edition = "2021"
description = "Fast transcription factor binding site prediction and FASTA manipulation in Rust"
license = "MIT"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
ndarray = "0.16.1"
polars = { version = "0.44.2", features = ["lazy", "dtype-struct", "log"] }
thiserror = "2.0.3"
statrs = "0.17.1"
phf = {version = "0.11.2", features = ["macros"]}
113 changes: 113 additions & 0 deletions tf-binding-rs/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# tf-binding-rs (In Development)

[<img alt="github" src="https://img.shields.io/badge/github-peter6866/tf--binding--rs-8da0cb?style=for-the-badge&labelColor=555555&logo=github" height="20">](https://github.com/peter6866/tf-binding-rs)
[<img alt="crates.io" src="https://img.shields.io/crates/v/tf-binding-rs.svg?style=for-the-badge&color=fc8d62&logo=rust" height="20">](https://crates.io/crates/tf-binding-rs)

A Rust library for predicting transcription factor (TF) binding site occupancy in DNA sequences. This toolkit provides efficient implementations for:

- FASTA file manipulation and sequence processing
- Position Weight Matrix (PWM) handling and Energy Weight Matrix (EWM) conversion
- TF binding site occupancy prediction using statistical thermodynamics
- Binding energy landscape and occupancy probability calculations
- Multi-TF occupancy analysis

## Features

- 🧬 Fast FASTA file reading and writing
- 📊 PWM/EWM-based binding site analysis
- 🔍 Efficient sequence scanning with energy matrices
- 📈 Occupancy landscape calculation for multiple TFs
- 🧮 Statistical thermodynamics-based predictions

## Installation

Add this to your `Cargo.toml`:

```toml
[dependencies]
tf-binding-rs = "0.1.1"
```

Or install using cargo:

```bash
cargo add tf-binding-rs
```

## Examples

### Reading FASTA Files

```rust
use tf_binding_rs::fasta;

fn main() -> Result<(), Box<dyn std::error::Error>> {
// Read sequences from a FASTA file
let sequences = fasta::read_fasta("path/to/sequences.fasta")?;

// Print sequence information
println!("Number of sequences: {}", sequences.height());

// Calculate GC content
let gc_stats = fasta::gc_content(&sequences)?;
println!("GC content analysis: {:?}", gc_stats);

Ok(())
}
```

### Working with PWM Files

```rust
use tf_binding_rs::occupancy;

fn main() -> Result<(), Box<dyn std::error::Error>> {
// Read PWM motifs from MEME format file
let pwm_collection = occupancy::read_pwm_files("path/to/motifs.meme")?;

// Process each motif
for (motif_id, pwm) in pwm_collection {
println!("Processing motif: {}", motif_id);
println!("Matrix dimensions: {:?}", pwm.shape());
}

Ok(())
}
```

### Working with PWMs and Energy Matrices

```rust
use tf_binding_rs::occupancy;

fn main() -> Result<(), Box<dyn std::error::Error>> {
// Read PWMs and convert to Energy Weight Matrices
let ewm_collection = occupancy::read_pwm_to_ewm("path/to/motifs.meme")?;

// Calculate binding landscape for a sequence
let sequence = "ATCGATCGTAGCTACGT";
let mu = -3.0; // chemical potential

// Get occupancy predictions for all TFs
let occupancy_landscape = occupancy::total_landscape(
&sequence,
&ewm_collection,
mu
)?;

println!("Occupancy predictions:\n{}", occupancy_landscape);
Ok(())
}
```

## Use Cases

- Genomic sequence analysis
- TF binding site prediction and quantification
- Multi-factor binding landscape analysis
- Regulatory sequence characterization
- Statistical thermodynamics of protein-DNA interactions

## Documentation

For detailed API documentation, visit [docs.rs/tf-binding-rs](https://docs.rs/tf-binding-rs)
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
1 change: 0 additions & 1 deletion src/occupancy.rs → tf-binding-rs/src/occupancy.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ use crate::types::*;
use polars::lazy::dsl::*;
use polars::prelude::*;
use std::collections::HashMap;
use std::fmt::format;
use std::fs::File;
use std::io::{BufRead, BufReader};
use std::iter::Peekable;
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.

0 comments on commit 2910a55

Please sign in to comment.