🎯 CellLENS

📌 Overview

Cell Local Environment Neighborhood Scan (CellLENS) is a computational method that integrates cross-domain information from tissue samples to learn a single-cell representation embedding. By analyzing spatial proteomic and spatial transcriptomic datasets across different tissue types and disease settings, CellLENS identifies biologically relevant cell populations that were previously challenging to detect due to lost tissue morphological information.

For more details, see our preprint.

⚠️ Active Development: This repository is under active development. The current version is for reviewing and early access testing. A full installation guide and tutorial will be available soon.

🚀 Installation

CellLENS is hosted on pypi and can be installed via pip. We recommend working within a virtual environment. The package requires CUDA support as it uses PyTorch.

conda create -n celllens python=3.9   # Create a new environment
conda activate celllens               # Activate environment
pip install celllens==0.1.0           # Install CellLENS

After installation, import the module as:

import celllens

For usage details, check out our tutorials.

📖 Tutorials

📌 Tutorial I - Simplified usage with feature expression & cell locations.

📌 Tutorial II - Full version using feature expression, cell locations, & tissue images.

📌 Tutorial III - Advanced usage with ViT-based image feature learning.

⚙️ Key Parameters & Recommendations

Name	Function	Description	Recommendation
`nbhd_composition`	`SNAP_Dataset()`	Number of nearest neighbors (cells) to consider when calculating the ‘neighborhood composition vector’. This vector is involved in the CellLENS training to learn local cellular patterns. Default = 20	Generally no need to change this parameter. Could tune from 10-50, depending on the scale of the local cellular pattern wanted to learn in the tissue.
`feature_neighbor`	`SNAP_Dataset()`	Number of nearest neighbors to consider when linking the nodes (cells) on the feature (eg. protein expression) similarity graph for the GNN training. Default = 15	No need to change this parameter in most cases.
`spatial_neighbor`	`SNAP_Dataset()`	Number of nearest neighbors to consider when linking the nodes (cells) on the spatial (location) similarity graph for the GNN training. Default = 15	No need to change this parameter in most cases.
`pca_components`	`.initialize()`	Number of components to use on the PCA reduced feature expression input. Default = 25	Users should decide the value here. This is similar to the conventional PCA selection process, for example scRNA-seq studies.
`celltype`	`.initialize()`	Column to use as initial cell type labels. If input is `'feature_labels'` then will use Leiden clustering to get the initial labels; Alternatively, the user can also supply the pre-generated cell type labels from other means (eg. previous annotation, label transfer, and more). Default = ‘feature_labels’	If no pre-generated cell type information is available, the user can use the default; If the coarse cell type information is available, the user can supply here.
`cluster_res`	`.initialize()`	The resolution parameter for Leiden clustering will determine the initial labels used to calculate the 'neighborhood composition vector,' which serves as input for the CellLENS model learning process. Alternatively, users may opt to specify a fixed number of clusters instead of setting a resolution (see details below). Default = 0.5	Resolution of 0.5 works in most cases. However, this might be influenced when the dataset is too large or too small. We suggest the user monitor the verbose print-outs during this step. Generally, initial labels of 8-15 types should yield optimized results.
`n_clusters`	`.initialize()`	Number of clusters to generate during Leiden clustering. If None will use the specified resolution to run Leiden clustering instead. Default = None	Same description as above. The user can choose to specify the number of initial label types to be generated during Leiden clustering. We suggest around 8-15 clusters.
`size`	`.prepare_images()`	The pixel size to be considered when cropping the image for each individual cell. The image will then be used to extract local tissue level morphological information in the CNN model. Default = 512	Users should decide the value here, since the pixel’s physical distance could differ from modality to modality. Generally, a translated physical distance of 50 - 200 μm for the image size works well.
`truncation`	`.prepare_images()`	Pixel intensity quantile as threshold to binarize (0,1) the images input in the CNN model. Pixels with intensity level larger than this quantile will be set as 1, and pixels smaller than this quantile will be set as 0. Default = 0.9	Generally, we found 0.7-0.9 quantile works well. The users can visually check the images with different quantiles before running the CNN model.
`cnn_latent_dim`	`CellSNAP()`	The size of the latent dimension (extracted image features) from the CNN model. Default = 128	No need to change this parameter in most cases.
`gnn_latent_dim`	`CellSNAP()`	The size of the latent dimension (extracted fused representation) from the duo-GNN model. Default = 32	No need to change this parameter in most cases.
`fc_out_dim`	`CellSNAP()`	Output dimension for expression-GNN and input to MLP head. Larger value of this parameter encourages the model to learn more expression-related information. Default = 33	No need to change this parameter in most cases.
`cnn_out_dim`	`CellSNAP()`	Output dimension for spatial-GNN and input to MLP head. Larger value of this parameter encourages the model to learn more image morphology-related information. Default = 11	No need to change this parameter in most cases.
`round`	`.get_snap_embedding()`	Number of times the CellLENS duo-GNN model is trained. The repeated training is to produce a robust final representation. Default = 5	No need to change this parameter in most cases.
`k`	`.get_snap_embedding()`	We run SVD on the embedding generated by repeated rounds of duo-GNN training, and retrieve the k dimensions of the SVD results as the final representation. Default = 32	No need to change this parameter in most cases.

📢 Citation

If you find CellLENS useful, please cite our preprint.

@article{yourcitation2024,
  author    = {Your Name et al.},
  title     = {CellLENS: A Spatial Multi-Omics Representation Learning Method},
  journal   = {bioRxiv},
  year      = {2024},
  doi       = {10.1101/2024.05.12.593710v1}
}

📬 Contact & Contributions

We welcome contributions and feedback! Please open an issue or submit a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
.github/workflows		.github/workflows
Manuscript_archive		Manuscript_archive
Manuscript_archive_Part2(Rev)		Manuscript_archive_Part2(Rev)
build/lib/celllens		build/lib/celllens
data/codex_murine		data/codex_murine
dist		dist
envs		envs
media		media
src		src
tutorials		tutorials
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎯 CellLENS

📌 Overview

🚀 Installation

📖 Tutorials

⚙️ Key Parameters & Recommendations

📢 Citation

📬 Contact & Contributions

About

Releases

Packages

Contributors 4

Languages

License

sggao/celllens

Folders and files

Latest commit

History

Repository files navigation

🎯 CellLENS

📌 Overview

🚀 Installation

📖 Tutorials

⚙️ Key Parameters & Recommendations

📢 Citation

📬 Contact & Contributions

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages