Cell Local Environment Neighborhood Scan (CellLENS) is a computational method that integrates cross-domain information from tissue samples to learn a single-cell representation embedding. By analyzing spatial proteomic and spatial transcriptomic datasets across different tissue types and disease settings, CellLENS identifies biologically relevant cell populations that were previously challenging to detect due to lost tissue morphological information.
For more details, see our preprint.
⚠️ Active Development: This repository is under active development. The current version is for reviewing and early access testing. A full installation guide and tutorial will be available soon.
CellLENS is hosted on pypi
and can be installed via pip
. We recommend working within a virtual environment. The package requires CUDA
support as it uses PyTorch
.
conda create -n celllens python=3.9 # Create a new environment
conda activate celllens # Activate environment
pip install celllens==0.1.0 # Install CellLENS
After installation, import the module as:
import celllens
For usage details, check out our tutorials.
📌 Tutorial I - Simplified usage with feature expression & cell locations.
📌 Tutorial II - Full version using feature expression, cell locations, & tissue images.
📌 Tutorial III - Advanced usage with ViT-based image feature learning.
Name | Function | Description | Recommendation |
---|---|---|---|
nbhd_composition |
SNAP_Dataset() |
Number of nearest neighbors (cells) to consider when calculating the ‘neighborhood composition vector’. This vector is involved in the CellLENS training to learn local cellular patterns. Default = 20 |
Generally no need to change this parameter. Could tune from 10-50, depending on the scale of the local cellular pattern wanted to learn in the tissue. |
feature_neighbor |
SNAP_Dataset() |
Number of nearest neighbors to consider when linking the nodes (cells) on the feature (eg. protein expression) similarity graph for the GNN training. Default = 15 |
No need to change this parameter in most cases. |
spatial_neighbor |
SNAP_Dataset() |
Number of nearest neighbors to consider when linking the nodes (cells) on the spatial (location) similarity graph for the GNN training. Default = 15 |
No need to change this parameter in most cases. |
pca_components |
.initialize() |
Number of components to use on the PCA reduced feature expression input. Default = 25 |
Users should decide the value here. This is similar to the conventional PCA selection process, for example scRNA-seq studies. |
celltype |
.initialize() |
Column to use as initial cell type labels. If input is 'feature_labels' then will use Leiden clustering to get the initial labels; Alternatively, the user can also supply the pre-generated cell type labels from other means (eg. previous annotation, label transfer, and more). Default = ‘feature_labels’ |
If no pre-generated cell type information is available, the user can use the default; If the coarse cell type information is available, the user can supply here. |
cluster_res |
.initialize() |
The resolution parameter for Leiden clustering will determine the initial labels used to calculate the 'neighborhood composition vector,' which serves as input for the CellLENS model learning process. Alternatively, users may opt to specify a fixed number of clusters instead of setting a resolution (see details below). Default = 0.5 |
Resolution of 0.5 works in most cases. However, this might be influenced when the dataset is too large or too small. We suggest the user monitor the verbose print-outs during this step. Generally, initial labels of 8-15 types should yield optimized results. |
n_clusters |
.initialize() |
Number of clusters to generate during Leiden clustering. If None will use the specified resolution to run Leiden clustering instead. Default = None |
Same description as above. The user can choose to specify the number of initial label types to be generated during Leiden clustering. We suggest around 8-15 clusters. |
size |
.prepare_images() |
The pixel size to be considered when cropping the image for each individual cell. The image will then be used to extract local tissue level morphological information in the CNN model. Default = 512 |
Users should decide the value here, since the pixel’s physical distance could differ from modality to modality. Generally, a translated physical distance of 50 - 200 μm for the image size works well. |
truncation |
.prepare_images() |
Pixel intensity quantile as threshold to binarize (0,1) the images input in the CNN model. Pixels with intensity level larger than this quantile will be set as 1, and pixels smaller than this quantile will be set as 0. Default = 0.9 |
Generally, we found 0.7-0.9 quantile works well. The users can visually check the images with different quantiles before running the CNN model. |
cnn_latent_dim |
CellSNAP() |
The size of the latent dimension (extracted image features) from the CNN model. Default = 128 |
No need to change this parameter in most cases. |
gnn_latent_dim |
CellSNAP() |
The size of the latent dimension (extracted fused representation) from the duo-GNN model. Default = 32 |
No need to change this parameter in most cases. |
fc_out_dim |
CellSNAP() |
Output dimension for expression-GNN and input to MLP head. Larger value of this parameter encourages the model to learn more expression-related information. Default = 33 |
No need to change this parameter in most cases. |
cnn_out_dim |
CellSNAP() |
Output dimension for spatial-GNN and input to MLP head. Larger value of this parameter encourages the model to learn more image morphology-related information. Default = 11 |
No need to change this parameter in most cases. |
round |
.get_snap_embedding() |
Number of times the CellLENS duo-GNN model is trained. The repeated training is to produce a robust final representation. Default = 5 |
No need to change this parameter in most cases. |
k |
.get_snap_embedding() |
We run SVD on the embedding generated by repeated rounds of duo-GNN training, and retrieve the k dimensions of the SVD results as the final representation. Default = 32 |
No need to change this parameter in most cases. |
If you find CellLENS useful, please cite our preprint.
@article{yourcitation2024,
author = {Your Name et al.},
title = {CellLENS: A Spatial Multi-Omics Representation Learning Method},
journal = {bioRxiv},
year = {2024},
doi = {10.1101/2024.05.12.593710v1}
}
We welcome contributions and feedback! Please open an issue or submit a pull request.