Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding "GMM definition mode" #100

Open
sjspielman opened this issue Aug 17, 2023 · 0 comments
Open

Question regarding "GMM definition mode" #100

sjspielman opened this issue Aug 17, 2023 · 0 comments

Comments

@sjspielman
Copy link

Hi copyKat maintainers, thanks for this package! I'm posting this issue to learn more about one aspect of this method described in the publication:

The cluster with minimal estimated variance is defined as the ‘confident diploid cells’ by following a strict classification criterion. Potential misclassifications may occur when the data have only a few normal cells or when the tumor cells have near-diploid genomes with limited copy number aberration (CNA) events. In this case, CopyKAT provides a ‘GMM definition’ mode to identify the diploid normal cells one by one, where a mixture of three Gaussian models of gene expression in single cells is assumed to represent genomic gains, losses and neutral states. A single cell is then defined as a confident diploid cell when genes in neutral states account for at least 99% of the expressed genes.

I am hoping to use copyKat on some pediatric scRNA-seq data, which has far fewer aberrations compared to an adult cancer sample. Since I expect misclassifications in pediatric data, I was hoping to understand how to specify the "GMM definition mode" referenced in this paragraph. But, I don't see anything about this setting in the main copykat() function. Is this mode something that gets automatically applied in the package depending on certain internal results, or is there something else I should specify when using copyKat to invoke this mode?
I've already seen that using a correlational distance measure is probably preferable to euclidian for my circumstances, so now just looking for other ways like this I can help the copyKat algorithm work with my pediatric data.

Thanks very much for any advice here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant