-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about batch effects in refine.bio datasets #455
Comments
Hi @kengcher, Thanks for your questions and for using refine.bio. The dataset you mention (GSE99039) is submitter-processed, which means we were unable to process the data from raw files and use whatever values the authors submitted to GEO (in this case, it is reported to be RMA normalized values). We do quantile normalize submitter-processed data for delivery, but have less control over what happens prior to that step. We do not perform any batch correction (e.g., ComBat). Looking at the description for this particular experiment, I would want to know if that separation corresponds to idiopathic PD vs. controls, but you do mention that the separation does not match any of the metadata in your post. Hope this helps! Let me know if you have additional questions. |
Hi Jacyln
Thanks for getting back!
GEO indicates that the CEL files are available:
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE99039
How does refine.bio decide whether to process from CEL or otherwise?
The two clusters do not match controls vs diseases:
[image: image.png]
…On Mon, Mar 29, 2021 at 11:40 AM Jaclyn Taroni ***@***.***> wrote:
Hi @kengcher <https://github.com/kengcher>,
Thanks for your questions and for using refine.bio. The dataset you
mention (GSE99039) is submitter-processed, which means we were unable to
process the data from raw files and use whatever values the authors
submitted to GEO (in this case, it is reported to be RMA normalized
values). We do quantile normalize submitter-processed data for delivery,
but have less control over what happens prior to that step. We do not
perform any batch correction (e.g., ComBat).
Looking at the description for this particular experiment, I would want to
know if that separation corresponds to idiopathic PD vs. controls, but you
do mention that the separation does not match any of the metadata in your
post.
Hope this helps! Let me know if you have additional questions.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#455 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIBKGAAXJ3ZW2K5BHCHOLTTGCNP7ANCNFSM4Z33T5XQ>
.
|
We've looked into why this particular experiment was not processed from raw and believe we may have identified a fix, which we will now need to test. If the fix works, we can expect to make the version of this experiment processed from raw within the next few weeks. We're in the middle of some infrastructure changes for the project, so we appreciate your patience! |
Hi!
I am trying to understand whether batch effects are corrected for in the refine.bio pipeline.
I downloaded the dataset GSE99039 (microarray) from refine.bio then looked at the dataset using PCA. I noticed that the dataset from refine.bio seem to have a clear separation that does not match any of the metadata.
refine.bio PCA
Hence would like to ask about
i. where is the part in the pipeline that does the (quantile?) normalization
ii. i understand that for the normalized data pipeline if any batch correction was performed.
Thank you.
The text was updated successfully, but these errors were encountered: