This repository simply captures the data analysis scripts used for the paper in Rmd notebook format. The notebooks briefly discuss the rationale behind the approaches taken, and show the closest programmatical version to the figures in the paper. The pipeline has been written so that a relatively unskilled user can simply clone a 'lightweight' repository and run each script straight away (i.e. the scripts are intended to be platform-independent, take care of installing the necessary libraries, and all other resources are either included already or downloaded programmatically). As I (JdN) am relatively unskilled myself, and tests have only been done in RStudio using Mac OS X, some adaptation may be required - please feel free to raise an issue or contact me directly.
Scripts #1 to #4 deal with the bulk RNAseq analysis of data generated by Jerome Korzelius and colleagues. This pipeline was written by Joaquín de Navascués, based on earlier work by Aleix Puig-Barbé.
Scripts #5 to #7 deal with the analysis of scRNAseq data already published by the Perrimon lab (Hung et al., 2020) and the Fly Cell Atlas consortium (Li et al., 2022). The pipeline was written by Vinícius Dias Nirello, and adapted for sharing by Joaquín de Navascués. Our pipeline uses data from Hung et al., (2020) whose integration was computed by the authors (and shared directly with us) from their data available at GEO. Because of the differences in C++ computing libraries and compilers working under the hood of R in different machines, this integrated data and their UMAP representation cannot be reproduced easily. Therefore we provide another script (integration_Hung2020, based on the scripts from that publication) that shows how the analysis could have been done purely from data deposited in public repositories, and pipe the results into script #5.
Note: This is a lightweight version of the scripts - no data are stored here, and instead they are automatically downloaded (and often deleted after loading); the only figures produced are for the notebooks (those for the data are saved in a different folder). However, you can obtain an archived version with the datasets and final figures from Zenodo:
-
Joaquín de Navascués @jdenavascues/ORCID
-
Vinícius Dias Nirello Google Scholar/GitHub
-
Aleix Puig-Barbé @AleixPuig7/ORCID/GitHub
Code snippets taken from Stack Overflow and other places are linked where they are used.
Work supported by:
- funding from Cardiff University and the University of Essex
- NC3Rs SKT grant NC/W001047/1
- DFG Grant KO5594/1-1
- an EMBO Long-Term Fellowship
- FAPESP fellowship #2021/00393-9
- FAPESP São Paulo Excellence Chair #2019/16113-5
- ERC Advanced Grant no. 268515