Documentation is the most important part of any scientific project. Please read the instructions here carefully and follow them. This will ensure:
- You make good progress on your project in an organized manner
- When you look back at your work one year from now, you will be able to still make sense of it and use it (it is surprisingly easy and common to become completely blank about old code, old decisions or old work in a long project!)
- When you leave and someone else takes up your project, they are able to actually use the progress you made and move the project forward. This will benefit you as well when the project leads to a manuscript!
You can read this as well.
- Create a repository for your project using this repository as a template (see the
Use this remplate
button on the top right? Create a new repository from there!). - Select
include all branches
on the next page. - The owner should be
csndl-iitd
. - Give the repository a name (and description, if you want) based on your project.
- Keep it public and hit
Create repository
. - Install
Github desktop
on your computer (not necessary but it will make your life easier). - Clone the newly created repository to your computer using Github desktop.
- Now read the rest of this page to understand how to use your repository.
All project related discussion will happen through issues
on Github.com
. Create the first issue named Main discussion thread
if it is not already present and add the overall goal of the project there.
In our weekly meetings, we will decide action items that you need to work on. These should be added appropriately as new issues or continued discussions in existing issues, and appropriately referenced in the Main discussion thread. Also assign the issues to yourself or whoever is responsible for them systematically.
For each weekly meeting you should create a milestone
on Github.com. All the action items that we discuss for the next week should be assigned to the next milestone. Once you finish that action item, you can add the results to the corresponding issue. That way the results are automatically collected together in the milestone before our next meeting.
All project related reading you do should be added as PDFs to the references folder. The files should be named following the convention author et al_year_title.pdf
. Refer to this file in your issues when you refer to the paper. Typically you will also have an issue related to reading the paper where you can summarize whatever was relevant in the paper to our project.
Remember to switch to the main branch before adding references to this folder. Otherwise it will become difficult to track references scattered across branches. This matters only if you every create other branches for your analysis code.
I recommend using Zotero
for managing your references. It is easy to use, light-weight, free of charge and free of crap. And has all the goodies to integrate it with your choice word processors.
If you are designing an experiment, all experiment files should go in the experiment
folder. Please follow the following steps:
- Create a new branch on Github named
experiment
if it does not already exist. Typically it should exist. - Install
github desktop
on the computer on which the experiment will live, and clone this repository there. - Switch to the
experiment
branch. - Create the experiment directly in the experiment folder of this repository only.
- Every time you update the experiment, commit the changes to git including a short description of what you changed. You can use github desktop for doing this efficiently.
The analysis
folder should be used for keeping all your code and Jupyter notebooks.
- This folder must contain a
requirements.txt
file. This file keeps track of all the Python packages and their versions that you use in your analysis. - Keep this file updated by regularly running
pip freeze > requirements.txt
- Create your own branch from the analysis-template branch for all your analysis. Only final scripts and notebooks can be copied to the main branch after they are confirmed to work.
- Add proper
library
andnotebooks
folders to keep the code organized nicely. - Commit regularly, and surely when you complete an issue, and include a proper commit message and description.
All data should live in the data
folder. There are two sub-folders by default - raw
and processed
. As the names suggest, raw data should go in raw
folder. All the results of your processing should go in the processed
folder. As needed, add other required structure to keep the data well-organized.
As we use this system more, we will figure out how to handle large datasets. The readme will be updated accordingly.