Skip to content

Latest commit

 

History

History
136 lines (105 loc) · 6.23 KB

setup.org

File metadata and controls

136 lines (105 loc) · 6.23 KB

Setting up environment

The information here is intended to make it easier for people to set up Jupyter Lab. To keep the notebooks in a neat state, please restart the kernel and run all cells prior to committing any changes.

Environments

When working on a data science project, it is very helpful to have control over your computational environment, e.g. the version of python you are using and the versions of the packages. The following methods are worth keeping in mind.

Google Colab

Probably the simplest way to run the notebooks in for these tutorials is Google’s Colab. This is a browser based notebook environment which only requires a browser to be installed on your machine and a Google account. Once you open this page, click the GitHub tab and paste in the URL of this repository: https://github.com/aezarebski/aas-extended-examples. This should then give you a list of the detected notebooks to select from as shown in the figure below.

./.resources/colab-screenshot-2022-10-12.png

Anaconda

Anaconda is a distribution of Python intended for data science. The standard individual version entails a wide variety of packages and applications including Jupyter Lab, and all of the packages used in this course. Therefore, an easy way to get set up for this course is simply to install Python through Anaconda.

The second key feature of Anaconda is the conda package manager. You may have run across pip package management in the past, where you install new packages using pip install packageName. Conda enhances this behaviour by ensuring consistency between newly installed and existing packages, and is run using conda install packageName. When possible, this approach is preferred to pip because you do not have the risk of breaking something you’ve already installed. However, conda does not have the same coverage as pip, and is sometimes less up to date. Conda can also be substantially slower if you have a large number of packages in your environment. If conda has issues, you can always fall back to pip, keeping in mind the risks.

Virtual environment

Setting up and using the virtual environment

If you are comfortable with the command line and already have python installed, you can use the built-in virtual environment module to set this up. Setting up a virtual environment is simple. Just follow the steps below in your terminal. We made a requirements.txt file which lets you install all of the necessary packages in one command as shown below.

# create and activate the virtual environment
python3 -m venv venv
source venv/bin/activate
# upgrade to the latest version of pip
pip install -U pip
# install required packages
pip install -r requirements.txt

Once you have a virtual environment working with jupyterlab installed, starting Jupyter Lab is as simple as

jupyter lab

and pointing your browser to the address printed out. When you are finished working with this virtual environment, deactive by running deactivate in your terminal.

Building your own the virtual environment

If you want to understand how we build this requirements.txt file, see the following instructions. You don’t need to use this unless you want to tweak the set up for your own projects.

# create and activate the virtual environment
python3 -m venv venv
source venv/bin/activate
# upgrade to the latest version of pip
pip install -U pip
# install required packages
pip install jupyterlab
pip install numpy
pip install scipy
pip install matplotlib
pip install pandas
pip install statsmodels

Making the requirements.txt file is easy, from a session with your environment activated run the command pip freeze > requirements.txt. The pip freeze command prints the packages in the current environment, and > requirements.txt pipes that text into a file of that name.

Nix shell

If you do not have Jupyter Lab set up on your machine this following option offers a simple way to set it up, provided you are happy to install Nix. The shell.nix file describes a package for running the notebooks in this repository. To activate the notebook server run the following command.

nix-shell --command "jupyter lab"

The shell.nix file is based on the instructions provided by jupyterWith repository from Tweag.

The repository

If you have cloned this repository and are working through the notebooks as part of the Applied Analytical Statistics course, it might be useful to download GitHub Desktop to assist in keeping your notebooks up to date with the versions online.

Colophon

Jupytext is an extension that allows you to pair a jupyter notebook and a plain text version of it. If you have this set up as described in the installation guide and the notebooks are paired properly, then you can work on either version and jupyterlab will keep them in sync in a nice way. This supports both python code and basic latex, (presumably via MathJax), so should be sufficient for most writing.

Why should you care about this? Because having a sensible plaintext version of these notebooks will make editing and version control much easier, and opens the possibility for automating the creation of “questions” notesbooks from the “answers” notebooks. Plus we get all the nice benefits of things like spell checking!

Note: It does not appear that the vanilla pandoc method generates sensible Ipython notebooks.

There is a script ./resources/check-questions-text.py that looks at the lines in a file (the plain text question files) and reports if there are any lines that might correspond to answers having snuck in. We should probably run this prior to committing code each time to make sure the notebooks are in decent shape. In the future, this could also be a way to do other automated checks on the notebooks.