The information here is intended to make it easier for people to set up Jupyter Lab. To keep the notebooks in a neat state, please restart the kernel and run all cells prior to committing any changes.
When working on a data science project, it is very helpful to have control over your computational environment, e.g. the version of python you are using and the versions of the packages. The following methods are worth keeping in mind.
Probably the simplest way to run the notebooks in for these tutorials is
Google’s Colab. This is a browser based notebook environment which only requires
a browser to be installed on your machine and a Google account. Once you open
this page, click the GitHub tab and paste in the URL of this repository:
https://github.com/aezarebski/aas-extended-examples
. This should then give you
a list of the detected notebooks to select from as shown in the figure below.
Anaconda is a distribution of Python intended for data science. The standard individual version entails a wide variety of packages and applications including Jupyter Lab, and all of the packages used in this course. Therefore, an easy way to get set up for this course is simply to install Python through Anaconda.
The second key feature of Anaconda is the conda
package manager. You may have run across pip
package management in the past, where you install new packages using pip install packageName
. Conda enhances this behaviour by ensuring consistency between newly installed and existing packages, and is run using conda install packageName
. When possible, this approach is preferred to pip because you do not have the risk of breaking something you’ve already installed. However, conda does not have the same coverage as pip, and is sometimes less up to date. Conda can also be substantially slower if you have a large number of packages in your environment. If conda has issues, you can always fall back to pip, keeping in mind the risks.
If you are comfortable with the command line and already have python installed,
you can use the built-in virtual environment module to set this up. Setting up a
virtual environment is simple. Just follow the steps below in your terminal. We
made a requirements.txt
file which lets you install all of the necessary
packages in one command as shown below.
# create and activate the virtual environment
python3 -m venv venv
source venv/bin/activate
# upgrade to the latest version of pip
pip install -U pip
# install required packages
pip install -r requirements.txt
Once you have a virtual environment working with jupyterlab installed, starting Jupyter Lab is as simple as
jupyter lab
and pointing your browser to the address printed out. When you are finished
working with this virtual environment, deactive by running deactivate
in your
terminal.
If you want to understand how we build this requirements.txt
file, see the
following instructions. You don’t need to use this unless you want to tweak the
set up for your own projects.
# create and activate the virtual environment
python3 -m venv venv
source venv/bin/activate
# upgrade to the latest version of pip
pip install -U pip
# install required packages
pip install jupyterlab
pip install numpy
pip install scipy
pip install matplotlib
pip install pandas
pip install statsmodels
Making the requirements.txt
file is easy, from a session with your
environment activated run the command pip freeze > requirements.txt
. The pip
freeze
command prints the packages in the current environment, and >
requirements.txt
pipes that text into a file of that name.
If you do not have Jupyter Lab set up on your machine this following option
offers a simple way to set it up, provided you are happy to install Nix. The
shell.nix
file describes a package for running the notebooks in this
repository. To activate the notebook server run the following command.
nix-shell --command "jupyter lab"
The shell.nix
file is based on the instructions provided by
jupyterWith repository
from Tweag.
If you have cloned this repository and are working through the notebooks as part of the Applied Analytical Statistics course, it might be useful to download GitHub Desktop to assist in keeping your notebooks up to date with the versions online.
Jupytext is an extension that allows you to pair a jupyter notebook and a plain text version of it. If you have this set up as described in the installation guide and the notebooks are paired properly, then you can work on either version and jupyterlab will keep them in sync in a nice way. This supports both python code and basic latex, (presumably via MathJax), so should be sufficient for most writing.
Why should you care about this? Because having a sensible plaintext version of these notebooks will make editing and version control much easier, and opens the possibility for automating the creation of “questions” notesbooks from the “answers” notebooks. Plus we get all the nice benefits of things like spell checking!
Note: It does not appear that the vanilla pandoc method generates sensible Ipython notebooks.
There is a script ./resources/check-questions-text.py
that looks at the lines
in a file (the plain text question files) and reports if there are any lines
that might correspond to answers having snuck in. We should probably run this
prior to committing code each time to make sure the notebooks are in decent
shape. In the future, this could also be a way to do other automated checks on
the notebooks.