To automatically generate the API documentation from the docstrings,
cd doc; sphinx-quickstart
cd ..; sphinx-apidoc -f -o doc mclearn
and the doc can be built by
cd doc; make html
During development, if we want the changes we make to the package to take effect immediately without reinstalling, then provide the develop
argument when first installing the package
python setup.py develop
and in the Jupyter notebooks, the autoreload
extension will automatically re-import all the packages every time a code cell is run:
%load_ext autoreload
%autoreload 2
Some algorithms are fairly computationally expensive, espcially on the bigger datasets. One option to speed up experiments is to rent a spot instance on Amazon EC2. To start, log into the AWS management console and click over to EC2. Then choose a region - US West (Oregon) tends to have the lowest spot prices. Create a spot request and select the Ubuntu Server 14.04 LTS (HVM) AMI. Pick an appropriate instance type, for example c3.8xlarge. Set security group to
- SSH 22 My IP
- HTTPS 443 My IP
- Custom TCP 8888 My IP
and download a new private/public key file.
After we have ssh
into the instance,
ssh -i <path/to/key.pem> ubuntu@<DNS>
start by updating the system
sudo apt-get update
sudo apt-get dist-upgrade -y
sudo apt-get install -y git fio gcc
and install Anaconda Python:
wget https://repo.continuum.io/archive/Anaconda3-5.0.0.1-Linux-x86_64.sh
bash Anaconda3-5.0.0.1-Linux-x86_64.sh
source ~/.bashrc
conda update -y conda anaconda
conda install -y joblib
rm Anaconda3-5.0.0.1-Linux-x86_64.sh
To install the bleeding-edge version of mclearn
for development:
git clone https://github.com/chengsoonong/mclass-sky.git
cd mclass-sky; python setup.py develop
Create a Jupyter notebook configuration file:
jupyter notebook --generate-config
Prepare a hashed password:
ipython
In [1]: from notebook.auth import passwd
In [2]: passwd()
In [3]: exit
Tell the notebook to communicate via a secure protocol mode by setting the certfile
option to our self-signed certificate:
mkdir certificates
openssl req -x509 -nodes -days 365 -newkey rsa:1024 -keyout certificates/jupyter.key -out certificates/jupyter.pem
Open the config file
vim .jupyter/jupyter_notebook_config.py
and add the following lines, replacing the example hash with the one generated:
# Set options for certfile, ip, password, and toggle off browser auto-opening
c.NotebookApp.certfile = u'/home/ubuntu/certificates/jupyter.pem'
c.NotebookApp.keyfile = u'/home/ubuntu/certificates/jupyter.key'
# Set ip to '*' to bind on all interfaces (ips) for the public server
c.NotebookApp.ip = '*'
c.NotebookApp.password = u'sha1:<hash...>'
c.NotebookApp.open_browser = False
# It is a good idea to set a known, fixed port for server access
c.NotebookApp.port = 8888
Start the notebook with jupyter notebook
.
Create a project folder:
sudo mkdir ~/projects
sudo chmod 777 ~/projects
Back in the EC2 Management Console, select the running instance. Choose Actions - Image - Create Image. This will create our own customised AMI, which we can use as a starting point for future instances (without needing to setting up everything from sratch again).
Amazon EBS provides persistent block level storage volumes for use with EC2 instances and is useful when working with large datasets. Create a new encrypted EBS volume and attach it to a running instance. Name it /dev/xvdf
. Inside the instance, use lsblk
to view the disk devices and their mount points. For new volumes, create an ext4 file system (this will format the volume and delete the existing data!):
sudo mkfs -t ext4 /dev/xvdf
sudo file -s /dev/xvdf
To mount it
sudo mount /dev/xvdf ~/projects
To transfer, for example the data
folder, from local disk to EC2
scp -r -i <path/to/key.pem> data/. ubuntu@<DNS>:projects/mclass-sky/projects/peerjcs16/data/
To initialise a volume, say /dev/xvda
, that is restored from a snapshot (for faster reads):
sudo fio --filename=/dev/xvda --rw=randread --bs=128k --iodepth=32 --ioengine=libaio --direct=1 --name=volume-initialize