Skip to content
This repository has been archived by the owner on Sep 10, 2024. It is now read-only.

Commit

Permalink
New tutorials and polish (#23)
Browse files Browse the repository at this point in the history
- Add / update new tutorials formatted as Juypter notbooks
- Add "scripts to rule them all" and used the `em-core` scripts as a template
- Updated and define S3 client
- Added persistent system environment variable instructions
- README Updates
- Replaced static links to original repo with dynamic links for this repo
- Improved organization
- Various markdown linting

Co-authored-by: Ryan Earley <[email protected]>
Co-authored-by: Nadia Dimitrova <[email protected]>
Co-authored-by: imgbot[bot] <31301654+imgbot[bot]@users.noreply.github.com>
  • Loading branch information
4 people authored Sep 30, 2020
1 parent 67ea474 commit 77da4ae
Show file tree
Hide file tree
Showing 52 changed files with 3,765 additions and 80 deletions.
72 changes: 47 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,45 +1,67 @@
# Tutorials Guide
# LADI Tutorials

Tutorials for the Low Altitude Disaster Imagery (LADI) dataset. This tutorial was originally forked from a [Penn State Learning Factory](https://www.lf.psu.edu/) capstone project.

- [LADI Tutorials](#ladi-tutorials)
- [Point of Contact](#point-of-contact)
- [Initial Setup](#initial-setup)
- [Persistent System Environment Variables](#persistent-system-environment-variables)
- [Scripts](#scripts)
- [Tutorials - Accessing the Dataset](#tutorials---accessing-the-dataset)
- [Tutorials - Metadata Analysis](#tutorials---metadata-analysis)
- [Tutorials - Machine Learning](#tutorials---machine-learning)
- [Distribution Statement](#distribution-statement)

Tutorial for the Low Altitude Disaster Imagery (LADI) dataset. This tutorial was originally forked from a [Penn State Learning Factory](https://www.lf.psu.edu/) capstone project
## Point of Contact

- [Tutorials Guide](#tutorials-guide)
- [Getting Started](#getting-started)
- [Clean and Validate LADI Dataset](#clean-and-validate-ladi-dataset)
- [PyTorch Data Loading](#pytorch-data-loading)
- [Train and Test A Classifier](#train-and-test-a-classifier)
- [Fine Tuning Torchvision Models](#fine-tuning-torchvision-models)
- [Distribution Statement](#distribution-statement)
We encourage the use of the [GitHub Issues](https://guides.github.com/features/issues/) but when email is required, please contact the administrators at [[email protected]](mailto:[email protected]). As the public safety and computer vision communities adopt the dataset, a separate mailing list for development may be created.

## Getting Started
## Initial Setup

[Getting Started](./Tutorials/Get_Started.md)
This section specifies the run order and requirements for the initial setup the repository. Other repositories in this organization may be reliant upon this setup being completed.

This documentation is about installing AWS tools and configuring AWS environment to download LADI dataset and load dataset in Python locally and remotely.
### Persistent System Environment Variables

## Clean and Validate LADI Dataset
Immediately after cloning this repository, [create a persistent system environment](https://superuser.com/q/284342/44051) variable titled `LADI_DIR_TUTORIAL` with a value of the full path to this repository root directory.

[Clean and Validate LADI Dataset](./Tutorials/Clean_Validate.md)
On unix there are many ways to do this, here is an example using [`/etc/profile.d`](https://unix.stackexchange.com/a/117473). Create a new file `ladi-env.sh` using `sudo vi /etc/profile.d/ladi-env.sh` and add the command to set the variable:

This documentation is about clean the LADI dataset. For this project, we have only extracted 2000 images for training.
```bash
export LADI_DIR_TUTORIAL=PATH TO /ladi-tutorial
```

## PyTorch Data Loading
You can confirm `LADI_DIR_TUTORIAL` was set in unix by inspecting the output of `env`.

[PyTorch Data Loading](./Tutorials/Pytorch_Data_Load.md)
### Scripts

This documentation is about loading LADI dataset in PyTorch framework including examples of writing custom `Dataset`, `Transforms` and `Dataloader`.
This is a set of boilerplate scripts describing the [normalized script pattern that GitHub uses in its projects](https://github.blog/2015-06-30-scripts-to-rule-them-all/). The [GitHub Scripts To Rule Them All](https://github.com/github/scripts-to-rule-them-all) was used as a template. Refer to the [script directory README](./script/README.md) for more details.

You will need to run these scripts in this order to download package dependencies and download all of the necessary data to get you started.

## Tutorials - Accessing the Dataset

## Train and Test A Classifier
A set of tutorials focused on installing AWS tools and configuring AWS environment to download LADI dataset and load dataset in Python locally and remotely. There is also a short tutorial on how to clean and validate data.

[Train and Test A Classifier](./Tutorials/Train_Test_Classifier.md)
- [Getting Started](./tutorials/Get_Started.md)
- [Clean and Validate LADI Dataset](./tutorials/Clean_Validate.md)

This documentation is about training and testing a classifier model using Convolutional Neural Network (CNN) from scratch.
## Tutorials - Metadata Analysis

## Fine Tuning Torchvision Models
A set of tutorials are Jupyter Python 3.X notebooks that demonstrate on how to perform geospatial analysis by enhancing the LADI metadata with third party GIS information. One tutorial identifies the number of images taken within an administrative boundary (e.g. USA states) and assigns each state a color based on the number of images taken. The other tutorial filters images based on an specific annotation and performs various geospatial measurements on this subset.

[Fine Tuning Torchvision Models](./Tutorials/Fine_Tune_Torchvision_Models.md)
- [ISO-3166-2 Administrative Boundaries](./tutorials/Geospatial-Hurricane-Analysis.ipynb)
- [Geospatial Hurricane Analysis](./tutorials/Geospatial-Hurricane-Analysis.ipynb)

This documentation is about training and testing a classifier model using pre-trained ResNet and AlexNet.
### Tutorials - Machine Learning

These tutorials focus on how to training and testing a classifier model using Convolutional Neural Network (CNN) from scratch or using pre-trained ResNet and AlexNet.

- [PyTorch Data Loading](./tutorials/Pytorch_Data_Load.md)
- [Train and Test A Classifier](./tutorials/Train_Test_Classifier.md)
- [Fine Tuning Torchvision Models](./tutorials/Fine_Tune_Torchvision_Models.md)

This documentation is about loading LADI dataset in PyTorch framework including examples of writing custom `Dataset`, `Transforms` and `Dataloader`.

## Distribution Statement

[BSD -Clause License](https://github.com/LADI-Dataset/ladi-tutorial/blob/master/LICENSE)
[BSD 3-Clause License](LICENSE)
7 changes: 0 additions & 7 deletions Tutorials/Model Script/README.md

This file was deleted.

19 changes: 19 additions & 0 deletions data/Census-AHS/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Census-AHS
The Census' American Housing Survey data provides information about the physical costs and conditions of homes, characteristics of the people living in these houses, and characteristics for disaster response of more than 60,000 Americans

## Download Instructions

### Script (Recommended)

[`script/setup.sh`](../../script/setup.sh) is used to set up the project in an initial state. It will download and extract the data.

### Manual

Although not recommended, the data can be downloaded manually:

1. Go to the US-Census National public use file page: [Census](https://www.census.gov/programs-surveys/ahs/data/2017/ahs-2017-public-use-file--puf-/ahs-2017-national-public-use-file--puf-.html)
2. download the AHS 2017 National PUF v3.0 CSV (or the latest version)

## Distribution Statement

[BSD -Clause License](https://github.com/LADI-Dataset/ladi-tutorial/blob/master/LICENSE)
20 changes: 20 additions & 0 deletions data/Census-CBSA/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Census-CBSA

Census-CBSA defines US Metropolitan and Micropolitan statistical areas. This Census data is provided as a geospatial-enabled file format.

## Download Instructions

### Script (Recommended)

[`script/setup.sh`](../../script/setup.sh) is used to set up the project in an initial state. It will download and extract the data.

### Manual

Although not recommended, the data can be downloaded manually:

1. Go to the US-Census Bureau open data website: [US-CBSA](https://catalog.data.gov/dataset/tiger-line-shapefile-2019-nation-u-s-current-metropolitan-statistical-area-micropolitan-statist)
2. download the latest version of the Shapefile Zip File

## Distribution Statement

[BSD -Clause License](https://github.com/LADI-Dataset/ladi-tutorial/blob/master/LICENSE)
21 changes: 21 additions & 0 deletions data/Census-State/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Census-State

Census-State defines statistical areas for all US-States. This Census data is provided as a geospatial-enabled file format.


## Download Instructions

### Script (Recommended)

[`script/setup.sh`](../../script/setup.sh) is used to set up the project in an initial state. It will download and extract the data.

### Manual

Although not recommended, the data can be downloaded manually:

1. Go to the US-Census Bureau open data website: [US-State](https://catalog.data.gov/dataset/tiger-line-shapefile-2017-nation-u-s-current-state-and-equivalent-national)
2. download the latest version of the Shapefile Zip File

## Distribution Statement

[BSD -Clause License](https://github.com/LADI-Dataset/ladi-tutorial/blob/master/LICENSE)
16 changes: 16 additions & 0 deletions data/FAA-Airports/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# FAA Airports

Airport defines area on land or water intended to be used either wholly or in part for the arrival; departure and surface movement of aircraft/helicopters. This airport data is provided as a vector geospatial-enabled file format.

## Download Instructions

### Script (Recommended)

[`script/setup.sh`](../../script/setup.sh) is used to set up the project in an initial state. It will download and extract the data.

### Manual

Although not recommended, the data can be downloaded manually:

1. Go to the FAA open data website: [Airports](https://ais-faa.opendata.arcgis.com/datasets/e747ab91a11045e8b3f8a3efd093d3b5_0)
2. Select shapefile from the Download drop down
17 changes: 17 additions & 0 deletions data/Natural-Earth/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Natural-Earth
The Natural Earth file is a comprehensive map of the world and it's administrative boundaries (states, fips, countries). This map is provided as a geospatial-enabled file format.

### Script (Recommended)

[`script/setup.sh`](../../script/setup.sh) is used to set up the project in an initial state. It will download and extract the data.

### Manual

Although not recommended, the data can be downloaded manually:

1. Go to the Natural Earth download page: [Natural-Earth](https://www.naturalearthdata.com/downloads/10m-cultural-vectors/10m-admin-1-states-provinces/)
2. Download states and provinces version 4.1.0 (or the latest version)

## Distribution Statement

[BSD -Clause License](https://github.com/LADI-Dataset/ladi-tutorial/blob/master/LICENSE)
File renamed without changes.
4 changes: 2 additions & 2 deletions Tutorials/README.md → data/metadata/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Tutorials
# Data

Default directory for tutorials markdowns.
Default directory for sample data used by tutorials.

## Distribution Statement

Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
18 changes: 18 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
boto3 == 1.9.66
contextily == 1.0.0
contextlib2 == 0.6.0.post1
fiona == 1.8.13
jupyter >= 1.0.0
jupyter-client >= 6.1.6
jupyter-console >= 6.1.0
jupyter-core >= 4.6.3
jupyterlab >= 2.2.5
jupyterlab-server >= 1.2.0
geopandas == 0.6.1
geopy == 2.0.0
matplotlib == 3.3.1
numpy == 1.19.1
pandas == 1.1.0
path == 15.0.0
pyproj == 2.6.1.post1
shapely == 1.7.0
156 changes: 156 additions & 0 deletions script/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# Scripts

This is a set of boilerplate scripts describing the [normalized script pattern that GitHub uses in its projects](https://github.blog/2015-06-30-scripts-to-rule-them-all/). The [GitHub Scripts To Rule Them All
](https://github.com/github/scripts-to-rule-them-all) was used as a template. They were tested using Ubuntu 18.04.3 LTS on Windows 10.

- [Scripts](#scripts)
- [`LADI_DIR_TUTORIAL` and Execution](#ladi_dir_tutorial-and-execution)
- [Dependencies](#dependencies)
- [Linux Shell](#linux-shell)
- [Proxy and Internet Access](#proxy-and-internet-access)
- [Superuser Access](#superuser-access)
- [The Scripts](#the-scripts)
- [script/bootstrap](#scriptbootstrap)
- [Packages](#packages)
- [script/setup](#scriptsetup)
- [Data](#data)

## `LADI_DIR_TUTORIAL` and Execution

These scripts assume that `LADI_DIR_TUTORIAL` has been set. Refer to the repository root [README](../README.md) for instructions.

## Dependencies

### Linux Shell

The scripts need to be run in a Linux shell. For Windows 10 users, you can use [Ubuntu on Windows](https://tutorials.ubuntu.com/tutorial/tutorial-ubuntu-on-windows#0). Specifically for Windows users, system drive and other connected drives are exposed in the `/mnt/` directory. For example, you can access the Windows C: drive via `cd /mnt/c`.

If you modify these scripts, please follow the [convention guide](https://github.com/LADI-Dataset/ladi-overview/blob/master/CONTRIBUTING.md#convention-guide) that specifies an end of line character of `LF (\n)`. If the end of line character is changed to `CRLF (\r)`, you will get an error like this:

### Proxy and Internet Access

The scripts will download data using [`curl`](https://curl.haxx.se/docs/manpage.html) and [`wget`](https://manpages.ubuntu.com/manpages/trusty/man1/wget.1.html), which depending on your security policy may require a proxy.

The scripts assume that the `http_proxy` and `https_proxy` linux environments variables have been set.

```bash
export http_proxy=proxy.mycompany:port
export https_proxy=proxy.mycompany:port
```

You may also need to [configure git to use a proxy](https://stackoverflow.com/q/16067534). This information is stored in `.gitconfig`, for example:

```git
[http]
proxy = http://proxy.mycompany:port
[https]
proxy = http://proxy.mycompany:port
```

### Superuser Access

Depending on your security policy, you may need to run some scripts as a superuser or another user. These scripts have been tested using [`sudo`](https://manpages.ubuntu.com/manpages/disco/en/man8/sudo.8.html). Depending on how you set up the system variable, `LADI_DIR_TUTORIAL` you may need to call [sudo with the `-E` flag](https://stackoverflow.com/a/8633575/363829), preserve env.

If running without administrator or sudo access, try running these scripts using `bash`, such as

```bash
bash ./setup.sh
```

## The Scripts

Each of these scripts is responsible for a unit of work. This way they can be called from other scripts.

This not only cleans up a lot of duplicated effort, it means contributors can do the things they need to do, without having an extensive fundamental knowledge of how the project works. Lowering friction like this is key to faster and happier contributions.

The following is a list of scripts and their primary responsibilities.

### script/bootstrap

[`script/bootstrap`][bootstrap] is used solely for fulfilling dependencies of the project, such as packages, software versions, and git submodules. The goal is to make sure all required dependencies are installed. This script should be run before
[`script/setup`][setup].

#### Packages

Using [`apt`](https://help.ubuntu.com/lts/serverguide/apt.html), the following linux packages are installed:

| Package | Use |
| :-------------| :-- |
`unzip` | extracting zip archives

The LADI team has not knowingly modified any of these packages. Any modifications to these packages shall be in compliance with their respective license and outside the scope of this repository.

### script/setup

[`script/setup`][setup] is used to set up a project in an initial state. This is typically run after an initial clone, or, to reset the project back to its initial state. This is also useful for ensuring that your bootstrapping actually works well.

#### Data

Commonly used datasets are downloaded by [`script/setup`][setup]. Refer to the [data directory README](../data/README.md) for more details.

<!-- NOT YET IMPLEMENTED BUT COMMENTED FOR FUTURE REFERENCE
### script/update
[`script/update`][update] is used to update the project after a fresh pull.
If you have not worked on the project for a while, running [`script/update`][update] after
a pull will ensure that everything inside the project is up to date and ready to work.
Typically, [`script/bootstrap`][bootstrap] is run inside this script. This is also a good
opportunity to run database migrations or any other things required to get the
state of the app into shape for the current version that is checked out.
### script/server
[`script/server`][server] is used to start the application.
For a web application, this might start up any extra processes that the
application requires to run in addition to itself.
[`script/update`][update] should be called ahead of any application booting to ensure that
the application is up to date and can run appropriately.
### script/test
[`script/test`][test] is used to run the test suite of the application.
A good pattern to support is having an optional argument that is a file path.
This allows you to support running single tests.
Linting (i.e. rubocop, jshint, pmd, etc.) can also be considered a form of testing. These tend to run faster than tests, so put them towards the beginning of a [`script/test`][test] so it fails faster if there's a linting problem.
[`script/test`][test] should be called from [`script/cibuild`][cibuild], so it should handle
setting up the application appropriately based on the environment. For example,
if called in a development environment, it should probably call [`script/update`][update]
to always ensure that the application is up to date. If called from
[`script/cibuild`][cibuild], it should probably reset the application to a clean state.
### script/cibuild
[`script/cibuild`][cibuild] is used for your continuous integration server.
This script is typically only called from your CI server.
You should set up any specific things for your environment here before your tests
are run. Your test are run simply by calling [`script/test`][test].
### script/console
[`script/console`][console] is used to open a console for your application.
A good pattern to support is having an optional argument that is an environment
name, so you can connect to that environment's console.
You should configure and run anything that needs to happen to open a console for
the requested environment.
-->

<!-- Relative Links -->
[bootstrap]: bootstrap.sh
[setup]: setup.sh
[update]: update.sh
[server]: server.sh
[test]: test.sh
[cibuild]: cibuild.sh
[console]: console.sh
Loading

0 comments on commit 77da4ae

Please sign in to comment.