This proof-of-concept system assess vessels in a risk framework considering multiple factors to suggest the likelihood that a vessel has been engaging in illegal, unregulated, or unreported (IUU) fishing. The framework combines automatic identification system (AIS) tracking data with satellite imagery in construction of a risk framework. The framework combines several indicators including: the likelihood that a vessel has previously fished in a marine protected area (MPA) or exclusive economic zone (EEZ), and the intermittency of the vessel's AIS signal.
Vessel risk indicators may be considered individually, or weighted according to the users interest, or combined into a unified vessel risk score. This information is displayed in a front end web application, which gives governments, NGOs, retailers, and enforcement agencies the information to distinguish responsible, legitimate vessels from vessels doing IUU fishing. For example, this could be used by retailers to check the risk score of vessels that supply their tuna, or to guide enforcement agencies in choosing which areas to patrol.
Running this pipeline requires a PostgreSQL database, Anaconda Python 3.4, and R (3.4.1). Pre-processing, feature generation, and modelling were performed in Python, with risk indicators created in PSQL, and the web application made in R Shiny. A separate pipeline to run the intersection between AIS tracking data and satellite imagery can be viewed here. Instructions to run the RShiny app are here.
Before running the pipeline the following commands should be executed:
- Create database credential files:
auth/db_credentials
(seeauth/db_credentials_example
),/auth/database_alchemy.ini
(seeauth/database_alchemy.dummy
) and/auth/database_psycopg2.ini
(seeauth/database_psycopg2.dummy
) - Define environmental variables:
source environment_variables
- Create conda environment for pipeline:
conda env create -f envs/development.yml
This will download shape files of coastlines and locations of ports, to be used for vessel distance calculations in the preprocessing and feature generation steps.
python src/features/ais_distance_calculations.py
This PostgreSQL command removes nulls, unix timestamps, and coordinates beyond the range for positional and then static data.
psql -f ./sql_scripts/ais_data_cleaning.sql
psql -c 'CREATE SCHEMA IF NOT EXISTS ais_is_fishing_model;'
This script removes duplicate data and null values and generates additional features, including distance to shore and distance to port for each vessel at each time point, and whether it is nighttime or daytime.
python src/models/is_fishing/preprocess_data.py
This uses labelled training data to generate a model to predict whether a vessel is fishing at each time point. A random forest model with 450 trees was used and the model output was saved in the models
directory.
python src/models/is_fishing/train_is_fishing.py
This code reads from the PostgreSQl database in chunks and predicts for each vessel at each time point the probability that it is fishing.
python src/models/is_fishing/predict_is_fishing.py
This creates a count of the number of available rows in both AIS static and positional data for each MMSI.
psql -f ./sql_scripts/create_unique_vessel_register.sql
8. Create a score of the number of times vessel was in marine protected areas over a given time period
First, running bash ./sql_scripts/get_wdpa.sh
will download a shapefile with all the World Protected Areas, and create and upload the schema and the data to a PostgreSQL instance. Second, using the uploaded table, the marine_protected_areas_within.sql
script will create a unique vessel score to account for the presence of vessels in MPA's.
psql -f ./sql_scripts/marine_protected_areas_within.sql
Based on the existing tables, aggregate vessel MMSI indicators are created in this script:
psql -f ./sql_scripts/component_generator.sql
- World Economic Forum (https://www.weforum.org/)
- IBM (https://www.ibm.com/)
- Digital Globe, Inc. (http://www.digitalglobe.com/)
- Planet Labs, Inc. (https://www.planet.com/)
- Spire Global, Inc. (https://spire.com/)
This project was conducted as part of Data Science for Social Good (DSSG) Europe 2017 fellowship, further details of the twelve week summer fellowship can be found here: https://dssg.uchicago.edu/europe/
Data science fellows: Iván Higuera Mendieta, Shubham Tomar, and William Grimes
Project manager: Paul van der Boor
Technical mentor: Jane Zanzig
The authors would like to thank Euro Beinat and Nishan Degnarain for having the vision to pursue a data science project for detection of illegal, unreported, and unregulated fishing vessels. Further our weekly calls with Nishan Degnarain, and Steven Adler were instrumental in guiding this project to success.
We also extend our thanks to the following for their input, and helpful discussions: Dan Hammer, Gregory Stone, Kristina Boerder, Kyle Brazil, Nathan Miller, and Paul Woods.
This project is licensed under the MIT License - see the LICENSE.md file for details