The RESCU schema is now deprecated. Please refer to vds
schema.
- Use
vds.counts_15min
instead ofrescu.volumes_15min
- Identify RESCU sensors in the new schema using
vds.detector_inventory WHERE det_type = 'RESCU Detectors'
Road Emergency Services Communication Unit (RESCU) data tracks traffic volume on expressways using loop detectors.
- Keywords
- Description:
- Schema
- What do we use this for:
- Date of data collection:
- Data Ownership
- METHODOLOGICAL INFORMATION
- Data Quality/ Describe any quality-assurance procedures performed on the data
- 1- Are there known data gaps/incomplete data?
- 2- What are the gaps?
- 3- How are data gaps/incomplete data addressed?
- 4- Who is responsible for addressing data gaps/incomplete data?
- 5- Are there data quality assessment processes for the data?
- 6- How often are data quality assessment processes for the data undertaken?
- Data Maintenance
- How the data are loaded
RESCU, traffic, volume, long_term, Data and Analytics
The City's Road Emergency Services Communication Unit (RESCU) tracks and manages traffic volume on expressways and some arterial roads using various technologies. Within D&A, only the loop detector portion of the broader RESCU system is used. General information can be found here.
- Data is stored in the
rescu
schema. - The main tables are listed below. Please note that
rescu.volumes_15min
andrescu.detector_inventory
are the tables that should be used for querying.
rescu.detector_inventory
table:
This table contains details of RESCU VDS in the City. Its origin has been lost to time and it is not being automatically updated, so details may be out of date
.
Arterycode can be used used to join the data with traffic.artery_data
column_name | data_type | sample | Description |
---|---|---|---|
detector_id | text | DW0040DEL | You can quickly tell the direction via the second last letter. |
number_of_lanes | smallint | 3 | |
latitude | numeric | 43.635944 | |
longitude | numeric | -79.401186 | |
det_group | text | LAKE | "ALLEN" (Allen road), "FGG/LAKE" (Gardiner Expressway and Lakeshore ramps) , "FGG" (Gardiner Expressway), "DVP" (Don Valley Parkway), "LAKE" (Lakeshore) |
road_class | text | Major Arterial | |
primary_road | text | Lake Shore Blvd W | |
direction | character varying | E | |
offset_distance | integer | ||
offset_direction | text | W of | |
cross_road | text | BATHURST STREET | |
district | text | Toronto and East York | |
ward | text | Trinity-Spadina (20) | |
vds_type | text | inductive_loop | |
total_loops | smallint | 6 | |
sequence_number | text | LAKE173 | |
data_range_low | integer | 20000 | |
data_range_high | integer | 30000 | |
historical_count | integer | 9700 | |
arterycode | integer | 826 | Reference table for Artery Codes (internal reference for intersections and segments) |
rescu.raw_20sec
table: the table includes the raw number of counts recorded by the detector during that 20-second interval. This table only has data for 7 days, the last being 2020-04-08.
column_name | data_type | sample |
---|---|---|
datetime_bin | timestamp without time zone | 2020-04-08 00:00:00 |
detector_id | text | DE0010DEG |
lane_no | integer | 1 |
volume | integer | 0 |
occupancy | numeric | 0.0 |
speed | numeric | 0.0 |
uid | integer | 1 |
rescu.raw_15min
table: The table includes aggregated raw counts over 15 min interval. This table is used to insert rows into rescu.volumes_15min
via create-trigger-function-populate_volumes_15min.sql
.
rescu.raw_15min
column_name | data_type | sample | Description |
---|---|---|---|
dt | date | 2020-01-13 | |
raw_info | text | 0000 - 0015 de0010der -1 | Format "hhmm - hhmm detector volume". If volume is -1 that means the detector is down and the row is not inserted into volumes_15min. |
raw_uid | integer | 1 |
rescu.volumes_15min
table: This table includes processed 15 min counts.
rescu.volumes_15min
column_name | data_type | sample |
---|---|---|
detector_id | text | DE0010DER |
datetime_bin | timestamp without time zone | 2019-01-02 00:00:00 |
volume_15min | integer | 29 |
arterycode | integer | 3272 |
volume_uid | integer | 5234941 |
Analytics as fulfilled by data specialists as part of data requests. Despite data quality concerns (usually due to malfunctioning detectors) RESCU data are the only source of volume counts for highways within Toronto's jurisdiction (the Allen Expressway (As at May 2023, there is no data on the Allen Expressway since 2021-10), the Gardiner Expressway and the Don Valley Parkway).
Data specialists and research analysts. Used for data requests (mostly staff asking for volumes on highways).
-Data gaps make data unreliable for data requests. -In 2022 and 2023 (as of May) we have identified significant network wide outages that span days to weeks meaning data availability is very sparse. -There are rare opportunities to repair RESCU detectors, for example: https://www.toronto.ca/services-payments/streets-parking-transportation/road-maintenance/bridges-and-expressways/expressways/gardiner-expressway/gardiner-expressway-maintenance-program/
No. Data could be published but at this current point it is not due to data quality concerns.
Though data from RESCU detectors has been available since January 2017 and is updated daily, there are many gaps in the data and humans seeking to use this dataset should check the data availability and quality for their required dates and locations.
The Data Operations Team. When this pipeline gets upgraded there will be a TDS who owns it.
Active Traffic Management. As ITS Central is becoming of more use (as field servers are no longer working and ITS Central is the replacement).
Loop detectors installed on the ground of the road surface. Radar detectors (which function the same way as loop detectors) placed on the roadside.
Raw data is fed into Oracle every 20 seconds. These volume counts are aggregated to 15 minute bins and stored in the rescu schema of the postgres bigdata database via an airflow pipeline on a daily basis. You can read the python script here.
Raw data is recorded in 20 second increments. Data handled by D&A is has been pre-aggregated to 15 minute increments through a legacy process developed by ATM. Data is matched to arterycode (directional). The vehicle detector stations have coordinates.
Raw data is stored in ITS Central and ATM's Oracle database. Within ITS Central it is stored in Postgres database but we are unsure as to how much data is being stored in the system.
Unsure—particularly now that there are two streams (one in ITS Central and one in Oracle).
Pulling data from databases into Postgres. Windows task scheduler is responsible for pulling the data from \\tssrv7
.
Refer to [How the data are loaded](#How the data are loaded) below for more information
Yes, there are many data gaps.
Missing volumes due to detector issues. Data reports sent to D&A out of Oracle contain specific lane-level data. This may help isolate detectors that are down and help in closing gaps and validating data.
Currently within D&A and for the purpose of Data Requests, requesters tell requestees that data is not available.
Gaps are handled / addressed using a variety of strategies, depending on the intended use of the data and the nature of the gap. D&A can be contacted for gaps handled by them.
There is a daily check run in Airflow check_rescu.py
to see if a threshold of data points is met - if there are fewer than 7,000 rows, an alert is raised (almost daily as of 2023-05-10).
D&A process done daily. QA process counts the number of rows that have data with 7000 rows being the threshold.
There have also been analyses completed to check which non-ramp detectors were recording valid data in 2021 using the following methodology:
- Count the daily bins per detector. Filter out detectors with fewer than 96 15-minute bins in a 24 hour period (since they must be missing data)
- Calculate daily volume counts for the valid detectors
- Calculate the median weekday and weekend daily volume count per detector
- Group the median weekday daily volumes by corridor and graph them
- Visually determine a minimum threshold based on the graphs.
The 2021 minimum thresholds were as follows:
- Allen Expressway - Weekday: 4000 per lane
- Allen Expressway - Weekend: 3000 per lane
- Don Valley Parkway - Weekday: 15000 per lane
- Don Valley Parkway - Weekend: 10000 per lane
- Gardiner Expressway - Weekday: 10000 per lane
- Gardiner Expressway - Weekend: 10000 per lane
- Lakeshore Boulevard - Weekday: 2000 per lane
- Lakeshore Boulevard - Weekend: 2000 per lane
RESCU data were then extracted for the detectors and dates that met these thresholds.
The code used to complete these checks can be found in the date_evaluation folder.
1- Who is responsible for the status of data functionality and the overall maintenance of the data collection?
- Hardware: Traffic Plant / Installation and Maintenance (TPIM) is responsible.
- ITS Central:
- Oracle DB Server: The systems solution integrator in Active Traffic Management.
- Data Operations; and, eventually,
- ATM.
This script is located in the terminal server and takes in three variables which are start_date
, end_date
(exclusive) and path
. It is a job that runs on a daily basis in the morning and imports the latest 15-minute volume file from a particular drive into our RDS. By default, the start_date
and end_date
are the beginning and the end of the day before (today - 1 day and 12:00 today respectively). However, the user can also specify the date range from which the data should be pulled to ensure that this process can be used for other applications too. This script is automated to run daily on the terminal server to ingest a day worth of data collected from the day before. The steps are as followed:
- The script reads information from a .rpt file and then inserts the date into a table named
rescu.raw_15min
. The table has the following information
raw_uid | dt | raw_info |
---|---|---|
12852 | 2020-01-13 | 1700 - 1715 de0010deg 633 |
12853 | 2020-01-13 | 1700 - 1715 de0010der 62 |
- There is also a trigger function named
rescu.insert_rescu_volumes()
which processes the newly added data fromrescu.raw_15min
and inserts the processed data into the tablerescu.volumes_15min
. All information fromraw_info
in the raw table is then processed into the 3 columns which aredetector_id
,datetime_bin
andvolume_15min
whereasaretrycode
is taken from the tablerescu.detector_inventory
by matching them usingdetector_id
. The tablerescu.volumes_15min
has the following information
volume_uid | detector_id | datetime_bin | volume_15min | arterycode |
---|---|---|---|---|
9333459 | DE0010DEG | 2020-01-13 17:00:00 | 749 | 23984 |
9333460 | DE0010DER | 2020-01-13 17:00:00 | 80 | 3272 |
Since the terminal server does not have an internet connection, we will not get notified if the process fails. Therefore, we created an Airflow task to do that job for us. There is a dag named check_rescu
which runs at 6am everyday that checks the number of rows inserted for the day before in both tables rescu.raw_15min
and rescu.volumes_15min
. If the number of rows is 0 OR if the number of rows from the raw table is less than that in the volumes table OR if the total number of rows from the volumes table is less than 7000 (the average number of rows per day is about 20,000), a Slack notification will be sent to notify the team. The line that does exactly that is shown below.
if raw_num == 0 or raw_num < volume_num or volume_num < 7000:
raise Exception ('There is a PROBLEM here. There is no raw data OR raw_data is less than volume_15min OR volumes_15min is less than 7000 which is way too low')
When the Slack message is sent, we can run the following check to find out what exactly is wrong with the data pipeline. The Airflow dag only shows us the number of rows in the raw and volumes tables but the reason of failing may still be unclear. Therefore, this query can be used to have a better picture on what is happening with that day of data.
SELECT
TRIM(SUBSTRING(raw_info, 15, 12)) AS detector_id,
dt + LEFT(raw_info,6)::time AS datetime_bin,
nullif(TRIM(SUBSTRING(raw_info, 27, 10)),'')::int AS volume_15min
FROM rescu.raw_15min
WHERE dt = '2020-09-03'::date --change to the date that you would like to investigate
AND nullif(TRIM(SUBSTRING(raw_info, 27, 10)),'')::int < 0
If the column volume_15min
is -1
, that means that there is something wrong with the data from the source end and we have to notify the person in charge as this is not something that we can fix.