Skip to content

TEPs I Code Structure

Chenchong Charles Zhu edited this page Feb 14, 2020 · 1 revision

All the executable codes are in KCOUNT/codes (not sure why, maybe this is an oversight). When Estimate AADTs is pushed in App2.mlapp, main_combined_2.m is called.

TEPs-I Flow diagram

Main Script

  • KCOUNT/codes/main_combined_2.m - function main_combined_2, which sequentially calls PRTCS, KCOUNT, and LocalSVR. Uses pos_neg_sum.m in Emission to combine all outputs to a single AADT file, and also call toronto_sim.m to generate vehicle speeds using an ANN if the user chooses so.

PRTCS/codes/

  • accum_data.m: combines shortest_distance.zip data with AADTs and landuse data from AADT_Landuse_pop_lane_speed2_<YEAR>_.xlsx.
  • data_prep_kridging.m: combines AADT and landuse data prepared by data_prepar_locals with shortest network distances from shortest_distance.zip. Returns set of output files used for regression kridging by KCOUNT.
  • data_prepar_locals.m: combines AADT and raw landuse data together into AADT_Landuse_pop_lane_speed2_<YEAR>_.xlsx (intermediate file for kridging) and AADT_Landuse_pop_lane_speed3_<YEAR>_.xlsx (for LocalSVR).
  • DoMPTC.m: nearly identical to DoMSTTC.m, except that it estimates the MADT and AADT of a permanent count station using the nearest four other permanent count stations. This is done for validation purposes.
  • DoMSTTC.m: for a given short term count station, find the five nearest permanent count stations (Euclidean distance) using nearestneighbour.m. For each permanent station, get day-to-month conversion coefficients and calculate day-to-year conversion coefficients from the same day of week (if possible) and closest year as the short term data. If the closest year is not the same year, inflate any relevant stats using growth rates calculated in PTCYEAR.m. Finally, estimate MADT and AADT for short-term station using inflated conversion coefficients. Returns estimates for all five nearest stations.
  • main_DoM_new_2012.m: main script for PRTCS. Calls STTC_estimate3 to get 15-minute count data, and determine day-to-month (DoM) conversion factor and AADT for permanent count stations. Calls PTCYEAR (which calls PTCWEEK if there's less than a year of data) to estimate growth rates. Estimates AADT for short term stations using DoMSTTC, while validating results using DoMPTC and validation. Postprocesses estimated AADTs (lines 197-266; not sure how it works yet). Finally, writes data out in preparation for KCOUNT's regression kridging using data_prepar_locals and either short_krig or data_prep_kridging (depending on whether intermediate data is already stored).
  • nearestneighbour.m: for each point in P, determine closest point in X using Delauney triangulation.
  • PTCWEEK.m: uses a linear best fit to determine the weekly rate of growth in traffic counts. Currently only called as a backup method in PTCYEAR.m.
  • PTCYEAR.m: for multi-year counts, estimates the rate of growth in traffic counts under the assumption of an exponential growth rate (see World Bank methodology).
  • KCOUNT/codes/SEL_ratio.m: for a given short-term station and candidate nearby permanent count station, return the means of relevant averaged daily traffic values. Tries to pick the same day of week and closest year (week is an equality, so it's a stricter criterion), but if the same day of week doesn't exist, use closest year and any day of the week.
  • STTC_estimate3.m: reads in 15-minute traffic count data (PRTCS\negative\15min_counts_<YEAR>.zip). For each file (observations of one direction of a centreline segment over 1 year), embed data within a 35040 x 5 matrix M (96 fifteen min bins in one day x 365 days = 35040). The columns of M are year, month, 15-minute count, link ID, a temporary ID which is 1 for non-permanent stations and 0 for permanent ones, and day of the year (the link and temporary IDs are the same for each M, and most 15-minute count elements are NaNs due to missing data). Then take the monthly sum and daily sum averaged over all days of the month with data from M (insert either 0 or NaN if a month has no data). If a station has a). data for every month, b). data for at least 75% of of all 35040 bins in the year, c). doesn't have "re" in the filename, and d). is not part of the test_id_negative or test_id_positive arrays, it is a permanent site, and the script also calculates the monthly averaged daily total count, and divides this by the monthly averaged daily total for each day of the week (i.e. the average of all Mondays, all Tuesdays, etc.) to get a set of 7 Day-to-Month (DoM) conversion factors. The AADT is also calculated for permanent count stations.
  • short_krig.m: short version of data_prep_kridging when cached results already exist.
  • validation.m: validation script for permanent count station AADT/MADT estimates made using DoMPTC.m.

Outputs

  • output_PRTCS<YEAR><POS_OR_NEG>:

    • all_AADT_<YEAR>.txt: station ID, AADT (station ID is centreline ID)
    • Perm_AADT_<YEAR>.txt: station ID, AADT of permanent stations
    • Temp_AADT_<YEAR>.txt: station ID, best fit (minimum MSE) AADT, best fit day-to-year coefficient D_ij, year from TTC, calibration factor, growth rate, upstream permanent site ID, upstream permanent site AADT of temporary stations. This may actually have all the station IDs, not just temporary ones.
    • Temp_Dij_<YEAR>.txt: station ID and day-to-year coefficients D_ij of temporary stations
    • validation_<YEAR>.txt: real and best fit AADTs of permanent count stations
  • output_for_local<YEAR><POS_OR_NEG>:

    • AADT_Landuse_pop_lane_speed3_<YEAR>.csv: temporary station ID and associated land use data (AADT, population within 300 m buffer, lane number, speed limit, employment land use, commercial land use).
  • output_for_kridging<YEAR><POS_OR_NEG>:

    • data_for_fit<YEAR>.txt: land use data for all unique destination centrelines from resmat<YEAR>.txt.
    • data_for_pred<YEAR>.txt: land use data for all unique destination centrelines from resmat<YEAR>.txt that also have known AADTs > 2000 counts/year.
    • resmat<YEAR>.txt: centreline-to-centreline network distances, and destination centreline land use, population, speed limit, etc. properties, for all centreline IDs included in shortest_path.zip. These are around 90% major/minor arterials or collectors.
    • ids_pred_<YEAR>.txt: centreline IDs from data_for_pred<YEAR>.txt.
    • ids_obs_<YEAR>.txt: centreline IDs from data_for_fit<YEAR>.txt.

KCOUNT/codes/

  • BCtrans.m: performs exponentiation of V by lam, unless lam = 0 in which case take the natural log.
  • categorieskr_general.m: performs an OLS using fitlm on independent (non-spatial) variables. Currently fed speed limit, population, lane number, commercial, employment, government, industry, residential, distance in km (not sure what this is) by main_2_2012_min.
  • CressieHawks.m: Cressie-Hawkins estimator for variance matrix.
  • iter_for_bins.m: one loop of an iterative process to determine optimal variogram coefficients. Uses CressieHawks for initial values, variogramfit2 to fit the variogram, and variogram to determine values.
  • invBCtrans.m: performs inverse exponentiation of V by lam, unless lam = 0 in which case take the exponential. Inverse of BCtrans.m.
  • main_2_2012_min.m: main script for KCOUNT. Uses preProcDist_new to determine distance between "observed" IDs (those with AADTs) and "prediction" IDs (these are saved to disk). Determines maximum and minimum counts for each road type (not sure how this works). Performs OLS on data from KCOUNT/RMsma_2km_<POS/NEG>/data_for_fit<YEAR>.txt (road segments with AADT > 2000) and KCOUNT/RMsma_2km_<POS/NEG>/data_for_pred<YEAR>.txt (all (valid) road segments) using categorieskr_general.m. Uses residuals from that fit to get estimators for Feasible Generalized Least Squares (see also this). Uses FGLS estimators in an iterative variogram determination scheme (each loop runs iter_for_bins.m), then passes optimal values to a final
  • preProcDist_new.m: reads "pred" and "obs" road IDs (copied from output_for_kridging) in, then loops through all "obs" IDs to determine network distances between them and all "pred" values (with zero-distance links removed, since those are repeats). Saves these as distdatak3<YEAR>.mat distdatap3<YEAR>.mat binaries in KCOUNT/RMsma_2km_<POS/NEG>.
  • preProcDist_sub.m: subloop of preProcDist_new.m. For the centreline segment of a given station with AADTs, finds all rows whose starting point (first column in KCOUNT/RMsma_2km_<POS/NEG>/distance_short<YEAR>.csv, respectively) is the segment and ending point (second column) is one of the stations we'd like to predict.
  • variogram.m: returns variogram for various model types.
  • variogramfit2.m: fit a theoretical variogram to an experimental one. From Wolfgang Schwanghart (BSD3).

Outputs

  • outputs<POS_OR_NEG><YEAR>:
    • beta: kridging fit coefficients.
    • Kcounts_<KRIDGING_METHOD>_simulated<YEAR>.txt: kridging outputs for roads that also have observed AADTs.
    • Kcounts_<KRIDGING_METHOD>_predicted<YEAR>.txt: kridging outputs for all roads, regardless of whether they have observed AADTs.
    • weights: kridging weights (Cdp' * Cdd).
    • Xdummy: independent variables for roads that have observed AADTs.
    • Xpdummy: independent variables for all roads.
    • Y: dependent variables for roads that have observed AADTs.

LocalSVR/codes/

  • build_infile_SVR.m: control script that generates local_SVR_2011.R. Calls data_make_local.m to map weighted grid AADTs to local road IDs, reading in Major_roads_count_length_<POS_OR_NEG>\input_grid_aadt.csv, input_localrd_aadt.csv and AADT_Landuse_pop_lane_speed3_<YEAR>.csv in the process. Reads in main_inputs_<POS_OR_NEG>\data_300m_fit_<YEAR>.csv (training), data_300m_pred_<YEAR>.csv (validation), and all_<YEAR>_pred.csv (prediction), appends weighted information from data_make_local, raw AADTs from AADT_Landuse_pop_lane_speed3_<YEAR>.csv, road lengths from Major_AADT_roadlength.xlsx, and populations from predictors_300m_sample.xlsx, then writes them back out as ..._1.csv files. Runs script using Rscript <FULL_PATH>/local_SVR_<YEAR>.R. Finally moves R script's outputs to output_<YEAR>_<POS_OR_NEG> folder.
  • csvwrite_with_headers.m: dumps CSV files and includes column labels.
  • data_make_local.m: reads in Major_roads_count_length_<POS_OR_NEG>\input_grid_aadt.csv, input_localrd_aadt.csv, AADT_Landuse_pop_lane_speed3_<YEAR>.csv
  • local_SVR_<YEAR>.R: auto-generated from local_SVR_2011.R.bak to read in files prepared by build_infile_SVR.m. First performs a Box-Cox transformation to normalize data, then performs an SVM regression on the fit data. Fit residuals are then calculated, and predictions made for pred data. Two SVRs are performed - a naive one and one with manually-tuned hyperparameters.

Outputs

  • outputs_<YEAR>_<POS_OR_NEG>/mydata_<YEAR>: centreline ID and AADT estimate

Emission/codes

  • pos_neg_sum.m: attempts to find KCOUNT/outputs<POS_OR_NEG><YEAR>/Kcounts_spherical_predicted<YEAR>.txt and LocalSVR/outputs_<YEAR>_<POS_OR_NEG>\mydata_<YEAR>.txt, and raises an error if both directions are not found for each. Also reads in centreline directions from Emission/inputs/direction3.csv. Then, loops through values (KCOUNT output first, then LSVR), and stores AADT values for each centreline link in both directions, as well as the "total AADT", the sum from both directions. In cases where only one direction exists in direction3.csv, store a NaN in place of an estimate for the other direction. Finally, combine local and major outputs, and write all non-nan values out to files. Does some other stuff to prepare for velocity estimation, but this is currently beyond the scope of our work.
  • LocalSVR/codes/replace_obs_local.m: replace LocalSVR values with corresponding ones from PRTCS\output_for_local<YEAR><POS_OR_NEG>\AADT_Landuse_pop_lane_speed3_<YEAR>.csv.

Outputs

  • Emission/inputs/id_rem_for_pos.csv: Major road (KCOUNT) centreline IDs where positive directions don't exist in direction3.csv, and so should be dropped.
  • Emission/inputs/id_rem_for_neg.csv: Major road (KCOUNT) centreline IDs where negative directions don't exist in direction3.csv, and so should be dropped.
  • Emission/inputs/id_rem_for_pos_loc.csv: local road (LSVR) centreline IDs where positive directions don't exist in direction3.csv, and so should be dropped.
  • Emission/inputs/id_rem_for_neg_loc.csv: local road (LSVR) centreline IDs where negative directions don't exist in direction3.csv, and so should be dropped.
  • Emission/inputs/Total_aadt.csv: centreline ID and total AADT from both major and local roads.
  • Emission/outputs/aadts_two_directions.csv: centreline ID and AADT in either direction from both major and local roads.
  • aadt_output_files/final_aadt_<YEAR>.csv: centreline ID, AADT and (non-symmetric) AADT confidence interval in either direction from both major and local roads, ordered by ID, then direction.
  • input_for_toronto_sim4_<POS_OR_NEG>.csv: centreline ID, road classification (ID and string label) and AADT for vehicle speeds estimator.