-
Notifications
You must be signed in to change notification settings - Fork 1
TEPs I Code Structure
Chenchong Charles Zhu edited this page Feb 14, 2020
·
1 revision
All the executable codes are in KCOUNT/codes
(not sure why, maybe this is an oversight). When Estimate AADTs
is pushed
in App2.mlapp, main_combined_2.m is called.
-
KCOUNT/codes/main_combined_2.m
- functionmain_combined_2
, which sequentially calls PRTCS, KCOUNT, and LocalSVR. Usespos_neg_sum.m
in Emission to combine all outputs to a single AADT file, and also calltoronto_sim.m
to generate vehicle speeds using an ANN if the user chooses so.
-
accum_data.m
: combinesshortest_distance.zip
data with AADTs and landuse data fromAADT_Landuse_pop_lane_speed2_<YEAR>_.xlsx
. -
data_prep_kridging.m
: combines AADT and landuse data prepared bydata_prepar_locals
with shortest network distances fromshortest_distance.zip
. Returns set of output files used for regression kridging by KCOUNT. -
data_prepar_locals.m
: combines AADT and raw landuse data together intoAADT_Landuse_pop_lane_speed2_<YEAR>_.xlsx
(intermediate file for kridging) andAADT_Landuse_pop_lane_speed3_<YEAR>_.xlsx
(for LocalSVR). -
DoMPTC.m
: nearly identical toDoMSTTC.m
, except that it estimates the MADT and AADT of a permanent count station using the nearest four other permanent count stations. This is done for validation purposes. -
DoMSTTC.m
: for a given short term count station, find the five nearest permanent count stations (Euclidean distance) usingnearestneighbour.m
. For each permanent station, get day-to-month conversion coefficients and calculate day-to-year conversion coefficients from the same day of week (if possible) and closest year as the short term data. If the closest year is not the same year, inflate any relevant stats using growth rates calculated inPTCYEAR.m
. Finally, estimate MADT and AADT for short-term station using inflated conversion coefficients. Returns estimates for all five nearest stations. -
main_DoM_new_2012.m
: main script for PRTCS. CallsSTTC_estimate3
to get 15-minute count data, and determine day-to-month (DoM) conversion factor and AADT for permanent count stations. CallsPTCYEAR
(which callsPTCWEEK
if there's less than a year of data) to estimate growth rates. Estimates AADT for short term stations usingDoMSTTC
, while validating results usingDoMPTC
andvalidation
. Postprocesses estimated AADTs (lines 197-266; not sure how it works yet). Finally, writes data out in preparation for KCOUNT's regression kridging usingdata_prepar_locals
and eithershort_krig
ordata_prep_kridging
(depending on whether intermediate data is already stored). -
nearestneighbour.m
: for each point inP
, determine closest point inX
using Delauney triangulation. -
PTCWEEK.m
: uses a linear best fit to determine the weekly rate of growth in traffic counts. Currently only called as a backup method inPTCYEAR.m
. -
PTCYEAR.m
: for multi-year counts, estimates the rate of growth in traffic counts under the assumption of an exponential growth rate (see World Bank methodology). -
KCOUNT/codes/SEL_ratio.m
: for a given short-term station and candidate nearby permanent count station, return the means of relevant averaged daily traffic values. Tries to pick the same day of week and closest year (week is an equality, so it's a stricter criterion), but if the same day of week doesn't exist, use closest year and any day of the week. -
STTC_estimate3.m
: reads in 15-minute traffic count data (PRTCS\negative\15min_counts_<YEAR>.zip
). For each file (observations of one direction of a centreline segment over 1 year), embed data within a 35040 x 5 matrixM
(96 fifteen min bins in one day x 365 days = 35040
). The columns ofM
are year, month, 15-minute count, link ID, a temporary ID which is 1 for non-permanent stations and 0 for permanent ones, and day of the year (the link and temporary IDs are the same for eachM
, and most 15-minute count elements are NaNs due to missing data). Then take the monthly sum and daily sum averaged over all days of the month with data fromM
(insert either 0 or NaN if a month has no data). If a station has a). data for every month, b). data for at least 75% of of all 35040 bins in the year, c). doesn't have "re" in the filename, and d). is not part of thetest_id_negative
ortest_id_positive
arrays, it is a permanent site, and the script also calculates the monthly averaged daily total count, and divides this by the monthly averaged daily total for each day of the week (i.e. the average of all Mondays, all Tuesdays, etc.) to get a set of 7 Day-to-Month (DoM) conversion factors. The AADT is also calculated for permanent count stations. -
short_krig.m
: short version ofdata_prep_kridging
when cached results already exist. -
validation.m
: validation script for permanent count station AADT/MADT estimates made usingDoMPTC.m
.
-
output_PRTCS<YEAR><POS_OR_NEG>
:-
all_AADT_<YEAR>.txt
: station ID, AADT (station ID is centreline ID) -
Perm_AADT_<YEAR>.txt
: station ID, AADT of permanent stations -
Temp_AADT_<YEAR>.txt
: station ID, best fit (minimum MSE) AADT, best fit day-to-year coefficient D_ij, year from TTC, calibration factor, growth rate, upstream permanent site ID, upstream permanent site AADT of temporary stations. This may actually have all the station IDs, not just temporary ones. -
Temp_Dij_<YEAR>.txt
: station ID and day-to-year coefficients D_ij of temporary stations -
validation_<YEAR>.txt
: real and best fit AADTs of permanent count stations
-
-
output_for_local<YEAR><POS_OR_NEG>
:-
AADT_Landuse_pop_lane_speed3_<YEAR>.csv
: temporary station ID and associated land use data (AADT, population within 300 m buffer, lane number, speed limit, employment land use, commercial land use).
-
-
output_for_kridging<YEAR><POS_OR_NEG>
:-
data_for_fit<YEAR>.txt
: land use data for all unique destination centrelines fromresmat<YEAR>.txt
. -
data_for_pred<YEAR>.txt
: land use data for all unique destination centrelines fromresmat<YEAR>.txt
that also have known AADTs > 2000 counts/year. -
resmat<YEAR>.txt
: centreline-to-centreline network distances, and destination centreline land use, population, speed limit, etc. properties, for all centreline IDs included inshortest_path.zip
. These are around 90% major/minor arterials or collectors. -
ids_pred_<YEAR>.txt
: centreline IDs fromdata_for_pred<YEAR>.txt
. -
ids_obs_<YEAR>.txt
: centreline IDs fromdata_for_fit<YEAR>.txt
.
-
-
BCtrans.m
: performs exponentiation of V by lam, unless lam = 0 in which case take the natural log. -
categorieskr_general.m
: performs an OLS usingfitlm
on independent (non-spatial) variables. Currently fed speed limit, population, lane number, commercial, employment, government, industry, residential, distance in km (not sure what this is) bymain_2_2012_min
. -
CressieHawks.m
: Cressie-Hawkins estimator for variance matrix. -
iter_for_bins.m
: one loop of an iterative process to determine optimal variogram coefficients. UsesCressieHawks
for initial values,variogramfit2
to fit the variogram, andvariogram
to determine values. -
invBCtrans.m
: performs inverse exponentiation of V by lam, unless lam = 0 in which case take the exponential. Inverse ofBCtrans.m
. -
main_2_2012_min.m
: main script forKCOUNT
. UsespreProcDist_new
to determine distance between "observed" IDs (those with AADTs) and "prediction" IDs (these are saved to disk). Determines maximum and minimum counts for each road type (not sure how this works). Performs OLS on data fromKCOUNT/RMsma_2km_<POS/NEG>/data_for_fit<YEAR>.txt
(road segments with AADT > 2000) andKCOUNT/RMsma_2km_<POS/NEG>/data_for_pred<YEAR>.txt
(all (valid) road segments) usingcategorieskr_general.m
. Uses residuals from that fit to get estimators for Feasible Generalized Least Squares (see also this). Uses FGLS estimators in an iterative variogram determination scheme (each loop runsiter_for_bins.m
), then passes optimal values to a final -
preProcDist_new.m
: reads "pred" and "obs" road IDs (copied fromoutput_for_kridging
) in, then loops through all "obs" IDs to determine network distances between them and all "pred" values (with zero-distance links removed, since those are repeats). Saves these asdistdatak3<YEAR>.mat
distdatap3<YEAR>.mat
binaries inKCOUNT/RMsma_2km_<POS/NEG>
. -
preProcDist_sub.m
: subloop ofpreProcDist_new.m
. For the centreline segment of a given station with AADTs, finds all rows whose starting point (first column inKCOUNT/RMsma_2km_<POS/NEG>/distance_short<YEAR>.csv
, respectively) is the segment and ending point (second column) is one of the stations we'd like to predict. -
variogram.m
: returns variogram for various model types. -
variogramfit2.m
: fit a theoretical variogram to an experimental one. From Wolfgang Schwanghart (BSD3).
-
outputs<POS_OR_NEG><YEAR>
:-
beta
: kridging fit coefficients. -
Kcounts_<KRIDGING_METHOD>_simulated<YEAR>.txt
: kridging outputs for roads that also have observed AADTs. -
Kcounts_<KRIDGING_METHOD>_predicted<YEAR>.txt
: kridging outputs for all roads, regardless of whether they have observed AADTs. -
weights
: kridging weights (Cdp' * Cdd
). -
Xdummy
: independent variables for roads that have observed AADTs. -
Xpdummy
: independent variables for all roads. -
Y
: dependent variables for roads that have observed AADTs.
-
-
build_infile_SVR.m
: control script that generateslocal_SVR_2011.R
. Callsdata_make_local.m
to map weighted grid AADTs to local road IDs, reading inMajor_roads_count_length_<POS_OR_NEG>\input_grid_aadt.csv
,input_localrd_aadt.csv
andAADT_Landuse_pop_lane_speed3_<YEAR>.csv
in the process. Reads inmain_inputs_<POS_OR_NEG>\data_300m_fit_<YEAR>.csv
(training),data_300m_pred_<YEAR>.csv
(validation), andall_<YEAR>_pred.csv
(prediction), appends weighted information fromdata_make_local
, raw AADTs fromAADT_Landuse_pop_lane_speed3_<YEAR>.csv
, road lengths fromMajor_AADT_roadlength.xlsx
, and populations frompredictors_300m_sample.xlsx
, then writes them back out as..._1.csv
files. Runs script usingRscript <FULL_PATH>/local_SVR_<YEAR>.R
. Finally moves R script's outputs tooutput_<YEAR>_<POS_OR_NEG>
folder. -
csvwrite_with_headers.m
: dumps CSV files and includes column labels. -
data_make_local.m
: reads inMajor_roads_count_length_<POS_OR_NEG>\input_grid_aadt.csv
,input_localrd_aadt.csv
,AADT_Landuse_pop_lane_speed3_<YEAR>.csv
-
local_SVR_<YEAR>.R
: auto-generated fromlocal_SVR_2011.R.bak
to read in files prepared bybuild_infile_SVR.m
. First performs a Box-Cox transformation to normalize data, then performs an SVM regression on the fit data. Fit residuals are then calculated, and predictions made forpred
data. Two SVRs are performed - a naive one and one with manually-tuned hyperparameters.
-
outputs_<YEAR>_<POS_OR_NEG>/mydata_<YEAR>
: centreline ID and AADT estimate
-
pos_neg_sum.m
: attempts to findKCOUNT/outputs<POS_OR_NEG><YEAR>/Kcounts_spherical_predicted<YEAR>.txt
andLocalSVR/outputs_<YEAR>_<POS_OR_NEG>\mydata_<YEAR>.txt
, and raises an error if both directions are not found for each. Also reads in centreline directions fromEmission/inputs/direction3.csv
. Then, loops through values (KCOUNT output first, then LSVR), and stores AADT values for each centreline link in both directions, as well as the "total AADT", the sum from both directions. In cases where only one direction exists indirection3.csv
, store a NaN in place of an estimate for the other direction. Finally, combine local and major outputs, and write all non-nan values out to files. Does some other stuff to prepare for velocity estimation, but this is currently beyond the scope of our work. -
LocalSVR/codes/replace_obs_local.m
: replace LocalSVR values with corresponding ones fromPRTCS\output_for_local<YEAR><POS_OR_NEG>\AADT_Landuse_pop_lane_speed3_<YEAR>.csv
.
-
Emission/inputs/id_rem_for_pos.csv
: Major road (KCOUNT) centreline IDs where positive directions don't exist indirection3.csv
, and so should be dropped. -
Emission/inputs/id_rem_for_neg.csv
: Major road (KCOUNT) centreline IDs where negative directions don't exist indirection3.csv
, and so should be dropped. -
Emission/inputs/id_rem_for_pos_loc.csv
: local road (LSVR) centreline IDs where positive directions don't exist indirection3.csv
, and so should be dropped. -
Emission/inputs/id_rem_for_neg_loc.csv
: local road (LSVR) centreline IDs where negative directions don't exist indirection3.csv
, and so should be dropped. -
Emission/inputs/Total_aadt.csv
: centreline ID and total AADT from both major and local roads. -
Emission/outputs/aadts_two_directions.csv
: centreline ID and AADT in either direction from both major and local roads. -
aadt_output_files/final_aadt_<YEAR>.csv
: centreline ID, AADT and (non-symmetric) AADT confidence interval in either direction from both major and local roads, ordered by ID, then direction. -
input_for_toronto_sim4_<POS_OR_NEG>.csv
: centreline ID, road classification (ID and string label) and AADT for vehicle speeds estimator.