Skip to content

Pipeline for automated data cleaning and splicing from timestamped multi-sensor data, with automated sensor-wise plotting, and saving all generated figures as PNGs into a graphs folder

Notifications You must be signed in to change notification settings

hohilwik/automated-data-cleaning-and-splicing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About

This is for automation for data cleaning and splicing of data streams coming from a multi-sensor array. I use init.c to set up the directory structure, ls_init.c to set up ls.txt that is used to get information about the files. Then data_pre_proc.c goes through all the files placed inside data_raw, cleans it and places it in data_cleaned, then it creates subdirectories inside data_spliced for each file in data_cleaned, and outputs the spliced data to files named sensor1.txt, sensor2.txt, etc.

How to use

  • Run startup.sh with "./start.sh", this will compile the required files and create their executables
  • Run ./init to initialize the directory structure
  • Place your raw data inside data_raw
  • Run ./ls_init, this is required for the functions further used and will cause a segmentation fault if not used(because the functions will not know which files to look for)
  • Run ./proc and the data will get cleaned, spliced, and placed in appropriate directories
  • Sensor count is automatically detected and taken care of for each file
  • Use the MATLAB script graphing.m to get graphs of your data, this script will give you file-wise plots, with different subplots for each sensor, and can easily be modified to do other things, as seen in graphing_conditional.m
  • After processing, place data_raw, data_cleaned, data_spliced, graphs and ls.txt into some folder of your choice(for eg, processed_data_01), and you can process more data by following the previous steps again
  • NOTE: the format that this works for is comma-separated data with timestamps at the start, for example
2022-05-23 01:38:56.362883,133.0,20.0,77.0,105.0,35.0
2022-05-23 01:38:56.379001,133.0,24.0,77.0,102.0,42.0
2022-05-23 01:38:56.403324,133.0,24.0,77.0,102.0,42.0
2022-05-23 01:38:56.425821,133.0,24.0,77.0,102.0,42.0
2022-05-23 01:38:56.443677,134.0,29.0,77.0,101.0,43.0
2022-05-23 01:38:56.459578,134.0,29.0,77.0,101.0,43.0
2022-05-23 01:38:56.474837,134.0,29.0,77.0,101.0,43.0
2022-05-23 01:38:56.488113,136.0,29.0,77.0,99.0,50.0
  • To make it work with your data format, have your string format be:
<string1><space><string2>,<val>,<val>

where string1 and string2 don't contain any spaces, and the values also do not have any spaces between them. Keep in mind that the last value does not have a comma after it, though that can be easily changed if you know C.

Sample output of processing

Processing [baddeadlift2.txt]
Sensor count-> 5
Cleaning Data...
Splicing Data...
Processing [baddeadlift3.txt]
Sensor count-> 5
Cleaning Data...
Splicing Data...

About

Pipeline for automated data cleaning and splicing from timestamped multi-sensor data, with automated sensor-wise plotting, and saving all generated figures as PNGs into a graphs folder

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published