Merge pull request #80 from aremazeilles/experiment_data

Datafile set definition
aremazeilles · Dec 2, 2020 · c61c0e8 · c61c0e8
2 parents 3cb0a97 + 69a5ecd
commit c61c0e8
Show file tree

Hide file tree

Showing 3 changed files with 144 additions and 34 deletions.
diff --git a/modules/ROOT/pages/data_format.adoc b/modules/ROOT/pages/data_format.adoc
@@ -51,25 +51,18 @@ The activity can be repeated as follows:
   It can be a configuration of the testbed or environmental settings (augment the slope angle), a configuration of the robotic system (support level of the exoskeleton), or an indication to the user (redo the operation with eyes closed).
   The parameter being changed must be mentioned as a <<template.adoc#table_controlled_variables, controlled variable>>, and must be listed in the <<Testbed configuration file, testbed configuration file>>.
   For each condition, we can have a set of runs.
-* to involved several **subjects**: this is particularly relevant for experiments involving humans.
+* to involve several **subjects**: this is particularly relevant for experiments involving humans.
   In that case, we can considering having for each subject the execution of the same protocol, i.e involving potentially several conditions, and for each condition several repetitions or runs.
 
 === Filename format, and generalities
 
-Generally speaking pre-processed data should follow the following pattern:
+Generally speaking pre-processed data should follow the following contextualized pattern:
 
 ```
 subject_X_cond_Y_run_Z_[type].csv
 ```
-where:
 
-* `[type]` is a string related to the type of information stored in the file.
-  This is the root name of the file.
-* X, Y, Z are integer respectively associated the to the number of subjects involved, the number of different conditions being tested, and the number of repetitions per condition.
-* The `subject` / `cond` / `run` namespaces are only used when it makes sense.
-  An experiment involving a unique subject would have have pre-processed files following the pattern: `cond_Y_run_Z_[type].csv`.
-  An experiment with various subject but unique condition would follow the pattern `subject_X_run_Z_[type].csv`.
-  At the extreme, an experiment involving a unique subject (or an humanoid), with single condition and no repetition will be described by files with pattern: `[type].csv`.
+The rationale behind this filename format is described in <<experiment.adoc, The Experimental data page>>.
 
 Unless specified if different, all datafile recorded will have a name using this pattern.
 Each data file is now described, focusing on the type of information stored (extensions subject, cond and run are skipped for now).
@@ -84,8 +77,8 @@ Each data file is now described, focusing on the type of information stored (ext
 | grf | csv | Ground reaction Forces
 | emg | csv | Electromyography
 | wr_N_config_M_segments (unclear)| yaml | Segments descriptions files
-| subject_N_anthropometry | yaml | Anthropometry description
-| humanoid_N_anthropometry | yaml | Humanoid description
+| subject_N_info | yaml | subject description (metric and/or other)
+| humanoid_N_info | yaml | Humanoid description
 | gaitEvents | yaml | Gait Events
 | subject_N_run_R_testbedLabel | yaml | testbed specifics
 |================
@@ -156,7 +149,7 @@ In that case two options are proposed:
 
 WARNING: The protocol should indicate the appropriate option to use.
 
-**Option 1: one file per device**
+__Option 1: one file per device__
 
 The two files will share the same structure (based on the information stored in it), but will only differ by their name:
 
@@ -176,7 +169,7 @@ Both files will contain data following the regular wrench file pattern, i.e.:
 | ... | ... | ...
 |=================
 
-**Option 2: one file gathering the two devices**
+__Option 2: one file gathering the two devices__
 
 A single file is provided, and use the generic format `subject_N_run_R_wrench.csv`.
 The file content is a concatenation of the two readings, with the labels adjusted to distinguish the two devices:
@@ -207,7 +200,7 @@ From this file, the number of joints, its labels and the degrees of freedom can
 
 **Number of files**: all necessary files to describe the complete robotic structure.
 
-**Name of the file**: The main urdf file which includes the rest of urdf files should be named as `humanoid_N_anthropometry`, where `N` is the humanoid number.
+**Name of the file**: The main urdf file which includes the rest of urdf files should be named as `robot_N_info`, where `N` is the humanoid number.
 
 **File format**: `.urdf`.
 The use of `.urdf` files also has shortcomings such as the lack of friction (important for e.g. walking steeper slope angles).
@@ -228,26 +221,29 @@ Moreover, it allows to use a wide variety of simulators commonly used in robotic
 This file shall contain the dimensions and inertial properties of each segment of the worn robot with respect to the reference system of the human body segment connected to it.
 This is needed to enable dynamic simulators to model the human-WR system.
 
-=== Human anthropometric measures file
+=== Human information file
 
-**Description**: This file shall contain all the anthropometric measurements of the human body segment, as detailed in the <<model.adoc#sec_hbs, model document>>.
+**Description**: This file shall contain all information related to the subject, such as the anthropometric measurements of the human body segment (as detailed in the <<model.adoc#sec_hbs, model document>>), gender, age, ... .
 
-**Name of the file**: subject_N_anthropometry, where N = subject’s number.
+
+**Name of the file**: `subject_N_info`, where N = subject’s number.
 Use appropriate leading zeros for R and N to ensure proper ordering of files.
 
-**File format**: .yaml
+**File format**: `.yaml`
 
-**File structure**: Set of lines containing key: value where the key is provided in the <<model.adoc#table_body_segment, body segment table>>.
+**File structure**: Set of lines containing key: value.
+For anthropometric measures, the keys should be the ones presented in <<model.adoc#table_body_segment, body segment table>>.
+In any case, the entries provided should follow the protocol requirement.
 
-**Units**: Meters
+**Units**: Various
 
 === Humanoid anthropometric measures file
 
 **Description**: This file shall contain all the anthropometric measurements from the humanoid robot mapped to the above proposed human segments (see Table 2 and Figure 3).
 
-**Name of the file**: humanoid_N_anthropometry, where N = humanoid’s identifier.
+**Name of the file**: `robot_N_info`, where N = humanoid’s identifier.
 
-**File format**: .yaml
+**File format**: `.yaml`
 
 **File structure**: Set of lines containing key: value where the key must contain the corresponding robot segment name.
 
@@ -262,7 +258,7 @@ It can also contain subject behavior constraints set by the experimenter (ask th
 
 **File format**: .yaml
 
-**File name**: `testbed.yaml`.
+**File name**: `condition.yaml`
 
 **File structure**: Set of lines containing key: values.
 Where each key is one testbed-related data.
@@ -297,6 +293,7 @@ All controlled variables, as defined in the <<template.adoc#table_controlled_var
 **File format**: These files are not supposed to be processed automatically by the EUROBENCH Benchmarking routines, so that a specific format is not defined.
 Data can be provided as the device drivers provide them  (e.g. `c3d`, `rosbag`, `.txt`, `.csv`, ...). However, a description of the file content and acquisition frequency should be provided (like `Readme.md` or `Readme.txt`) to help the user opening and understanding these files.
 
+[[sec:pre_processed_data]]
 == Pre-Processed Data Files
 
 This set of files should contain all the data processed from the raw data and needed for running a specific benchmarking routine.
@@ -773,14 +770,14 @@ These are the files that they have produced to be compatible with the EUROBENCH
 ** `raw_data.txt`
 ** `subject_{01-02}_run_{01-03}_imu_raw.cappa`
 * Anthropometric Files
-** `subject_01_anthropometry.yaml`
-** `subject_02_anthropometry.yaml`
+** `subject_01_info.yaml`
+** `subject_02_info.yaml`
 * Electromyography Files
 ** `subject_{01-02}_run_{01-03}_emg.csv`
 * Gait Events Files
 ** `subject_{01-03}_run_{01-03}_gaitEvents.csv`
 * Testbed configuration related data file
-** `testbed.yaml`
+** `condition.yaml`
 
 There is only a unique testbed configuration file, as all runs are repetitions of the same experimental conditions.
 
@@ -794,14 +791,14 @@ The experiment was then repeated changing the support level of the exoskeleton
 ** `raw_data.txt`
 ** `cond_{01-02}_run_{01-03}_markers_raw.cappa`
 * Anthropometric Files
-** `anthropometry.yaml`
+** `info.yaml`
 * Gait Events Files
 ** `cond_{01-02}_run_{01-03}_gaitEvents.csv`
 * Testbed related data file
-** `cond_{01-02}_testbed.yaml`
+** `condition_{01-02}.yaml`
 
 label `subject` is discarded as a unique subject is considered.
-The level of exoskeleton support is specified through a variable in `cond_01_testbed.yaml` and `cond_02_testbed.yaml` files.
+The level of exoskeleton support is specified through a variable in `condition_01.yaml` and `condition_02.yaml` files.
 
 === Example 3
 
@@ -811,12 +808,12 @@ The Laboratory HumanoidLab has done a study on the new walking pattern generator
 ** `rosbag_{01-02}.bag` (containing /tf topic)
 ** `humanoid_markers_raw_{01-02}.cappa`
 * .urdf File
-** `humanoid.urdf`
+** `robot_info.urdf`
 * Gait Events Files
 ** `run_01_gaitEvents.csv`
 ** `run_02_gaitEvents.csv`
 * Testbed related data file
-** `testbed.yaml`
+** `condition.yaml`
 
 === Example 4
 
@@ -833,9 +830,9 @@ An instrumented chair was used, which is collecting a set of measures, in a form
 * Chair sensors data
 ** `subject_{01-02}_cond_{01-02}_run_{01-05}_platformData.csv`
 * Testbed related data file
-** `cond_{01-02}_testbed.yaml`
+** `condition_{01-02}.yaml`
 
-The eyes status (open/closed) is set through a parameter in files `cond_01_testbed.yaml` and `cond_02_testbed.yaml`.
+The eyes status (open/closed) is set through a parameter in files `condition_01.yaml` and `condition_02.yaml`.
 
 == References
 

diff --git a/modules/ROOT/pages/experiment_data.adoc b/modules/ROOT/pages/experiment_data.adoc
@@ -0,0 +1,112 @@
+= Experimental data
+:imagesdir: ../images
+:sectnums:
+:sectnumlevels: 4
+:experimental:
+:keywords: AsciiDoc
+:source-highlighter: highlight.js
+:icons: font
+
+## Introduction
+
+After the execution of an experiment, the experimented should have collected a set of information:
+
+* raw data, as provided by the sensors used in the experiment;
+* pre-processed data, according to the format mentioned in <<data_format.adoc, the Eurobench data format>>;
+* a report document, providing informal description of the test conducted.
+
+We focus here on the structure and organization of the pre-processed data, as they are the input information that will be used to compute _automatically_ the Performance Indicator metrics.
+
+## Datafile naming
+
+Although each centre may have a well-defined data organization, we are required to define a _Eurobench-compatible_ structure, to make sure the automatic scoring mechanism can handle the data submitted for benchmarking.
+
+The structure proposed only constraints the **pre-processed data**, as the other ones (raw data, report document) are not used within the benchmarking process.
+
+Generally speaking pre-processed data should follow the following pattern:
+
+```
+subject_X_cond_Y_run_Z_[type].csv
+```
+
+Such contextualized format provides the following information:
+
+* `[type]` is a string related to the type of information stored in the file.
+  This is the root name of the file.
+  All possible types are described in <<data_format.adoc#sec:pre_processed_data, the Eurobench data format>>.
+* `X`, `Y`, `Z` are integer respectively associated to the number of subjects involved, the number of different conditions being tested, and the number of repetitions per condition.
+* The `subject` / `cond` / `run` namespaces are only used when it makes sense:
+** An experiment involving a unique subject (or only an humanoid) would have have pre-processed files following the pattern: `cond_Y_run_Z_[type].csv`.
+** An experiment with various subjects but unique condition would follow the pattern `subject_X_run_Z_[type].csv`.
+** An experiment in which there is a single repetition of a trial would have datafiles following the pattern : `subject_X_cond_Y_[type].csv`.
+** At the extreme, an experiment involving a unique subject (or an humanoid), with single condition and no repetition will be described by files with pattern: `[type].csv`.
+
+Unless if specified different, all datafile submitted for benchmarking will have a name using this pattern.
+
+Aside the pre-processed data file, we can foresee the following additional required files:
+
+* files containing information (anthropometric and or other aspects) about the subject involved: `subject_i_info.yaml`.
+* files containing information on the robotic device used: `robot_info.yaml` (`urdf` format may be used, the protocol should state it).
+* files containing the settings of the controlled variables: `condition_i.yaml`.
+
+These files can be repeated using the following patterns:
+
+* user information: if a single human subject is involved, the data file can be named `subject_info.yaml`.
+ If `N` subject are involved, then we should provide `N` files following the pattern: `subject_i_info.yaml`, where `i < N` identifies the subject.
+* robot device information: we assume only one robotic device is used per experiments, so that the file can be directly named `robot_info.yaml`.
+  Any change of the control setting of the robot should be mentioned in the condition file.
+* Condition setting file: we identified three cases:
+** if no parameter is set, no file is required.
+** if the condition setting contains information specific to the user (like an adjustment of a pushing force that may differ per subject), then we should get the : `subject_i_condition_j.yaml`, where `j` should be an integer referring to the `Y` different condition setting being considered.
+** if the conditional settings are common for all subjects, then we should get only `Y` different files, named `condition_j.yaml`.
+
+Such model gives us a _contextualized_ filename approach: based on the filename, we can deduce all the context of a data file: its content, the subject involved, the condition setting used, the repetition id.
+A contextualized filename is unique in the complete experiment.
+
+
+### illustration
+
+**Case 1**: An experiment is conducted with 4 human subjects and a robotic device.
+The experiment involves 3 condition settings (for instance, same testbed but (1): without the robotic device, (2) with the robotic device totally passive, and (3) with the robotic device in active mode). These 3 settings are common across subjects.
+Each setting has been tested and repeated 4 times, except for the second subject that could only perform two repetitions.
+The files expected are the following:
+
+* `subject_{1,2,3,4}_info.yaml`
+* `robot_info.urdf`
+* `condition_{1,2,3}.yaml`
+* `subject_{1,3,4}_cond_{1,2,3}_run_{1,2,3,4}_[type].csv`
+* `subject_{1,3,4}_cond_{1,2,3}_run_{1,2}_[type].csv`
+
+In the two last groups of files, the `[type]` string should refer to one of the pre-processed format described in X.
+
+**Case 2**: Another experiment involves 2 subjects and a robotic device.
+The experiment contains 3 different condition settings.
+One of the configuration parameters has to be adjusted to the subject involved.
+Each condition is only repeated once.
+
+* `subject_{1,2}_info.yaml`
+* `robot_info.urdf`
+* `subject_{1,2}_condition_{1,2,3}.yaml`
+* `subject_{1,2}_cond_{1,2,3}_[type].csv`
+
+**Case 3**: an experiment compares two different parameter settings of an humanoid.
+The robot has been _asked_ to repeat three times each configuration.
+
+* `robot_info.urdf`
+* `condition_{1,2}.yaml`
+* `cond_{1,2}_run_{1,2,3}_[type].csv`
+
+## Data file organization
+
+Given the previous filename pattern, the simplest file organization is to place all the data-files into a single folder.
+
+For personal matters the experimenter can prefer organizing the data into folders.
+We envision two possibilities:
+
+* The **contextualized filename pattern is maintained across folder**: this means that by copying all files in folders into a single folder, and keeping their name, we would get all data still organized following the previous filename pattern.
+This means the folder organization does not affect the naming of the file, and the possibility of understanding the context of a file given its name.
+
+* The **filename is simplified according to the folder organisation**.
+  In that case, the contextualized filename could be obtained using the folder hierarchy.
+
+We propose focusing on the first approach for now.
diff --git a/modules/ROOT/pages/index.adoc b/modules/ROOT/pages/index.adoc
@@ -12,6 +12,7 @@ This document gathers documentation on software aspects related to the http://eu
 * <<data_format.adoc#Eurobench Data Format, Eurobench Data Format>>
 * <<template.adoc#Eurobench template, Eurobench protocol templates>>
 * <<pi_spec.adoc#Performance Indicator Specification, Performance Indicator Algorithm specification>>
+* << experiment_data.adoc#Experimental data, Organisation of experimental dataset>>
 
 == Modification Instructions