Update docs

OpenSun3D · Apr 18, 2024 · 90fdc70 · 90fdc70
1 parent a851c16
commit 90fdc70
Show file tree

Hide file tree

Showing 4 changed files with 40 additions and 8 deletions.
diff --git a/search/search_index.json b/search/search_index.json
diff --git a/sitemap.xml.gz b/sitemap.xml.gz
diff --git a/track_1/index.html b/track_1/index.html
@@ -1007,7 +1007,7 @@ <h2 id="submission-instructions"><strong>Submission Instructions</strong></h2>
 <p>Given the open-vocabulary query, the participants are asked to segment object instances that fit best with the query. Expected result is object instance masks, and confidence scores for each mask. </p>
 <p>We ask the participants to upload their results as a single <code>.zip</code> file, which when unzipped must contain in the root the prediction files. There must not be any additional files or folders in the archive except those specified below.</p>
 <p>Results must be provided as a text file for each scene. Each text file should contain a line for each instance, containing the relative path to a binary mask of the instance, and the confidence of the prediction. The result text files must be named according to the corresponding scan, as <code>{SCENE_ID}.txt</code> with the corresponding scene ID. Predicted <code>.txt</code> files listing the instances of each scan must live in the root of the unzipped submission. Predicted instance mask files must live in a subdirectory of the unzipped submission. For instance, a submission should look like:</p>
-<pre><code>submission_opensun3d
+<pre><code>submission_opensun3d_track1
     |__ {SCENE_ID_1}.txt
     |__ {SCENE_ID_2}.txt 
          ⋮

diff --git a/track_2/index.html b/track_2/index.html
@@ -951,7 +951,7 @@ <h3 id="challenge-phases">Challenge Phases</h3>
 </li>
 </ul>
 <h3 id="data-organization-and-format">Data organization and format</h3>
-<p>We represent each scene with a visit_id (6-digit number) and each video sequence with a video_id (8-digit number). Each scene has on average three video sequences recorded with a 2020 iPad Pro.</p>
+<p>We represent each scene with a visit_id (6-digit number) and each video sequence with a video_id (8-digit number). For each scene, we provide a high-resolution point cloud generated by combinding multiple Faro laser scans of the scene. Additionally, each scene is accompanied by on average three video sequences recorded with a 2020 iPad Pro.</p>
 <pre><code>PATH/TO/DATA/DIR/{dev or test}/
 ├── {visit_id}/
 |   ├── {visit_id}.ply # combined Faro laser scan with 5mm resolution
@@ -983,8 +983,8 @@ <h3 id="data-organization-and-format">Data organization and format</h3>
 .
 </code></pre>
 <h3 id="annotations-format">Annotations format</h3>
-<p>Annotations are organized in two separate files and follow this format:</p>
-<p><em>descriptions.json</em></p>
+<p>We provide GT annotations for the scenes in the development set which are organized in two separate files and follow this format:</p>
+<p><em><a href="https://github.com/OpenSun3D/cvpr24-challenge/blob/main/challenge_track_2/benchmark_data/descriptions_dev.json">descriptions_dev.json</a></em></p>
 <pre><code>[
   {
     &quot;desc_id&quot;: unique id of the description,
@@ -997,7 +997,7 @@ <h3 id="annotations-format">Annotations format</h3>
   ...
 ]
 </code></pre>
-<p><em>annotations.json</em></p>
+<p><em><a href="https://github.com/OpenSun3D/cvpr24-challenge/blob/main/challenge_track_2/benchmark_data/annotations_dev.json">annotations_dev.json</a></em></p>
 <pre><code>[
   {
     &quot;annot_id&quot;: unique id of the annotation,
@@ -1007,7 +1007,7 @@ <h3 id="annotations-format">Annotations format</h3>
   ...
 ]
 </code></pre>
-<p>The file <em>descriptions.json</em> contains the language task descriptions and links them to the corresponding functional interactive element instances. The file <em>annotations.json</em> contains the functional interactive element annotations, i.e., the mask indices of a single functional interactive element instance in the original laser scan. </p>
+<p>The file <em>descriptions_dev.json</em> contains the language task descriptions and links them to the corresponding functional interactive element instances. The file <em>annotations_dev.json</em> contains the functional interactive element annotations, i.e., the mask indices of a single functional interactive element instance in the original laser scan. </p>
 <blockquote>
 <p>&#128221; We <em>highlight</em> that a single language task description can correspond to one or multiple functional interactive element instances.</p>
 </blockquote>
@@ -1156,7 +1156,39 @@ <h2 id="example-code">Example code</h2>
 </code></pre>
 <p>where the <code>wide</code> RGB frames are used for coloring, the extraneous point will be cropped from the laser scan and the output will be stored.</p>
 <h2 id="submission-instructions">Submission Instructions</h2>
-<p>Coming soon.</p>
+<p>Given the open-vocabulary language task description, the participants are asked to segment functional interacive element instances that an agent needs to interact with to successfully accomplish the task. Expected result is functional interacive element masks, and confidence scores for each mask. </p>
+<p>We ask the participants to upload their results as a single <code>.zip</code> file, which when unzipped must contain in the root the prediction files. There must not be any additional files or folders in the archive except those specified below.</p>
+<p>Results must be provided as a text file for each scene. Each text file should contain a line for each instance, containing the relative path to a binary mask of the instance, and the confidence of the prediction. The result text files must be named according to the corresponding laser scan (<code>visit_id</code>) and language description (<code>desc_id</code>), as <code>{visit_id}_{desc_id}.txt</code>. Predicted <code>.txt</code> files listing the instances of each scan must live in the root of the unzipped submission. Predicted instance mask files must live in a subdirectory named <code>predicted_masks/</code> of the unzipped submission. For example, a submission should look like the following:</p>
+<pre><code>submission_opensun3d_track2
+    |__ {visit_id_1}_{desc_id_1}.txt
+    |__ {visit_id_2}_{desc_id_2}.txt 
+         ⋮
+    |__ {visit_id_N}_{desc_id_N}.txt
+    |__ predicted_masks/
+        |__ {visit_id_1}_{desc_id_1}_000.txt
+        |__ {visit_id_1}_{desc_id_1}_001.txt
+            ⋮
+</code></pre>
+<p>for all the available N pairs (laser scan, language description).</p>
+<p>Each prediction file for a scene should contain a list of instances, where an instance is: (1) the relative path to the predicted mask file, (2) the float confidence score. If your method does not produce confidence scores, you can use 1.0 as the confidence score for all masks. Each line in the prediction file should correspond to one instance, and the two values above separated by a space. Thus, the filenames in the prediction files must not contain spaces.
+The predicted instance mask file should provide a mask over the vertices of the provided laser scan, i.e. <code>{visit_id}_laser_scan.ply</code>, following the original order of the vertices in this file.
+Each instance mask file should contain one line per point, with each line containing an integer value, with non-zero values indicating part of the instance. For example, consider a scene identified by visit_id <code>123456</code>, with a language description input identified by desc_id <code>5baea371-b33b-4076-92b1-587a709e6c65</code>. In this case, the submission files could look like:</p>
+<p><code>123456_5baea371-b33b-4076-92b1-587a709e6c65.txt</code></p>
+<pre><code>predicted_masks/123456_5baea371-b33b-4076-92b1-587a709e6c65_000.txt 0.7234
+predicted_masks/123456_5baea371-b33b-4076-92b1-587a709e6c65_001.txt 0.9038
+⋮
+</code></pre>
+<p>and <code>predicted_masks/123456_5baea371-b33b-4076-92b1-587a709e6c65_000.txt</code> could look like:</p>
+<pre><code>0
+0
+1
+1
+⋮
+0
+</code></pre>
+<blockquote>
+<p>&#128221; <strong>IMPORTANT NOTE</strong>: The prediction files must adhere to the vertex ordering of the original laser scan point cloud <code>{visit_id}_laser_scan.ply</code>. If your pipeline alters this vertex ordering (e.g., through cropping the laser scan using the <code>crop_mask</code> data asset), ensure that the model predictions are re-ordered to match the original vertex ordering before generating the prediction files.</p>
+</blockquote>
 <h2 id="evaluation-guidelines">Evaluation Guidelines</h2>
 <p>In order to evaluate the results on the scenes of the dev set, we provide <a href="https://github.com/OpenSun3D/cvpr24-challenge/blob/main/challenge_track_2/benchmark_eval/eval_utils/eval_script_inst.py">evaluation functions</a> as well as an example <a href="https://github.com/OpenSun3D/cvpr24-challenge/blob/main/challenge_track_2/benchmark_eval/demo_eval.py">evaluation script</a>. We follow the standard evaluation for 3D instance segmentation, and compute Average Precision (AP) scores. The evaluation script computes the AP scores for each language task description and then averages the scores over all language task descriptions in the set. </p>
 <p>You can run the example evaluation script as:</p>