Fix mismatch in full pipeline outputs (#75)

The full knowledge pipeline had `question` and `response` as output columns, while the skills pipelines used `question` and `answer`. `generate_data.py` currently expects `response` instead of `answer`. Instead of having to deal with both, just standardize on `response`, since that seems to be used more frequently. For example, various prompt filenames have "response" in their names. Signed-off-by: Russell Bryant <[email protected]>
instructlab · Jul 3, 2024 · 6251693 · 6251693
1 parent bcb7974
commit 6251693
Show file tree

Hide file tree

Showing 3 changed files with 5 additions and 5 deletions.
diff --git a/src/instructlab/sdg/configs/skills/evaluate_freeform_pair.yaml b/src/instructlab/sdg/configs/skills/evaluate_freeform_pair.yaml
@@ -33,7 +33,7 @@ generation: |
   [End of Question]
 
   [Start of Answer]
-  {answer}
+  {response}
   [End of Answer]
 
   Begin your evaluation by providing a short explanation. Be as objective as possible. After providing your explanation, you must rate the answer on a scale of 1 to 3 as mentioned above. 

diff --git a/src/instructlab/sdg/configs/skills/evaluate_grounded_pair.yaml b/src/instructlab/sdg/configs/skills/evaluate_grounded_pair.yaml
@@ -43,12 +43,12 @@ generation: |
   [End of Question]
 
   [Start of Answer]
-  {answer}
+  {response}
   [End of Answer]
 
   * Return the evaluation between [Start of Evaluation] and [End of Evaluation] tags.
   * Return the score between [Start of Score] and [End of Score] tags.
 
 
 start_tags: ["[Start of Evaluation]", "[Start of Score]"]
-end_tags: ["[End of Evaluation]", "[End of Score]"]
+end_tags: ["[End of Evaluation]", "[End of Score]"]
diff --git a/src/instructlab/sdg/default_flows.py b/src/instructlab/sdg/default_flows.py
@@ -336,7 +336,7 @@ def get_flow(self) -> list:
                     "client": self.client,
                     "model_id": self.model_id,
                     "model_prompt": _get_model_prompt(self.model_family),
-                    "output_cols": ["answer"],
+                    "output_cols": ["response"],
                     "batch_kwargs": {
                         "num_procs": 8,
                         "batched": self.batched,
@@ -471,7 +471,7 @@ def get_flow(self) -> list:
                     "client": self.client,
                     "model_id": self.model_id,
                     "model_prompt": _get_model_prompt(self.model_family),
-                    "output_cols": ["answer"],
+                    "output_cols": ["response"],
                     "batch_kwargs": {
                         "num_procs": 8,
                         "batched": self.batched,