-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce v3 schema version to support new knowledge format #38
Milestone
Comments
27 tasks
russellb
added a commit
to russellb/instructlab-schema
that referenced
this issue
Jul 17, 2024
Closes instructlab#38 v3 includes some backwards incompatible changes to the knowledge schema format. Here is a diff against v2. The changes are: - Q&A pairs now have an associated context blob from the knowledge document. - There is new "document_outline" field. ```diff --- src/instructlab/schema/v2/knowledge.json 2024-07-17 12:56:37 +++ src/instructlab/schema/v3/knowledge.json 2024-07-17 13:14:56 @@ -8,7 +8,8 @@ "domain", "task_description", "seed_examples", - "document" + "document", + "document_outline" ], "unevaluatedProperties": false, "properties": { @@ -44,20 +45,37 @@ "items": { "type": "object", "required": [ - "question", - "answer" + "context", + "questions_and_answers" ], "unevaluatedProperties": false, "properties": { - "question": { - "description": "A question used for synthetic data generation.", + "context": { + "description": "A context used for synthetic data generation.", "type": "string", "minLength": 1 }, - "answer": { - "description": "The desired response for the question.", - "type": "string", - "minLength": 1 + "questions_and_answers": { + "type": "array", + "items": { + "type": "object", + "required": [ + "question", + "answer" + ], + "properties": { + "question": { + "description": "A question used for synthetic data generation.", + "type": "string", + "minLength": 1 + }, + "answer": { + "description": "The desired response for the question.", + "type": "string", + "minLength": 1 + } + } + } } } } @@ -104,6 +122,11 @@ } } } + }, + "document_outline": { + "description": "An outline of the document.", + "type": "string", + "minLength": 1 } } } ``` Signed-off-by: Russell Bryant <[email protected]>
Merged
russellb
added a commit
to russellb/instructlab-schema
that referenced
this issue
Jul 17, 2024
Closes instructlab#38 v3 includes some backwards incompatible changes to the knowledge schema format. Here is a diff against v2. The changes are: - Q&A pairs now have an associated context blob from the knowledge document. - There is new "document_outline" field. ```diff --- src/instructlab/schema/v2/knowledge.json 2024-07-17 12:56:37 +++ src/instructlab/schema/v3/knowledge.json 2024-07-17 13:14:56 @@ -8,7 +8,8 @@ "domain", "task_description", "seed_examples", - "document" + "document", + "document_outline" ], "unevaluatedProperties": false, "properties": { @@ -44,20 +45,37 @@ "items": { "type": "object", "required": [ - "question", - "answer" + "context", + "questions_and_answers" ], "unevaluatedProperties": false, "properties": { - "question": { - "description": "A question used for synthetic data generation.", + "context": { + "description": "A context used for synthetic data generation.", "type": "string", "minLength": 1 }, - "answer": { - "description": "The desired response for the question.", - "type": "string", - "minLength": 1 + "questions_and_answers": { + "type": "array", + "minItems": 3, + "uniqueItems": true, + "items": { + "type": "object", + "required": [ + "question", + "answer" + ], + "properties": { + "question": { + "description": "A question used for synthetic data generation.", + "type": "string", + "minLength": 1 + }, + "answer": { + "description": "The desired response for the question.", + "type": "string", + "minLength": 1 + } + } + } } } } @@ -104,6 +122,11 @@ } } } + }, + "document_outline": { + "description": "An outline of the document.", + "type": "string", + "minLength": 1 } } } ``` Signed-off-by: Russell Bryant <[email protected]>
russellb
added a commit
to russellb/instructlab-schema
that referenced
this issue
Jul 17, 2024
Closes instructlab#38 v3 includes some backwards incompatible changes to the knowledge schema format. Here is a diff against v2. The changes are: - Q&A pairs now have an associated context blob from the knowledge document. - There is new "document_outline" field. ```diff --- src/instructlab/schema/v2/knowledge.json 2024-07-17 12:56:37 +++ src/instructlab/schema/v3/knowledge.json 2024-07-17 13:14:56 @@ -8,7 +8,8 @@ "domain", "task_description", "seed_examples", - "document" + "document", + "document_outline" ], "unevaluatedProperties": false, "properties": { @@ -44,20 +45,37 @@ "items": { "type": "object", "required": [ - "question", - "answer" + "context", + "questions_and_answers" ], "unevaluatedProperties": false, "properties": { - "question": { - "description": "A question used for synthetic data generation.", + "context": { + "description": "A context used for synthetic data generation.", "type": "string", "minLength": 1 }, - "answer": { - "description": "The desired response for the question.", - "type": "string", - "minLength": 1 + "questions_and_answers": { + "type": "array", + "minItems": 3, + "uniqueItems": true, + "items": { + "type": "object", + "required": [ + "question", + "answer" + ], + "properties": { + "question": { + "description": "A question used for synthetic data generation.", + "type": "string", + "minLength": 1 + }, + "answer": { + "description": "The desired response for the question.", + "type": "string", + "minLength": 1 + } + } + } } } } @@ -104,6 +122,11 @@ } } } + }, + "document_outline": { + "description": "An outline of the document.", + "type": "string", + "minLength": 1 } } } ``` Signed-off-by: Russell Bryant <[email protected]>
russellb
added a commit
to russellb/instructlab-schema
that referenced
this issue
Jul 17, 2024
Closes instructlab#38 v3 includes some backwards incompatible changes to the knowledge schema format. Here is a diff against v2. The changes are: - Q&A pairs now have an associated context blob from the knowledge document. - There is new `document_outline` field. - drop `task_description` ```diff --- src/instructlab/schema/v2/knowledge.json 2024-07-17 12:56:37 +++ src/instructlab/schema/v3/knowledge.json 2024-07-17 15:38:30 @@ -6,9 +6,9 @@ "required": [ "created_by", "domain", - "task_description", "seed_examples", - "document" + "document", + "document_outline" ], "unevaluatedProperties": false, "properties": { @@ -27,15 +27,6 @@ "Pop culture" ] }, - "task_description": { - "description": "A description of the task which is used in prompts to the teacher model during synthetic data generation. The description should be detailed and prescriptive to improve the teacher model's responses.", - "type": "string", - "minLength": 1, - "examples": [ - "To teach a language model about softball history", - "To teach a language model about tabby cats" - ] - }, "seed_examples": { "description": "An array of seed examples for synthetic data generation.", "type": "array", @@ -44,20 +35,39 @@ "items": { "type": "object", "required": [ - "question", - "answer" + "context", + "questions_and_answers" ], "unevaluatedProperties": false, "properties": { - "question": { - "description": "A question used for synthetic data generation.", + "context": { + "description": "Context from the document associated with this set of sample q&a pairs.", "type": "string", "minLength": 1 }, - "answer": { - "description": "The desired response for the question.", - "type": "string", - "minLength": 1 + "questions_and_answers": { + "type": "array", + "minItems": 3, + "uniqueItems": true, + "items": { + "type": "object", + "required": [ + "question", + "answer" + ], + "properties": { + "question": { + "description": "A question used for synthetic data generation.", + "type": "string", + "minLength": 1 + }, + "answer": { + "description": "The desired response for the question.", + "type": "string", + "minLength": 1 + } + } + } } } } @@ -104,6 +114,11 @@ } } } + }, + "document_outline": { + "description": "An outline of the document.", + "type": "string", + "minLength": 1 } } } ``` Signed-off-by: Russell Bryant <[email protected]>
russellb
added a commit
to russellb/instructlab-schema
that referenced
this issue
Jul 17, 2024
Closes instructlab#38 v3 includes some backwards incompatible changes to the knowledge schema format. Here is a diff against v2. The changes are: - Q&A pairs now have an associated context blob from the knowledge document. - There is new `document_outline` field. - drop `task_description` ```diff --- src/instructlab/schema/v2/knowledge.json 2024-07-17 12:56:37 +++ src/instructlab/schema/v3/knowledge.json 2024-07-17 15:38:30 @@ -6,9 +6,9 @@ "required": [ "created_by", "domain", - "task_description", "seed_examples", - "document" + "document", + "document_outline" ], "unevaluatedProperties": false, "properties": { @@ -27,15 +27,6 @@ "Pop culture" ] }, - "task_description": { - "description": "A description of the task which is used in prompts to the teacher model during synthetic data generation. The description should be detailed and prescriptive to improve the teacher model's responses.", - "type": "string", - "minLength": 1, - "examples": [ - "To teach a language model about softball history", - "To teach a language model about tabby cats" - ] - }, "seed_examples": { "description": "An array of seed examples for synthetic data generation.", "type": "array", @@ -44,20 +35,39 @@ "items": { "type": "object", "required": [ - "question", - "answer" + "context", + "questions_and_answers" ], "unevaluatedProperties": false, "properties": { - "question": { - "description": "A question used for synthetic data generation.", + "context": { + "description": "Context from the document associated with this set of sample q&a pairs.", "type": "string", "minLength": 1 }, - "answer": { - "description": "The desired response for the question.", - "type": "string", - "minLength": 1 + "questions_and_answers": { + "type": "array", + "minItems": 3, + "uniqueItems": true, + "items": { + "type": "object", + "required": [ + "question", + "answer" + ], + "properties": { + "question": { + "description": "A question used for synthetic data generation.", + "type": "string", + "minLength": 1 + }, + "answer": { + "description": "The desired response for the question.", + "type": "string", + "minLength": 1 + } + } + } } } } @@ -104,6 +114,11 @@ } } } + }, + "document_outline": { + "description": "An outline of the document.", + "type": "string", + "minLength": 1 } } } ``` Signed-off-by: Russell Bryant <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This is part of: instructlab/sdg#160
This is an example of the new format: https://github.com/instructlab/taxonomy/blob/7729fcd62ca68e36225a98a954e702734cc09ae1/knowledge/science/anatomy/tonsils/qna.yaml
The key changes are:
document_outline
field has been added.The text was updated successfully, but these errors were encountered: