-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quick fixes to make large scale testing work #3
Conversation
Add batch support for vllm
Signed-off-by: shiv <[email protected]>
Signed-off-by: shiv <[email protected]>
The max_tokens value in llmblock.py was updated from 12000 to 4096 to optimize the performance of the LLM server. Signed-off-by: shiv <[email protected]>
…tom error incase of empty dataset in midldle of a pipeline Signed-off-by: shiv <[email protected]>
Signed-off-by: Aakanksha Duggal <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks @shivchander 🚢
Attaching the output for your reference.
ilab data generate --endpoint-url http://localhost:3000/v1 --model mistralai/Mixtral-8x7B-Instruct-v0.1 --num-instructions 1 --api-key EMPTY --model-family mixtral --pipeline full
INFO 2024-07-10 15:44:44,893 utils.py:161: _init_num_threads NumExpr defaulting to 10 threads.
INFO 2024-07-10 15:44:45,005 config.py:58: <module> PyTorch version 2.3.1 available.
Generating synthetic data using 'mistralai/Mixtral-8x7B-Instruct-v0.1' model, taxonomy:'taxonomy' against http://localhost:3000/v1 server
INFO 2024-07-10 15:44:47,123 generate_data.py:259: generate_data Synthesizing new instructions. If you aren't satisfied with the generated instructions, interrupt training (Ctrl-C) and try adjusting your YAML files. Adding more examples may help.
INFO 2024-07-10 15:44:47,323 llmblock.py:32: server_supports_batched LLM server supports batched inputs: True
INFO 2024-07-10 15:44:47,323 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:44:47,323 pipeline.py:47: generate Running block: gen_questions
INFO 2024-07-10 15:44:47,323 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response'],
num_rows: 6
})
INFO 2024-07-10 15:44:47,997 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'num_samples', 'question'],
num_rows: 6
})
INFO 2024-07-10 15:44:47,997 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:44:47,999 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:44:47,999 pipeline.py:47: generate Running block: eval_questions
INFO 2024-07-10 15:44:47,999 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'num_samples', 'question'],
num_rows: 6
})
INFO 2024-07-10 15:44:49,427 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'num_samples', 'question', 'evaluation', 'score'],
num_rows: 6
})
INFO 2024-07-10 15:44:49,428 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:44:49,428 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:44:49,428 pipeline.py:47: generate Running block: filter_questions
INFO 2024-07-10 15:44:49,428 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'num_samples', 'question', 'evaluation', 'score'],
num_rows: 6
})
Map: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 1237.14 examples/s]
Filter: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 4524.60 examples/s]
INFO 2024-07-10 15:44:49,446 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question'],
num_rows: 6
})
INFO 2024-07-10 15:44:49,446 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:44:49,448 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:44:49,448 pipeline.py:47: generate Running block: gen_responses
INFO 2024-07-10 15:44:49,448 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question'],
num_rows: 6
})
INFO 2024-07-10 15:44:50,243 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question', 'response'],
num_rows: 6
})
INFO 2024-07-10 15:44:50,243 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:44:50,245 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:44:50,245 pipeline.py:47: generate Running block: evaluate_qa_pair
INFO 2024-07-10 15:44:50,245 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question', 'response'],
num_rows: 6
})
INFO 2024-07-10 15:44:52,704 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question', 'response', 'evaluation', 'score'],
num_rows: 6
})
INFO 2024-07-10 15:44:52,704 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:44:52,704 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:44:52,704 pipeline.py:47: generate Running block: filter_qa_pair
INFO 2024-07-10 15:44:52,704 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question', 'response', 'evaluation', 'score'],
num_rows: 6
})
Map: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 1936.73 examples/s]
Filter: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 3881.22 examples/s]
INFO 2024-07-10 15:44:52,713 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question', 'response'],
num_rows: 6
})
INFO 2024-07-10 15:44:52,713 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:44:52,713 generate_data.py:286: generate_data Generated 1 samples
INFO 2024-07-10 15:44:52,718 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:44:52,718 pipeline.py:47: generate Running block: gen_contexts
INFO 2024-07-10 15:44:52,718 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_context', 'seed_question', 'seed_response'],
num_rows: 5
})
INFO 2024-07-10 15:44:56,598 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_context', 'seed_question', 'seed_response', 'context'],
num_rows: 5
})
INFO 2024-07-10 15:44:56,598 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:44:56,600 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:44:56,600 pipeline.py:47: generate Running block: gen_grounded_questions
INFO 2024-07-10 15:44:56,600 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_context', 'seed_question', 'seed_response', 'context'],
num_rows: 5
})
INFO 2024-07-10 15:45:01,204 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_context', 'seed_question', 'seed_response', 'context', 'num_samples', 'question'],
num_rows: 12
})
INFO 2024-07-10 15:45:01,204 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:45:01,207 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:45:01,207 pipeline.py:47: generate Running block: eval_grounded_questions
INFO 2024-07-10 15:45:01,207 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_context', 'seed_question', 'seed_response', 'context', 'num_samples', 'question'],
num_rows: 12
})
INFO 2024-07-10 15:45:04,379 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_context', 'seed_question', 'seed_response', 'context', 'num_samples', 'question', 'evaluation', 'score'],
num_rows: 12
})
INFO 2024-07-10 15:45:04,380 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:45:04,380 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:45:04,380 pipeline.py:47: generate Running block: filter_grounded_questions
INFO 2024-07-10 15:45:04,380 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_context', 'seed_question', 'seed_response', 'context', 'num_samples', 'question', 'evaluation', 'score'],
num_rows: 12
})
Map: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 3076.13 examples/s]
Filter: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 8264.64 examples/s]
INFO 2024-07-10 15:45:04,389 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_context', 'seed_question', 'seed_response', 'context', 'question'],
num_rows: 10
})
INFO 2024-07-10 15:45:04,389 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:45:04,390 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:45:04,390 pipeline.py:47: generate Running block: gen_grounded_responses
INFO 2024-07-10 15:45:04,390 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_context', 'seed_question', 'seed_response', 'context', 'question'],
num_rows: 10
})
INFO 2024-07-10 15:45:05,908 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_context', 'seed_question', 'seed_response', 'context', 'question', 'response'],
num_rows: 10
})
INFO 2024-07-10 15:45:05,908 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:45:05,911 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:45:05,911 pipeline.py:47: generate Running block: evaluate_grounded_qa_pair
INFO 2024-07-10 15:45:05,911 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_context', 'seed_question', 'seed_response', 'context', 'question', 'response'],
num_rows: 10
})
INFO 2024-07-10 15:45:07,548 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_context', 'seed_question', 'seed_response', 'context', 'question', 'response', 'evaluation', 'score'],
num_rows: 10
})
INFO 2024-07-10 15:45:07,548 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:45:07,548 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:45:07,548 pipeline.py:47: generate Running block: filter_grounded_qa_pair
INFO 2024-07-10 15:45:07,548 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_context', 'seed_question', 'seed_response', 'context', 'question', 'response', 'evaluation', 'score'],
num_rows: 10
})
Map: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 2562.82 examples/s]
Filter: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 6235.03 examples/s]
INFO 2024-07-10 15:45:07,557 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_context', 'seed_question', 'seed_response', 'context', 'question', 'response', 'evaluation', 'score'],
num_rows: 10
})
INFO 2024-07-10 15:45:07,557 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:45:07,557 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:45:07,557 pipeline.py:47: generate Running block: combine_question_and_context
INFO 2024-07-10 15:45:07,557 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_context', 'seed_question', 'seed_response', 'context', 'question', 'response', 'evaluation', 'score'],
num_rows: 10
})
Map (num_proc=8): 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 88.16 examples/s]
INFO 2024-07-10 15:45:07,703 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_context', 'seed_question', 'seed_response', 'context', 'question', 'response', 'evaluation', 'score'],
num_rows: 10
})
INFO 2024-07-10 15:45:07,703 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:45:07,703 generate_data.py:286: generate_data Generated 2 samples
INFO 2024-07-10 15:45:07,706 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:45:07,706 pipeline.py:47: generate Running block: gen_questions
INFO 2024-07-10 15:45:07,706 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response'],
num_rows: 5
})
INFO 2024-07-10 15:45:10,233 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'num_samples', 'question'],
num_rows: 5
})
INFO 2024-07-10 15:45:10,233 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:45:10,235 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:45:10,235 pipeline.py:47: generate Running block: eval_questions
INFO 2024-07-10 15:45:10,235 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'num_samples', 'question'],
num_rows: 5
})
INFO 2024-07-10 15:45:11,642 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'num_samples', 'question', 'evaluation', 'score'],
num_rows: 5
})
INFO 2024-07-10 15:45:11,642 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:45:11,642 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:45:11,642 pipeline.py:47: generate Running block: filter_questions
INFO 2024-07-10 15:45:11,642 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'num_samples', 'question', 'evaluation', 'score'],
num_rows: 5
})
Map: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 1514.63 examples/s]
Filter: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 3644.05 examples/s]
INFO 2024-07-10 15:45:11,651 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question'],
num_rows: 3
})
INFO 2024-07-10 15:45:11,651 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:45:11,652 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:45:11,652 pipeline.py:47: generate Running block: gen_responses
INFO 2024-07-10 15:45:11,652 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question'],
num_rows: 3
})
INFO 2024-07-10 15:45:14,099 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question', 'response'],
num_rows: 2
})
INFO 2024-07-10 15:45:14,099 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:45:14,102 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:45:14,102 pipeline.py:47: generate Running block: evaluate_qa_pair
INFO 2024-07-10 15:45:14,102 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question', 'response'],
num_rows: 2
})
INFO 2024-07-10 15:45:15,430 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question', 'response', 'evaluation', 'score'],
num_rows: 2
})
INFO 2024-07-10 15:45:15,430 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:45:15,430 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:45:15,430 pipeline.py:47: generate Running block: filter_qa_pair
INFO 2024-07-10 15:45:15,430 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question', 'response', 'evaluation', 'score'],
num_rows: 2
})
Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 710.24 examples/s]
Filter: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 1515.01 examples/s]
INFO 2024-07-10 15:45:15,437 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question', 'response'],
num_rows: 2
})
INFO 2024-07-10 15:45:15,438 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:45:15,438 generate_data.py:286: generate_data Generated 3 samples
INFO 2024-07-10 15:45:15,442 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:45:15,442 pipeline.py:47: generate Running block: gen_questions
INFO 2024-07-10 15:45:15,442 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response'],
num_rows: 12
})
INFO 2024-07-10 15:45:16,458 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'num_samples', 'question'],
num_rows: 12
})
INFO 2024-07-10 15:45:16,458 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:45:16,460 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:45:16,460 pipeline.py:47: generate Running block: eval_questions
INFO 2024-07-10 15:45:16,460 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'num_samples', 'question'],
num_rows: 12
})
INFO 2024-07-10 15:45:17,872 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'num_samples', 'question', 'evaluation', 'score'],
num_rows: 12
})
INFO 2024-07-10 15:45:17,872 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:45:17,872 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:45:17,872 pipeline.py:47: generate Running block: filter_questions
INFO 2024-07-10 15:45:17,872 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'num_samples', 'question', 'evaluation', 'score'],
num_rows: 12
})
Map: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 3766.78 examples/s]
Filter: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 9126.32 examples/s]
INFO 2024-07-10 15:45:17,880 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question'],
num_rows: 12
})
INFO 2024-07-10 15:45:17,880 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:45:17,881 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:45:17,881 pipeline.py:47: generate Running block: gen_responses
INFO 2024-07-10 15:45:17,881 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question'],
num_rows: 12
})
INFO 2024-07-10 15:45:19,117 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question', 'response'],
num_rows: 11
})
INFO 2024-07-10 15:45:19,118 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:45:19,120 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:45:19,120 pipeline.py:47: generate Running block: evaluate_qa_pair
INFO 2024-07-10 15:45:19,120 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question', 'response'],
num_rows: 11
})
INFO 2024-07-10 15:45:21,885 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question', 'response', 'evaluation', 'score'],
num_rows: 11
})
INFO 2024-07-10 15:45:21,885 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:45:21,885 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:45:21,885 pipeline.py:47: generate Running block: filter_qa_pair
INFO 2024-07-10 15:45:21,885 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question', 'response', 'evaluation', 'score'],
num_rows: 11
})
Map: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 3498.70 examples/s]
Filter: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 8639.95 examples/s]
INFO 2024-07-10 15:45:21,893 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question', 'response'],
num_rows: 11
})
INFO 2024-07-10 15:45:21,893 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:45:21,893 generate_data.py:286: generate_data Generated 4 samples
INFO 2024-07-10 15:45:21,897 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:45:21,897 pipeline.py:47: generate Running block: gen_questions
INFO 2024-07-10 15:45:21,897 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response'],
num_rows: 5
})
INFO 2024-07-10 15:45:22,707 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'num_samples', 'question'],
num_rows: 5
})
INFO 2024-07-10 15:45:22,707 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:45:22,709 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:45:22,709 pipeline.py:47: generate Running block: eval_questions
INFO 2024-07-10 15:45:22,709 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'num_samples', 'question'],
num_rows: 5
})
INFO 2024-07-10 15:45:23,933 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'num_samples', 'question', 'evaluation', 'score'],
num_rows: 5
})
INFO 2024-07-10 15:45:23,933 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:45:23,933 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:45:23,933 pipeline.py:47: generate Running block: filter_questions
INFO 2024-07-10 15:45:23,933 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'num_samples', 'question', 'evaluation', 'score'],
num_rows: 5
})
Map: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 1738.07 examples/s]
Filter: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 3550.88 examples/s]
INFO 2024-07-10 15:45:23,941 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question'],
num_rows: 5
})
INFO 2024-07-10 15:45:23,941 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:45:23,943 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:45:23,943 pipeline.py:47: generate Running block: gen_responses
INFO 2024-07-10 15:45:23,943 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question'],
num_rows: 5
})
INFO 2024-07-10 15:45:25,208 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question', 'response'],
num_rows: 5
})
INFO 2024-07-10 15:45:25,208 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:45:25,211 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:45:25,211 pipeline.py:47: generate Running block: evaluate_qa_pair
INFO 2024-07-10 15:45:25,211 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question', 'response'],
num_rows: 5
})
INFO 2024-07-10 15:45:27,003 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question', 'response', 'evaluation', 'score'],
num_rows: 5
})
INFO 2024-07-10 15:45:27,003 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:45:27,003 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:45:27,003 pipeline.py:47: generate Running block: filter_qa_pair
INFO 2024-07-10 15:45:27,003 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question', 'response', 'evaluation', 'score'],
num_rows: 5
})
Map: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 1560.50 examples/s]
Filter: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 3725.62 examples/s]
INFO 2024-07-10 15:45:27,011 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question', 'response'],
num_rows: 5
})
INFO 2024-07-10 15:45:27,012 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:45:27,012 generate_data.py:286: generate_data Generated 5 samples
INFO 2024-07-10 15:45:27,015 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:45:27,015 pipeline.py:47: generate Running block: gen_questions
INFO 2024-07-10 15:45:27,015 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response'],
num_rows: 5
})
INFO 2024-07-10 15:45:28,646 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'num_samples', 'question'],
num_rows: 5
})
INFO 2024-07-10 15:45:28,646 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:45:28,647 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:45:28,647 pipeline.py:47: generate Running block: eval_questions
INFO 2024-07-10 15:45:28,647 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'num_samples', 'question'],
num_rows: 5
})
INFO 2024-07-10 15:45:32,433 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'num_samples', 'question', 'evaluation', 'score'],
num_rows: 6
})
INFO 2024-07-10 15:45:32,433 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:45:32,433 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:45:32,434 pipeline.py:47: generate Running block: filter_questions
INFO 2024-07-10 15:45:32,434 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'num_samples', 'question', 'evaluation', 'score'],
num_rows: 6
})
Map: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 1835.05 examples/s]
Filter: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 4146.62 examples/s]
INFO 2024-07-10 15:45:32,442 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question'],
num_rows: 4
})
INFO 2024-07-10 15:45:32,442 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:45:32,443 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:45:32,444 pipeline.py:47: generate Running block: gen_responses
INFO 2024-07-10 15:45:32,444 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question'],
num_rows: 4
})
INFO 2024-07-10 15:45:36,834 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question', 'response'],
num_rows: 4
})
INFO 2024-07-10 15:45:36,835 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:45:36,837 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:45:36,837 pipeline.py:47: generate Running block: evaluate_qa_pair
INFO 2024-07-10 15:45:36,837 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question', 'response'],
num_rows: 4
})
INFO 2024-07-10 15:45:38,571 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question', 'response', 'evaluation', 'score'],
num_rows: 4
})
INFO 2024-07-10 15:45:38,571 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:45:38,571 pipeline.py:45: generate ------------------------------------
INFO 2024-07-10 15:45:38,571 pipeline.py:47: generate Running block: filter_qa_pair
INFO 2024-07-10 15:45:38,571 pipeline.py:48: generate Input dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question', 'response', 'evaluation', 'score'],
num_rows: 4
})
Map: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 1243.86 examples/s]
Filter: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 2898.62 examples/s]
INFO 2024-07-10 15:45:38,579 pipeline.py:62: generate Output dataset: Dataset({
features: ['task_description', 'seed_question', 'seed_response', 'question', 'response'],
num_rows: 4
})
INFO 2024-07-10 15:45:38,579 pipeline.py:63: generate ------------------------------------
INFO 2024-07-10 15:45:38,579 generate_data.py:286: generate_data Generated 6 samples
INFO 2024-07-10 15:45:38,585 generate_data.py:304: generate_data Generation took 53.42s
@@ -64,7 +64,7 @@ def __init__( | |||
self.defaults = { | |||
"model": self.model, | |||
"temperature": 0, | |||
"max_tokens": 12000, | |||
"max_tokens": 4096, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
dataset = block.generate(dataset, **gen_kwargs) | ||
|
||
if len(dataset) == 0: | ||
raise EmptyDatasetError(f"Pipeline stopped: Empty dataset after running block: {block_config['block_name']}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if self.operation == operator.contains: | ||
samples = samples.filter( | ||
lambda x: self.operation(self.value, x[self.column_name]), | ||
num_proc=self.num_procs, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have an example you can send me? I'm trying to write a test case for this and contains
works how I expected.
I'm also not clear what behavior you're going for still running the operation with the parameters reversed right after this. Maybe that wasn't on purpose?
In any case, I think if we're looking at the same flows where it's used, we can nail it down quickly. Feel free to ping me privately on slack with details if needed.
No description provided.