Block Name In Errors #155

gabe-l-hart · 2024-07-16T17:59:00Z

Description

This PR adds try/except handling around the core loop in Pipeline. In the handler clause, it creates a wrapper exception, attempting to match the type of the internal error, with the block_name and block_type included.

Closes: #128

russellb

one suggestion to localize the try/except handling to only the parts calling block-specific code.

Also, do you have an example of what an error looked like before and after the change?

russellb · 2024-07-16T19:25:40Z

src/instructlab/sdg/pipeline.py

+                block = block_type(self.ctx, self, block_name, **block_config)
+
+                logger.info("Running block: %s", block_name)
+                logger.info(dataset)
+
+                dataset = block.generate(dataset)


It looks like these lines capture the block-specific code, so you could localize your addition to this part, I think.

Good point! Do you think it's worth containing the config parsing as well? It seems like the use of direct indexing (vs .get) for "name", "type", and "config" implies that these are all required fields, but I don't see anywhere that they're validated (though that might happen elsewhere that I haven't found yet?). If it's not done elsewhere, it may be worth doing that validation in the __init__ to avoid getting part-way through the expensive pipeline operation before discovering configuration errors.

we now have some config validation, at least for the configs we have in the tree. See make validate-pipelines. It uses a jsonschema definition

Ok, cool, makes sense!

…/type wrapping instructlab#128 Signed-off-by: Gabe Goodhart <[email protected]>

instructlab#128 Signed-off-by: Gabe Goodhart <[email protected]>

gabe-l-hart · 2024-07-16T21:04:08Z

Thanks for the suggestion! I've localized the error handling to just the generate call.

Also, do you have an example of what an error looked like before and after the change?

I don't have a concrete example of before/after, but this uses raise ... from ... syntax. The parent exception will attempt to match the exception type of the one raised in the block with a message "BLOCK ERROR [{block_type class name}/{block_name}]: {child exc message}". The original exception can be accessed via the __cause__ attribute of the wrapper exception.

markmc · 2024-07-17T11:19:38Z

Also, do you have an example of what an error looked like before and after the change?

Here's what an exception from LLMBlock.generate() looks like:

INFO 2024-07-17 07:08:14,120 pipeline.py:77: generate Running block: gen_questions
INFO 2024-07-17 07:08:14,120 pipeline.py:78: generate Dataset({
    features: ['task_description', 'seed_question', 'seed_response'],
    num_rows: 5
})
Traceback (most recent call last):
...
  File "/home/markmc/sdg/src/instructlab/sdg/sdg.py", line 20, in generate
    dataset = pipeline.generate(dataset)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/markmc/sdg/src/instructlab/sdg/pipeline.py", line 80, in generate
    dataset = block.generate(dataset)
              ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/markmc/sdg/src/instructlab/sdg/llmblock.py", line 181, in generate
    raise Exception("TEST")
Exception: TEST

This isn't actually so problematic since we log which block is being executed, so the user has some context to use in order to debug the error

The example I gave in #128 is failing to load a config file when instantiating a block - an error with the pipeline definition, but also one that can't be detected by the schema

src/instructlab/sdg/pipeline.py

…ption instructlab#128 Signed-off-by: Gabe Goodhart <[email protected]>

markmc · 2024-07-17T15:10:45Z

src/instructlab/sdg/pipeline.py

+    """
+
+    def __init__(self, block: Block, exception: Exception):
+        self.block = block


nit: underscore prefix this if you want just .block_name and .block_type to be used?

I left that one "public" thinking that some might like to be able to actually muck with the block itself when handling the exception.

src/instructlab/sdg/pipeline.py

…ration instructlab#128 Signed-off-by: Gabe Goodhart <[email protected]>

instructlab#128 Signed-off-by: Gabe Goodhart <[email protected]>

markmc

Thanks, @gabe-l-hart !

mergify bot added the testing Relates to testing label Jul 16, 2024

russellb reviewed Jul 16, 2024

View reviewed changes

gabe-l-hart added 2 commits July 16, 2024 13:31

BlockNameInErrors: Add exception handling in Pipeline with block name…

1cfc8af

…/type wrapping instructlab#128 Signed-off-by: Gabe Goodhart <[email protected]>

BlockNameInErrors: Limit try/except to the generate call

10955fe

instructlab#128 Signed-off-by: Gabe Goodhart <[email protected]>

gabe-l-hart force-pushed the BlockNameInErrors-128 branch from 46f3e0d to 10955fe Compare July 16, 2024 19:40

markmc mentioned this pull request Jul 17, 2024

Error handling - identify which block in which pipeline triggers an exception #128

Closed

markmc reviewed Jul 17, 2024

View reviewed changes

src/instructlab/sdg/pipeline.py Outdated Show resolved Hide resolved

BlockNameInErrors: Refactor to use a single BlockGenerationError exce…

95b2ed9

…ption instructlab#128 Signed-off-by: Gabe Goodhart <[email protected]>

markmc reviewed Jul 17, 2024

View reviewed changes

gabe-l-hart added 2 commits July 17, 2024 11:29

BlockNameInErrors: Capture block instantiation errors as well as gene…

aa9fb2a

…ration instructlab#128 Signed-off-by: Gabe Goodhart <[email protected]>

BlockNameInErrors: Expose PipelineBlockError at the top of the library

ceb5a82

instructlab#128 Signed-off-by: Gabe Goodhart <[email protected]>

markmc approved these changes Jul 18, 2024

View reviewed changes

hickeyma approved these changes Jul 18, 2024

View reviewed changes

markmc merged commit 263372b into instructlab:main Jul 18, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Block Name In Errors #155

Block Name In Errors #155

gabe-l-hart commented Jul 16, 2024

russellb left a comment

russellb Jul 16, 2024

gabe-l-hart Jul 16, 2024

russellb Jul 16, 2024

gabe-l-hart Jul 16, 2024

gabe-l-hart commented Jul 16, 2024

markmc commented Jul 17, 2024

markmc Jul 17, 2024

gabe-l-hart Jul 17, 2024

markmc left a comment

Block Name In Errors #155

Block Name In Errors #155

Conversation

gabe-l-hart commented Jul 16, 2024

Description

russellb left a comment

Choose a reason for hiding this comment

russellb Jul 16, 2024

Choose a reason for hiding this comment

gabe-l-hart Jul 16, 2024

Choose a reason for hiding this comment

russellb Jul 16, 2024

Choose a reason for hiding this comment

gabe-l-hart Jul 16, 2024

Choose a reason for hiding this comment

gabe-l-hart commented Jul 16, 2024

markmc commented Jul 17, 2024

markmc Jul 17, 2024

Choose a reason for hiding this comment

gabe-l-hart Jul 17, 2024

Choose a reason for hiding this comment

markmc left a comment

Choose a reason for hiding this comment