Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPIKE] Test validate dataset classes in catalog using LSP #159

Merged
merged 9 commits into from
Jan 24, 2025

Conversation

SajidAlamQB
Copy link
Contributor

@SajidAlamQB SajidAlamQB commented Nov 25, 2024

Related to: #5

This is a spike to test using the lamguage server protocol from pygls to add validation for dataset classes.

How to Use the Feature

  1. Install or Update the Kedro VSCode Extension
  2. Open Your Kedro Project in VSCode
  3. The extension will automatically initialize and start scanning catalog files.
  4. View Diagnostics in the Problems Pane it will display diagnostics for any issues found in your catalog files, even if the files are not open.

Signed-off-by: Sajid Alam <[email protected]>
@SajidAlamQB SajidAlamQB self-assigned this Nov 25, 2024
Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is ready for review, but I left some comments. Using diagnostic is the right direction.

Something to considered:

  • The current feature works on opened file, would it makes sense to work on closed file as well? i.e. when I start with a new project, I want to quickly identify all the missing dependencies.
  • Once this is done, we need to deprecated the old static YAML schema validation in package.json

image

  • Bonus: Is there a way to improve the error message to show what's the missing dependencies?

This is looking good already, you may want to add some documentation or screenshot at the end to show how to use this feature. This is definitely a very useful feature!

Final word: as a next step, is it possible to make this functions standalone, so it can be used as a CLI command? This way the VSCode extension can be a consumer of this function, but also keep it opened so it can be used as a CLI command or extend with other tool easily.

bundled/tool/lsp_server.py Show resolved Hide resolved
@noklam noklam linked an issue Nov 27, 2024 that may be closed by this pull request
bundled/tool/lsp_server.py Outdated Show resolved Hide resolved
Signed-off-by: Sajid Alam <[email protected]>
@SajidAlamQB
Copy link
Contributor Author

This is looking good already, you may want to add some documentation or screenshot at the end to show how to use this feature. This is definitely a very useful feature!

Hey @noklam, thanks for the feedback! I'm currently working on extending the feature to validate closed files as well. I agree that it would make sense to quickly identify all missing dependencies when starting a new project. I'll also add documentation and include screenshots to demonstrate how to use this new feature.

Signed-off-by: Sajid Alam <[email protected]>
@SajidAlamQB
Copy link
Contributor Author

I've implemented validation of closed files but there is a bit of a performance aspect to this. For projects with many files it may take some time to load them all.

@SajidAlamQB SajidAlamQB marked this pull request as ready for review November 29, 2024 12:29
@noklam
Copy link
Contributor

noklam commented Nov 29, 2024

Haven't read the code as I am on my phone. It only need to read the relevant files so it shouldn't need to read the entire project. It can make use of the configuration that it only scan the base /local or the selected env

Signed-off-by: Sajid Alam <[email protected]>
@SajidAlamQB SajidAlamQB requested a review from noklam December 5, 2024 14:09
@astrojuanlu
Copy link
Member

I'd love to do some QA, @SajidAlamQB do you think a .vsix can be generated from this PR?

@SajidAlamQB
Copy link
Contributor Author

I'd love to do some QA, @SajidAlamQB do you think a .vsix can be generated from this PR?

Absolutely please find it attached here.

kedro.vsix.zip

@astrojuanlu
Copy link
Member

Tested with a spaceflights project. It works, although the LSP doesn't immediately pick up changes in the environment. For example, starting from here:

image

(kedro-datasets and pandas not present in the environment)

I then do uv pip install kedro-datasets pandas and it didn't immediately update. Maybe it has to do with this error I'm observing in the Output tab:

2024-12-16 14:54:35.847 [info] Failed to handle request 12 textDocument/definition DefinitionParams(text_document=TextDocumentIdentifier(uri='file:///Users/juan_cano/Projects/QuantumBlackLabs/kedro-spaceflights-mlflow/conf/base/catalog.yml'), position=45:3, work_done_token=None, partial_result_token=None)
Traceback (most recent call last):
  File "/Users/juan_cano/.vscode/extensions/kedro.kedro-0.2.2/bundled/libs/pygls/protocol/json_rpc.py", line 266, in _handle_request
    self._execute_request(msg_id, handler, params)
  File "/Users/juan_cano/.vscode/extensions/kedro.kedro-0.2.2/bundled/libs/pygls/protocol/json_rpc.py", line 188, in _execute_request
    self._send_response(msg_id, handler(params))
                                ^^^^^^^^^^^^^^^
  File "/Users/juan_cano/.vscode/extensions/kedro.kedro-0.2.2/bundled/tool/lsp_server.py", line 366, in definition
    result = _query_catalog(document, word)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/juan_cano/.vscode/extensions/kedro.kedro-0.2.2/bundled/tool/lsp_server.py", line 333, in _query_catalog
    catalog_paths = _get_conf_paths(server, "catalog")
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/juan_cano/.vscode/extensions/kedro.kedro-0.2.2/bundled/tool/lsp_server.py", line 235, in _get_conf_paths
    config_loader: OmegaConfigLoader = server.config_loader
                                       ^^^^^^^^^^^^^^^^^^^^
AttributeError: 'KedroLanguageServer' object has no attribute 'config_loader'

Also, something that just noticed (and I think it's happening with the current version too): somehow the extension is trying to parse pipeline.py as YAML

image

Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like @astrojuanlu said, the extension is parsing more files than it needed like pipeline.py.

image

In addition, it seems that it is still validating the entries starts with _

I am not sure why but when I rerun this I don't see these error again and it works fine.

@noklam
Copy link
Contributor

noklam commented Jan 2, 2025

I create a PR, it can be merged either with this PR or we can have a separate one to deprecate the old validation with Redhat's YAML extension.

#164

@noklam noklam self-requested a review January 9, 2025 15:08
@SajidAlamQB
Copy link
Contributor Author

I then do uv pip install kedro-datasets pandas and it didn't immediately update.

I've added periodic revalidation this should resolve this, also it should now check live while entering catalog entries if they are importable.

Signed-off-by: Sajid Alam <[email protected]>
@SajidAlamQB
Copy link
Contributor Author

Also, something that just noticed (and I think it's happening with the current version too): somehow the extension is trying to parse pipeline.py as YAML.

Added check in watchers for only catalog files now, this shouldn't happen anymore.

Signed-off-by: Sajid Alam <[email protected]>
@noklam noklam requested a review from astrojuanlu January 17, 2025 12:10
@noklam
Copy link
Contributor

noklam commented Jan 17, 2025

tagging @astrojuanlu in case you want to do another round of QA

@SajidAlamQB
Copy link
Contributor Author

SajidAlamQB commented Jan 20, 2025

Please find the updates vsix file here: (note just rename the extension to kedro.vsix, github doesn't allow .vsix)

kedro.vsix.zip

Copy link
Member

@astrojuanlu astrojuanlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a bit of QA, looks like it works!

@rashidakanchwala
Copy link

Extension had some other issue for me in the debugging mode but the above issue worked fine.

I tested a scenario where kedro-datasets was not installed. This resulted in a red error line and error messages in the terminal. The issue was resolved after installing kedro-datasets.

@astrojuanlu
Copy link
Member

This resulted in a red error line and error messages in the terminal.

How did those look like?

@SajidAlamQB SajidAlamQB merged commit 8f019d2 into main Jan 24, 2025
2 checks passed
@SajidAlamQB SajidAlamQB deleted the spike/implement-LSP-for-catalog branch January 24, 2025 13:39
@jitu5 jitu5 mentioned this pull request Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Validate dataset classes in catalog using LSP
5 participants