Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query Sanity Checking #116

Open
3 tasks done
lu-pl opened this issue Oct 25, 2024 · 1 comment · May be fixed by #129 or #204
Open
3 tasks done

Query Sanity Checking #116

lu-pl opened this issue Oct 25, 2024 · 1 comment · May be fixed by #129 or #204
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@lu-pl
Copy link
Contributor

lu-pl commented Oct 25, 2024

rdfproxy should run certain checks on queries to fail as fast as possible in case of inapplicable queries to avoid inadvertent side effects or simply crashing later in the program flow.

  • Utilize SPARQL query parser

It might be advisable to parse incoming queries with an external parsing tool which raise ParsingExceptions early.

Note that the rdflib SPARQL parser is not fully reliable for this, see #2960. rdflib.plugins.sparql.parser.parseQuery can however catch SPARQL syntax errors.

  • Allow only SELECT queries for `SPARQLModelAdapter

rdfproxy is designed to handle SELECT queries, passing anything but a SELECT query to a SPARQLModelAdapter will certainly crash (somewhere) but still might have inadvertent side effects.

So passing anything but a SELECT query should crash the program as fast as possible.

A SELECT query predicate can be defined using the rdflib SPARQL parser plugin:

from rdflib.plugins.sparql.parser import parseQuery

def _is_select_query(query: str) -> bool:
    _, query_type = parseQuery(query)
    return query_type.name == "SelectQuery"

Another option would be to use SPARQLWrapper internals for determining the query type (see SPARQLWrapper.Wrapper._parseQueryType l597) or to utilize the compiled regex patternSPARQLWrapper.pattern (see SPARQLWrapper.SPARQLWrapper l267)

I prefer the rdflib solution.

  • Disallow Solution Modifiers

See #126.

@lu-pl lu-pl added the enhancement New feature or request label Oct 25, 2024
@lu-pl lu-pl self-assigned this Oct 25, 2024
@lu-pl lu-pl changed the title Allow only SELECT queries for SPARQLModelAdapter Query Sanity Checking Oct 29, 2024
lu-pl added a commit that referenced this issue Oct 31, 2024
@lu-pl lu-pl linked a pull request Oct 31, 2024 that will close this issue
@lu-pl lu-pl added this to the Query checking milestone Nov 7, 2024
@lu-pl
Copy link
Contributor Author

lu-pl commented Nov 25, 2024

Note: Currently, it is not entirely clear to me, where exactly this logic should run.

Options are e.g.

  • SPARQLModelAdapter init

This would ensure that the query gets checked as soon as possible. Also model sanity checking according to #108 could run in the initializer, so query and model checking would both happen in one place and as soon as possible.

class SPARQLModelAdapter(Generic[_TModelInstance]):
    # ... 
    def __init__(
        self, target: str | SPARQLWrapper, query: str, model: type[_TModelInstance]
    ) -> None:
        self._query = get_checked_query(query)
        self._model = get_checked_model(model)
    # ...

This would encapsulate all query logic in a single class and would also communicate clearly, that QueryConstructor is designed to handle sanitized and checked queries only.

@kevinstadler kevinstadler added the documentation Improvements or additions to documentation label Dec 18, 2024
@kevinstadler kevinstadler removed this from the Query checking milestone Dec 18, 2024
lu-pl added a commit that referenced this issue Jan 23, 2025
The change introduces a check_query callable which runs an extensible
compose pipeline of query checkers.

The check_query function runs in SPARQLModelAdapter to enable fast
failures on inapplicable queries; also a run_query_check=False flag is
added to the QueryConstructor class so that QueryConstructor could
also be used as a standalone class with query checking enabled.

Closes #116.
@lu-pl lu-pl linked a pull request Jan 23, 2025 that will close this issue
lu-pl added a commit that referenced this issue Jan 23, 2025
The change introduces a check_query callable which runs an extensible
compose pipeline of query checkers.

Note regarding QueryParseException: This custom exception is intended
to be a thin wrapper around a pyparsing ParseException that RDFLib
raises.
This avoids introducing pyparsing as a dependency just to be able to
test against this exception. I feel like RDFLib should not raise a
pyparsing exception but provide a thin wrapper itself.
See RDFLib/rdflib#3057.

The check_query function runs in SPARQLModelAdapter to enable fast
failures on inapplicable queries; also a run_query_check=False flag is
added to the QueryConstructor class so that QueryConstructor could
also be used as a standalone class with query checking enabled.

Closes #116.
lu-pl added a commit that referenced this issue Jan 24, 2025
The change introduces a check_query callable which runs an extensible
compose pipeline of query checkers.

Note regarding QueryParseException: This custom exception is intended
to be a thin wrapper around a pyparsing ParseException that RDFLib
raises.
This avoids introducing pyparsing as a dependency just to be able to
test against this exception. I feel like RDFLib should not raise a
pyparsing exception but provide a thin wrapper itself.
See RDFLib/rdflib#3057.

The check_query function runs in SPARQLModelAdapter to enable fast
failures on inapplicable queries. Note that this somewhat couples
QueryConstructor to SPARQLModelAdapter; QueryConstructor should be
marked private for this reason.

Closes #116.
lu-pl added a commit that referenced this issue Jan 24, 2025
The change introduces a check_query callable which runs an extensible
compose pipeline of query checkers.

Note regarding QueryParseException: This custom exception is intended
to be a thin wrapper around a pyparsing ParseException that RDFLib
raises.
This avoids introducing pyparsing as a dependency just to be able to
test against this exception. I feel like RDFLib should not raise a
pyparsing exception but provide a thin wrapper itself.
See RDFLib/rdflib#3057.

The check_query function runs in SPARQLModelAdapter to enable fast
failures on inapplicable queries. Note that this somewhat couples
QueryConstructor to SPARQLModelAdapter; QueryConstructor should be
marked private for this reason.

Closes #116.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
2 participants