Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle or disallow Solution Modifiers for SPARQL inputs #126

Open
lu-pl opened this issue Oct 30, 2024 · 7 comments · May be fixed by #129
Open

Handle or disallow Solution Modifiers for SPARQL inputs #126

lu-pl opened this issue Oct 30, 2024 · 7 comments · May be fixed by #129
Assignees
Labels
enhancement New feature or request

Comments

@lu-pl
Copy link
Contributor

lu-pl commented Oct 30, 2024

rdfproxy.SPARQLModelAdapter implements pagination by dynamically modifying the supplied SPARQL query.

For ungrouped models, OFFSET and LIMIT modifiers are injected, for grouped models a subquery (based on the initial query) is generated and injected into the initial query.

E.g. for

select ?parent ?child ?name
where {
    values (?parent ?child ?name) {
        ('x' 'c' 'foo')
        ('y' 'd' UNDEF)
        ('y' 'e' UNDEF)
        ('z' UNDEF UNDEF)
    }
}

and a group_by definition in the respective model specifying "parent" as the grouping value, the following query gets generated:

select ?parent ?child ?name
where {
    values (?parent ?child ?name) {
        ('x' 'c' 'foo')
        ('y' 'd' UNDEF)
        ('y' 'e' UNDEF)
        ('z' UNDEF UNDEF)
    }

  {
    select distinct ?parent
    where {
      values (?parent ?child ?name) {
        ('x' 'c' 'foo')
        ('y' 'd' UNDEF)
        ('y' 'e' UNDEF)
        ('z' UNDEF UNDEF)
      }
    }
    order by ?parent
    limit 2
    offset 0
  }
}

See #114.

This raises the question though, how to handle input queries that themselves define solution modifiers.
rdfproxy currently simply does not handle that case, passing a query with solution modifiers is undefined behavior and will almost certainly lead to unintended effects. So the case urgently MUST be handled.

The easiest and probably best solution to that problem is to simply disallow solution modifiers for SPARQL inputs.

First of all, I cannot really think of a reason to use SPARQL queries with solution modifiers in rdfproxy. LIMIT and OFFSET are handled by rdfproxy pagination and GROUP BY is handled by rdfproxy's mapping and grouping mechanism.

Secondly, proper handling of that case would come with significant overhead and difficulties. Apart from the lack of proper SPARQL query analysis facilities for Python, one would also need to think about and decide, what cases of solution modifiers in an inbound query could even be meaningful in the context of rdfproxy.

@lu-pl
Copy link
Contributor Author

lu-pl commented Oct 30, 2024

Possible predicate function for checking a query for solution modifiers:

def query_has_solution_modifiers(query: str) -> bool:
    """Predicate for checking if a SPARQL query has a solution modifier."""
    pattern = r"}[^}]*\w+$"
    result = re.search(pattern, query)
    return bool(result)

The regex matches any word characters after (and including) the last '}'. So this is simply for checking the existence of a solution modifier, not for cleanly extracting possible solution modifiers.

@lu-pl lu-pl self-assigned this Oct 30, 2024
@lu-pl lu-pl mentioned this issue Oct 30, 2024
3 tasks
@lu-pl
Copy link
Contributor Author

lu-pl commented Oct 30, 2024

Possible predicate function for checking a query for solution modifiers:

def query_has_solution_modifiers(query: str) -> bool:
    """Predicate for checking if a SPARQL query has a solution modifier."""
    pattern = r"}[^}]*\w+$"
    result = re.search(pattern, query)
    return bool(result)

The regex matches any word characters after (and including) the last '}'. So this is simply for checking the existence of a solution modifier, not to cleanly extract possible solution modifiers.

Note: Handling of nested queries with solution modifiers is much more complicated and very likely not even necessary.

lu-pl added a commit that referenced this issue Oct 31, 2024
@lu-pl lu-pl linked a pull request Oct 31, 2024 that will close this issue
@lu-pl
Copy link
Contributor Author

lu-pl commented Nov 1, 2024

Possible predicate function for checking a query for solution modifiers:

def query_has_solution_modifiers(query: str) -> bool:
    """Predicate for checking if a SPARQL query has a solution modifier."""
    pattern = r"}[^}]*\w+$"
    result = re.search(pattern, query)
    return bool(result)

The regex matches any word characters after (and including) the last '}'. So this is simply for checking the existence of a solution modifier, not for cleanly extracting possible solution modifiers.

Consider (?<=})\s+([^}]*)\Z.

The combination of positive lookbehind and absolute anchor is able to match the solution modifier of the outermost SPARQL clause regardless of wether multiline mode is active. Also the group captures the solution modifier more cleanly.

@lu-pl lu-pl added this to the Query checking milestone Nov 7, 2024
@lu-pl lu-pl added the enhancement New feature or request label Nov 7, 2024
@kevinstadler kevinstadler removed this from the Query checking milestone Dec 18, 2024
@lu-pl
Copy link
Contributor Author

lu-pl commented Jan 21, 2025

Actually, there is a very trivial solution for allowing incoming queries with solution modifiers, i.e. simply providing a very thin SPARQL wrapper in case a solution modifier is detected.

E.g. a query with a solution modifier

select ?s
where {
    ?s ?p ?o .
}
limit 10

can be thinly wrapped like

select ?s
where {
  {
    select ?s
    where {
      ?s ?p ?o .
    }
    limit 10
  }
}
limit 2
offset 0

LIMIT/OFFSET in that case represent RDFProxy query modifications, which could then be applicable without problems.

I think a warning should be emitted if a query with solution modifiers is detected though, there simply is no/rarely need for passing queries with solution modifiers to RDFProxy.

@lu-pl
Copy link
Contributor Author

lu-pl commented Jan 24, 2025

Actually, there is a very trivial solution for allowing incoming queries with solution modifiers, i.e. simply providing a very thin SPARQL wrapper in case a solution modifier is detected.

E.g. a query with a solution modifier

select ?s
where {
?s ?p ?o .
}
limit 10

can be thinly wrapped like

select ?s
where {
{
select ?s
where {
?s ?p ?o .
}
limit 10
}
}
limit 2
offset 0

LIMIT/OFFSET in that case represent RDFProxy query modifications, which could then be applicable without problems.

I think a warning should be emitted if a query with solution modifiers is detected though, there simply is no/rarely need for passing queries with solution modifiers to RDFProxy.

This solution is only applicable for ungrouped models, grouped models significantly more complicated and I am currently having trouble even imagining a potential sane use case for allowing solution modifiers for grouped models.

@lu-pl
Copy link
Contributor Author

lu-pl commented Jan 24, 2025

Actually, I am in favor of generally disallowing solution modifiers. All that can be done with solution modifiers can be done with RDFProxy parameters and elaborately nesting and wrapping queries if solution modifiers are detected very likely would lead to unexpected results for users.

@lu-pl lu-pl changed the title Disallow Solution Modifiers for SPARQL inputs Handle or disallow Solution Modifiers for SPARQL inputs Jan 24, 2025
@lu-pl
Copy link
Contributor Author

lu-pl commented Jan 24, 2025

Given a grouped model, for a simple query with solution modifier like

select ?x ?y
where {
  values (?x ?y) {
    (1 2)
    (1 3)
    (2 4)
    (3 5)
  }
}
limit 2

the following steps are necessary for handling the query:

  1. thinly wrap the query
  2. construct the subquery based on the initial query with the solution modifiers removed
  3. inject this subquery into the wrapped query

So for the query above, the following query would need to get generated:

select ?x ?y
where {
  {
    select ?x ?y
    where {
      values (?x ?y) {
	(1 2)
	(1 3)
	(2 4)
	(3 5)
      }
    }
    limit 2
  }
  # injected
  {
    select distinct ?x
    where {
      values (?x ?y) {
	(1 2)
	(1 3)
	(2 4)
	(3 5)
      }
    }
    order by ?x limit 2 offset 0
  }
}

Again, I wonder if generally disallowing outer solution modifiers would not be the better option here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants