Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of EXISTS and NOT EXISTS #1703

Merged
merged 40 commits into from
Feb 15, 2025
Merged

Conversation

joka921
Copy link
Member

@joka921 joka921 commented Jan 7, 2025

This implements EXISTS via a new ExistsJoin operation. Namely, for each operation involving an expression (for example, a BIND or a FILTER), an ExistJoin operation is added for each occurence of EXISTS in that expression. The ExistsJoin adds an additional column (with a unique variable name) that contains the result of the EXISTS and is used to evaluate the expression. This can can be seen in the "Analysis" tree when executing a query. The current implementation has the following limitations, which will be addressed in future PRs:

  1. The ExistsJoin operation is not yet lazy
  2. The argument of an EXISTS is currently handled completely independent from the rest of the query (except for FROM AND FROM NAMED clauses, which are inherited from the outer query); it is an ongoing debate whether that is correct when the EXISTS contains FILTERs that refer to variables from outside the EXISTS.
  3. When there are UNDEF values in the join columns, the ExistsJoin uses our generic zipper join, which is not particularly efficient

joka921 and others added 15 commits October 8, 2024 14:59
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Copy link

codecov bot commented Jan 8, 2025

Codecov Report

Attention: Patch coverage is 92.76018% with 16 lines in your changes missing coverage. Please review.

Project coverage is 90.12%. Comparing base (6349abd) to head (092e0d9).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
src/engine/MultiColumnJoin.cpp 14.28% 4 Missing and 2 partials ⚠️
src/util/JoinAlgorithms/FindUndefRanges.h 77.27% 0 Missing and 5 partials ⚠️
src/engine/ExistsJoin.cpp 96.36% 2 Missing and 2 partials ⚠️
src/engine/ExistsJoin.h 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1703      +/-   ##
==========================================
+ Coverage   90.07%   90.12%   +0.04%     
==========================================
  Files         396      399       +3     
  Lines       38021    38199     +178     
  Branches     4266     4281      +15     
==========================================
+ Hits        34247    34426     +179     
+ Misses       2486     2473      -13     
- Partials     1288     1300      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

joka921 added 10 commits January 8, 2025 10:55
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
As a next step, I want to write some comments.

Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
The only thing that is missing, is some corner case tests, and maybe cleaning up the parsing of the active dataset clauses.

Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
@hannahbast hannahbast changed the title An initial implementation of EXISTS An initial implementation of EXISTS Jan 10, 2025
@hannahbast hannahbast marked this pull request as ready for review February 5, 2025 00:48
Copy link
Member

@hannahbast hannahbast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great and I have made a thorough pass now and committed some changes. A few more questions and then this is ready to merge:

  1. In the code for ExistsExpression::getCacheKey you mention that it can happen that the ExistsJoin is not yet set up because the query planning is not yet finished, in which case you set a random cache key. For which scenario is this relevant? In particular, when does it ever happen that query processing starts before query planning is completed?

  2. In SparqlAntlrParserTest.cpp, you define a selectABarFooMatcher, which can take FROM and FROM NAMED clauses as arguments. But then you only use it with the default arguments.

  3. At the end of ExistsJoinTest.cpp there are two TODOs by yourself suggesting a few more tests, how about adding them?

  4. Is it hard to use ad_utility::callFixedSize for the ExistsJoin or should that be a separate PR?

  5. Can you check the one SonarCloud issue that is not a TODO?

  6. Are you fine with the patch coverage reported by CodeCov?

@hannahbast hannahbast changed the title An initial implementation of EXISTS Initial implementation of EXISTS and NOT EXISTS Feb 5, 2025
Hannah Bast and others added 8 commits February 5, 2025 03:39
# Conflicts:
#	.github/workflows/sparql-conformance.yml
Signed-off-by: Johannes Kalmbach <[email protected]>
# Conflicts:
#	src/engine/CMakeLists.txt
#	src/engine/sparqlExpressions/ExistsExpression.h
#	test/engine/ExistsJoinTest.cpp
Signed-off-by: Johannes Kalmbach <[email protected]>
Copy link
Member

@hannahbast hannahbast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joka921 This looks great now, thanks a lot. I tried it on a few queries on Wikidata and they worked fine. I have committed some minor comment improvements and then this should be almost ready to merge, provided that all the tests run through.

What I don't understand is the addition of https://github.com/ad-freiburg/qlever/blob/87ba7cad5b1124b4095beb9fa31b040e85a7de17/.github/workflows/upload-sparql-conformance.yml . In the current master, we have https://github.com/ad-freiburg/qlever/blob/master/.github/workflows/sparql-conformance-uploader.yml, which looks a bit different though. Please clarify.

@hannahbast hannahbast changed the title Initial implementation of EXISTS and NOT EXISTS Implementation of EXISTS and NOT EXISTS Feb 14, 2025
@sparql-conformance
Copy link

Conformance check passed ✅

Test Status Changes 📊

Number of Tests Previous Status Current Status
15 Failed Passed

Details: https://qlever.cs.uni-freiburg.de/sparql-conformance-ui?cur=092e0d91d22d3e541253dfcce728fd6cd5d5065e&prev=6349abdcf85d66c231dd4527c987d05b21177cf7

Copy link
Member

@hannahbast hannahbast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted the involuntary changes in .github/workflows (regarding the SPARQL conformance test) and fixed another small merge bug. Everything looks great and I will happily merge this now!

PS: The failing macOS check has nothing to do with this PR in particular (the packet installation step fails).

@hannahbast hannahbast merged commit a9f6895 into ad-freiburg:master Feb 15, 2025
23 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants