Feature/query #44

hohonuuli · 2024-10-16T15:14:52Z

This branch adds support for ad-hoc queries against the annotations view in the database. This feature is needed to support the upcoming vars query web ui.

The endpoints for these new features will be under v1/query

Add distinct as a flag. Default should be true.
Add strict(?) as a flag. Default is false. When false adds the observation_uuid and index_recorded_timestamp columns and sorts by those.
Add orderby param. This should be ignored when strict is true (verify that this is the behavior we want)
Add equals operator
Rename like to contains and retain current behavior
Add like that takes a sql like string (e.g. http%)
Add integration tests for sql server and postgres

hohonuuli · 2024-10-17T00:12:04Z

There are new query endpoints, see http://portal.shore.mbari.org:8100/docs/#/Query.

Goals

These endpoints are to allow app-developers and some users to fetch annotation or annotation data in a flexible manner. The primary use case is the new VARS web query. Secondary use would be to replace SQL used in vars-gridview. Non-goals include specialized queries for reporting.

Notes

These endpoints operate against the annotations view in VARS. That view joins tables from the M3_ANNOTATIONS and M3_VIDEO_ASSETS databases into a single unified view. When working with data from this view, it's important to be aware of how the table joins affect the data returned. Typically, you will have to do some munging of the rows in your app, depending on what you're querying for. The observation_uuid column is your friend and you can use that to combine related columns (notably anything to with associations or images which are essentially a one to many join with observations). I used this same method with all the version of the VARS query through the years and it works well.

The query endpoint is relatively dynamic, so we can add or remove columns from the annotations view as we see fit. Note that the observation_uuid, imaged_moment_uuid, and index_recorded_timestamp columns are hard-coded into the codebase at the moment. This is required to do sensible things like giving stable sort keys and allowing for returns of related concepts/associations when doing a query.

`/v1/query/columns`

This endpoint allows your code to know what it can query for.

This returns information about the each column in the annotation view. For most users, they only care about the columnName, but columnType may be useful too. Example from http://portal.shore.mbari.org:8100/v1/query/columns edited for brevity:

[
  {
    "columnName": "imaged_moment_uuid",
    "columnType": "uniqueidentifier",
    "columnSize": 36,
    "columnLabel": "imaged_moment_uuid",
    "columnClassName": "java.lang.String"
  }
]

`/v1/query/count`

Takes the same JSON body used in /v1/query/run (except you don't need to include select) and returns a count of matching rows. Note that this is NOT QUITE RIGHT yet. run includes DISTINCT so count will likely overestimate the number of rows returned.

`/v1/query/run`

Runs a query using a POST / JSON request and returns the DISTINCT result ordered by time as tab-delimited data

Here's an example body below, it's a bit like if SQL and JSON had a baby. Important things ...

select - specifies the columns to return
where - constraints. Can be one of the operators below
- between - can be two elements of numbers or dates (as ISO8601)
- contains - translates to LIKE '%word%'
- equals - Matches a string
- in - Same as SQL IN. The value is an array of strings: ["foo", "bar", "etc"]
- isnull - can be true or false.
- like - User has to supply the %
- max - Becomes column <= max
- min - Becomes column >= max
- minmax - A number between the provided values. Takes an array of numbers [1, 100]
concurrentObservations - When true runs your query and also returns any other annotations occurring on the same frames that were returned by your query. If true it overrides strict and strict will be treated as false regardless of what you set it to.
relatedAssociations - When true, and you're constraining by some association field, will run your query but also return all other associations on observations in your query. Useful for things like searching for bounding box but also getting any other associations, not just the bounding box ones. If true it overrides strict and strict will be treated as false regardless of what you set it to.
limit - the max number of rows to return
offset - The starting row to return. When used with limit can page through the data.
distinct - Applies distinct to the query, the default is false
strict - When false, queries will be modified to include the observation_uuid and index_recorded_timestamp will be included in the returns. The default is true
orderby - takes and array of column name to be used for sorting. The default is by index_recorded_timestamp

This query returns all Nanomia (and Nanomia bijuga) annotations with a localization, with a valid recorded timestamp, and that have images and are on a video file. Since concurrentObservations is true, it will also return all other bounding box annotations on the same frames (e.g. not nanomia)

{
  "select": [
    "concept",
    "index_recorded_timestamp",
    "video_sequence_name",
    "video_uri",
    "image_url",
    "link_value"
  ],
  "where": [
    {
      "column": "concept",
      "in":["Nanomia", "Nanomia bijuga"]
    },
    {
      "column": "index_recorded_timestamp",
      "isnull": false
    },
    {
      "column": "image_url",
      "isnull": false
    },
    {
      "column": "link_name",
      "in": ["bounding box"]
    },
    {
      "column": "video_uri",
      "like": "http%"   
    }
  ],
  "limit": 5000,
  "offset": 0,
  "concurrentObservations": true,
  "relatedAssociations": false
}

You can run this from the command like with:

curl -X 'POST' \
  'http://portal.shore.mbari.org:8100/v1/query/run' \
  -H 'Content-Type: application/json' \
  -d '{
  "select": [
    "concept",
    "index_recorded_timestamp",
    "video_sequence_name",
    "video_uri",
    "image_url",
    "link_value"
  ],
  "where": [
    {
      "column": "concept",
      "in":["Nanomia", "Nanomia bijuga"]
    },
    {
      "column": "index_recorded_timestamp",
      "isnull": false
    },
    {
      "column": "image_url",
      "isnull": false
    },
    {
      "column": "link_name",
      "in": ["bounding box"]
    },
    {
      "column": "video_uri",
      "like": "http%"   
    }
  ],
  "limit": 5000,
  "offset": 0,
  "concurrentObservations": true,
  "relatedAssociations": false
}
'

@lonnylundsten @kevinsbarnard @NancyJS I would really appreciate any feedback so that this endpoint addresses current SQL use cases. (Kevin, especially for apps). Please be mindful it's not meant to address ALL need for SQL (Lonny, that's especially true for reporting). Also, none of the operator names are set in stone, so we can tweak them if there's consensus (e.g. min -> gt). It's relatively easy to add other operators too if needed.

This is currently deployed starting with release 1.2.0 and is now running internally at MBARI. I'm waiting for feedback before I start writing any apps against it.

lonnylundsten · 2024-10-17T21:00:25Z

@hohonuuli @kevinsbarnard

If I want to constrain a query by date, how can I do that using this API?

Is it something like this -- this doesn't work?
{"column": "index_recorded_timestamp", "between": "1996-01-01 and 2002-01-01"}

lonnylundsten · 2024-10-17T21:19:18Z

@hohonuuli If I update the query so there is no limit (i.e., fetch all the data) when I do a big query (i.e., Nanomia bijuga) I get a time out error. Is that expected or should I be able to get all the data?

lonnylundsten · 2024-10-17T21:21:59Z

@hohonuuli If I update the query so there is no limit (i.e., fetch all the data) when I do a big query (i.e., Nanomia bijuga) I get a time out error. Is that expected or should I be able to get all the data?

I may have crashed VARS....

hohonuuli · 2024-10-17T21:23:55Z

@lonnylundsten

With great power comes great responsibility
Don't do that.

That unbounded query will return a little over a half-million rows ('cause table joins). The service converts that to 1. an in memory data structure which is 2. converted to a String to be written back to the client. I'm pretty sure I haven't configured the service with enough memory to handle that query.

hohonuuli · 2024-10-17T21:25:28Z

The proper order to fetch large sets is:

Use the count endpoint to get an estimate of the number of rows
Page through the data using limit and offset to read in smaller chunks. I don't know what the max rooms is and honestly, it depends on how busy the server is. Start with 5000 as the upper bound.

hohonuuli · 2024-10-17T21:28:04Z

If I want to constrain a query by date, how can I do that using this API?

It's suppose to be the following, but it might be broken ATM as I'm changing the API.

{
    "column": "index_recorded_timestamp",
    "between": [
        "1996-01-01T00:00:00Z",
        "2002-01-01T00:00:00Z"
    ]
}

hohonuuli · 2024-10-21T23:14:24Z

@lonnylundsten @kevinsbarnard I've updated the query params the /v1/query/run accepts. The docs above reflect the new parameters. Here's new examples:

Get localizations of Nanomia that are missing an image, but one could be fetched using beholder.

curl -X 'POST' \
  'http://m3.shore.mbari.org/anno/v1/query/run' \
  -H 'Content-Type: application/json' \
  -d '{
  "select": [
    "concept",
    "index_recorded_timestamp",
    "video_sequence_name",
    "video_uri",
    "image_url",
    "link_value"
  ],
  "where": [
    {
      "column": "concept",
      "in":["Nanomia", "Nanomia bijuga"]
    },
    {
      "column": "index_elapsed_time_millis",
      "isnull": false
    },
    {
      "column": "image_url",
      "isnull": true
    },
    {
      "column": "link_name",
      "in": ["bounding box"]
    },
    {
      "column": "video_uri",
      "like": "http%"   
    }
  ],
  "distinct": true,
  "limit": 5000,
  "offset": 0,
  "concurrentObservations": true,
  "relatedAssociations": false
}
'

Get all concept names used in annotations

curl -X 'POST' \
  'http://m3.shore.mbari.org/anno/v1/query/run' \
  -H 'Content-Type: application/json' \
  -d '{
  "select": [
    "concept"
  ],
  "where": [
    {
      "column": "concept",
      "isnull": false
    }
  ],
  "strict": true,
  "orderby": ["concept"],
  "distinct": true
}
'

These changes have been deployed internally as v1.2.1

hohonuuli · 2024-10-22T22:52:52Z

Released as v1.2.2

hohonuuli added 5 commits October 13, 2024 22:16

Working on a query implementation

6efa349

Constraints parser is working

5904ede

fixed driver

f040342

check point

af9dee1

Seems to be working. Tweaking things ...

a18ebf4

hohonuuli added enhancement priority labels Oct 16, 2024

hohonuuli self-assigned this Oct 16, 2024

ready for intial evaluation

b274976

hohonuuli requested review from lonnylundsten and kevinsbarnard October 17, 2024 00:18

hohonuuli marked this pull request as draft October 17, 2024 16:09

hohonuuli added 4 commits October 17, 2024 10:25

Need to add test for query endpoints

737c440

reformated with updated scalafmt

411eeb9

Applied scalafix rules

70c710d

check point

ceb1658

mbari-org deleted a comment from lonnylundsten Oct 17, 2024

hohonuuli added 2 commits October 20, 2024 18:25

Fixed bug in select

d60d1fc

added connection pool

5b5aeb9

hohonuuli added 3 commits October 22, 2024 13:38

Added QueryServiceSuite

2f04825

All tests pass

1a02308

ran scalafix + scalafmt

d9e40b1

hohonuuli marked this pull request as ready for review October 22, 2024 22:42

hohonuuli merged commit ae39e2c into master Oct 22, 2024
2 checks passed

hohonuuli deleted the feature/query branch October 22, 2024 22:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/query #44

Feature/query #44

hohonuuli commented Oct 16, 2024 •

edited

Loading

hohonuuli commented Oct 17, 2024 •

edited

Loading

lonnylundsten commented Oct 17, 2024

lonnylundsten commented Oct 17, 2024

lonnylundsten commented Oct 17, 2024

hohonuuli commented Oct 17, 2024

hohonuuli commented Oct 17, 2024

hohonuuli commented Oct 17, 2024 •

edited

Loading

hohonuuli commented Oct 21, 2024 •

edited

Loading

hohonuuli commented Oct 22, 2024

Feature/query #44

Feature/query #44

Conversation

hohonuuli commented Oct 16, 2024 • edited Loading

hohonuuli commented Oct 17, 2024 • edited Loading

Goals

Notes

/v1/query/columns

/v1/query/count

/v1/query/run

lonnylundsten commented Oct 17, 2024

lonnylundsten commented Oct 17, 2024

lonnylundsten commented Oct 17, 2024

hohonuuli commented Oct 17, 2024

hohonuuli commented Oct 17, 2024

hohonuuli commented Oct 17, 2024 • edited Loading

hohonuuli commented Oct 21, 2024 • edited Loading

Get localizations of Nanomia that are missing an image, but one could be fetched using beholder.

Get all concept names used in annotations

hohonuuli commented Oct 22, 2024

hohonuuli commented Oct 16, 2024 •

edited

Loading

hohonuuli commented Oct 17, 2024 •

edited

Loading

`/v1/query/columns`

`/v1/query/count`

`/v1/query/run`

hohonuuli commented Oct 17, 2024 •

edited

Loading

hohonuuli commented Oct 21, 2024 •

edited

Loading