query resource #14

ErikSundvall · 2016-06-23T14:54:21Z

There are several issues to discuss regarding how to specify the Query resource. Including but not limited to:

Format of result set (see https://openehr.atlassian.net/wiki/display/spec/AQL+Result+Set+work+area )
Support for other Query formalisms than AQL?
Storing and executing stored queries (see previous discussion at https://openehr.atlassian.net/wiki/display/spec/openEHR+REST+APIs )
Long queries do not fit within the length limit of GET-requests, so there must be a way to POST queries too.
...

bostjanl · 2016-06-23T15:08:23Z

In the api spec there are POST as well as GET query calls

wolandscat · 2016-06-23T15:28:20Z

I would expect that a POST would make sense for the logical operation of 'register query' which just returns a query handle / id. Then some later request - a GET? - does an 'execute query' with that id.

In a more sophisticated system, a query could be registered and then used to generate push results every x time, or event driven to a named receiver.

Queries that are registered in this way need to somehow time-out and disappear over time, else the query service will fill up pretty fast...

bostjanl · 2016-06-23T15:30:25Z

We need POST because GET is limited by the URL length limitation. There are some really long queries out there.

ErikSundvall · 2016-06-27T14:58:27Z

A "POST to register/store Query"-approach was described in the LiU EEE work, see the section "Querying" (and its subsections) a bit in to the chapter "Implementation" http://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/1472-6947-13-57#Sec13

We used a SHA-1 hash to give every differently formulated query an ID, and stored every* new (not previously seen/hashed) Query in it's original form (as a help to log what has been queried for). That does not take a lot of space and does not need to be done inside the DBMS "stored procedure"-mechanism. Instead admins (or statistics + config rules) can decide wich commonly occuring queries to convert to real DBMS-native stored procedures/queries for improved efficiency.

Also, re-translating from AQL to native can be avoided for repeated queries with identical hash, if you in your implementation also choose to store the translated Query in some form. The translated but seldom used ones could of course be purged after a while if space is a problem.

(In implementations where the standard http-server-log is used as a major part of the system log the original queries need to be kept, since the content of a POST won't be logged.)

I thought the store-query-and-redirect-approach was a good way to adhere to REST-design patterns, and the only approach needed, but from the openEHR-wiki-rest-discussion I reckoned that people thought it might be seen as complicated by some developers and thus a simpler non-storing approach was wanted. When we get to the week for discussin /query in our time-table we could revisit this and discuss pros and cons of different approaches.

Those interested in java code implementing Query storage might want to look in...
https://github.com/LiU-IMT/EEE/tree/master/src/main/java/se/liu/imt/mi/eee/db
...and its subdirectories to see an implementation for a specific database.

The variable-cleaning+ordering and SHA-hashing etc can be found in...
https://github.com/LiU-IMT/EEE/blob/master/src/main/java/se/liu/imt/mi/eee/ehr/res/Query.java
Pretty simple stuff to implement.

Regarding space:
Sendling long queries via GET will store the entire query string in the standard http-log every time, even when the same query is repeated thousands of times. So a POST->store-once->redirect approach might be less space-consuming for many use-cases. Well, zipping logs, will reduce that space in the long run of course, but storing unique POSTed queries neatly (and optionally storing their usage statistics) also has other benefits.

*) "Debug"-marked queries were not stored, they were just translated to native Query language and the translated Query returned instead of executed towards patient data.

wolandscat · 2016-06-27T15:17:57Z

@ErikSundvall I like this general approach.
That last point about space is also a good one.

bjornna · 2016-06-30T11:34:24Z

We have implemented POST as a GET call to make it work for long queries. This is the same as the search servers do. Like Apache Solr and also Elastic Search. I guess this is a pragmatic way to handle this use-case.

Related to the space and long queries:
We have also implemented a stored query interface. This has the normal CRUD operations and you may use a stored query identifier as input to the query. Such a stored query may have parameters that should be serialized as key/value structure.

bjornna · 2016-06-30T11:37:46Z

One important challenge we are facing is to be able to reuse the same query within diffrerent scopes. One implementation of this could be the parameter function. We didn't find that flexible enough. That's why we implemented the concept of query scope. By using query scope you may use the same AQL to query for the latest bloodpressure of a patient and you may put scope as episode of care, folder, etc as an external parameter.

I think this concept is something that should be added to the openEHR service specification.

wolandscat · 2016-06-30T13:22:46Z

@bjornna that's a nice idea; can it be extended to make the same query run for 'this patient 1234' or over a population of patients?

bjornna · 2016-06-30T13:26:59Z

Related to the GET/POST topic - I think the query resource is something different than a ordinary REST resource. It is by definition a read only resource. You are not able to change the state of the system (the Ehr ) by doing Query.
Given this there would be no problem using both GET and POST.

Thus is of course not true for other resources which by nature is CRUD oriented.

We have two resources to work with stored queries and virtual archetype definitions. They are CRUD based and follows the REST verb pattern. The identifier of these resources may be used in the query resource

ErikSundvall · 2016-06-30T14:04:37Z

The hash-mechanism combined with a (redirectable) shortcut/name/alias-mechanism will achieve the same thing as CRUD exept that the U (update) and D (deletion) of a named query will be a logical change of what the shortcut/name points to, rather than a physical change/delete of a stored query since the old query can still be inspected by accessing it via the stored/hash URL - good for for log/auditing purposes.

Parametized stored queries can be fed with parameters via the GET call to the hashed (or named/aliased) URL of the stored query. That is what we did in the above mentioned Java implementation, see excerpt from
https://github.com/LiU-IMT/EEE/blob/master/src/main/java/se/liu/imt/mi/eee/ehr/res/Query.java below

        // Create string (currently compact JSON) with keys and values sorted alphabetically, allows multiple values with same name         
        Iterator<String> it = postedQueryAsForm.getNames().iterator();

        while (it.hasNext()) {
            String name = (String) it.next();
            String[] valueArray = postedQueryAsForm.getValuesArray(name);
            // TODO: Separate static & dynamic variables,
            if (name.startsWith("_")) {
                // POST-ed variable names starting with _ (underscore) will not be stored, 
                // but instead sent on as dynamic parameters in URI after removing first underscore in the variable name
                for (int i = 0; i < valueArray.length; i++) {
                    uriGetQueryAsForm.add(name.substring(1), valueArray[i]);
                    // System.out.println("Querysource.handleFormPost() adding entry: "+name.substring(1) +" = "+ valueArray[i]);
                }                   
            } else {
                // All other posted (static) variables get stored in the query map
                if (valueArray.length != 1) {
                    throw new ResourceException(Status.CLIENT_ERROR_BAD_REQUEST, "Keys for static (stored) query variables need to be unique (duplicates are not allowed). Dynamic variables prepended with _ (underscore) will instead be passed on as a URI query and do not need to be unique.");                    
                }
                sortedMap.put(name, valueArray[0]);
                cleanedStaticForm.add(name, valueArray[0]);             
            }

I don't know if that is the best approach or not (or if something else than _ should be used as prefix), but for a developer/user writing ad-hoc queries, e.g. in a big form field, the result will be the same no matter if the variable is prefixed with an underscore or not. A developer that on the other hand understands the hashing/storing and knows they'll want some dynamic parameters, can prefix those variables with underscores and then in later repeated calls just use the hashed GET URL with dynamic parameters without passing the POST/store mechanism for subsequent calls.

Please have a look at the section called "Benefits from storing and redirecting POSTed queries"
in http://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/1472-6947-13-57

(While in that paper you can also search for "bookmark" to read about a "bookmark" service in LiU EEE, that could use autogenerated (or manually assigned) names. It would work for many query-use-cases too, but I guess some developers might find that general bookmark-approach overkill or complicated if the thing wanted is only named queries. Also, it might hide "GET"-parameters for named queries unless we specify that any query-parameters added to the bookmark URL should be passed along to the target, in this case an AQL query.)

The "scope"-concept that you write about @bjornna looks interesting, do you have more information and a list of the parameters/restrictions that you have found useful?

bjornna · 2016-06-30T17:21:23Z

@erik - you have a working server at http://arenaehr.dips.no:9000/api-doc to test some queries

From the api doc you find the following template for payload in the request . There are several parameters;

you may restrict the query to a set of Composition, Ehrs or a set of tags. Tags are actually key/value structures on a Composition. They could be whatever. What we use most is period of care and episode of care.
you may also partition the query by a tag. This makes it convenient to get the latest temperature for the last episode for a given patient, or more used : the latest temp for all patients currently at the hospital ( a set of episodes ).

{
"aql": "string",
"compositionUids": [
"string"
],
"ehrIds": [
"string"
],
"tagScope": {
"tags": [
{
"values": [
"string"
],
"tag": "string"
}
]
},
"partitionBy": {
"tag": "string",
"limit": 0
}
}

bjornna · 2016-06-30T17:23:20Z

@thomas - I guess the partition question was answered by the comment above. Let me know if I missed something.

wolandscat · 2017-08-19T13:02:48Z

Re-reading everything above, it seems to me we should (largely following Erik):

avoid filling up the http-log with repeats of long queries
use POST to register a query and execute via a GET that supplies params as needed
use hashes created from the query templates to id the stored queries

(The current API POST method says: 'Execute an AQL query, but this seems wrong to me).

In this scheme there is no 'one-shot' query execution approach, unless we provide such via another GET, which appears to be what is in the current API. If there is to be one-shot query execute + return results, what are the semantics? Do these queries get registered as well? Are parameters treated separately in the same manner for queries registered via POST and executed via GET?

wolandscat · 2017-08-19T13:06:05Z

Another idea, following from the hashing concept described by Erik: it would seem obvious to support some kind of 'query set', that would typically be used to populate a whole form. Query sets could be identified by hashes and stored in the same way as single queries. Doing a GET on a query set would get a table of QueryResults, keyed by individual query hashes.

Crazy idea?

bjornna · 2017-08-19T18:57:47Z

No - not crazy at all. I think we have implemented almost this feature with VAQM. Clients may query for several queries in a batch. Use case is I.e. a ward list with a query for each distinct column.

In the query result we provide a correlation identifier to match the results.

Let's look into this when we discuss the Query endpoint.

ppazos · 2017-08-19T19:14:21Z

I agree, query sets are useful, let's say to fill a clinical dashboard / patient summary GUI at once, without the need of executing individual queries. This specific use case would be defined in a profile like "get data to display" (there is an IHE profile for something like this http://wiki.ihe.net/index.php/Retrieve_Information_for_Display), and implemented with those query sets. My 2c.

…

On Sat, Aug 19, 2017 at 3:57 PM, Bjørn Næss ***@***.***> wrote: No - not crazy at all. I think we have implemented almost this feature with VAQM. Clients may query for several queries in a batch. Use case is I.e. a ward list with a query for each distinct column. In the query result we provide a correlation identifier to match the results. Let's look into this when we discuss the Query endpoint. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <https://github.com/openEHR/specifications-ITS/issues/14#issuecomment-323541471>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABOCXPoPivLqwI0hp2eQZ8L-Nq97VlzXks5sZzArgaJpZM4I85ZR> .

-- Ing. Pablo Pazos Gutiérrez Cel:(00598) 99 043 145 Skype: cabolabs <http://cabolabs.com/> http://www.cabolabs.com [email protected] Subscribe to our newsletter <http://eepurl.com/b_w_tj>

ErikSundvall added the discussion label Jun 23, 2016

bostjanl added this to the 1.1.0 milestone Nov 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

query resource #14

query resource #14

ErikSundvall commented Jun 23, 2016

bostjanl commented Jun 23, 2016

wolandscat commented Jun 23, 2016

bostjanl commented Jun 23, 2016

ErikSundvall commented Jun 27, 2016 •

edited

Loading

wolandscat commented Jun 27, 2016

bjornna commented Jun 30, 2016

bjornna commented Jun 30, 2016

wolandscat commented Jun 30, 2016

bjornna commented Jun 30, 2016

ErikSundvall commented Jun 30, 2016 •

edited

Loading

bjornna commented Jun 30, 2016

bjornna commented Jun 30, 2016

wolandscat commented Aug 19, 2017

wolandscat commented Aug 19, 2017

bjornna commented Aug 19, 2017

ppazos commented Aug 19, 2017 via email

query resource #14

query resource #14

Comments

ErikSundvall commented Jun 23, 2016

bostjanl commented Jun 23, 2016

wolandscat commented Jun 23, 2016

bostjanl commented Jun 23, 2016

ErikSundvall commented Jun 27, 2016 • edited Loading

wolandscat commented Jun 27, 2016

bjornna commented Jun 30, 2016

bjornna commented Jun 30, 2016

wolandscat commented Jun 30, 2016

bjornna commented Jun 30, 2016

ErikSundvall commented Jun 30, 2016 • edited Loading

bjornna commented Jun 30, 2016

bjornna commented Jun 30, 2016

wolandscat commented Aug 19, 2017

wolandscat commented Aug 19, 2017

bjornna commented Aug 19, 2017

ppazos commented Aug 19, 2017 via email

ErikSundvall commented Jun 27, 2016 •

edited

Loading

ErikSundvall commented Jun 30, 2016 •

edited

Loading