Skip to content

Commit

Permalink
Getting there. TODO's indicate a few loose ends to tie
Browse files Browse the repository at this point in the history
  • Loading branch information
RasonJ committed May 31, 2024
1 parent 6626bb1 commit 736bf69
Show file tree
Hide file tree
Showing 7 changed files with 76 additions and 70 deletions.
2 changes: 1 addition & 1 deletion _search-plugins/ubi/data-structures.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ _Optionally_:
- `getSessionId()`
- `getPageId()`

Other sample implementations can be found [here](#TODO-clients-link).
<!-- Not needed with this page: Other sample implementations can be found [here](#TODO-clients-link). -->

```js
/*********************************************************************************************
Expand Down
2 changes: 2 additions & 0 deletions _search-plugins/ubi/documentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ This *repository* contains the OpenSearch plugin for the User Behavior Insights
facilitates persisting client-side events (e.g. item clicks, scroll depth) and OpenSearch queries for the purpose of analyzing the data
to improve search relevance and user experience.

[Link to repo plugin's documentation](https://github.com/o19s/opensearch-ubi)

## Quick start

We need a Quick Start!!!
Expand Down
11 changes: 4 additions & 7 deletions _search-plugins/ubi/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,29 +20,26 @@ It is a causal system, linking a user's query to all subsequent user interaction

* An machine readable [schema](https://github.com/o19s/ubi) that faciliates interoperablity of the UBI specification.
* An OpenSearch [plugin](https://github.com/o19s/opensearch-ubi) that facilitates the storage of client-side events and queries.
* A client-side JavaScript library reference implementation that shows how to capture events and send those events to the OpenSearch UBI plugin.

TODO: link a client implementation [here](#TODO-clients-link)
{: .warn }
* A client-side JavaScript [ example ]({{site.url}}{{site.baseurl}}/search-plugins/ubi/data-structures/) reference implementation that shows how to capture events and send those events to the OpenSearch UBI plugin.

<!-- vale off -->

| Explanation & Reference | Description
| :--------- | :------- |
| [UBI Request/Response Specification](https://github.com/o19s/ubi/) <br/> **References UBI Draft Specification X.Y.Z** | Schema standard for making UBI requests and responses |
| [UBI OpenSearch Schema Documentation]({{site.url}}{{site.baseurl}}/search-plugins/ubi/schemas/) | Documentation on the individual Query and Event stores for OpenSearch |
| [`query_id` Data Flow]({{site.url}}{{site.baseurl}}/search-plugins/ubi/query_id/) | How the `query_id` ties the search to results and user events |
| `query_id` Data Flow <!-- ({{site.url}}{{site.baseurl}}/search-plugins/ubi/query_id/) --> | To remove? |


| Tutorials & How-to Guides | Description
| :--------- | :------- |
| [UBI Plugin Admin]({{site.url}}{{site.baseurl}}/search-plugins/ubi/documentation/) | How to install and use the UBI Plugin |
| [ JavaScript client structures ]({{site.url}}{{site.baseurl}}/search-plugins/ubi/data-structures/) | Sample JavaScript structures for populating the Event store |
| [UBI SQL queries ]({{site.url}}{{site.baseurl}}/search-plugins/ubi/sql-queries/) | How to write analytic queries for UBI data in SQL |
| [UBI Dashboard]({{site.url}}{{site.baseurl}}/search-plugins/ubi/ubi-dashboard-tutorial/) | Teaches you how to build an OpenSearch dashboard with UBI data |
| [UBI Dashboard Tutorial]({{site.url}}{{site.baseurl}}/search-plugins/ubi/ubi-dashboard-tutorial/) | Teaches you how to build an OpenSearch dashboard with UBI data |
| ... | teaches how to do something |

<!-- vale on -->
{: .tip }
Documentation adapted using concepts from [Diátaxis](https://diataxis.fr/)
{: .tip }

6 changes: 6 additions & 0 deletions _search-plugins/ubi/query_id.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,16 @@
{% comment %}
# ***********
TODO: decide whether this page is necessary since schemas.md contains most of the same information
in easier terms
---
layout: default
title: UBI data flow
parent: User behavior insights
has_children: false
nav_order: 7
---
# ***********
{% uncomment %}

# Basic UBI flow
**Executive Summary**: Once a user performs search, that search is tied to a `query_id`. Then any following user events until the next search are logged and indexed by the search's `query_id`. If the user finds something of interest, that something's identifier (`object_id` or `key_value`) is logged in the event store with the `query_id`.
Expand Down
110 changes: 52 additions & 58 deletions _search-plugins/ubi/schemas.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,6 @@ UBI is not functional unless the links between the following fields are consiste

To summarize: the `query_id` signals the beginning of a `client_id`'s *Search Journey* every time a user queries the search index, the `action_name` tells us how the user is interacting with the query results within the application, and [`event_attributes.object.object_id`](#object_id) is referring to the precise query result that the user interacts with.

{% comment %}
### *************************
# TODO: rework this section with new parameter passing framework
### *************************
{% endcomment %}

## Important UBI roles
- **Search Client**: in charge of searching, and then recieving *objects* from some document index in OpenSearch.
(1, 2, *5* and 7, in following sections)
Expand Down Expand Up @@ -134,8 +128,8 @@ Since UBI manages the **UBI Queries** store, the developer should never have to
- `query_id` (events and queries)
&ensp; A unique ID of the query provided by the client or generated automatically. The same query text issued multiple times would generate different `query_id`.

- `client_id` (events)
&ensp; A user/client ID provided by the client application
- `client_id` (events and queries)
&ensp; A user/client ID provided by the client application

- `query_response_objects_ids` (queries)
&ensp; This is an array of the `object_id`'s. This *could* be the same id as the `_id` but is meant to be the externally valid id of document/item/product.
Expand All @@ -145,85 +139,85 @@ Since UBI manages the **UBI Queries** store, the developer should never have to
### 2) **UBI events**
This is the event store that the client side directly indexes events to, linking the event [`action_name`](#action_name), [`object_id`](#object_id)'s and [`query_id`](#query_id)'s together with any other important event information.
Since this schema is dynamic, the developer can add any new fields and structures (such as *user* information, *geo-location* information) at index time that are not in the current **UBI Events** [schema](https://github.com/o19s/opensearch-ubi/tree/2.14.0/src/main/resources/events-mapping.json):

<p id="application"> </p>

- `application`
<p id="application">

&ensp; (size 100) - name of the application tracking UBI events (e.g. `amazon-shop`, `ABC-microservice`)

<p id="action_name"> </p>

- `action_name`
<p id="action_name">

&ensp; (size 100) - any name you want to call your event such as `click`, `watch`, `purchase`, and `add_to_cart`, but one could map these to any common *JavaScript* events, or debugging events.
_TODO: How to formalize? A list of standard ones and then custom ones._
&ensp; _TODO: How to formalize? A list of standard ones and then custom ones._

<p id="query_id"> </p>

- `query_id`
<p id="query_id">
&ensp; (size 100) - ID for some query.
&ensp;Either the client provides this, or the `query_id` is generated at index time by the **UBI Plugin**.

&ensp; (size 100) - ID for some query. Either the client provides this, or the `query_id` is generated at index time by the **UBI Plugin**.

The `client_id` must be consistent in both the **UBI Queries** and **UBI Events** stores.
<p id="client_id"> </p>

- `client_id`
&ensp; A user/client ID provided by the client application
&ensp;The `client_id` must be consistent in both the **UBI Queries** and **UBI Events** indexes.

- `timestamp`:
&ensp; UTC-based, UNIX epoch time.

- `message_type`

&ensp; (size 100) - originally thought of in terms of ERROR, INFO, WARN, but could be anything else useful such as `QUERY` or `CONVERSION`.
Can be used to group `action_name` together in logical bins. _Thinking this should be backend logic in analysis_
&ensp; Can be used to group `action_name` together in logical bins.

- `message`

&ensp; (size 256) - optional text message for the log entry. For example, with a `message_type` of `INFO`, people might expect an informational or debug type text for this field, but a `message_type` of `QUERY`, we would expect the text to be more about what the user is searching on.

`event_attributes` has dynamic mapping, meaning if events are indexed with many custom fields, the index could bloat quickly with many new fields.
{: .warning}

- `event_attributes`'s structure that describes any important context about the event. Within it, it has 2 primary structures `position` and `object`, as well as being extensible to add anymore relevant, custom, information about the event can be stored such as timing informaiton, individual user or session information, etc.
- `event_attributes`'s
&ensp;structure that describes any important context about the event. Within it, it has 2 primary structures `position` and `object`, as well as being extensible to add anymore relevant, custom, information about the event can be stored such as timing informaiton, individual user or session information, etc.

Check failure on line 178 in _search-plugins/ubi/schemas.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _search-plugins/ubi/schemas.md#L178

[OpenSearch.Spelling] Error: informaiton. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: informaiton. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_search-plugins/ubi/schemas.md", "range": {"start": {"line": 178, "column": 255}}}, "severity": "ERROR"}

Check warning on line 178 in _search-plugins/ubi/schemas.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _search-plugins/ubi/schemas.md#L178

[OpenSearch.LatinismsElimination] Using 'etc.' is unnecessary. Remove.
Raw output
{"message": "[OpenSearch.LatinismsElimination] Using 'etc.' is unnecessary. Remove.", "location": {"path": "_search-plugins/ubi/schemas.md", "range": {"start": {"line": 178, "column": 308}}}, "severity": "WARNING"}
&ensp; Since this has a dynamic mapping, the index _could_ become bloated with many new fields
{: .warning}

The two primary structures in the `event_attributes`:
- **`event_attributes.position`** - structure that contains information on the location of the event origin, such as screen *x,y* coordinates, or the *n-th* object out of 10 results, ....
- **`event_attributes.position`**
&ensp; structure that contains information on the location of the event origin, such as screen *x,y* coordinates, or the *n-th* object out of 10 results, ....

Check failure on line 183 in _search-plugins/ubi/schemas.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _search-plugins/ubi/schemas.md#L183

[OpenSearch.SpacingPunctuation] There should be no space before and one space after the punctuation mark in 'x,y'.
Raw output
{"message": "[OpenSearch.SpacingPunctuation] There should be no space before and one space after the punctuation mark in 'x,y'.", "location": {"path": "_search-plugins/ubi/schemas.md", "range": {"start": {"line": 183, "column": 100}}}, "severity": "ERROR"}

- `event_attributes.position.ordinal`

&ensp; tracks the *n*th item within a list that a user could select, click (i.e. selecting the 3rd element could be event{`onClick, results[4]`})
&ensp; tracks the *n*th item within a list that a user could select, click (i.e. selecting the 3rd element could be event{`onClick, results[4]`})

Check failure on line 186 in _search-plugins/ubi/schemas.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _search-plugins/ubi/schemas.md#L186

[OpenSearch.Spelling] Error: th. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: th. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_search-plugins/ubi/schemas.md", "range": {"start": {"line": 186, "column": 27}}}, "severity": "ERROR"}

- `event_attributes.position.{x,y}`

&ensp; tracks x and y values, that the client defines
- `event_attributes.position.{x,y}`
&ensp; tracks x and y values, that the client defines

- `event_attributes.position.page_depth`

&ensp; tracks page depth of results
- `event_attributes.position.page_depth`
&ensp; tracks page depth of results

- `event_attributes.position.scroll_depth`

&ensp; tracks scroll depth of page results
- `event_attributes.position.scroll_depth`
&ensp; tracks scroll depth of page results

- `event_attributes.position.trail`

&ensp; text field for tracking the path/trail that a user took to get to this location
- `event_attributes.position.trail`
&ensp; text field for tracking the path/trail that a user took to get to this location

<p id="object_id">

- **`event_attributes.object`**, which contains identifying information of the object returned from the query that the user interacts with (i.e.: a book, a product, a post).
The `object` structure has two ways to refer to the object, with `object_id` being the id that links prior queries to this object:

- `event_attributes.object.internal_id` is a unique id that OpenSearch can use to internally to index the object, think the `_id` field in the indexes.
- `event_attributes.object.object_id`
&ensp; is the id that a user could look up and find the object instance within the **document corpus**. Examples include: `ssn`, `isbn`, `ean`. Variants need to be incorporated in the `object_id`, so for a t-shirt that is red, you would need SKU level as the `object_id`.
Initializing UBI requires mapping from the **Document Index**'s primary key to this `object_id`

- `event_attributes.object.object_id_field`

&ensp; indicates the type/class of object _and_ the ID field of the search index.

- `event_attributes.object.description`

&ensp; optional description of the object

<p id="object_id">

- `event_attributes.object.object_detail`

&ensp; optional text for further data object details

- *extensible fields*: any new fields by any other names in the `object` that one indexes will dynamically expand this schema to that use-case.
{: .warning}
- `event_attributes.object.object_id`
&ensp; is the id that a user could look up and find the object instance within the **document corpus**. Examples include: `ssn`, `isbn`, `ean`. Variants need to be incorporated in the `object_id`, so for a t-shirt that is red, you would need SKU level as the `object_id`.
Initializing UBI requires mapping from the **Document Index**'s primary key to this `object_id`
</p>

- `event_attributes.object.object_id_field`
&ensp; indicates the type/class of object _and_ the ID field of the search index.

- `event_attributes.object.description`
&ensp; optional description of the object

- `event_attributes.object.object_detail`
&ensp; optional text for further data object details

- *extensible fields*:
&ensp;any new fields by any other names in the `object` that one indexes will dynamically expand this schema to that use-case.
{: .warning}
8 changes: 4 additions & 4 deletions _search-plugins/ubi/sql-queries.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Although it's trivial on the server side to find queries with no results, we can
```sql
select
count(0)
from .ubi_log_queries
from ubi_queries
where query_response_objects_ids is null
order by user_id
```
Expand All @@ -38,7 +38,7 @@ Both client and server-side queries should return the same number.
```sql
select
message, count(0) Total
from .ubi_log_events
from ubi_events
where
action_name='on_search'
group by message
Expand Down Expand Up @@ -76,7 +76,7 @@ To make a pie chart like widget on all the most common events:
```sql
select
action_name, count(0) Total
from .ubi_log_events
from ubi_events
group by action_name
order by Total desc
```
Expand Down Expand Up @@ -119,7 +119,7 @@ logout|408
Find a search in the query log:
```sql
select *
from .ubi_log_queries
from ubi_queries
where query_id ='1065c70f-d46a-442f-8ce4-0b5e7a71a892'
order by timestamp
```
Expand Down
7 changes: 7 additions & 0 deletions _search-plugins/ubi/ubi-dashboard-tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,13 @@ has_children: false
nav_order: 7
---

{% comment %}
# **************
TODO: update images for new indices
# **************
{% endcomment %}


# Build an analytic dashboard for UBI
Whether you've been collecting user events and queries for a while, or [you uploaded some sample events](https://github.com/o19s/chorus-opensearch-edition/blob/main/katas/003_import_preexisting_event_data.md), now you're ready to visualize them in the dashboard using User Behavior Insights (UBI).

Check failure on line 17 in _search-plugins/ubi/ubi-dashboard-tutorial.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _search-plugins/ubi/ubi-dashboard-tutorial.md#L17

[Vale.Terms] Use 'OpenSearch' instead of 'opensearch'.
Raw output
{"message": "[Vale.Terms] Use 'OpenSearch' instead of 'opensearch'.", "location": {"path": "_search-plugins/ubi/ubi-dashboard-tutorial.md", "range": {"start": {"line": 17, "column": 137}}}, "severity": "ERROR"}

Expand Down

0 comments on commit 736bf69

Please sign in to comment.