Skip to content
This repository has been archived by the owner on Aug 13, 2024. It is now read-only.

Solve for multiple 'first sessions' with the same start_tstamp per user #10

Open
bill-warner opened this issue Sep 7, 2021 · 0 comments

Comments

@bill-warner
Copy link

Issue

In users_this_run we join together users_aggregates and users_sessions_this_run on start_tstamp. By joining on start_tstamp we attempt to pull info from the first session per user:

  FROM {{.scratch_schema}}.users_aggregates{{.entropy}} AS b

  INNER JOIN {{.scratch_schema}}.users_sessions_this_run{{.entropy}} AS a
    ON a.domain_userid = b.domain_userid
    AND a.start_tstamp = b.start_tstamp

There are rare cases however where a user can have multiple sessions with the same start_tstamp, which also happens to be their first session. This can result in duplicate domain_userids in the users_this_run table.

Proposed Fix

  • In 03-users-sessions-this-run step, join in start_tstamp from users_userids_this_run giving user_start_tstamp.
  • In 04-users-aggs step, add first_domain_sessionid. Use some arbitary selection if multiple first sessions with same start_tstamp i.e. max(case when start_tstamp = user_start_tstamp then domain_sessionid end) as first_domain_sessionid
  • In 06-users step, join a to b using first_domain_sessionid = domain_session_id, rather than start_tstamp to get first session details.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant