-
Notifications
You must be signed in to change notification settings - Fork 60
Querying
This page explores all of the various facets of Google Analytics queries: metrics, dimensions, filters, segments, sorting, date ranges and report granularity. (Python API reference documentation. is also available.) But before we can query, we'll need to figure out which profile we'll want to request data from.
Once you've authenticated, you'll have access to one or more accounts. Each account can have multiple web properties (each web property has its own tracking code). Finally, each web property can have one or more profiles, also known as views in some places in the Google Analytics GUI.
You can browse through your accounts, webproperties and profiles in the Google Analytics web interface, but you can also explore your account in a Python REPL:
>>> accounts = ga.authenticate()
>>> accounts
[<googleanalytics.account.Account object: debrouwere.org (12933299)>,
...]
>>> account = accounts['debrouwere.org']
>>> account.webproperties
[<googleanalytics.account.WebProperty object: http://debrouwere.org (UA-12933299-1)>,
...]
>>> webproperty = account.webproperties['http://debrouwere.org']
>>> webproperty.profiles
[<googleanalytics.account.Profile object: debrouwere.org (26206906)>]
The default profile is available under WebProperty#profile
(vs. WebProperty#profiles
). When working with the API, you will usually want to work with a profile that has no automatic filters applied to it – or as few as possible. Then, just add whatever filters you'd like to your query.
It is also possible to navigate to a profile during authentication:
profile = ga.authenticate(
account='debrouwere.org',
webproperty='http://debrouwere.org',
profile='debrouwere.org',
)
profile = ga.authenticate(
account='debrouwere.org',
webproperty='http://debrouwere.org',
)
If you don't specify a profile, the default profile will be used. (Note that there's always a default profile, but no similar concept of a default account or webproperty exists. You will always have to specify an account and a web property.)
Metrics and dimensions can be specified using the internal ID, the slug or the (case-insensitive) human-readable name. These all work:
type | example |
---|---|
id | ga:goalCompletionsAll |
slug | goalCompletionsAll |
python slug | goal_completions_all |
case-insensitive slug | goalcompletionsall |
human-readable name | Goal Completions |
case-insensitive name | GOAL completions |
assert profile.core.metrics['pageviews'] == profile.core.metrics['ga:pageviews']
If you're not quite sure about the exact name of the metric or dimension you're interested in, take a look at the Dimensions and Metrics reference, the Query Explorer.
To see which metrics and dimensions are available to you in both the Core and the Real-Time API from Python:
print(profile.core.metrics)
print(profile.realtime.metrics)
print(profile.core.dimensions)
print(profile.realtime.dimensions)
googleanalytics columns
# find metrics, dimensions and segments containing `session` or `page`
googleanalytics columns session
googleanalytics columns page
Or just take a guess:
>>> metrics['goal completion']
KeyError: 'Cannot find goal completion among the available type. Did you mean: <googleanalytics.columns.Core object: Metric, Goal Completions (ga:goalCompletionsAll)>, <googleanalytics.columns.Core object: Metric, Goal 1 Completions (ga:goal1Completions)>, ...'
Note: it might be a smart idea to adopt a naming convention throughout your code. snake_case
is probably most idiomatic, e.g. prefer event_category
over of eventCategory
and ga:eventcategory
. (In a REPL session, though, use whatever you like!)
Date ranges can be specified with Python date objects or date strings.
from datetime import date
query.range('2015-01-01', '2015-01-31')
query.range(date(2015, 1, 1), date(2015, 1, 31))
googleanalytics query --start 2015-01-01 --stop 2015-01-31
You can specify an explicit start and stop date, or just a single date together with the number of days or months to count forward or backwards from that point.
query.range('2015-01-01', days=31)
query.range('2015-01-31', days=-31)
query.range(date(2015, 1, 1), months=1)
If you specify only a number of days of months, the date range will end yesterday.
query.range(days=-7)
If you specify only a start date, we will query for just that day -- end date will be assumed to be the same as start date.
query.range('2015-01-01')
By default, Google will return just one big total for each metric. If you'd like hourly, daily, weekly, monthly or yearly results you need to add some sort of time dimension. The easiest way to do this is through eponymous convenience methods:
query.hourly('2015-01-01', '2015-01-01')
query.yearly('2010-01-01', '2015-12-31')
query.total('2010-01-01', '2015-12-31')
googleanalytics query --interval daily
googleanalytics query --interval monthly
These methods work just like a regular Query#range
specification, but they add the appropriate time dimension so that you get back a separate result for each hour, day, week... whatever the granularity you asked for.
If you'd like to make it explicit that you're asking for just a big rollup, Query#total
is synonymous with Query#range
.
Note: depending on the size of your audience, Google Analytics' data can be a couple of hours behind. In addition, when comparing today's data to previous days, you won't have 24 hours of data in any case, so make sure you don't make unfair comparisons between today and previous days, and when doing programmatic roll-ups, don't schedule these for midnight – 6 AM is a better bet.
Note: monthly intervals are generally not recommended, as not all months have an equal amount of days.
# return a top 10
query.sort('pageviews', descending=True).limit(10)
# return the next 10 (similar to how LIMIT works in SQL)
query.limit(10, 10)
Note: Google Analytics uses 1-indexed rows. The first row is not row 0 but row 1.
Filters are applied at the event level (each individual pageview) whereas segments are applied later in the querying process and help you limit the data to only a certain kind of user or visit.
# limit pageviews count to just a part of your site
query.filter(pagepathlevel1='/stories/')
# don't include traffic to the about page
query.filter(pagepath__ne='/about')
# return only information for mobile users
query \
.metrics('pageviews', 'session duration') \
.segment('mobile traffic')
googleanalytics query --filter pagepathlevel1=/stories/
googleanalytics query --filter pagepath__ne=/about
googleanalytics query "pageviews,session duration" --segment "mobile traffic"
operator | description | example |
---|---|---|
eq | Equals |
query.filter(time_on_page=10) or query.filter(time_on_page__eq=10)
|
neq | Does not equal | query.filter(time_on_page__neq=10) |
gt | Greater than | query.filter(time_on_page__gt=10) |
lt | Less than | query.filter(time_on_page__lt=10) |
gte | Greater than or equal to | query.filter(time_on_page__gte=10) |
lte | Less than or equal to | query.filter(time_on_page__lte=10) |
operator | description | example |
---|---|---|
eq | Exact match | |
neq | Does not match | |
contains | Contains substring | |
ncontains | Does not contain substring | |
re | Contains a match for the regular expression | |
nre | Does not match regular expressions |
operator | description | example |
---|---|---|
eq | Equal to or exact match | |
neq | Not equal to or is not an exact match | |
lt | Less than | |
lte | Less than or equal to | |
gt | Greater than | |
gte | Greater than or equal to | |
between | Value is between the given range | |
any | Value is one of the listed values | |
contains | Contains substring | |
ncontains | Does not contain substring | |
re | Contains a match for regular expression | |
nre | Does not contain a match for regular expression |
For queries that should run faster, you may specify a lower precision, and for those that need to be more precise, a higher precision:
# faster queries
query.precision(0)
query.precision('FASTER')
# queries with the default level of precision (usually what you want)
query
query.precision(1)
query.precision('DEFAULT')
# queries that are more precise
query.precision(2)
query.precision('HIGHER_PRECISION')
googleanalytics query --precision 2
Unless you are absolutely sure you want or need this, don't bother setting a precision. The default precision is usually plenty fast, and (usually only marginally) higher precision is almost never worth the huge increase in query time.
In some cases, it can be useful to construct a query directly, without resorting to the convenience methods on the Query
object. Lower-level access is provided through the query.set
method -- you can pass set either a key and value, a dictionary with key-value pairs or you can pass keyword arguments. These will then be added to the raw query dictionary.
query = profile.core.query \
# keyword arguments
.set(metrics=['ga:pageviews']) \
.set(dimensions=['ga:yearMonth']) \
# key and value
.set('start_date', '2014-07-01') \
# dictionary of keys and values
.set({'end_date': '2014-07-05'})
You can always check what the raw query is going to be with the build method on queries. You can also access the raw query as well as raw report data in query.raw
and report.raw
respectively.
print(query.build())
from pprint import pprint
pprint(query.raw)
report = query.get()
pprint(report.raw)
If you'd like to just use the simplified OAuth2 functionality, that's possible too, using Google's service
interface on the Account
object.
accounts = ga.authenticate()
raw_query = {
'ids': 'ga:26206906',
'metrics': ['ga:pageviews'],
'dimensions': ['ga:yearMonth'],
'start_date': '2014-07-01',
'end_date': '2014-07-05',
}
accounts[0].service.data().ga().get(raw_query).execute()
You'll find more information about this interface in Google's own Analytics documentation for Python.
The Real Time Reporting API is currently in closed beta. However, you can request access by filling out a short form and will generally be granted access to the API within 24 hours.
The Real Time API is very similar to the Core API:
import googleanalytics
accounts = googleanalytics.authenticate(identity='me')
profile = accounts[0].webproperties[0].profiles[0]
# Core API
profile.core.query('pageviews').daily('3daysAgo').values
# Real Time API
profile.realtime.query('pageviews', 'minutes ago').values
googleanalytics query pageviews --realtime
The only caveat is that not all of the metrics and dimensions you're used to from the Core are supported. Take a look at the Real Time Reporting API reference documentation to find out more, or check out all available columns interactively through Profile#realtime.metrics
and Profile#realtime.dimensions
in Python.
use query.yml
file can get many queries in a time.
import googleanalytics as ga
import json, yaml
blueprint = ga.Blueprint(yaml.load(open('./examples/query.yml')))
profile = ga.authenticate(**blueprint.scope)
queries = blueprint.queries(profile)
- Authentication
- Querying
- Common Queries
- Working With Reports
- On The Command Line
- Python API documentation