-
Notifications
You must be signed in to change notification settings - Fork 60
Querying
This page explores all of the various facets of Google Analytics queries: metrics, dimensions, filters, segments, sorting, date ranges and report granularity. (Python API reference documentation. is also available.) But before we can query, we'll need to figure out which profile we'll want to request data from.
Once you've authenticated, you'll have access to one or more accounts. Each account can have multiple web properties (each web property has its own tracking code). Finally, each web property can have one or more profiles, also known as views in some places in the Google Analytics GUI.
You can browse through your accounts, webproperties and profiles in the Google Analytics web interface, but you can also explore your account in a Python REPL:
>>> accounts = ga.authenticate()
>>> accounts
[<googleanalytics.account.Account object: debrouwere.org (12933299)>,
...]
>>> account = accounts['debrouwere.org']
>>> account.webproperties
[<googleanalytics.account.WebProperty object: http://debrouwere.org (UA-12933299-1)>,
...]
>>> webproperty = account.webproperties['http://debrouwere.org']
>>> webproperty.profiles
[<googleanalytics.account.Profile object: debrouwere.org (26206906)>]
The default profile is available under WebProperty#profile
(vs. WebProperty#profiles
). When working with the API, you will usually want to work with a profile that has no automatic filters applied to it – or as few as possible. Then, just add whatever filters you'd like to your query.
It is also possible to navigate to a profile during authentication:
profile = ga.authenticate(
account='debrouwere.org',
webproperty='http://debrouwere.org',
profile='debrouwere.org',
)
profile = ga.authenticate(
account='debrouwere.org',
webproperty='http://debrouwere.org',
)
If you don't specify a profile, the default profile will be used. (Note that there's always a default profile, but no similar concept of a default account or webproperty exists. You will always have to specify an account and a web property.)
Metrics and dimensions can be specified using the internal ID, the slug or the (case-insensitive) human-readable name. These all work:
type | example |
---|---|
id | ga:goalCompletionsAll |
slug | goalCompletionsAll |
case-insensitive slug | goalcompletionsall |
human-readable name | Goal Completions |
case-insensitive name | GOAL completions |
assert profile.core.metrics['pageviews'] == profile.core.metrics['ga:pageviews']
If you're not quite sure about the exact name of the metric or dimension you're interested in, take a look at the Dimensions and Metrics reference, the Query Explorer.
To see which metrics and dimensions are available to you in both the Core and the Real-Time API from Python:
print(profile.core.metrics)
print(profile.realtime.metrics)
print(profile.core.dimensions)
print(profile.realtime.dimensions)
Or just take a guess:
>>> metrics['goal completion']
KeyError: 'Cannot find goal completion among the available type. Did you mean: <googleanalytics.columns.Core object: Metric, Goal Completions (ga:goalCompletionsAll)>, <googleanalytics.columns.Core object: Metric, Goal 1 Completions (ga:goal1Completions)>, ...'
Date ranges can be specified with Python date objects or date strings.
from datetime import date
query.range('2015-01-01', '2015-01-31')
query.range(date(2015, 1, 1), date(2015, 1, 31))
You can specify an explicit start and stop date, or just a single date together with the number of days or months to count forward or backwards from that point.
query.range('2015-01-01', days=31)
query.range('2015-01-31', days=-31)
query.range(date(2015, 1, 1), months=1)
If you specify only a number of days of months, the date range will end yesterday.
query.range(days=-7)
If you specify only a start date, we will query for just that day -- end date will be assumed to be the same as start date.
query.range('2015-01-01')
By default, Google will return just one big total for each metric. If you'd like hourly, daily, weekly, monthly or yearly results you need to add some sort of time dimension. The easiest way to do this is through eponymous convenience methods:
query.hourly('2015-01-01', '2015-01-01')
query.yearly('2010-01-01', '2015-12-31')
These methods work just like a regular Query#range
specification, but they add the appropriate time dimension so that you get back a separate result for each hour, day, week... whatever the granularity you asked for.
Note: depending on the size of your audience, Google Analytics' data can be a couple of hours behind. In addition, when comparing today's data to previous days, you won't have 24 hours of data in any case, so make sure you don't make unfair comparisons between today and previous days, and when doing programmatic roll-ups, don't schedule these for midnight – 6 AM is a better bet.
# return a top 10
query.sort('pageviews', descending=True).limit(10)
# return the next 10 (similar to how LIMIT works in SQL)
query.limit(10, 10)
Note: Google Analytics uses 1-indexed rows. The first row is not row 0 but row 1.
Filters are applied at the event level (each individual pageview) whereas segments are applied later in the querying process and help you limit the data to only a certain kind of user or visit.
# limit pageviews count to just a part of your site
query.filter(pagepathlevel1='/stories')
# don't include traffic to the about page
query.filter(pagepath__ne='/about')
# return only information for mobile users
query \
.metrics('pageviews', 'session duration') \
.segment('mobile traffic')
For queries that should run faster, you may specify a lower precision, and for those that need to be more precise, a higher precision:
# faster queries
query.range('2014-01-01', '2014-01-31', precision=0)
query.range('2014-01-01', '2014-01-31', precision='FASTER')
# queries with the default level of precision (usually what you want)
query.range('2014-01-01', '2014-01-31')
query.range('2014-01-01', '2014-01-31', precision=1)
query.range('2014-01-01', '2014-01-31', precision='DEFAULT')
# queries that are more precise
query.range('2014-01-01', '2014-01-31', precision=2)
query.range('2014-01-01', '2014-01-31', precision='HIGHER_PRECISION')
Unless you are absolutely sure you want or need this, don't bother setting a precision. The default precision is usually plenty fast, and (usually only marginally) higher precision is almost never worth the huge increase in query time.
In some cases, it can be useful to construct a query directly, without resorting to the convenience methods on the Query
object. Lower-level access is provided through the query.set
method -- you can pass set either a key and value, a dictionary with key-value pairs or you can pass keyword arguments. These will then be added to the raw query dictionary.
query = profile.core.query() \
.set(metrics=['ga:pageviews']) \
.set(dimensions=['ga:yearMonth']) \
.set('start_date', '2014-07-01') \
.set({'end_date': '2014-07-05'})
You can always check what the raw query is going to be with the build method on queries. You can also access the raw query as well as raw report data in query.raw
and report.raw
respectively.
print(query.build())
from pprint import pprint
pprint(query.raw)
report = query.get()
pprint(report.raw)
If you'd like to just use the simplified OAuth2 functionality, that's possible too, using Google's service
interface on the Account
object.
accounts = ga.authenticate()
raw_query = {
'ids': 'ga:26206906',
'metrics': ['ga:pageviews'],
'dimensions': ['ga:yearMonth'],
'start_date': '2014-07-01',
'end_date': '2014-07-05',
}
accounts[0].service.data().ga().get(raw_query).execute()
You'll find more information about this interface in Google's own Analytics documentation for Python.
The Real Time Reporting API is currently in closed beta. However, you can request access by filling out a short form and will generally be granted access to the API within 24 hours.
The Real Time API is very similar to the Core API:
import googleanalytics
accounts = googleanalytics.authenticate(identity='me')
profile = accounts[0].webproperties[0].profiles[0]
# Core API
profile.core.query('pageviews').daily('3daysAgo').values
# Real Time API
profile.realtime.query('pageviews', 'minutes ago').values
The only caveat is that not all of the metrics and dimensions you're used to from the Core are supported. Take a look at the Real Time Reporting API reference documentation to find out more, or check out all available columns interactively through Profile#realtime.metrics
and Profile#realtime.dimensions
in Python.
- Authentication
- Querying
- Common Queries
- Working With Reports
- On The Command Line
- Python API documentation