Sample Queries

Table of Contents

Intro

Here are some sample queries to get you started.

All of these queries can be run against the Akamai mPulse RUM dataset.

NOTE: It is recommended you always use a DATE filter to minimize query size and cost.

Unique Dimension Values

To understand the unique values available for each dimension, you can run a DISTINCT query:

SELECT  DISTINCT DEVICEMODEL
FROM `akamai-mpulse-rumarchive.rumarchive.rumarchive_page_loads`
WHERE DATE = '2022-09-01'
ORDER BY DEVICEMODEL ASC

Dimension Popularity

You can calculate the frequency of Dimension values by doing a GROUP BY and calculating the SUM(BEACONS) count for that date.

Ensure you're not just calculating row COUNT(*) for each group, as each row is weighted differently (by BEACONS count).

For example, this shows the counts of each Device Type in the mPulse dataset for 2022-09-01.

(NOTE: Data is sampled so these counts should only be used for relative percentages).

SELECT  DEVICETYPE,
SUM(BEACONS) AS BEACONCOUNT
FROM `akamai-mpulse-rumarchive.rumarchive.rumarchive_page_loads`
WHERE DATE = '2022-09-01'
GROUP BY DEVICETYPE
ORDER BY BEACONCOUNT DESC

Dimension Popularity (as a percentage of total)

You can calculate the relative popularity of Dimension values by doing a GROUP BY and calculating the SUM(BEACONS) count for that date, compared to the overall SUM(BEACONS) for all rows.

For example, this shows the relative popularity of Device Types in the mPulse dataset for 2022-09-01:

SELECT  DEVICETYPE,
SUM(BEACONS) AS BEACONCOUNT,
(
SUM(BEACONS) /
(SELECT SUM(BEACONS)
FROM `akamai-mpulse-rumarchive.rumarchive.rumarchive_page_loads`
WHERE DATE = '2022-09-01')
) AS BEACONPCT
FROM `akamai-mpulse-rumarchive.rumarchive.rumarchive_page_loads`
WHERE DATE = '2022-09-01'
GROUP BY DEVICETYPE
ORDER BY BEACONCOUNT DESC

Page Load Time by Country

Using the PERCENTILE_APPROX function, you can calculate the approximate percentile (e.g. median) for a set of data.

NOTE: When using the PERCENTILE_APPROX function, it may run out of memory when run in BigQuery. It will often do this if you're executing a query over all of the data for one day (e.g. no filters have been applied).

The work around is shown below, to limit the dataset via WHERE filters, or the following sample which uses a subquery to reduce memory pressure.

SELECT  COUNTRY,
SUM(BEACONS) AS BEACONCOUNT,
`akamai-mpulse-rumarchive.rumarchive.PERCENTILE_APPROX`(
ARRAY_AGG(PLTHISTOGRAM),
[0.50],
100,
false) as MEDIAN
FROM `akamai-mpulse-rumarchive.rumarchive.rumarchive_page_loads`
WHERE DATE = "2022-09-01"
AND BEACONTYPE = 'page view'
AND USERAGENTFAMILY = 'Chrome'
GROUP BY COUNTRY
ORDER BY SUM(BEACONS) DESC

Page Load Time by Country (using a Subquery)

Using the PERCENTILE_APPROX function, you can calculate the approximate percentile (e.g. median) for a set of data.

If you are querying an entire day's worth of data, the PERCENTILE_APPROX may run out of memory in BigQuery.

One workaround is to pre-aggregate the data further by using the COMBINE_HISTOGRAMS function against a column with high cardinality (e.g. COUNTRY or DEVICEMODEL), before issuing the final query against the dimensions you care about.

SELECT  PROTOCOL,
SUM(BEACONS) AS BEACONS,
`akamai-mpulse-rumarchive.rumarchive.PERCENTILE_APPROX`(
ARRAY_AGG(PLTHISTOGRAM),
[0.0, 0.25, 0.50, 0.75, 0.90, 0.95, 1.0],
100,
false) AS PERCENTILES
FROM (
SELECT COUNTRY,
PROTOCOL,
SUM(BEACONS) AS BEACONS,
`akamai-mpulse-rumarchive.rumarchive.COMBINE_HISTOGRAMS`(ARRAY_AGG(PLTHISTOGRAM)) AS PLTHISTOGRAM
FROM `akamai-mpulse-rumarchive.rumarchive.rumarchive_page_loads`
WHERE DATE = '2022-09-01'
GROUP BY COUNTRY, PROTOCOL
) AS subquery
GROUP BY PROTOCOL
ORDER BY COUNT(*) DESC