Sample Queries
Table of Contents
Intro
Here are some sample queries to get you started.
All of these queries can be run against the Akamai mPulse RUM dataset.
NOTE: It is recommended you always use a DATE filter to minimize query size and cost.
Unique Dimension Values
To understand the unique values available for each dimension, you can run a DISTINCT query:
SELECT  DISTINCT DEVICEMODEL
FROM    `akamai-mpulse-rumarchive.rumarchive.rumarchive_page_loads`
WHERE   DATE = '2022-09-01'
ORDER BY DEVICEMODEL ASCDimension Popularity
You can calculate the frequency of Dimension values by doing a GROUP BY and calculating the SUM(BEACONS) count for that date.
Ensure you're not just calculating row COUNT(*) for each group, as each row is weighted differently (by BEACONS count).
For example, this shows the counts of each Device Type in the mPulse dataset for 2022-09-01.
(NOTE: Data is sampled so these counts should only be used for relative percentages).
SELECT  DEVICETYPE,
        SUM(BEACONS) AS BEACONCOUNT
FROM    `akamai-mpulse-rumarchive.rumarchive.rumarchive_page_loads`
WHERE   DATE = '2022-09-01'
GROUP BY DEVICETYPE
ORDER BY BEACONCOUNT DESCDimension Popularity (as a percentage of total)
You can calculate the relative popularity of Dimension values by doing a GROUP BY and calculating the SUM(BEACONS) count for that date, compared to the overall SUM(BEACONS) for all rows.
For example, this shows the relative popularity of Device Types in the mPulse dataset for 2022-09-01:
SELECT  DEVICETYPE,
        SUM(BEACONS) AS BEACONCOUNT,
        (
          SUM(BEACONS) /
            (SELECT SUM(BEACONS)
             FROM `akamai-mpulse-rumarchive.rumarchive.rumarchive_page_loads`
             WHERE DATE = '2022-09-01')
        ) AS BEACONPCT
FROM    `akamai-mpulse-rumarchive.rumarchive.rumarchive_page_loads`
WHERE   DATE = '2022-09-01'
GROUP BY DEVICETYPE
ORDER BY BEACONCOUNT DESCPage Load Time by Country
Using the PERCENTILE_APPROX function, you can calculate the approximate percentile (e.g. median) for a set of data.
NOTE: When using the PERCENTILE_APPROX function, it may run out of memory when run in BigQuery.  It will often do this if you're executing a query over all of the data for one day (e.g. no filters have been applied).
The work around is shown below, to limit the dataset via WHERE filters, or the following sample which uses a subquery to reduce memory pressure.
SELECT  COUNTRY,
        SUM(BEACONS) AS BEACONCOUNT,
        `akamai-mpulse-rumarchive.rumarchive.PERCENTILE_APPROX`(
          ARRAY_AGG(PLTHISTOGRAM),
          [0.50],
          100,
          false) as MEDIAN
FROM    `akamai-mpulse-rumarchive.rumarchive.rumarchive_page_loads`
WHERE   DATE = "2022-09-01"
    AND BEACONTYPE = 'page view'
    AND USERAGENTFAMILY = 'Chrome'
GROUP BY COUNTRY
ORDER BY SUM(BEACONS) DESCPage Load Time by Country (using a Subquery)
Using the PERCENTILE_APPROX function, you can calculate the approximate percentile (e.g. median) for a set of data.
If you are querying an entire day's worth of data, the PERCENTILE_APPROX may run out of memory in BigQuery.
One workaround is to pre-aggregate the data further by using the COMBINE_HISTOGRAMS function against a column with high cardinality (e.g. COUNTRY or DEVICEMODEL), before issuing the final query against the dimensions you care about.
SELECT  PROTOCOL,
        SUM(BEACONS) AS BEACONS,
        `akamai-mpulse-rumarchive.rumarchive.PERCENTILE_APPROX`(
          ARRAY_AGG(PLTHISTOGRAM),
          [0.0, 0.25, 0.50, 0.75, 0.90, 0.95, 1.0],
          100,
          false) AS PERCENTILES
FROM    (
    SELECT  COUNTRY,
            PROTOCOL,
            SUM(BEACONS) AS BEACONS,
            `akamai-mpulse-rumarchive.rumarchive.COMBINE_HISTOGRAMS`(ARRAY_AGG(PLTHISTOGRAM)) AS PLTHISTOGRAM
    FROM    `akamai-mpulse-rumarchive.rumarchive.rumarchive_page_loads`
    WHERE   DATE = '2022-09-01'
    GROUP BY COUNTRY, PROTOCOL
) AS subquery
GROUP BY PROTOCOL
ORDER BY COUNT(*) DESC