Web Traffic Analysis helps to gain insights about visitors and their behavior on our websites to deliver better results. BigQuery and Google Analytics provide you the information needed to improve the website, and make it the best it can be.
The following subdomains are included:
analytics.ga360_session_xfin order to filter by session_date, as there it is currently not available on hit level data (this is WIP in this issue)
Google Analytics stores unique identifiers such as client id, visitor id that requires ORANGE classification
CONCAT(visitor_id, CAST(visit_start_time AS STRING)
COUNT(DISTINCT visitor_id). Google Analytics takes all rows into account while calculating users (unlike sessions)
TOTAL_NEW_VISITS IS NOT NULL
HIT_TYPE = 'PAGE'
HIT_TYPE = 'PAGE'. The results for all pages might have approx. -0.2% difference compared to Google Analytics UI.
session_date: every report including date dimension should be based on
visitor_id: used to create session id and to calculate users
total_visits: the number of sessions. This value is
1for sessions with interaction events. The value is
nullif there are no interaction events in the session. Google Analytics shows interactive sessions only, this should be considered while querying nbr of sessions
hit_type: type of the hit that can be
page_path: URL path of the page, e.g.: /free-trial/
host_name: this is used to filter by the different subdomains (about.gitlab.com, docs.gitlab.com, etc)
The below SQL queries define the calculation of each main metric that is matching with Google Analytics UI results (All Data - Raw No Filters view)
Nbr of sessions, users and new users by day in Sept 2020
SELECT DATE_TRUNC('day',session_date)::date AS visit_date, COUNT(DISTINCT IFF(total_visits=1, CONCAT(visitor_id, CAST(visit_start_time AS STRING)), NULL)) AS interactive_Sessions, COUNT(DISTINCT visitor_id) AS users, COUNT(DISTINCT IFF(total_new_visits IS NOT NULL, visitor_id, NULL)) AS new_users FROM LEGACY.GA360_SESSION_XF WHERE DATE_TRUNC('day',session_date)::date >= '09/01/2020' AND DATE_TRUNC('day',session_date)::date <= '09/30/2020' GROUP BY 1 ORDER BY 1
Nbr of total and trial pageviews by day from Feb 2020
-- this subquery is needed in order to get the session_date for each unique session_id and link it to the hit level table WITH ga_unique_sessions AS ( SELECT DISTINCT(CONCAT(visitor_id, CAST(visit_start_time AS STRING))) AS session_id, SESSION_DATE FROM LEGACY.GA360_SESSION_XF -- total and trial pageview hits ), ga_pageview_hits AS ( SELECT CONCAT(visitor_id, CAST(visit_start_time AS STRING)) AS session_id, page_path, CASE WHEN page_path = '/free-trial/' THEN COUNT(1) ELSE 0 END AS trial_pageviews, COUNT(1) AS total_pageviews FROM LEGACY.GA360_SESSION_HIT WHERE HIT_TYPE = 'PAGE' GROUP BY 1,2 ), final AS( SELECT SESSION_DATE AS visit_date, SUM(trial_pageviews) AS trial_pageviews, SUM(total_pageviews) AS total_pageviews, COUNT(DISTINCT ga_pageview_hits.session_id,page_path ) AS unique_pageviews FROM ga_pageview_hits LEFT JOIN ga_unique_sessions ON ga_pageview_hits.session_id=ga_unique_sessions.session_id WHERE DATE_TRUNC('day',SESSION_DATE)::date >= '2/1/2020' GROUP BY 1 ) SELECT * FROM final
Google Analytics 360/ BigQuery data is based on
All Data - Raw No Filters view that is using
(GTM-08:00) Los Angeles Time zone. Google Analytics 360/BigQuery data is available from
19th Feb 2020 in Sisense. These are essential to know this while comparing your query results with Google Analytics 360 UI.
Each row in the Google Analytics BigQuery represents a single session. There are a lot of dimensions and metrics on Google Analytics UI, this schema can be a bit overwhelming on database level. The below picture represents 2 sessions that helps to get a better understanding of the BigQuery data structure:
To get more clarity about BigQuery export schema, take a look at the below interactive tree map:
There are 3 major BigQuery tables that are crucial to perform web traffic analysis: