Posted on May 5th, 2020
We are excited to announce the launch our new video series, “Python in a Marketing World,” hosted by Ads Operations Analyst Tanner Riebel. The series provides a jumpstart for marketers interested in learning Python.
As companies continue to emphasize data-driven decision making, martech skills have become increasingly helpful for marketers to add to their skillset. For serious data-wranglers, Python enables marketers to quickly analyze large datasets, create on-the-fly visualizations, and collate data from multiple channels. Being able to do this kind of analysis outside of proprietary analytics packages, and accessing channel data via their APIs, gives marketers the opportunity to spot trends, market changes, roll out predictive models, and more.
While not intended to be comprehensive Python or cloud data analysis courses, the series will provide quick scripts and tips for those marketers expanding their skills and looking for use cases that can be easily implemented.
Episode 1: “Reporting With The Google Analytics API” includes:
- A detailed walkthrough of a script for pulling Google Analytics data
- Example script available for download so you can easily get started pulling your own data
- Additional resources highlighted to help you navigate the Google Analytics API
Watch the video, or follow along with the transcript posted below. Be sure to subscribe to our YouTube channel to catch the next episode, where Tanner will demonstrate how to use Python with the Google AdWords API.
Read Episode 1: “Reporting with the Google Analytics API”
Tanner: Hi everyone. My name is Tanner Riebel. I’m an AdOps analyst at AIMCLEAR, and today I’m gonna be walking you through reporting with the Google Analytics API utilizing Python. As far as the skill level for this video, it’s really for all levels, but it will mainly be focusing on the beginner level. The majority of the time we’ll just be walking through the actual script outlining why things are formatted.
The end goal is really to make this as easily transferable to your systems as possible so you can use this script and get running really quickly. We’ll actually be providing this script for you to just change it to your systems. And while I’m going through it, I’ll highlight things that you’ll need to change for it to work for what you’re running.
Before actually jumping into the script, I wanted to highlight a couple of reasons why you would use this API, or really just any API in general from at least reporting standpoint. So, main reason being just it removes any manual process, especially when you have multiple accounts, like if you’re pulling this data, daily for sure, but even weekly, it can be a huge manual process and take lot of time.
The API allows you to build this script, and actually once it’s built, all you have to do is keep running it or make sure you’re running it and the data will be updated. Once it’s actually built, it basically takes no time to run it after you have it built. So that’s really the main reason is just a time saving.
As far as using it for ad hoc analysis or one-off analysis, that’s not what, at least from a reporting standpoint, that’s not what this is really good for. It’s more for when you’re pulling this consistent data and you’re reporting on it routinely. The language we’ll be using is Python and the text editor I use is Sublime, just for some background. And then also I’ll be providing two output examples within the script: First being CSV, and the second being Google BigQuery, which is what we actually use at AIMCLEAR quite often, and I’ll show you a final output of what it would look like in BigQuery.
So, this is the data that we’re pulling in. This is all the dimensions and metrics that we have, and then a preview of the data, which is just test data, but this is how it would look, final output and BigQuery.
So yeah, with that, let’s just get into the script. I won’t be going line by line here. We’ll really just be reporting or pointing out the features of it, and where you need to change things to make it work for yourself. We’ll start here at line 12, setting up authentication to actually be able to grab the data in your system. First is setting up the credentials and you can see I have it specified here. You want to highlight the file path to the service account, and that will be a json file. If you don’t have a service account created, I’ll pop up the resource and provide it as well of the detailed walkthrough of how to create this.
Basically, the service account is just connecting your Google Analytics account and giving it the access it needs to pull this data. This is really detailed walkthrough steps you need to do to get there. If you don’t have the Google API console set up, then this link will actually provide you a step back further to set that up.
Once you have that service account set up, then we’re just outlining the versions we’re running. We’re also outlining a couple of date variables that we’ll use later throughout the script. And that brings us to our first function in line 21. Throughout the whole script there’ll be three functions.
You’ll see the first two here are really just setting up for the last function, which is our main function. So, these first two functions here, we actually won’t need to change anything in these in order for the script to run. Really the main function is where we actually input different things that we would like to have ran through the script.
So, this analytics_api_query basically builds a query pool based on how the Google Analytics API needs to have it be read. We’re outlining things like start date, end date, metrics, and dimensions, which we’ll also outline in our main function later when we call on this function.
And the next one, analytics_get_report. This is basically just setting up to actually grab the output of data and construct it. So, how you can pull this data basically. Well, they both really pull data. This is really the actual grabbing of the data and putting it someplace is this analytics_get _report.
And that brings us to our main function, which was really the bulk of it. So, I’ll start at the top here and the top line 60, starting out, we’re just creating an empty data frame. This is this base_df is where your finalized data will sit. And then the next we’re outlining our columns, which is actually a different variable. We’re calling on traffic columns, which we have outlined as a list at the bottom. So, this is the actual data we are pulling. The dimensions and metrics that we have outlined here.
The next is just specifying the start date. So, we’re pulling all the way from 2019 and then formatting the data as well as we need it to be. And then, this is really important here and if you’re messing around with the dimensions and metrics that you’re pulling, this is where it’s important to also change here, if you’re pulling more or less data or more or less dimensions and metrics. So, this is formatting it again to how the Google Analytics API needs to read it. And it’s referencing our columns variable. Basically, this logic here is saying the first seven of these columns, which is again highlighted down here.
Our dimensions, and I have each of them highlighted as dimensions here or commented as dimensions. And then same thing for metrics, except now it’s everything after the seventh will be our metrics. So that’s the logic. If you switch, if for example, if you deleted one, then you would need to switch these numbers here to have it aligned to what it needs to be.
Then we’re calling on one of our analytics_get_report function to actually get the results and then putting it into a data frame. Really, lines 75 through 80 is just formatting. You won’t need to change anything here. We’ll just skip that and go to the output. So, I have three different examples of our output.
The first one being PowerShell, so this is just when you’re actually running the script. This will print out the head of the data frame or the data that we’re running. The next one is a CSV output. So, this is what you’ll have to change for it to actually work where we’re outlining the file path right here, which is going to my desktop.
And then the file name, example-export-GA. So, you’ll have to change that. And if you don’t want to run it to a CSV, you can always just comment it out here as well. Last output will be BigQuery. So, I showed you the finalized example of how it looks like in BigQuery.
A couple of things you have to outline to have it actually work and are really important is first, line 90, as specifying the dataset name. It will be the raw data separated by a period, and then the table name. So, I’ll show you where that would, as far as the dataset name, really, that would be this raw data. And it’s also important to create, before you run the script, you need to create the data set, so you need to just create an empty and have it named raw_data or whatever you would like it to be called.
And then the table name you could see, I have it as specified as well. And then also important is I have INSERT_PROJECT_ID here. Going back to BigQuery, when you hover over this, this is the project name, but in the black here you’ll see client-test-that random numbers below it. That’s the project ID and that’s what you need, not the project name.
And then really the only other important part of this BigQuery output is you have a couple options for if the table exists. This is specifying that what we have is going to replace the data. So, if this table exists inside BigQuery, it will replace this data or this table with whatever our script is running.
Another one that’s popular also is append. Which will, if the table exists, it will just append the data of the script that’s running to the bottom of the table. It won’t actually replace it. And, yeah, I think that’s pretty much that. That brings us to the end of the script. We already went over the different traffic columns. A really good resource here and I’ll actually pull up, for dimensions and metrics if you want to pull different things is this URL here and I’ll provide that as well. But it basically gives you an idea of the different names for certain things that you want to be pulled. Cause it’s very specific in the name that you need to call upon. And then there’s also combinations of dimensions and metrics that you can’t pull together.
So, that gives you another idea of how you can pull if you want to pull different things from the API that I have outlined in the script already. And then finishing off, so that’s the main was our last function. And then line 22 is literally just calling on that main script. And you’ll see here in order to call on the main script, you need to outline the Google Analytics view ID for your account.
I believe that outlines everything I wanted to cover. if you want to transfer this to your system, you can just find that there’s only about two or three things that you need to change, mainly towards authentication, to make it work for your system.
And I also wanted to highlight, that’s a base script to get you kind of a jumpstart and get you rolling fast. And, you can provide variations or additions to the script to make it be exactly what you want it to be. And one example I’m going to highlight here is using a loop and all you have to do is change the main function and most of the time for anything that you want to pull.
So, we’re actually specifying in the main function now that we want to pull one, two, three, it looks like seven different reports. These are all different reports that we want to run. And mainly what I mean by different reports is different dimensions and metrics. You can see this loop is running through and pulling all those different reports and we already have devices, columns, highlights. So for the devices report, we want to pull dimensions, metrics from the devices columns, and all these reports you can see have different dimensions and metrics that we want to pull and devices once, as specific for device data that we would like to obtain.
That’s just an example of how you can build upon this script and just make it be what you want it to be. And I believe I have everything highlighted here. Everything that I wanted to highlight. Like I said, we’ll be providing the script so you can use it and get started on your own and, and feel free to reach out if you have any questions.
Thanks for watching.