Google Sheets
Last updated
Last updated
DataLab supports two ways to work with data in Google Sheets.
With the built-in Google Sheets connector and the SQL cell: by far the simplest way, but only supports reading data from Google Sheets, not writing back data to Google Sheets.
With the Google Sheets API and Environment variables: requires more setup, but gives you maximum flexibility.
Choose this option if you only want to read data from a Google Sheet to analyze in Python but are not looking to write the results of your calculations back to a Google Sheet.
When inside a workbook, click View > Databases, click the + icon, select Google Sheets and follow the on-screen instructions. You will be prompted to grant DataLab permission to access files in your Google Drive.
At some point a file picker will appear, where you can select the spreadsheet to access:
You can now query the Google Sheets file you just connected with a SQL cell. Select it in the Select source dropdown in the top left corner of a SQL cell, write a SQL query, and hit run:
By default, the result of your query is now available as a Pandas DataFrame (if you're working in Python) or a data.frame (if you're working in R) to continue your analysis. See SQL cell for more information.
There are several ways to refer to sheets. Suppose you want to query a sheet named Players
:
If you're trying to query a sheet that has spaces in it, you have to use quotes to refer to the sheet name.
You can use Google Sheets syntax to 'zoom in' on a particular selection of your spreadsheet, e.g. SELECT * FROM 'unicorn_companies!B2:C13'
The connector assumes that the first row contains the column names and will infer the Google Sheets schema from that.
Every Google Sheets file you connected corresponds to a 'database' in DataLab. This means:
You can query different sheets inside the same Google Sheets file with one connection.
You have to set up a new database for every Google Sheets file you want to access from inside DataLab.
Every workbook collaborator that wants to query a Google Sheet will need to authenticate themselves with Google; only Google users that have read access to the Google Sheet in question will be able to query the Google Sheet.
At the moment, you can only use the Google Sheets database for reading data from a Google Sheets file. Writing back results to Google Sheets is not (yet) supported.
Choose this option if you want to both read data from a Google Sheet as well as write the result of your calculations in Python back to the same or another Google Sheet.
Before you can run Python code to programmatically access data in Google Sheets, you need to:
Enable the Google Sheets API
Create a Google service account for programmatic access.
Ensure you have a Google spreadsheet with data that the service account can access.
Store the service account credentials in DataLab.
In what follows, all these steps are laid out in detail.
Make sure you’re signed in with your Google account.
Navigate to the Google API Library
Create a new project (if you haven't created one before) by clicking in the dropdown on the navbar.
Search for the “Google Sheets API” and enable it. This can take up to 10 seconds.
A Google service account is a special kind of account that can be used by programs to access Google resources like a spreadsheet. You will use this service account to connect DataLab to Google Sheets.
You only have to set up this Google service account once for every Google account that you want to access Google resources with; you can skip this step the next time.
Follow the steps below to create the service account and generate the necessary credentials:
In the “APIs and services” navbar on the left, go to the “Credentials tab”
Click on “+ CREATE CREDENTIALS” and select “Service Account”
In the first step (service account details), provide a name for the service account, e.g., “gsheet-operator” and click on “Create and continue”
In the second step, select the “Owner” role and click “Continue”
In the third step, don’t change anything and click “Done”
Once back on the Credentials page, click on the service account you just created.
Go to the Keys tab, click “Add Key > Create new key”
Choose “JSON”, then click “Create.” The JSON file with your service account credentials will automatically download to your computer.
You now have a service account and a JSON credentials file! Head over to your Downloads folder or wherever the JSON file was downloaded, open it up, and have a look. It should look something like this:
There’s a client_email
field in there: gsheets-operator@<google-project-name>.iam.gserviceaccount.com. Copy this email to your clipboard; you’ll need it in the next step.
Before you can analyze data in spreadsheets, you need to make sure you have a spreadsheet with data in it. If you don’t have a dataset lying around, you can start from a Google Sheet that we prepared for this tutorial: Open the example spreadsheet and once in the Google Sheet, click “File > Make a copy,” specify a name, and click “Make a copy.” If you already have a spreadsheet with data you want to analyze, just open up that spreadsheet.
Regardless of whether you’re working with a duplicate of the example spreadsheet or your own spreadsheet, you need to give the Google service account that you created in the first step access to the spreadsheet:
Click “Share”
Add the email of the service account that you copied to your clipboard in the previous step as an editor to the spreadsheet (e.g. gsheets-operator@<google-project-name>.iam.gserviceaccount.com)
Click “Send”
Alright, service account: check. Google spreadsheet with the right access: check.
Click this link to create a workbook in your own account that contains all the Python code you need to connect to Google Sheets.
To use the service account credentials JSON in your newly created workbook, you need to store it in DataLab. To do so securely, you can use Environment variables. In your new workbook, open "Environment > Environment variables..." in the menu bar, and click on "Add". Next:
Set Name
to GOOGLE_JSON
Set Value
to the full contents of the service account JSON file that was downloaded. You can do this by opening the JSON file, selecting all, copying it to your clipboard, and then pasting it in the Value field.
Set the “Environment Variable Set Name” to “Google Service Account” (this can be anything, really)
After filling in all fields, click “Create,” “Next,” and finally, “Connect.” Your workbook session will restart, and GOOGLE_JSON
will now be available as an environment variable in your workbook.
If you want to reuse the same services account credentials in another workbook, you don’t need to set up the environment variable again: you can connect the environment variable to your other workspaces as well.
Use the Python code snippets in the workbook that you can create from this link to install the necessary packages, read data from the Google spreadsheet, analyze it and write data back to the Google spreadsheet. All from Python!