DataLab Docs
  • What is DataLab?
  • Work
    • Creating a workbook
    • Sharing a workbook
    • Managing a workbook
    • Code cell
      • Working with packages
    • Text cell
      • Including images
    • SQL cell
      • SQL scenarios
      • Parameterize your SQL query
    • Explore Data cell
    • Chart cell
      • Configuring your chart
      • Pivot charts
      • Migration guide
    • AI Assistant
    • Version history
    • Scheduled runs
    • Hiding and showing cells
    • Long-running cells
    • Report view
    • Environment variables
    • Git and GitHub
  • Connect to Data
    • Connect your data to DataLab
    • Sharing a Data Source
    • Airtable
    • Amazon Athena
    • Amazon S3
    • Databricks
    • Dropbox
    • Files
    • Google Drive
    • Google BigQuery
    • Google Sheets
    • MariaDB
    • Microsoft SQL Server
    • MongoDB
    • MotherDuck
    • MySQL
    • Oracle Database
    • PostgreSQL
    • Redshift
    • Snowflake
    • Supabase
  • Guides
    • Publish a notebook
    • Importing data from flat files
    • Resizing plots
    • Show Bokeh and Pyvis plots
  • Resources
    • Pricing
    • Manage group settings
    • Reporting for Group Admins
    • DataLab for education
    • Technical requirements
    • Addressing slow code
    • Address R vulnerability
    • Get help
Powered by GitBook
On this page
  • Setup
  • Connect to Amazon S3 with Python
  • Going deeper

Was this helpful?

  1. Connect to Data

Amazon S3

PreviousAmazon AthenaNextDatabricks

Last updated 11 months ago

Was this helpful?

This article covers all the necessary steps to access files on Amazon S3, AWS's simple storage solution, from inside DataLab.

Setup

Create an Amazon S3 Bucket

You need to have an existing Amazon S3 bucket (which is the S3 equivalent of a file folder). If you don't have a bucket yet, follow the instructions in to create one.

Locate your access key credentials

To programmatically access resources on AWS you need to create an access key that consists of an Access key ID and a Secret key that has the right permissions for what you intend to do with the S3 bucket from inside DataLab.

If you don't yet have such an access key, follow the instructions in to create a new access key.

Create a new workbook

Create a new, empty workbook, or click to create a workbook in your own account that contains all the Python code you need to connect to Amazon S3.

Store account key credentials in DataLab

To use the account key credentials in this workbook, you need to store them in DataLab. To do so securely, you can use Environment variables.

In your new workbook, open "Environment > Environment variables..." in the menu bar, and click on "Add". You need to create a new set with 2 environment variables:

  • AWS_ACCESS_KEY_ID: Set this to the access key ID you got in the previous step.

  • AWS_SECRET_ACCESS_KEY: Set this to the secret key you got in the previous step.

  • Set a meaningful "Environment Variable Set Name", e.g. "AWS Access Key".

After filling in all fields, click "Create", "Next" and finally, "Connect". Your workbook session will restart, and AWS_ACCESS_KEY_ID and AWS_SECRET_KEY will now be available as environment variables in this workbook.

Connect to Amazon S3 with Python

You can now switch to Python to access the files. We'll use boto3 for this, which is the official Python package to create, configure, and manage AWS services, among which Amazon S3. It's already installed by default, so we only have to import it.

import boto3

To verify that everything was set up okay, let's list all the objects (files) in a specific S3 bucket. Make sure to update the AWS_BUCKET_NAME to a bucket that is available in your AWS account.

AWS_BUCKET_NAME = "datacamp-workspacedemo-workspacedemos3-prod" # change this

# Create the s3 resource
s3 = boto3.resource('s3')

# Load the bucket with specified name.
bucket = s3.Bucket(AWS_BUCKET_NAME)

# List all objects (files or folders) in the bucket
[ obj.key for obj in bucket.objects.all() ]

This should've worked! Note that you don't need to explicitly fetch and provide the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables; boto3 expects these environment variables to be there with those names, and loads them behind the scenes.

Going deeper

Other than listing files, boto3 allows you to download and upload files, manage buckets, etc. Consult the to learn more.

this AWS documentation article
this AWS documentation article
this link
boto3 documentation
Set up AWS environment variables in DataLab.