Importing data from flat files
Last updated
Last updated
DataLab features a library of curated datasets ready for you to analyze, but it is also possible to work with your own data files, be it CSV, Excel, text, geospatial, etc. data files. In this article, we'll assume that the data file you want to work with in DataLab is on your computer. The first step is uploading the file to DataLab, the next step is loading the data file into your notebook session.
Click on "File > show workbook files" in the menu.
To upload the data file, you have several options:
Click on "Add" in the top right of the file browser pane and click "Upload". Select the file from your local filesystem and confirm.
Click on "Browse files" in the dashed rectangular box in the file browser. Select the file from your local filesystem and confirm.
Drag the file from a file browser (Windows) or Finder (Mac) window into the file browser pane.
Now that you have uploaded to file to your workbook, you can load it into your session so you can start analyzing it.
First, copy the file path to your clipboard: in the workbook file browser, click the icon next to the data file and click "Copy path to clipboard".
Next, add a new code cell to your notebook file and add one of the following code snippets depending on the file format or the language you're using. Note that you may need to tweak this function call to deal with the specifics of your file (e.g. to skip rows, to specify the column names, etc). Replace example.csv
with the file path you copied to your clipboard in the first step.
File Type | Python | R |
---|---|---|
Finally, run the code cell. The data contained in the file will now be available as a dataframe df
that you can start analyzing.
Specifically for CSV files, there is a faster way: click the icon next to the file you want to import and select "Load as DataFrame". A new code cell will be added to your notebook with the appropriate Python or R code.
CSV
import pandas as pd pd.read_csv('example.csv')
import readr read_csv('example.csv')
Excel
import pandas as pd pd.read_excel('example.xlsx')
import readxl read_excel('example.xlsx')