Importing data into OpenRefine
Overview
Teaching: 10 min
Exercises: 5 minQuestions
How do I get data into OpenRefine?
Objectives
Successfully import data into OpenRefine
Importing data
If you haven’t already, at this point download doaj-article-sample.csv, which is a csv file. Make a note of the location you save the file.
What kinds of data files can I import?
There are several options for getting your data set into OpenRefine. You can upload or import files in a variety of formats including:
- TSV (tab-separated values)
- CSV (comma-separated values)
- Excel
- JSON (javascript object notation)
- XML
- Google Spreadsheet
Exercise 1: Create your first Open Refine project (using provided data, doaj-article-sample.csv https://goo.gl/285hFK)
To import the data for the exercises below, run OpenRefine. NOTE: If Open Refine does not open in a browser window, open your browser and type the address http://127.0.0.1:3333/ to take you to the Open Refine interface. Put up a red sticky if you can’t start Open Refine.
- Create project and open the data file.
- Make sure your imported data is set to
UTF-8, that the first row is used as the header, and that OpenRefine doesn’t automatically detect numbers and dates in your values.- Name and create your project.
Solution
- Locate the file which you have downloaded called
doaj-article-sample.csvClick
NextThe next screen gives you some options to ensure that the data gets imported into OpenRefine correctly. The options vary depending on the type of data you are importing.
In this case you need to:
- Set the
Character encodingtoUTF-8(Note: click in the cell and it will trigger a pop-up)- Ensure the first row is used to create the column headings by selecting
Parse 1 line(s) as column headers(This is defaulted)Make sure OpenRefine doesn’t try to automatically detect numbers and dates by leaving
Parse cell text into numbers, dates, ...deselectedOnce you are happy click
Create Project >>This will create the project and open it for you. Projects are saved as you work on them, there is no need to save copies as you go along.
To open an existing project in OpenRefine you can click Open Project from the main OpenRefine screen (in the left hand menu). When you click this, you will see a list of the existing projects and can click on a project’s name to open it.
Exercise 1A: Going Further
- Look at the other options on the Import screen - try changing some of these options and see how that changes the Preview and how the data appears after import.
- Do you have access to
JSONorXMLdata? If so the first stage of the import process will prompt you to select a ‘record path’ - that is the parts of the file that will form the data rows in the OpenRefine project.
Key Points
Use the ‘Create Project’ option to import data
You can control how data imports using options on the import screen