Notes: Introduction to OpenRefine

Overview

Teaching: 15 min
Exercises: 0 min
Questions
  • What is OpenRefine? What can it do?

Objectives
  • Explain what the OpenRefine software does

  • Explain how the OpenRefine software can help work with data files

What is OpenRefine

A tool for working with ‘messy’ data

???


Open Refine can help you:

???

Some common scenarios might be:


For example:

Data you have Desired data
1st January 2014 2014-01-01
01/01/2014 2014-01-01
Jan 1 2014 2014-01-01
2014-01-01 2014-01-01

???

Where you have a list of names or terms that differ from each other but refer to the same people, places or concepts. For example:

Data you have Desired data
London London
London] London
London,] London
london London

Where you have several bits of data combined together in a single column, and you want to separate them out into individual bits of data with one column for each bit of the data.

Address parts in a single field:

‘University of Wales, Llyfrgell Thomas Parry Library, Llanbadarn Fawr, ABERYSTWYTH, Ceredigion, SY23 3AS, United Kingdom’

“University of Wales”, “Llyfrgel Thomas Parry Library”, “Llanbadarn Fawr”, “ABERYSTWYTH”, “Ceredigion”, “SY23 3AS”, “United Kingdom”


Where you want to add to your data from an external data source:

Data you have Date of Birth from VIAF (Virtual International Authority File) Date of Death from VIAF (Virtual International Authority File)
Braddon, M. E. (Mary Elizabeth) 1835 1915
Rossetti, William Michael 1829 1919
Prest, Thomas Peckett 1810 1879

Key Points

  • OpenRefine is ‘a tool for working with messy data’

  • OpenRefine works best with data in a simple tabular format

  • OpenRefine can help you split data up into more granular parts

  • OpenRefine can help you match local data up to other data sets

  • OpenRefine can help you enhance a data set with data from other sources