Note that this site is in currently in version 1.0.0-alpha.   Some functionality may be limited.

5. Side Note on Data Structures: Tidy Data

There are different guidelines to the processing of data, one of which is the Tidy Data format, which follows these rules in structuring data:

  1. Each variable is in a column.
  2. Each observation is a row.
  3. Each value is a cell. Look back at our example of cats to see how they may or may not follow those guidelines. Important note: some data formats allow for more than one dimension of data (like the JSON structure below). How might that complicate the concept of Tidy Data?
{
    "Cats": [
            {
                "Calico": [
                    {
                        "firstName": "Smally",
                        "lastName":"McTiny"
                    },
                    {
                        "firstName": "Kitty",
                        "lastName": "Kitty"
                    }
                ],
                "Tortoiseshell": [
                    {
                        "firstName": "Foots",
                        "lastName":"Smith"
                    },
                    {
                        "firstName": "Tiger",
                        "lastName":"Jaws"
                    }
                ]
            }
        ]
}

While tiny data is a really popular method of structuring and organizing data, it is not the only way to do so. Depending on the type of data you have, it is also not always the best way to structure data.

Challenges for lesson 5

Assignment: Challenge: Tidy Data

  1. Looking at the moSmall.csv dataset, there are a couple of columns with nested information that don’t follow the rules of tidy data. Can you identify at least two of the columns that demonstrates this?
  2. Would you convert moSmall.csv to follow the tidy data format? Can you demonstrate how you would do so?

  1. Artist Role, Artist Display Name, Artist Display Bio, Artist Alpha Sort, Artist Nationality, Artist Begin Date, Artist End Date, or Classification.
  2. I will choose to convert to the tidy data format if I was interested in any of the variables listed above, so that it will be easier to analyse the entries. I will have to unnest the entries by separating the data into different columns. For example, if I am interested in understanding the type of roles that are predominantly held by non-cisgender men, I will unnest the column Artist Role as two columns (e.g. Artist 1 Role, Artist 2 Role) as illustrated in this example:

Comparison of moSmall after tidy format

Questions

Try again!

Tiny data format only allows one value per cell.

(Select one of the following)

Try again!

Do you think you can explain the rules of tidy data structuring?

This question has no answer. It is a reflective question.

Terms Used in Lesson

Can you define the terms below? Hover over each of them to read a preview of the definitions.

Tidy Data

Tidy data are a way of processing and organizing data in to a data structure that follows these rules: Each variable is in a column. Each observation is a row. Each value is a cell.

See term page

Workshop overall progress