Note that this site is in currently in version 1.0.0-alpha. Some functionality may be limited.
What is data? What counts as data? These are questions we will explore throughout the workshop.
Data is foundational to nearly all digital projects and often help us to understand and express our ideas and narratives. Hence, in order to do digital work, we should know how data is captured, constructed, and manipulated. In this workshop we will be discussing the basics of research data, in terms of its material, transformation, and presentation. We will also engage with the ethical dimensions of what it means to work with data, from collection to visualization to representation.
In this workshop, you will learn to:
Become familiar with the specific requirements of “high quality data”
Know the stages of data analysis
Learn about ethical issues around working with different types of data and analysis
Understand the difference between proprietary and open data formats
In this section, we want to introduce some central steps that you want to take before you get started with this workshop. For instance, there are workshop suggestions that you may want to engage with before you start this workshop, some required or recommended software installations, some files from external sources to download, etc.
Sometimes, we ask you to complete a short task on an external website before you start the workshop. This can be because we want you to work on a particular dataset that you download here, or because we want you to sign up for a service. Note that these links will take you out of our website, so we will open them in a new tab for you. Once you are done, you can close down the window and easily return here.
The dataset, moSmall.csv, will be used throughout the challenges in the workshop. To save the file to your local computer, right click on the “Download the workshop dataset” link and choose Save Link As.... Note: It is important to make sure your file is saved as a .csv file. Original dataset taken from The Metropolitan Museum of Art’s Creative Commons Zero.
This is a list of workshops that we suggest you engage with before you get started with this one. They are listed here as they contain some central concepts or tools that you may need before you can digest all the information you will be presented in this workshop.
This workshop makes reference to concepts from the Command Line workshop, and having some knowledge about how to use the command line will be central for anyone who wants to learn about how to handle and process data and data analysis.
Why am I learning this? Why does it matter? How will it help my project? Learning new digital skills is an investment of your valuable time, so it is reasonable to want to know—essentially—what will I get out of taking this workshop? The materials below help situate the skills you are about to learn within a larger context of how they are used, by whom, and to what ends.
Digital tools and the skills required to use them are part of our culture and, therefore, never neutral. Digital humanists and social scientists consider the ethical challenges and responsibilities of the tools and methods that they use. The following materials are designed to introduce you to issues you may want to consider as you learn this new skill and decide how to integrate it into your own research and teaching.
Big data projects often times requiring sharing data sets across different individuals and teams. In addition, to ensure that our work is reproducible and accountable, we may also feel inclined to share the data collected. As such, figuring out how to share such data is crucial in the project planning stage.
De-identified information can be reconstructed from piecemeal data found across different sources. When we consider what we are doing with the data we have collected, we also need to think about the possible re-identification of our participants.
Readings before you get started
The readings listed below situate what you are about to learn in cultural contexts, such as a particular humanities or social science field, the information or computer sciences, or popular discourse. The purpose of the readings is to provide a theoretical framework you can use to contextualize how you intend to use the skill or tool introduced in this workshop.
The book, Bit By Bit: Social Research in Digital Age, written by Matthew Salganik, approaches data and social research from a computational social science perspective. He also discusses the idea of “readymade” and “custommade” data alongside ethics.
Ten Simple Rules for Responsible Big Data Research explores some guidelines for addressing complex ethical issues that arise in any research project.
Projects related to Data Literacies
The following are sample projects that use the skill or tool (either implicitly or explicitly) that you are about to learn. Some skills that are foundational may seem not to lead to a specific project goal that you have in mind. You might be surprised to learn that the following projects depend on the skills learned in this workshop.
The Data for Public Good is a semester-long collaborative project led by CUNY graduate students. Each semester, a different public-interest dataset is explored to present information that is useful and informative to a public audience.
SAFElab, led by Dr. Desmond U. Patton, uses computational and social work approaches to understand the mechanisms of violence and work on prevention and intervention in violence that occur in neighborhoods and on social media.
Datasets related to Data Literacies
An introduction to what datasets are and what they do in our frontmatter section.
Di is currently a PhD student at CUNY, The Graduate Center (GC). They are also a GC Digital Initiatives Digital Fellow. Broadly, their work is on understanding the relationality between systems of oppression and the individual. They are interested in identities as discourses, and the ties between transnationalism and diasporas. Currently, they are working on several projects that is exploring identities as discourses, including in alt-right spaces, K-pop/Hallyu, and the experiences of queer and trans Asian (American) in the US.
As a GC Digital Initiatives Digital Fellow, they are also interested in understanding what ethics is within computational social science, digital humanities, and public humanities projects. They are also invested in bridging the gaps of technology literacy, especially within underserved communities.