9. Data Literacy and Ethics
Throughout the workshop we have been thinking together through some of the potential ethical concerns that might crop up as we proceed with our own projects. Just as we have discussed thus far, we hope that you see that data and ethics is an ongoing process throughout the lifespans of your project(s) and don’t often come with easy answers.
In this final activity, we would like for you to think about some of the potential concerns that might come up in the scenario below and think about how you might approach them.
You are interested in looking at the reactions to the democratic party presidential debates across time. You decided that you would use data from Twitter to analyze the responses. After collecting your data, you learned that your data has information from users who were later banned and included some tweets that were removed/deleted from the site.
As you work through this activity, you can definitely choose to do so with your partner! And we highly encourage you to do so! Different perspectives can offer us different insights to our own gaps and help us in thinking through our decisions. Be prepared to discuss your thoughts and ideas when we “meet” for our sessions.
If You Would Like Some Guiding Questions
- What are some reasons you might have for anonymizing (or not) your data?
- Would your approach differ if the responses were anonymized v. not?
- Would you remove the data in your initially downloaded corpus?
- How might you be aware of the differences in the corpus you downloaded v. the most current information?
- Would the number of tweets generated impact your decisions?
- How might where you are at in the stages of data (e.g. “raw” data v. “cleaned” data v. analysed) affect your choices?
Some Additional Exploration
- If you were collecting and/or analyzing data on folx in power, such as looking at the data from Tweets of Congress’ project, would that change the way you consider your answers to the previous questions?
- Current ethical guidelines from SAFE Lab at Columbia University have decided to alter the text of social media post to render it unsearchable. Why and when would you consider (or not) altering the collected tweets for publication?