komex: Crowd-sourced Text Analysis


COVID-19 pandemic: Please note!
While we intend to hold the in-person courses on site, we stand ready to switch to on-line provision of these courses if the corona rules do not allow for presence teaching.

Course Structure
09.00-10.30h: Lecture
10.11-12.30h: Lab / group work
13.30-14.30h: Office hour / independent work (reading, exercises)

Regular teaching room with space for max. 20 participants considering corona rules.

The five-day in-person course provides you with the skills needed to design and conduct your own crowd-sourced text analysis project. Crowd-sourced text analysis allows for fast, affordable, valid, and reproducible online coding of very large numbers of statements by collecting repeated judgements by multiple (paid) non-expert coders. The course covers key concepts and ideas of crowd-sourced decision-making as well as quality criteria and pitfalls. The course focuses on two applications (simple and complex categorization tasks and the identification of positive and negative sentiment) that can be generalized to many contexts and supports students in developing their crowd-sourcing projects.

Sessions include:

  1. Introduction to crowd-sourced text analysis
  2. How to produce high quality data
  3. Application 1: Text categorization
  4. Application 2: Sentiment analysis
  5. Other applications, pitfalls, ethical concerns

Intended Learning Outcomes
At the end of the week, you will:

  • Be able to design, set up, implement, and evaluate your own online crowd-coding data project.
  • Be familiar with the core concepts underlying crowd-sourcing and the ‘wisdom of the crowd’ paradigm, as well as most recent applications to political science and social science questions.
  • Be able to calculate and critically reflect on different agreement-, reliability-, and trust scores.
  • Be able to craft and/or find suitable gold questions needed before, during, and after the analysis.
  • Be aware of potential pitfalls and limitations of crowd-coding and the best ways to address them.

Participants of this course may also participate in the virtual introduction to R course. While this may help with processing crowd-coding results, no prior R knowledge is needed..

Core Readings
Benoit, K., Conway, D., Lauderdale, B. E., Laver, M., & Mikhaylov, S. (2016). Crowd-sourced text analysis: Reproducible and agile production of political data. American Political Science Review, 110(2), 278-295.
Haselmayer, M., & Jenny, M. (2017). Sentiment analysis of political communication: combining a dictionary approach with crowdcoding. Quality & Quantity, 51(6), 2623-2646.
Horn, A. (2019). Can the online crowd match real expert judgments? How task complexity and coder location affect the validity of crowd‐coded data. European Journal of Political Research, 58(1), 236-247.
Barberá, P., Boydstun, A. E., Linn, S., McMahon, R., & Nagler, J. (2021). Automated text classification of news articles: A practical guide. Political Analysis, 29(1), 19-42.

430 EUR / Early bird 390 EUR
ECTS Credits 
Contact for Questions 
Seminar Room 
D 432
14.03.2022 09:00 to 14:30
15.03.2022 09:00 to 14:30
16.03.2022 09:00 to 14:30
17.03.2022 09:00 to 14:30
18.03.2022 09:00 to 14:30
While the course does not require specific knowledge, basic knowledge of empirical social research is assumed (e.g., what is reliability and how to work with spread sheet programs such as Excel). Prior experience with R or Stata is helpful, but not expected. Participants should bring their laptop and it is recommended that they bring some additional funds (50 EUR) to run their own crowd-coding tasks. Sockets will be provided.