komex: Crowd-sourced Text Analysis


COVID-19 pandemic: Please note!
As announced the in-person courses will be held on site. Please read carefully the current information about the Corona regulations!

Course Structure
09.00-10.30h: Lecture
10.11-12.30h: Lab / group work
13.30-14.30h: Office hour / independent work (reading, exercises)

Regular teaching room with space for max. 20 participants considering corona rules.

The five-day in-person course provides you with the skills needed to design and conduct your own crowd-sourced text analysis project. Crowd-sourced text analysis allows for fast, affordable, valid, and reproducible online coding of very large numbers of statements by collecting repeated judgements by multiple (paid) non-expert coders. The course covers key concepts and ideas of crowd-sourced decision-making as well as quality criteria and pitfalls. The course focuses on two applications (simple and complex categorization tasks and the identification of positive and negative sentiment) that can be generalized to many contexts and supports students in developing their crowd-sourcing projects.

Teaching sessions include:

  1. Introduction to crowd-sourced text analysis
  2. How to produce high quality data
  3. Application 1: Text categorization
  4. Application 2: Sentiment analysis
  5. Other applications, pitfalls, ethical concerns

Intended Learning Outcomes
At the end of the week, you will:

  • Be able to design, set up, implement, and evaluate your own online crowd-coding data project.
  • Be familiar with the core concepts underlying crowd-sourcing and the ‘wisdom of the crowd’ paradigm, as well as most recent applications to political science and social science questions.
  • Be able to calculate and critically reflect on different agreement-, reliability-, and trust scores.
  • Be able to craft and/or find suitable gold questions needed before, during, and after the analysis.
  • Be aware of potential pitfalls and limitations of crowd-coding and the best ways to address them.

Participants of this course may also participate in the virtual introduction to R course. While this may help with processing crowd-coding results, no prior R knowledge is needed..

Core Readings
Benoit, K., Conway, D., Lauderdale, B. E., Laver, M., & Mikhaylov, S. (2016). Crowd-sourced text analysis: Reproducible and agile production of political data. American Political Science Review, 110(2), 278-295.
Haselmayer, M., & Jenny, M. (2017). Sentiment analysis of political communication: combining a dictionary approach with crowdcoding. Quality & Quantity, 51(6), 2623-2646.
Horn, A. (2019). Can the online crowd match real expert judgments? How task complexity and coder location affect the validity of crowd‐coded data. European Journal of Political Research, 58(1), 236-247.
Barberá, P., Boydstun, A. E., Linn, S., McMahon, R., & Nagler, J. (2021). Automated text classification of news articles: A practical guide. Political Analysis, 29(1), 19-42.

Bildungszeit (can be claimed by employees in Baden-Württemberg) 
Anforderungen des Bildungszeitgesetzes Baden-Württemberg sind erfüllt
430 EUR / Early bird 390 EUR
ECTS Credits 
Contact for Questions 
Seminar Room 
D 432
14.03.2022 09:00 to 14:30
15.03.2022 09:00 to 14:30
16.03.2022 09:00 to 14:30
17.03.2022 09:00 to 14:30
18.03.2022 09:00 to 14:30
While the course does not require specific knowledge, basic knowledge of empirical social research is assumed (e.g., what is reliability and how to work with spread sheet programs such as Excel). Prior experience with R or Stata is helpful, but not expected. Participants should bring their laptop and it is recommended that they bring some additional funds (50 EUR) to run their own crowd-coding tasks. Sockets will be provided.