Microcredential komex: Crowdsourced Text Analysis


Learn a fast, cheap, replicable and scalable approach to gather data that fits your research question, not vice versa, in a way that also allows you to code even subtle and implicit differences in meaning.

What Is This Course About?
The three-days in-person course provides you with the skills needed to design and conduct a crowd-sourced (data gathering) project. Crowdsourced text analysis allows for fast, affordable, valid, and reproducible online coding of very large numbers of statements by collecting repeated judgements by multiple (paid) non-expert coders. The course covers key concepts and ideas of crowdsourced decision-making as well as quality criteria and pitfalls. You will learn how to do simple and complex categorization tasks as well as the identification of positive and negative sentiment (for instance with regard to actors, processes, institutions, or arguments). What you learn can be generalized to many research contexts and will allow you to develop and execute your own crowd-sourcing project(s).

Learning Goals
After this course you will:

  • Be able to build your own data, even if other methods are not able to capture the relevant nuances.
  • Be able to design, set up, implement, and evaluate your own online crowd-coding data project.
  • Be familiar with the core concepts underlying crowdsourcing and the ‘wisdom of the crowd’ paradigm, as well as most recent applications to social science questions.
  • Be able to calculate and critically reflect on different agreement-, reliability-, and trust scores.
  • Be able to craft and/or find suitable gold questions needed before, during, and after the analysis.
  • Be aware of potential pitfalls and limitations of crowd-coding and the best ways to address them.

Assignments for the Course
All the assignments will be carried out with the help of the instructors:

  • Think/find a coding task that speaks to your research question
  • Setting up coding instructions
  • Preparation of questions to select and evaluate crowd-coders
  • Uploading of crowd-coding job and collection of results


  • 28.02.2023: 14:00-17:00h – Teaching / 17:00h – Office hours
  • 29.02.2023: 09:00-12:00h & 14:00-17:00h - Teaching
  • 01.03.2023: 09:00-12:00h - Teaching / 13:00h – Office hours
  • 29.02.2023: Course dinner

Recommended Readings for the Course

  • Benoit, K., Conway, D., Lauderdale, B. E., Laver, M., & Mikhaylov, S. (2016).
    Crowd-sourced text analysis: Reproducible and agile production of political data.
    American Political Science Review, 110(2), 278-295.
  • Horn, A. (2019). Can the online crowd match real expert judgments? How task complexity and coder location affect the validity of crowd‐coded data. European Journal of Political Research, 58(1), 236-247.
  • Haselmayer, M., & Jenny, M. (2017). Sentiment analysis of political communication:
    combining a dictionary approach with crowdcoding. Quality & Quantity, 51(6),

Who Are Your Instructors?
Alexander Horn is Head of the Emmy Noether Research Group Varieties of Egalitarianism, University of Konstanz/Cluster The Politics of Inequality, Germany. Previously, he served as Assistant Professor at Aarhus University and John F. Kennedy Memorial Fellow at the Center for European Studies at Harvard. His methods expertise is in crowdsourcing, crowd-coding, and text analysis. His publications on online crowd-coding, content validity and measurement of political text include Peeping at the corpus – What is really going on behind the equality and welfare items of the Manifesto project? and Can the online crowd match real expert judgments? How task complexity and coder location affect the validity of crowd-coded data.
X @_Alex_Horn

Sergio E. Zanotto is a Ph.D. candidate in Linguistics and an Independent Doctoral Fellow at the Cluster of Excellence "The Politics of Inequality", University of Konstanz. His research focuses on the automatic analysis of Italian political discourses. His expertise involves the use of rule-based Corpus Linguistics and Natural Language Processing techniques for dealing with text analysis. He is proficient in different programming languages, such as R and Python. His projects deal with building data resources for training Language Models and evaluating language features for studying the Language of Politics.
X @sergio_zanotto

Bildungszeit (can be claimed by employees in Baden-Württemberg) 
Anforderungen des Bildungszeitgesetzes Baden-Württemberg sind erfüllt
270 EUR / Early bird 220 EUR / Please note: you will gain access to our learning management system Moodle only after having paid your course fee
ECTS Credits 
Contact for Questions 
28.02.2024 (All day)
29.02.2024 (All day)
01.03.2024 (All day)
3 study days
This course presumes basic knowledge in quantitative research design and methods, such as usually obtained through a social science MA degree.