Microcredential komex: Social Data Science with Python


A crash course on how to use Python and textual web content for social data science: from data collection to analysis.

What Is This Course About?
Large-scale data from web and social media platforms combined with computational methods has been described as a revolution for the quantitative social sciences. This course serves as an entry point to this new methodology by focusing on web and social media data collection and analysis with Python. The course will provide an in-depth exploration of data collection via Application Programming Interfaces and web scraping for platforms like Wikipedia, Google trends, Reddit, and Youtube. We will then cover how participants can preprocess, visualize, and analyze this data using basic machine learning and text mining.

Learning Goals

  • Get an idea of the Data Science Workflow.
  • Give an overview on some important libraries for data analysis.
  • Gives you basic background on finding, getting, and wrangling data from the Web, Data Visualization, Machine Learning and text mining.

Assignments for the Course
We will have two types of assignments:

  • Daily in-class exercises to be solved individually or in groups
  • Group projects to conceptualize a research project, collect and analyze social data with individual written reports


  • 10:00-12:15h: Teaching.
  • 13:30-15:45h: Teaching.
  • 16:00-17:00h: Office hour.

Recommended Readings for the Course

  • Li, F., Zhou, Y., & Cai, T. (2021). Trails of data: Three cases for collecting web information for social science research. Social Science Computer Review, 39(5), 922-942. doi:10.1177/0894439319886019
  • Nyhuis, D. (2021). Application programming interfaces and web data for social research. In Handbook of Computational Social Science, Volume 2. Routledge. doi:10.4324/9781003025245-4
  • Hovy, Dirk. Text analysis in Python for social scientists: Discovery and exploration. Cambridge University Press, 2020.

Who Are Your Instructor?
David Garcia: David Garcia is professor for Social and Behavioural Data Science at the University of Konstanz since 2022 and Faculty Member of the Complexity Science Hub Vienna. He has expert knowledge in Computational Social Science investigating human behavior through digital traces with methods from complexity science. Analyzing big social data by using computational modeling, he aims to understand the impact of line media and social media networks on individuals and society (e.g. inequalities, data privacy). You can find more about his research at http://dgarcia.eu and on X @dgarcia_eu.

Indira Sen: Indira Sen is a Postdoc at the Political Science department at the University of Konstanz and her research is about understanding and characterizing the measurement quality of social science constructs like political attitudes and abusive content from digital traces. Her work with NLP and measurement theory. You can reach her at @indiiigosky on X or https://indiiigo.github.io/.

Bildungszeit (can be claimed by employees in Baden-Württemberg) 
Anforderungen des Bildungszeitgesetzes Baden-Württemberg sind erfüllt
460 EUR / Early bird 390 EUR / Please note: you will gain access to our learning management system Moodle only after having paid your course fee
ECTS Credits 
Contact for Questions 
26.02.2024 (All day)
27.02.2024 (All day)
28.02.2024 (All day)
29.02.2024 (All day)
01.03.2024 (All day)
5 study days
This course presumes introductory Python knowledge. For students who are inexperienced in this software, it is recommended to first attend the short ekomex course “Introduction to Python”.