Abstract: Brentel, I., & Winters, K. (2018, 01.03.). A Case Study in Large Scale Variable Harmonization. General Online Research 2018 (GOR18), German Society for Online-Research. TH Köln, Köln. [Tandem 1]

Relevance & Research Question: One of the NRW-Innovativ projects is an attempt to fill a lacuna in communications studies by creating a harmonized dataset for longitudinal data (since 1954) about media use in Germany exploiting the Media-Analysis-Data. In making large scale media use data accessible for academic research in high quality standards of data documentation lies the relevance of this project. The research question, therefore, is: how to make the Media-Analysis-Data – as a big data – accessible for academic research while being transparent.

Methods & Data: This paper will present the various theoretical, practical and the use of a digital harmonization software, CharmStats, utilized over the course of this project. Goal of the harmonization was to create a scientific use file setting excellent documentation standards with the help of CharmStats and to continue the harmonization already done until 2009. Using a new harmonization software, CharmStats, we review the challenges and solutions developed as a case study in large-scale data harmonization. With more than 1.5 million cases per dataset – in total there are two harmonized datasets –, each with almost 30.000 variables for over 60 years for pressmedia and almost 40 years for radio, the Media-Analysis data can be counted as the biggest dataset of media use in Germany being available for academics.

Results: Target of the project is to make the complex process of data harmonization with large-scale data most transparent and replicable. CharmStats offers the possibility to fulfil the project´s goals as it produces syntaxes for data harmonization plus a report for documentation. For the presentation we would portrait the different levels to reach the projects´ goals to answer the research question:

  1. Find a structure to work with
  2. Setting standards for data documentation with CharmStats
  3. Producing a harmonized dataset
  4. Making the dataset replicable, moreover, making it an accessible and sustainable source for academic research throughout the Library of Online Harmonization (scheduled for release in 2019)

Added Value: The methodological approach of this project can be counted as a user case for documenting and harmonizing big data for academic research.