Maryland Population Research Center

New Data Sources in Social Science Research: Things to Know Before Working With Reddit Data: Social media are becoming more popular as a source of data for social science researchers. These data are plentiful and offer the potential to answer new research questions at smaller geographies and for rarer subpopulations. When deciding whether to use data from social media, it is useful to learn as much as possible about the data and its source. Social media data have properties quite different from those with which many social scientists are used to working, so the assumptions often used to plan and manage a project may no longer hold. For example, social media data are so large that they may not be able to be processed on a single machine; they are in file formats with which many researchers are unfamiliar, and they require a level of data transformation and processing that has rarely been required when using more traditional data sources (e.g., survey data). Unfortunately, this type of information is often not obvious ahead of time as much of this knowledge is gained through word-of-mouth and experience. In this article, we attempt to document several challenges and opportunities encountered when working with Reddit, the self-proclaimed “front page of the Internet” and popular social media site. Specifically, we provide descriptive information about the Reddit site and its users, tips for using organic data from Reddit for social science research, some ideas for conducting a survey on Reddit, and lessons learned in merging survey responses with Reddit posts. While this article is specific to Reddit, researchers may also view it as a list of the type of information one may seek to acquire prior to conducting a project that uses any type of social media data.
Located in MPRC People / Frauke Kreuter, Ph.D. / Frauke Kreuter Publications
Frauke Kreuter featured in The Baltimore Sun on New Data Collection on COVID-19 with Facebook: Faculty at the University of Maryland have been working with Facebook to design a worldwide survey aimed at collecting coronavirus data during the global pandemic.
Located in News
Tree-based Machine Learning Methods for Survey Research: Predictive modeling methods from the field of machine learning have become a popular tool across various disciplines for exploring and analyzing diverse data. These methods often do not require specific prior knowledge about the functional form of the relationship under study and are able to adapt to complex non-linear and non-additive interrelations between the outcome and its predictors while focusing specifically on prediction performance. This modeling perspective is beginning to be adopted by survey researchers in order to adjust or improve various aspects of data collection and/or survey management. To facilitate this strand of research, this paper (1) provides an introduction to prominent tree-based machine learning methods, (2) reviews and discusses previous and (potential) prospective applications of tree-based supervised learning in survey research, and (3) exemplifies the usage of these techniques in the context of modeling and predicting nonresponse in panel surveys.
Located in MPRC People / Frauke Kreuter, Ph.D. / Frauke Kreuter Publications
Katharine Abraham comments on Misleading Economic Data during COVID-19 on The New York Times: The tools we have to understand what is happening to the economy are becoming distorted or harder to interpret.
Located in News
Predicting Voting Behavior Using Digital Trace Data: A major concern arising from ubiquitous tracking of individuals’ online activity is that algorithms may be trained to predict personal sensitive information, even for users who do not wish to reveal such information. Although previous research has shown that digital trace data can accurately predict sociodemographic characteristics, little is known about the potentials of such data to predict sensitive outcomes. Against this background, we investigate in this article whether we can accurately predict voting behavior, which is considered personal sensitive information in Germany and subject to strict privacy regulations. Using records of web browsing and mobile device usage of about 2,000 online users eligible to vote in the 2017 German federal election combined with survey data from the same individuals, we find that online activities do not predict (self-reported) voting well in this population. These findings add to the debate about users’ limited control over (inaccurate) personal information flows.
Located in MPRC People / Frauke Kreuter, Ph.D. / Frauke Kreuter Publications
Trust and cooperative behavior: Evidence from the realm of data-sharing: Trust is praised by many social scientists as the foundation of functioning social systems owing to its assumed connection to cooperative behavior. The existence of such a link is still subject to debate. In the present study, we first highlight important conceptual issues within this debate. Second, we examine previous evidence, highlighting several issues. Third, we present findings from an original experiment, in which we tried to identify a “real” situation that allowed us to measure both trust and cooperation. People’s expectations and behavior when they decide to share (or not) their data represents such a situation, and we make use of corresponding data. We found that there is no relationship between trust and cooperation. This non-relationship may be rationalized in different ways which, in turn, provides important lessons for the study of the trust—behavior nexus beyond the particular situation we study empirically.
Located in MPRC People / Frauke Kreuter, Ph.D. / Frauke Kreuter Publications
Change Through Data: A Data Analytics Training Program for Government Employees: From education to health to criminal justice, government regulation and policy decisions have important effects on social and individual experiences. New data science tools applied to data created by government agencies have the potential to enhance these meaningful decisions. However, certain institutional barriers limit the realization of this potential. First, we need to provide systematic training of government employees in data analytics. Second we need a careful rethinking of the rules and technical systems that protect data in order to expand access to linked individual-level data across agencies and jurisdictions, while maintaining privacy. Here, we describe a program that has been run for the last three years by the University of Maryland, New York University, and the University of Chicago, with partners such as Ohio State University, Indiana University/Purdue University, Indianapolis, and the University of Missouri. The program—which trains government employees on how to perform applied data analysis with confidential individual-level data generated through administrative processes, and extensive project-focused work—provides both online and onsite training components. Training takes place in a secure environment. The aim is to help agencies tackle important policy problems by using modern computational and data analysis methods and tools. We have found that this program accelerates the technical and analytical development of public sector employees. As such, it demonstrates the potential value of working with individual-level data across agency and jurisdictional lines. We plan to build on this initial success by creating a larger community of academic institutions, government agencies, and foundations that can work together to increase the capacity of governments to make more efficient and effective decisions.
Located in MPRC People / Frauke Kreuter, Ph.D. / Frauke Kreuter Publications

Search results

Maryland Population Research Center