Skip to content. | Skip to navigation

Personal tools


You are here: Home / Big Data Project / BIGDATA Collaborative Research: IA, Population Reproduction of Poverty at Birth from Surveys, Censuses, and Birth Registrations.

BIGDATA Collaborative Research: IA, Population Reproduction of Poverty at Birth from Surveys, Censuses, and Birth Registrations.

Michael S. Rendall, Principal Investigator, University of Maryland - MPRC; Mark Handcock, Co-Principal Investigator, University of California Los Angeles; Margaret Weden, Co-Principal Investigator, RAND

Project Abstract

The potential for upward mobility from one generation to the next is fundamental to the wellbeing of a democratic society, and to that society’s long-term economic productivity. Poverty at birth is a condition affecting more than 1 in 4 American children. It is a critical barrier to children’s subsequent development and life chances, and therefore to breaking intergenerational cycles of disadvantage. Both mother’s and father’s characteristics contribute to the chance that a child will be born into poverty. Therefore it is important to incorporate both the mother and father into the scientific modeling of the experience of poverty at birth from one generation to the next. Existing data and methods do not allow for this modeling in representative samples of the U.S. population, especially as it evolves as an increasingly multi-ethnic society. Limited data are collected on the characteristics of every birth in the United States. Limited additional data on all individuals and their households are collected every 10 years in the Census. Larger amounts of data per individual and household are collected every year from random samples that are representative of the U.S. population. These samples include the large-scale American Community Survey and numerous medium- and small-scale sample surveys. This research develops and evaluates statistical estimation and simulation methods to combine data from all these sources to answer questions about intergenerational mobility. The research will answer questions about the degree of persistence of poverty at birth from one generation to the next across different race and ethnic groups, and about the roles of education and family formation in creating upward mobility versus persistence of disadvantage. Open-source, user-friendly software for the statistical methods developed in the project will be made available to researchers. The project also develops graduate students’ skills in Big Data methods in statistics and the social sciences.

The project has as its goal the development of a transformative, Big Data approach to exploiting the rich “traditional” data sources to build social-scientific theory in a statistically-rigorous and empirically-comprehensive way. In no single nationally-representative data source is poverty at birth observed for the mother-father-child triad. The triad’s poverty statuses at each one’s birth are instead linked in a model that simulates four connected processes: (1) educational progress predicted from the birth conditions of household poverty and parents’ education, race, ethnicity, and immigrant statuses; (2) couple formation and dissolution; (3) couple fertility and unpartnered women’s fertility; and (4) household poverty when their own children are born. The “Big Data” of this project consist of more than 100 million births, combined with census, microcensus, large-scale cross-sectional survey, and medium-scale and smaller-scale longitudinal survey data sources that together include millions of years of exposure to schooling, to partner-matching, and to partnered and (co-residentially) unpartnered births. These multiple sources of data of the study, unlike in many Big Data applications, are all either complete enumerations or probability samples of a well-specified population. The project develops combined-survey and combined population-and-survey estimation methods to estimate with enhanced precision the individual behavioral parameters of the process, and develops a simulation-modeling approach to generating inference about causal associations that emerge from the four connected processes, integrating the multiple sources of uncertainty about each component process. Results and methodological advances will be disseminated to the scholarly community through presentations and peer-reviewed journal articles. Additional, broader dissemination will occur through project investigator contact with the news media and other forums for engagement with the broader policy community about the project results and their significance.