This document summarizes the results of survey experiments conducted in Tanzania and Mali to evaluate how different survey design choices, such as questionnaire structure, use of proxy respondents, and definitions of household, can impact measurements of gender outcomes. The experiments found significant differences in reported labor force participation, employment status, income, and other variables depending on the survey design. The implications are that non-random measurement error can increase variance and bias estimates, particularly for discrete variables important for policymaking. Consistency in survey design is important for accurate measurement over time.
1 of 27
More Related Content
Survey Design Choices: Implications for Measuring Gender Outcomes
1. Survey Design Choices: Implications for Measuring Gender Outcomes Andrew Dillon IFPRI April 2010 Tool Pool Presentation Based on work with Elena Bardasi (WB), Lori Beaman (Northwestern), Kathleen Beegle (WB), Pieter Serneels (UEA)
2. Motivation Emerging literature using survey experiments illustrates that survey design matters both to the levels of variables reported from surveys and the economic relationships estimated with the data. Gender differences may be particularly important and have motivated much of this work. Builds on data quality work of the PSID, US Census, World Bank LSMS, ILO.
3. Motivation Household definitions used in multi-topic household surveys vary between surveys Typical definitions involve one or more of the following: Residency Requirement Acknowledgement of Household Head Consumption from “common pot” or “common granary” Joint income decisions: agricultural production, pastoralism, small enterprise There are ambiguities within each component Which components are appropriate may vary by survey topic and context
5. Motivation Comparison of labor statistics calculated from household roster "main activity" question and module on hours spent in specific activities Malawi IHS2 2004/5 Men Women Main activity reported in HH roster Activities section Main activity reported in HH roster Activities section Farmer a 47.9 59.3 57.8 64.1 Employee b 15.0 15.8 3.4 3.8 Employee (including casual worker) c 15.0 30.2 3.4 12.8 Inactive d 25.8 16.8 33.2 25.0 Note: Reference period for both modules in the last 7 days.
6. Motivation Possible reasons for the difference in measured outcomes across two surveys: Sampling (intended or implemented) Timing of measurement (e.g. seasonality) Survey instrument Recall period Wording of questions (Campanelli et al,1989) Sequence of questions (Martin and Polivka, 1995) Detail of questionnaire (Kalton et al, 1982) Respondent type: Self or proxy (Anker, 1983)
7. Motivation How much variation over time, across countries and across surveys is due to non-random measurement error? Stylized facts are important to policy makers, donors (MDGs….) Researchers care about bias in estimation of economic relationships Nonrandom measurement error increases variance of estimates for continuous variables, but does not necessarily bias estimates. When you work with small sample sizes, statistical power is diminished. For discrete variables, point estimates are biased. Many discrete outcomes that we care about including eligibility requirements, LFP, asset ownership, poverty status, etc.
8. 2 Survey Experiments Introduce Randomized Variation in Survey Design Choices Household definitions in Mali to look at the effect on household composition, assets, consumption expenditures, agricultural production and input use Questionnaire structure (inclusion of screening questions) and use of proxy respondents in labor modules
9. Experimental design: Labor Modules Sample size 1344 households from SHWALITA 7 districts throughout Tanzania Satisfactory degree of “randomness” across groups Treatment 1: Questionnaire design Treatment 2: Respondent type D etailed (with screening q questions) S hort S elf - reporting Detailed, Self - Reporting Short, Self - reporting P roxy Detailed, Proxy Short, Proxy
10. Experimental Design: Household Definitions Roughly 250 HHs per definition Definition Keywords: Common Agricultural Activities/ Income Sharing No Ag/ Income Sharing keywords Ag/ Income Sharing keywords Definition Keywords: No Food Sharing keywords Definition 1 Definition 3 Common Food Sharing Food Sharing keywords included Definition 2 Definition 4
12. Results: LFP and hours worked A. B. C. Short Detailed Diff Proxy Self-rep Diff Short Proxy Other Diff Labor force participation (%) Men 82.4 83.0 -0.6 74.3 87.3 -13.0*** 74.1 84.6 -10.5*** Women 69.9 77.0 -7.2*** 68.4 76.5 -8.1*** 64.6 75.6 -11.0*** Weekly hours last week Men 30.0 27.7 2.3** 24.5 31.3 -6.9*** 25.1 29.7 -4.6*** Women 22.3 23.0 -0.8 20.3 24.2 -4.2*** 19.4 23.4 -4.0** Daily earnings (Tshillings) Men 541 662 -121 637 580 57 471 628 -157 Women 198 384 -187** 271 306 -35 80 342 -262** Notes: *** indicates statistical significant mean differences at 1%, ** at 5%, * at 10%. Samples for weekly hours and daily earnings are conditional on any wage work in the last 7 days (they exclude zeros).
13. Differences in Sectoral Distribution by Gender Men Women A. Short or Detailed Short Detailed Diff Short Detailed Diff Main activity^ Agriculture 58.6 59.0 -0.4 60.1 65.7 -5.6** Other sectors 23.8 24.0 -0.2 9.6 11.4 -1.8 Domestic Duties 7.9 2.2 5.7*** 18.8 2.4 16.4*** No work 9.7 14.8 -5.1*** 11.3 20.5 -9.2*** N 723 688 750 784 B. Proxy or Self-report Proxy Self-rep Diff Proxy Self-rep Diff Main activity^ Agriculture 53.8 61.6 -7.8*** 59.8 64.8 -5.1** Other sectors 20.5 25.7 -5.2** 8.5 11.6 -3.1* Domestic Duties 7.8 3.6 4.1*** 13.5 8.7 4.8*** No work 17.9 9.0 8.9*** 18.1 14.8 3.2** N 502 909 564 970 C. Short proxy or not Short, Proxy Other Diff Short, Proxy Other Diff Main activity^ Agriculture 55.4 59.6 -4.2 56.8 64.4 -7.5*** Other sectors 18.7 25.0 -6.3** 7.4 11.2 -3.8** Domestic Duties 11.6 3.7 7.8*** 23.5 7.4 16.0*** No work 14.3 11.7 2.6 11.9 16.9 -5.0** N 251 1,160 285 1,249 Notes: Other sectors are specifically listed on the questionnaire and include mining/quarrying, manufacturing/ processing, gas/water/electricity, construction, transport, trading, personal services, education/health, public administration, and other *** indicates statistical significant mean differences at 1%, ** at 5%, * at 10%. ^ Within group, the percentages sum to 100.
14. Labor statistics by proxy-subject characteristics – difference in sex Mean (1) Proxy-subject gender interactions M-F (2) F-F (3) Diff. F-M (4) M-M (5) Diff. Labor force participation (%) 71.2 73.1 60.7 12.4*** 77.7 65.2 12.5*** Weekly hours last week 22.1 22.0 16.7 5.4*** 26.5 18.9 7.6*** Daily earnings (Tshillings) 444 307 213 94 740 360 382 N 1,066 350 214 367 135 Notes: *** indicates statistical significant at 1%, ** at 5%, and * at 10%. Labor statistics are disaggregated by proxy-subject gender interactions (M-F indicates a male proxy who reports on a female subject, and so on). The ttest conducted is between M-F and F-F in Columns (2) and (3), and F-M and M-M in Columns (4) and (5). The smaller sample size in this table is due to restricting the sample to only proxy responses.
15. Regression results for labor experiments Short v. Detailed Proxy v. Self-report Short proxy v. others Labor force participation Lower (women) - Lower Working hours - Lower Lower Income Lower - Lower Activity distribution More domestic duties Less ‘no work’ Less agric and other sectors (women) More domestic duties More ‘no work’ Less agric and other sectors More domestic duties Less ‘no work’ (women) Less agric (women) and other sectors Employment status Less paid employee (men) More self-empl (men) More unpaid family worker (women) Less paid employee (men) Less self-employed More unpaid family worker Less paid employee Less self-employed More unpaid family worker
17. Regression results on household composition Table 3a: HH Size Total HH Size: Resident for last 6 mo Age of HH Head Number of Married Men Number of Married Women Def 2: Common Food, Dwelling, Authority 0.780 2.13 * 0.212 0.225 (0.506) (1.20) (0.136) (0.157) Def 3: Common Agriculture, Dwelling, Authority 1.060 ** 2.19 * 0.258 * 0.378 ** (0.507) (1.20) (0.136) (0.158) Def 4: Common Agriculture; Common Food, Dwelling, Authority 0.715 2.93 ** 0.262 * 0.300 * (0.507) (1.20) (0.136) (0.158) N 1021 1021 1021 1021
19. Summary of regression results from hh experiments Adding consumption requirements Adding common ag or income generating requirements Adding both HH size - Increased household size - HH Comp Increased number of men and women, increased number of people 16-60 Increased number of men and women, increased number of people 16-60 Increased number of men and women, increased number of people 16-60 Assets Greater farm assets and livestock holdings Greater farm, nonfarm assets and livestock holdings - Consumption Greater expenditure, greater quantities measured Greater quantities measured -
20. Conclusions from HH experiments Somewhat surprising that increased definitional “restrictions” increases size as well as hh composition Keywords may trigger respondent to include/remember a different set of people These effects are slightly larger for women. Since food budget shares, especially grain budget shares are high (60-80% in Mali), the implications on poverty statistics are large Consistency over time in definitions used in national surveys and in evaluations (baseline and endline studies) are paramount for accurate measurement of policy relevant variables
21. Conclusions from Labor Experiments Proxy design has large and statistically significant deviations for LFP, hours, earnings, and occupational choice statistics. Is the reduced precision in statistics worth the reduced cost of implementation? Questionnaire design effects depend on variable of interest. Short+self-report produces estimates of LFP and sector of work that are the closest to detailed+self-report for men, but not women. Gender effects smaller than expected on LFP and hours? Women’s activities are highly sensitive to screening questions.
23. Actual Definitions from HH experiment A household is composed of the group of people living in the same dwelling space and acknowledge the authority of a man or women who is the head of household. A household is composed of the group of people living in the same dwelling space who eat meals together and acknowledge the authority of a man or women who is the head of household.
24. Actual Definitions from HH experiment A household is composed of the group of people living in the same dwelling space who have at least one common plot together or one income generating activity together (for example, herding, business or fishing) and acknowledge the authority of a man or women who is the head of household. A household is composed of the group of people living in the same dwelling space who eat meals together and have at least one common plot together or one income generating activity together (for example, herding, business or fishing) and acknowledge the authority of a man or women who is the head of household.
25. Papers Anker, R. (1983) “Female Labour Force Participation in Developing Countries: A Critique of Current Definitions and Data Collection Methods.” International Labour Review 122(6):709-724. Bardasi, Elena & Beegle, Kathleen & Dillon, Andrew & Serneels, Pieter, (2010). " Do labor statistics depend on how and to whom the questions are asked ? results from a survey experiment in Tanzania ," Policy Research Working Paper Series 5192, The World Bank. Bardasi, Elena & Beegle, Kathleen & Dillon, Andrew & Serneels, Pieter, (2010). “Explaining Variation in Child Labor Statistics," under review. Bardasi, Elena & Beegle, Kathleen & Dillon, Andrew & Serneels, Pieter, (2010). “Returns to Education," working paper. Beaman, Lori and Dillon, Andrew (2010). Do household definitions matter in survey design? Results from a randomized survey experiment in Mali, under review . Christiaensen, L. and J. Hoddinott (2001). “Comparing Village Characteristics Derived from Rapid Appraisals and household surveys: A Tale of Northern Mali.” Journal of Development Studies , vol. 37 (3), pp. 1-20. Campanelli, P., J. M. Rothgeb, and E. A. Martin (1989) The Role of Respondent Comprehension and Interviewer Knowledge in CPS Labor Force Classification. American Statistical Association Proceedings (Survey Research Methods Section). Charmes, J. (1998) Women Working in the Informal Sector in Africa: New Methods and New Data. Paris: Scientific Research Institute for Development and Co-operation. de Mel, S., D. McKenzie, and C. Woodruff (2007) “Measuring Microenterprise Profits: Don't Ask How the Sausage is Made.” Journal of Development Economics 88(1):19-31. Dixon-Mueller, R., and R. Anker (1988) Assessing Women’s Economic Contributions to Development. Training in Population, Human Resources and Development Planning Paper number 6, Geneva: International Labour Office.
26. Papers Dillon, Andrew, 2009. " Measuring child labor: Comparisons between hours data and subjective measures ," Research in Labor Economics, forthcoming. Glewwe, P. and M. Grosh (eds) (2000) Designing Household Survey Questionnaires for Developing Countries: Lessons from 15 Years of the Living Standards Development Study . Oxford University Press (for the World Bank). Guarcello, L., I. Kovrova, S. Lyon, M. Manacorda, and F.C. Rosati (2009) “Towards Consistency in Child Labour Measurement: Assessing the Comparability of Estimates Generated by Different Survey Instruments.” Understanding Children's Work Project Draft Working Paper. Guyer, J. (1981). “Household and Community in African Studies.” African Studies Review , 24 (2/3), pp. 87-128. Hausman, J. A. (2001) “Mismeasured Variables in Econometric Analysis: Problems from the Right and Problems from the Left.” Journal of Economic Perspectives 15 (4):57-67. Hausman, J.A., J. Abrevaya, and F.M. Scott-Morton (1998) “Misclassification of the Dependent Variable in a Discrete Response Setting.” Journal of Econometrics 87: 239-269. Hill, D. H. (1987) “Response Errors in Labor Surveys: Comparisons of Self and Proxy Reports in the Survey of Income and Program Participation (SIPP).” in Proceedings of the Bureau of Census, Third Annual Research Conference . Hill, P. (1986). Development Economics on Trial: The Anthropological Case for Prosecution. Cambridge University Press, Great Britain. Hosegood, V. and I.M. Timaeus (2005). "Household Composition and Dynamics in KwaZulu Natal, South Africa: Mirroring Social Reality in Longitudinal Data Collection." Pp. 58-77 in African Households: Censuses and Surveys, edited by E. Van de Walle. Armonk, NY: M.E. Sharpe.
27. Papers Hyslop, D. R. and G. W. Imbens (2001) “Bias from Classical and Other Forms of Measurement Error.” Journal of Business & Economic Statistics 19(4): 475-481. Kalton, G., and H. Schuman (1982) “The Effect of the Question on Survey Responses: A Review.” Journal of the Royal Statistical Society 145(1):42-57. Leone, T., E. Coast and S. Randall (2009) “Did you sleep here last night? The impact of the household definition in sample surveys: a Tanzanian case study.” Presentation at British Society for Population Studies Annual Conference, Brighton, 9-11 September 2009. Martin, E. and A. E. Polivka (1995) “Diagnostics for Redesigning Survey Questionnaires: Measuring Work in the Current Population Survey.” Public Opinion Quarterly 59:547-567. Martin, J. and B. Butcher (1982) “The Quality of Proxy Information: Some Results from a Large Scale Study.” The Statistician 31:293-319. Mata Greenwood, A. (2000) Incorporating Gender Issues in Labour Statistics , Geneva: International Labour Office, Bureau of Statistics. Mathiowetz, N. A. and R. M. Groves (1985) “The Effects of Respondent Rules on Health Survey Reports.” American Journal of Public Health 75(6):639-644. Moore, J. (1988) “Self/Proxy Response Status and Survey Response Quality: A Review of the Literature.” Journal of Official Statistics 4:155-172. Podsakoff P.M., S.B. MacKenzie, J-Y Lee, and N.P Podsakoff (2003) “Common Method Biases in Behavioral Research: A Critical Review of the Literature and Recommended Remedies.” Journal of Applied Psychology 88(5):879-903. Poterba, J. M. and L. H. Summers (1986) “Reporting Errors and Labor Market Dynamics.” Econometrica 54(6):1319-1338. Tversky A., and D. Kahneman (1981) “The Framing of Decisions and the Psychology of Choice.” Science 211(4481):453-458. Udry, Christopher. (1996) "Gender, Agricultural Production, and the Theory of the Household," Journal of Political Economy , 104(5), pp. 1010-46.
Editor's Notes
Where does this difference come from? may be differences in sampling: strategy followed, way sampling was implemented, selection bias may be timing / time of the year / seasonality Skoufias presents empirical evidence for India showing that may have to do with the survey instrument itself, detail of the questionnaire recall period: US Bureau of labor stats changed its longitudinal panel survey frequency from one year to two years and decided to first do an experiment: they found that it had some effects wording of the questions to find out, Campanelli et al report on a respondent debriefing study carried out by the US Bureau of stats repondents were asked to calssify hypothetical situations in terms of their own understanding of lf concepts like work, job, business, etc. e.g. 38% of repondents included non-work activities under work classification so wording is important sequence of the questions Following this, Martin and Polovka carried out an experiment to see what the effect of questionnaire wording and sequencing on employment stats was And found that it had an effect With those in family business (or farm), causal employment and work compensated by other than pay Had biggest differences Probing questions were evaluated as useful to detect what they said was underreporting Detail of the questionnaire Kalton et al provide an overview of factors that may have an effect and they identify questionnaire detail as one of them may have to do with the respondent: does the respondent answer for him or herself or for someone else (i.e. by proxy) If this kind of biases are important in the US, we expect them to be even more important in dev Countries
Where does this difference come from? may be differences in sampling: strategy followed, way sampling was implemented, selection bias may be timing / time of the year / seasonality Skoufias presents empirical evidence for India showing that may have to do with the survey instrument itself, detail of the questionnaire recall period: US Bureau of labor stats changed its longitudinal panel survey frequency from one year to two years and decided to first do an experiment: they found that it had some effects wording of the questions to find out, Campanelli et al report on a respondent debriefing study carried out by the US Bureau of stats repondents were asked to calssify hypothetical situations in terms of their own understanding of lf concepts like work, job, business, etc. e.g. 38% of repondents included non-work activities under work classification so wording is important sequence of the questions Following this, Martin and Polovka carried out an experiment to see what the effect of questionnaire wording and sequencing on employment stats was And found that it had an effect With those in family business (or farm), causal employment and work compensated by other than pay Had biggest differences Probing questions were evaluated as useful to detect what they said was underreporting Detail of the questionnaire Kalton et al provide an overview of factors that may have an effect and they identify questionnaire detail as one of them may have to do with the respondent: does the respondent answer for him or herself or for someone else (i.e. by proxy) If this kind of biases are important in the US, we expect them to be even more important in dev Countries
Where does this difference come from? may be differences in sampling: strategy followed, way sampling was implemented, selection bias may be timing / time of the year / seasonality Skoufias presents empirical evidence for India showing that may have to do with the survey instrument itself, detail of the questionnaire recall period: US Bureau of labor stats changed its longitudinal panel survey frequency from one year to two years and decided to first do an experiment: they found that it had some effects wording of the questions to find out, Campanelli et al report on a respondent debriefing study carried out by the US Bureau of stats repondents were asked to calssify hypothetical situations in terms of their own understanding of lf concepts like work, job, business, etc. e.g. 38% of repondents included non-work activities under work classification so wording is important sequence of the questions Following this, Martin and Polovka carried out an experiment to see what the effect of questionnaire wording and sequencing on employment stats was And found that it had an effect With those in family business (or farm), causal employment and work compensated by other than pay Had biggest differences Probing questions were evaluated as useful to detect what they said was underreporting Detail of the questionnaire Kalton et al provide an overview of factors that may have an effect and they identify questionnaire detail as one of them may have to do with the respondent: does the respondent answer for him or herself or for someone else (i.e. by proxy) If this kind of biases are important in the US, we expect them to be even more important in dev Countries