Pesquisa:Inquérito crack Brasil/Consistência e acompanhamento

Fonte: Wikiversidade

Consistency[editar | editar código-fonte]

This page describes some checks that we can develop in order to identify potential problems with the quality of the data as they are collected. Below, there is one section on general things we can look at, and then one section for each part of the survey.

General topics[editar | editar código-fonte]

  • check that the GPS data all appear to be well-formed and in approximately the right place
  • check that we have the information necessary to identify the interviewer (or the palm the interview came from)
  • check that the dates are well-formed and make sense
  • look at the frequency of 8888 (don't know) and be sure that it doesn't look like it is over-used
  • look at the missingness across all variables (we could come up with a list of variables that shouldn't ever be missing and see how often they are...)
  • compare scale-up estimates from different interviewers within each city and see whether or not any patterns make sense. this is hard, though, because interviewers are responsible for different neighborhoods, etc. but we can also do this for known populations whose distribution we expect to be quite homogenous (like women < 20 who gave birth in the past year)
  • come up with a crude way to hash the interview responses (to a subset of variables) as a way to detect duplicates. (since we receive periodic installments of data, there's a danger some sort of administrative slipup will result in duplicating observations.)

Household roster[editar | editar código-fonte]

For the household roster, one of the main things we want to examine is whether or not the selection of the individual within the household seems to be working. So this means

  • check that the number of entries in the household table is the same as the number of reported members of the household
  • check that the person selected to respond is the eligible household member who has the next birthday
  • check that the first entry in the roster has, as the address field, the census block number

Sociodemographic section[editar | editar código-fonte]

  • check that the length of time residing in the municipality is <= respondent age
  • check that the respondent's sex is recorded by the interviewer

Scale-up section[editar | editar código-fonte]

  • a couple of the questions have sub-questions whose totals should add up; for example, q27 >= q28 >= q29 and also q37 >= q38 >= q39
  • look at possible heaping in the responses (though this would not necessarily be a problem in the interview). for example, people may report 0, 5, 10, much more frequently than other numbers
  • for the known populations whose totals we have, look at responses against totals (maybe by interviewer?)

Age-sex section[editar | editar código-fonte]

  • check that q1-q5 are conssitent
  • q6 == # entries in sib table + 1
  • q7 == # older sibs according to sib table
  • sib - check complete and skip patterns right
  • check that number of ages listed matches reported total for m/w

Drug use section[editar | editar código-fonte]

  • q3/5/6 - check skip patterns

Interviewer/final section[editar | editar código-fonte]

(nothing obvious to check here, i don't think... we should add to this part if we think of something)

Main estimates[editar | editar código-fonte]

  • network size
  • hidden population size
  • estimated vs actual pop size for known groups

Process data (paradata)[editar | editar código-fonte]

  • length of interviews
  • response rate (we need to be sure that we have the data we need to estimate response rate)
  • number of interviews per interviewer per day
  • are the timestamps within in interview always increasing? In other words, is it possible that they are going through the survey out of order?
  • ensure that no interviews are between midnight and 7am
  • check to see whether or not a suspicious (how to define?) number of interviews come from the same or almost exactly the same location (using gps readings)
  • check whether interview times from the same interviewer do not overlap. That should not be possible for the interviewer, but may happen during data processing. (Neilane wrote in an e-mail that some interviews from the same interviewer had "the same time".)

Follow-up[editar | editar código-fonte]

Problems with interpretation of broken questionnaire[editar | editar código-fonte]

Because of a major screw-up in the PDA's programming, some interviewers are answering a few questions themselves, instead of asking the interviewee. The best way to detect this is to find out who has been recording the same answers for those questions over different interviews.

  • For each interviewer, make a histogram of the affected questions
    • Perhaps a simple straight-lines-and-dots plot of the sequence of answers would be better than a histogram, because we'd be able to see if interviewer's behavior changed at some point, and a flat line would still indicate mistaken behavior.