Questões abertas (open questions)[editar código-fonte]

Desatualizado[editar código-fonte]

Isso esta desatualizado. Franciela Pedroso.

Haha, não sei se isso foi um spam fortuito ou alguém envolvido no projeto, mas é fato. Está desatualizado! :P --Solstag (discussão) 08h06min de 27 de Abril de 2012 (UTC)

Rethinking some subpopulations[editar código-fonte]

Blind/Deaf people as known populations?[editar código-fonte]

The long-form of the census asks about difficulties with seeing (Q6.14) and difficulties with hearing (Q6.15). I believe that one possible response is 'cannot see/hear at all'. If we consider those people to be blind / deaf, is it possible that we could use these as known populations? I'm not sure what the purpose of these census questions is, and also if I am reading them exactly right. What does everyone think? --Dfeehan 19h13min de 13 de Julho de 2011 (UTC)

Retired by disability count looks too high[editar código-fonte]

Here are the graphs Dennis made of the subpopulation sizes by population and by city (graphs were created about July 1, 2011):

--Msalganik 20h27min de 6 de Julho de 2011 (UTC)

I think we should drop 'aposentados por invalidez' based on this from Ale: "Dimitri's spreadsheet notes that "apenas por UF desagregam por tipo de benefício". That means they only distinguish the kind of benefit received from social security within states, not cities. So the data he is presenting there is for all kinds of social security benefits, which makes sense with the numbers, but doesn't make sense for our research." --Msalganik 20h09min de 6 de Julho de 2011 (UTC)

Perhaps we could look further into the data to confirm what Dimitri said, as more recent data could be of city level. But for now I can only agree. --Solstag 20h36min de 6 de Julho de 2011 (UTC)

Bolsa família count looks too high[editar código-fonte]

The website pointed to in the population count spreadsheet contains the correct numbers, of families receiving the benefit, but the spreadsheet instead lists the total number of families registered with the government. This is somewhat good, because the correct numbers are readily available - they just weren't in our spreadsheet. --Solstag 20h36min de 6 de Julho de 2011 (UTC)

For Bolsa Familia, if we have the right numbers can you add them to the spreadsheet and send it back to us. That way Dennis can plot them and see if they are too big. --Msalganik 20h23min de 6 de Julho de 2011 (UTC)
Yes, I'm on to that --Solstag 20h36min de 6 de Julho de 2011 (UTC)
Also, the long form of the census, questions 6.56 - 6.59 ask about various social programs and one of the categories is Bolsa Familia and PETI lumped together. Could this be an alternate source of totals? If we can eventually get microdata from the census, this could be an advantage because we would be able to get the age-sex profiles of the populations. --Dfeehan 19h10min de 13 de Julho de 2011 (UTC)

Public middle school students count a bit high[editar código-fonte]

Although a bit high, for this population we have microdata including gender and age, which seemed to be of interest given McCormick's paper. We also need to update the spreadsheet since 2010 data has since been released. --Solstag 20h36min de 6 de Julho de 2011 (UTC)

Also, it looks like the long-form (the sample/amostra version) of the census has information on this population. In fact, perhaps we can use the census numbers to create a more specific question that would result in a smaller subgroup? Someone who understands Brazil's educational system might be able to figure out how to do this. I'm referring to questions 6.27 - 6.32 of the long-form questionnaire. --Dfeehan 18h59min de 13 de Julho de 2011 (UTC)

Population counts depending on the CENSUS[editar código-fonte]

It looks like we are expecting to get the following from the 2010 census:

  • emigracao internacional (short form)
  • indigenas (short form)
  • estrangeiro (long form)
  • religiao IURD (long form)
  • pessoas viuvas (long form)
  • pessoas com 4 filhos ou mais (long form)
  • Religiao (todos evangelicos somados) (long form)

But, it looks like we don't get the microdata from the census until 2012 (possibly the end of 2012) Is that going to cause problems?

--Msalganik 20h23min de 6 de Julho de 2011 (UTC)

It seems that some city level information gets released much earlier, but we would have to check with IBGE for those specific groups. For example, there is already city level information available on the general population count plus urban/rural, age and gender profiles. --Solstag 21h03min de 6 de Julho de 2011 (UTC)
Also, I've edited the list above to indicate which groups come from the entire census (short form) and which only come from the sample that is administered the long form questionnaire. However, I haven't been able to find anything online describing how large the long-form sample is; does anyone happen to know?--Dfeehan 19h06min de 13 de Julho de 2011 (UTC)

For the groups that we get from the 2010 Census can we use the micro-data to get the age-sex profiles that Dennis wants? --Msalganik 20h23min de 6 de Julho de 2011 (UTC)

For those specific groups we would probably need the microdata in order to get the age and sex profiles. --Solstag 21h03min de 6 de Julho de 2011 (UTC)

Are we sure that the things that we think are on the census are actually on the census? More generally, can we get a copy of the census questionnaire? Maybe that will give us ideas for other groups? --Msalganik 20h23min de 6 de Julho de 2011 (UTC)

I'll try to find a copy of the questionnaire. Meanwhile, that IBGE website for city level data has a lot of information from sources other than the CENSUS wihch can inspire new subpopulations! --Solstag 21h03min de 6 de Julho de 2011 (UTC)
Found it! Questionnaires for the 2010 Census. --Solstag 21h05min de 6 de Julho de 2011 (UTC)

It looks like these two groups come from the 2000 Census:

  • Homens casados
  • Mulheres casadas

Do we want to change to the 2010 census for this information?

--Msalganik 20h23min de 6 de Julho de 2011 (UTC)

Sure, but that's not available yet. --Solstag 21h03min de 6 de Julho de 2011 (UTC)
But weren't we asking about men and women married in the last year? (Just married, with no time restriction, would presumably be very large groups.) The long-form of the census asks about marriage status, but I don't see a question that would let us know when the marriages happened. So perhaps the totals for these populations weren't going to come from the census, but instead from civil records or something? --Dfeehan 19h06min de 13 de Julho de 2011 (UTC)
Dmitri's spreadsheet says "Homens casados Censo 2000 - preciso de acesso" and "Mulheres casadas Censo 2000 - preciso de acesso". Google Translate says that is married men and married women and I assume that Censo 2000 is the 2000 Census. However, I checked out Curitiba paper, and he said that the source for the married in the last 12 months men (and women) was "Brazilian National Institute of Geography and Statistics, Municipalities Demographic Information Database (2009) [IBGE/MUNIC]" That sounds like it might be a good source of known populations. Ale or Dennis, do you know that database? --Msalganik 01h52min de 14 de Julho de 2011 (UTC)
I don't know that database, but I googled it and there's a page about it on IBGE's site here. It appears to be a database of information gathered from all of Brazil's cities more or less each year. The most recent survey whose questionnaire is online is the one for 2009; I looked through it, and most of the things they ask about won't be useful as known populations. There were two possible exceptions: they ask about the number of doctors, nurses, etc who work for the city's family health program (Q10.1.1 - Q10.1.4), and they ask about the number of people in the guarda municipal (Q12.1 - Q12.3). I'm not sure if either of those would be likely to be useful; perhaps someone who knows more about family health programs, or who knows what the guara municipal is would have an opinion? Also, the 2009 questionnaire doesn't ask about marriages at all, as far as I can tell, and the 2000 questionnaire is not up on the site. --Dfeehan 16h01min de 18 de Julho de 2011 (UTC)
I still haven't been able to find any info on marriages in the past year in what I think is the MUNIC database. However, I randomly poked through IBGE's FTP site and found what looks like a link to civil registration data, including marriages here. The only problem is that I don't know how complete or up-to-date it is. That is, I don't know if this is all marriages for 2009, or if this is just whatever data IBGE has been able to collate from the different municipalities. If we can convince ourselves that these are complete and accurate, then it looks like there are also data on divorces and separations, which could be another potential known population (if it's not too small). So the question is, how do we ascertain whether or not those files are complete and up-to-date? --Dfeehan 19h04min de 19 de Julho de 2011 (UTC)

Questões resolvidas (resolved issues)[editar código-fonte]

Include demographic questions about crack users[editar código-fonte]

    • Best would be to list crack users, but that might be sensistive
    • First ask general questions about age, sex
    • Then, ask to list each crack user, in a very careful way

We introduced aggregate gender distinction with NSUM-XCRACK18PLUSMALE. We list them by name and get more demographic information in the Questionário Sexo-idade Mortalidade questionnaire. --Solstag 11h38min de 6 de Fevereiro de 2012 (UTC)

Sample crack users only 18+[editar código-fonte]

    • People are uncomfortable talking about crack users under 18 (abduction)?
    • More directly maps into a population comparable to other studies
  • Sample network only 18+
    • Best to compare with 18+ crack users, more direct and less sensitive to distortions
    • Hard because data on some populations don't make that distinction

To address this, we ended up adding some new questions, like NSUM-XCRACK18PLUS, as a nested restriction upon the broader question. So, in the end, we get both. Doing that we don't really avoid the under 18 uneasiness issue. --Solstag 11h38min de 6 de Fevereiro de 2012 (UTC)

Relationship between TLS and ScaleUp[editar código-fonte]

  • TLS maps crack scenes, public places
    • TLS as a secondary measure, not comparable to ScaleUp.
  • In the ScaleUp:
    • Distinguish between private and public use.
    • Frequency of each use makes it more complicated.

We ended up adding some new questions, like NSUM-XCRACK18PLUSPUBLIC, to address this. --Solstag 11h38min de 6 de Fevereiro de 2012 (UTC)

Município[editar código-fonte]

Vocês acham no questionário estar claro:

  • que as perguntas restringem-se ao município
  • qual o município em questão

Olhando alternativas para formulação das perguntas, a atual:

1) "Quantas pessoas você conhece que residem em [nome do município]..."

Pode ser enganadora em municípios complexos ou áreas periféricas.

O Dimitri mencionou a possibilidade:

2) "Quantas pessoas você conhece que residem neste município..."

Parece-me melhor, mas não deixa claro qual o município.

Combinando ficaria:

3) "Quantas pessoas você conhece que residem neste município, [nome do município], ..."

Mas pode ficar muito comprido e repetitivo.

O que acham? Tem alguma outra saída?

Talvez a versão (2) baste, desde que acrescentemos um linha à introdução deixando claro qual o município do qual estamos tratando.

Mas como então ficaria a introdução?

--Solstag 04h12min de 27 de Janeiro de 2011 (UTC)

Ok, resolvi optar pelo caminho mais trabalhoso (2) e aproveitei para dar uma geral em todos os questionários para ver a consistência das frases e identificação do município. Abs! E fechado. --Solstag 04h53min de 27 de Janeiro de 2011 (UTC)

Robbery[editar código-fonte]

Matt and I were wondering if it might be interesting to also ask how many people the respondents know who were robbed or mugged; so, something like

How many people do you know who live in [city] and who were mugged in the last 12 months?

(That may not be the best phrasing.) Our understanding is that this is something that might be of interest in Brazil, especially since it might be difficult to get reliable statistics about this sort of thing in other ways. We would also be able to then look at how the responses to this question relate to other things we're measuring, for example acquaintance with people who use crack and sociodemographic characteristics like class. What do you think? [Dennis 05 Jan 2011]

We think robbery is too complex a category to fit into a simple question. We would need a whole different questionnaire to get meaningful answers for that. --Solstag 13h15min de 26 de Janeiro de 2011 (UTC)

Vivem ou residem[editar código-fonte]

Preferimos "vivem" ou "residem" na abertura de todas as perguntas do scaleup? Eu acho que tanto faz e podemos deixar vivem, mas pareceu uma pergunta válida. --Solstag 19h51min de 21 de Dezembro de 2010 (UTC)

Vamos trocar para residem. --Solstag 13h16min de 26 de Janeiro de 2011 (UTC)

Q 25. migration[editar código-fonte]

For question 25, do you really know how many people have left the city (out-migration)? How do you know this? --Msalganik 22h01min de 17 de Dezembro de 2010 (UTC)

In any census (not only the Brazilian) you may find two different questions for migration: 1) the so-called "last movement" question, which asks when and where the respondent lived before moving into his/her current household (city); 2) the so-called "assigned date", which identifies the migratory status (i.e., where the respondent was living) in a specified date (usually, the census determine 5 years before the reference census date). So, it is true that we never get the "real" number of migrants because we have stock information mixed with isolated flow information (like those we got from question 1 considering only the last year, for instance), and also because many times these migration questions are asked for a sample from the census. However, this time, the Brazilian census asked for all households the question about "Brazilians currently living abroad" (we have direct information about international emigrants; and also direct information about foreigners because all households also respond about their nationality/place of birth). Last but not least, if we tabulate all the information from questions 1 and 2 in a matrix with all cities (in Brazil, 5549) we can easily get the information for each city in and out migration. Sure, this is an estimate, or a proxy if you want but a very good one. [Dimitri 21DEC2010]
As this has been answered, closing question. --Solstag 13h54min de 26 de Janeiro de 2011 (UTC)

Proposta de nova questão sobre uso interrompido[editar código-fonte]

Como a janela de conhece é de 24 meses e a janela de uso é de 6 meses, seria interessante perguntar quantas pessoas se conhece que eram usuários porém interromperam o uso nos últimos 6 meses. Alguma variação disso seria ainda mais interessante? --Solstag 18h46min de 17 de Dezembro de 2010 (UTC)

Not to be implemented. We're asking about time evolution in the sibling questionnaire instead. --Solstag 13h54min de 26 de Janeiro de 2011 (UTC)

Police and military[editar código-fonte]

Police looks bad because they have different categories and we risk having not consistent data between cities. Perhaps all strictly military could work (army+navy+air force). --Solstag 16h05min de 13 de Dezembro de 2010 (UTC)

Armed forces are too few, so we're going with armed forces plus police, both military and civilian. --Solstag 15h29min de 17 de Dezembro de 2010 (UTC)
On a second thought, if we are putting police back, why are we adding armed forces? Isn't police large enough? We don't want to make the subpopulation more complex. --Solstag 18h43min de 17 de Dezembro de 2010 (UTC)
I think we don't want to make the subpopulation more complex and this is exactly the case for "police". Unfortunately we have a lot of variance here - I recalled that some cities may have "policia municipal" in addition, and this is a source for a new amount of "noise". In my opinion we should not use this subpopulation. [Dimitri 21DEC2010]
Ok, agreed. Let's take this off the map then. --Solstag 18h49min de 22 de Dezembro de 2010 (UTC)

Q 35. four children or more[editar código-fonte]

For question about people with more than 4 children (Q35), can we specify that this refers to mothers? For example, what if I know a family with 5 children. Would I answer 2 (the mother and father) or 7 (mother and father and kids). Just restricting to the mother seems most clear. --Msalganik 21h56min de 17 de Dezembro de 2010 (UTC)

Yes, this is a very good idea. "how many women that are mother of at least 4 children do you know?". "Quantas mulheres voce conhece que residem em [Nome da Cidade] e tem pelo menos 4 filhos?" [Dimitri 21DEC2010]
I've changed Q 35. to "Quantas mulheres você conhece que vivem em [nome do municipio] e tem quatro filhos ou mais?". Closing this issue. --Solstag 19h49min de 21 de Dezembro de 2010 (UTC)

Churches[editar código-fonte]

This is a follow up from the resolved "large subpopulations" discussion. We have been thinking that church communities are not so good because, even though they might be within the size range, they have a super high degree of selectivity. Neilane's observation is that the individual experience will be that either you're in the church and your personal network has way over the acceptable proportion, or you're not in church and your personal network has less than the acceptable proportion. This is specially true for Brazilian evangelical churches. So, in practice, the population is outside the range, and in a weird way. For this reason that is not a suitable subpopulation for a ScaleUp survey. But we thought it would be interesting to hear from Matt about this, so what do you think?! --Solstag 17h23min de 13 de Dezembro de 2010 (UTC)

I think Neilane sound right. It is not the size of the population that matters as much as the issue of estimating vs counting. For example, if you ask me how many professors I know that number would be too big for me to count so I would estimate it and that could lead to error. For big populations (like men) a question will lead to lots of estimating and is clearly bad. However, if there is a group that is unequally distributed (like people in specific churches) then that will also run into the estimation problem. I think it makes sense to drop the group if you think many people will not be able to answer accurately because the size is too big. --Msalganik 19h31min de 17 de Dezembro de 2010 (UTC)

Q 19. Bolsa Familia[editar código-fonte]

Is the Bolsa Familia program very popular in some cities? Remember that if the size of the group get's too large then people don't seem to be able to answer accurately. For example, anything over 5% seems to be too big. --Msalganik

We also need to know whether to ask this question about people, families or mothers. That depends on the kind of data we get from the government. --Solstag 02h12min de 8 de Dezembro de 2010 (UTC)

Ok, we're asking about mothers. And we're asking regardless of potential size issues because it's too interesting a subpopulation. Choosing mothers also helps reduce the size. --Solstag 18h43min de 17 de Dezembro de 2010 (UTC)
OK, sounds good to me as long as that is clear in the question. --Msalganik 19h48min de 17 de Dezembro de 2010 (UTC)

Large subpopulations[editar código-fonte]

What we saw in Curitiba was that the largest subpopulation, students in public middle school, was the one that most under-estimated when we compare the scale-up estimated size to the true size. This suggests that we might not want to ask about a group as large as students in public middle school. Note that students in private was estimated pretty accurately so it seems that subpopulation size, not middle school, was the problem. Here's a plot which shows the error in the size estimate for the 20 known populations (http://dl.dropbox.com/u/1437602/error_knowngroups.pdf). The x-axis is the estimates population prevalence minus the true population prevalence. -Msalganik 00h45min de 6 de Dezembro de 2010 (UTC)

Hum, this could be related to the size of the subpopulation and also to the problem of telling apart kids in public versus private middle-schools, however the estimate for private schools is ok. It is perhaps more likely to be an issue with students in what you call public middle schools in Curitiba coming from outside the city. So, since we must ask about people you know "who live in Curitiba", we have a problem. I'm not completely sure if this is allowed, but I think so, and I think it might be very common that people from nearby cities go to school in Curitiba, since it is known to have many satellite cities that are not as wealthy. --Solstag 02h10min de 6 de Dezembro de 2010 (UTC)

Yes, we should avoid all subpopulations that exceeds 5%, and in addition there are some subpopulations which have not accurate statistics at all. So, we have think about some questions:

  • Q5: definitely out. This is a big problem because public health statistics for car accidents are not accurate for the majority of those 27 cities;
  • Q7: definitely out. This subpopulation is systematically above the threshold of 5% for almost all 27 cities;
  • Q9: we should think about it. Only one city showed this subpopulation above the threshold. Maybe we could keep it;
  • Q11 and Q12: definitely out. Curitiba is an exception and for the majority of other Brazilian cities there aren't accurate statistics about high education at Universities (students and professors);
  • Q13,Q14,Q15,Q24: hopefully I'll get good statistics for them - but we should let them on observation (I should have a final answer by next Monday, 12th);
  • Q19: maybe this is gonna stay out. I need a little more time to get the updated statistics. It seems to have a lot of variation among the 27 cities;
Summing up, so far
  • we have "for sure" the next subpopulation questions: Q1,Q2,Q3,Q4,Q6,Q8,Q10,Q16,Q17,Q18,Q20,Q21,Q22,Q23 = 14 subpopulations "for sure";
  • we have on "stand by" but potentially good questions: Q9,Q13,Q14,Q15,Q24 = 5 subpopulations that depend on data accuracy;
  • we have a "maybe" subpopulation: Q19 = 1 subpopulation still depending on data accuracy AND variation around the 5% threshold;

But, we have also new subpopulations which can be added in the questionnaire since we'll have the data directly from Brazil Census 2010, and also from updated administrative data: N1(teachers = Q34), N2(people with neoplasia), N3(international emigrants = Q25), N4(foreign residents=Q36), N5(amerindians/indigenas), N6(all protestants), N7(only IURD - i.e., evangelists), N8(widows = Q33), N9(people with 4 children or more = Q35) = 9 new subpopulations.

N1 and N2 might be added right now (these are good data and below the 5% threshold). N3,N4 and N5 will probably show a bigger variation among the 27 cities and some might be above 5%. N6 is highly likely to be above 5% for many cities. I have a good expectation for N7, N8 and N9. Finally, I believe that we could use N1,N2,N7,N8 and N9 because it is almost sure we'll have good estimates and below the 5% threshold = 5 new subpopulations. Still, we should try keeping N3,N4 and N5 because we will be able to check them directly against the Census data.

So, we have: 14 subpopulations "for sure"; 5 subpopulations on "stand by"; 5 new good subpopulations; 3 subpopulations "for checking"; and 1 subpopulation "maybe" in; [Dimitri 07DEC2010]

Hi Dimitri,
I and Neilane talked and we like N1, N8 and N9 and have already added them to the questionnaire. But we have doubts:
  • How do we ask about teachers? Does the data you have include university teachers?
  • About evangelical people, or any church in general, isn't it a problem that they all know each other, since they go to church together? They might be under 5%, but for each individual their impression could be of over 50%!
  • We're also not sure about neoplasia. Do people know when someone else has a tumor? Neoplasia includes benign tumors, something a lot of people have but very few talk about.
--Solstag 02h10min de 8 de Dezembro de 2010 (UTC)
Resolutions from my chat with dimitri and chico today: we improved the teacher's question. We will ask matt about the problem with churches. We won't use the neoplasia question for the reasons above and some issues with how the information is collected. We will include foreign residents. We won't use amerindians because of the subjectivity of their designation. --Solstag 16h05min de 13 de Dezembro de 2010 (UTC)

Q 25. Migration[editar código-fonte]

Por favor, opinar na frase. Acho que tem que ser alguém que saiu desse município e foi pra fora, é isso? E não alguém que já morou na cidade e agora saiu do pais (podendo já ter morado em outras cidades antes de ir), certo? --Neilane

Dimitri falou que tá legal! Resolvido. --Solstag 15h37min de 13 de Dezembro de 2010 (UTC)

Q 32. Aborto[editar código-fonte]

A pergunta deve ser genérica, ou específica sobre aborto provocado (não espontâneo)? O que estamos querendo estudar? --Solstag 02h20min de 8 de Dezembro de 2010 (UTC) e Neilane

Diz o dimitri que aborto provocado vai causar confusão e é melhor simplificar a pergunta. Resolvido. --Solstag 15h32min de 13 de Dezembro de 2010 (UTC)

Q 12.[editar código-fonte]

This was a population that last time we found the size data was not accurate for. Is there any reason to think that this size data is more accurate now?

From an email exchange between me and Dimitri on Nov 3, 2009:

"Matt: 2) Is the number of professors correct given the number of students in federal universities? It seems like there is 1 professor for every 2.5 students. Maybe I should move to Brazil :)"

"Dimitri: You shouldn't think we have such distribution here! I didn't pay attention on it but I think I'll have to check it for double counting. It is possible that Ministry of Education double counted professors if they only got the numbers (and not real names or identification) of professors in each school. I think it didn't happen for teachers in municipal and state schools anyway."


All questions regarding universities have been removed. See discussion for "Q 11." --Solstag 01h38min de 8 de Dezembro de 2010 (UTC)

Q 11.[editar código-fonte]

Aqui vai ficar em Universidades Federais só? Ou vamos ampliar para Públicas e ou Publicas e Particulares?? --Neilane

Eu acho que faz sentido trocar para Públicas, pois as pessoas potencialmente confundem. Talvez o ideal fosse passar para universidades em geral, mas creio que seria difícil conseguir bons números para as particulares. Dimitri, o que acha? --Solstag 06h29min de 23 de Novembro de 2010 (UTC)
Eu acho que nos devemos tirar todas as perguntas relativas a universidades. As informacoes estatisticas variam muito entre os municipios e os dados mais recentes que temos sao muito antigos (2004). Curitiba foi uma excecao pois consegui informacoes do departamento de estatisticas educacionais do Estado do PR. Nem todos os estados tem um bom sistema de registros como o deles. [Dimitri 07DEZ2010]

Q 30. and Q. 31[editar código-fonte]

Is this the definition used by the Ministry of Health? I would be surprised if there is not something more concrete. For example, this has not time frame.


We asked a specialist, Adriana Pinho, and changed the wording to be less technical, but there is no time frame for either sex workers or man who have sex with man, because it is not meaningful as it is for drug users - MSM status hardly ever changes and SW status usually not known in such terms. --Solstag and Neilane