Enumeration and automatic household sampling selection


I am working on a survey where the idea is to do a full enumeration (listing) of all households within a certain enumeration area (villages), and at the end, based on a few criteria, automatically select a sampled amount of households to be interviewed for the actual survey, to avoid two visits, as opposed to the traditional office sample selection. Is there such functionality?

Kind regards

The question is probably too generic and lacks important details.

The way I see it, it depends a lot on the criteria to define the sample. If these criteria need to consider all of the listed households (e.g. “select the 10 households with the most children”) then you would have to get the complete listing data from the server to determine the sample.
So, at the end of the listing the assignments for the actual survey could immediately be created by headquarters and would be available for the fieldworkers while they are still in the area.

If the criteria are per household (e.g. "select this household if there are any children under 5 years), then the sample survey could be integrated in the listing questionnaire and automatically enabled.

As @sergiy said, more details are needed to answer this question, especially about the criteria for the selection of the sample.

Thanks for your reply. Let’s say a village has 90 households, and I know this number based on other sources. I want to list all 90, but only measure the parcel of households that declare to engage in agricultural production, let’s say every nth (let’s say n=10). How could one go about it in SuSo?

In a village of 100 households (number of households is known per village ), let’s say I ask what activities the households engage in (for all the households). For those who engage in agriculture production, I want to systematically enable other questions, like area harvested, to only a number (n) of households, let-s say only 12 households, which would be systematically split based on the total number of households. I hope my goal is clearer. Thanks

Not really. Imagine you are the interviewer. You are in household nr. 1. They are engaged in ag production. Do you proceed with the questions into the area harvested or not? You can’t answer. The answer depends on the characteristics of other households that you haven’t interviewed yet (and randomness). That’s why you have to complete all listing first with eligibility information, then sample, then go into the main survey with the sampled households.

Since you have to go to HH 2, you leave HH 1. Since there is a chance that HH 1 is selected, you will have a second visit to HH 1.

Note that knowing the total number of HH in the area doesn’t help. Since you don’t know their eligibility variables, there could be all, some, or none eligible.

If you imagined it somehow differently, please elaborate.

What survey? Country? Company?

Tercio, what would be the problem to generate the actual survey assignments from the exported listing data? If that process is well prepared (maybe through a little program which generates the assignments with preloading) it should take no time at all to have the actual survey assignments ready for synchronization.
I see only a problem if there is no internet connection to synchronize the tablets.

Indeed completing the listing and then sampling HH for the main survey from the listing is by far the best option and the proposed solution, but the main issues are financial and lack of internet connectivity in most villages. So, this is just an alternative we want to test to see if it works: For all villages, we have the estimated total number of households as well as the estimated total number of agricultural households (from the agricultural census). For example, let’s say there are 120 HH engaged in agriculture, and I intend to systematically have a minimum of 12 and a maximum of 16 (this is just an example, sample team will define this) random cases to answer certain sections of the questionnaires. My question is basically how to establish an upper and lower limit of randomly selected households, based on the known estimated number of agricultural house holds. @sergiy, I write from Angola, we are testing this for a post-seeding survey, with the Ministry of Agriculture.

Sorry for all the confusion I am causing. To summarize, this would be what I am looking for:

What is the syntax to generate a random number between 0 and 1, where the chance of returning 1 is (let’s say) 10% (thus the chance of returning 0 is 90%), and this random number needs to be fixed, regardless of interview state, and also fixed every time the same interview is open (it should not change the number every time the interview is open or rejected/approved).

This will do.


You can define a Boolean variable R with the following syntax:


and it will have value true approx. 10% of the time, and that value will not change between the edit sessions or different users opening that interview - all the way from interview creation to data export, just like you described.

But, based on what you’ve described earlier, this is not what you may be wanting. Because you can get a situation with all 0’s or all 1’s in your enumeration area (because it is random, though the probability of this is not high) or a considerable shift away from your planned 10% (with a non-trivial probability).

A simulation shows that you may get something like this (results will change on randomness):

(for 10% sampling in 100 EAs each of size 200 hh).

          n |      Freq.     Percent        Cum.
          9 |          1        1.00        1.00
         12 |          1        1.00        2.00
         13 |          4        4.00        6.00
         14 |          5        5.00       11.00
         15 |          3        3.00       14.00
         16 |          6        6.00       20.00
         17 |         10       10.00       30.00
         18 |          8        8.00       38.00
         19 |         12       12.00       50.00
         20 |          8        8.00       58.00
         21 |         11       11.00       69.00
         22 |         10       10.00       79.00
         23 |          8        8.00       87.00
         24 |          5        5.00       92.00
         25 |          2        2.00       94.00
         26 |          2        2.00       96.00
         28 |          3        3.00       99.00
         29 |          1        1.00      100.00
      Total |        100      100.00

It is up to you to decide whether this is going to work for you.

Best regards to the colleagues in Angola!


1 Like

Thanks Sergiy, this is clear and we will run a few tests. Now considering the chance of having all 0s or all 1s, would it be possible to make the selection systematic in an infinite assignment? For instance, I estimate 120 agricultural households and I want only 12, so I would have k=120/12, k=10. Randomly select the first agricultural household (interview), and then select every kth… obviously this would be outside the scope of one interview (I understand there would be a problem with deleted interviews)… every kth interview would have a boolean for selection==true.

If you have one assignment for multiple interviews in a village you cannot determine “every 12th interview”.
If you would have one assignment per household you could do the random selection on the assignment level. But since this is a listing survey you will probably have a single assignment for all households you find. So this will not work.

1 Like