List Question or Question from Roster group

Good morning SuSo Team,

We currently have a question using the “List Question or Question from Roster group” in which interviewer checks household members that meet certain criteria.

Let’s say this question in particular is named s9q20. Hence, the downloaded data will have s9q20__0, s9q20__1 and up to s9q20__n, with n as the potential maximum household roster size.

When observing the data downloaded from the server, we interprete a value such as s9q20__0==6 as the person with the corresponding position in the roster equals to 6 meets the criteria. Similar if we observe s9q20__1=7 and so on.

The issue we are having is that, for some households, we observe values greater than roster size. For example, we observe s9q20__0==8 and s9q20__1==9 , and yet there are only 6 people in the household.

These cases are easy to identify in the sense that if there are less people in the household than the number in the s9q20__# variables, we can check. But we are worried that this could somehow happen for cases where s9q20__# is reporting a number below household size and yet there’s a distortion.

Thanks in advance,


Hello Daniel,

the confusion is based on the difference between the code and the index, which are two different things.

What you see in the roster is the code of the person, which could be, for example:

  • 2 John
  • 4 Mary

The same information will be recorded in the HH file as:
member__0="John" and member__1="Mary"

The code is assigned when the corresponding item is created and stays with it forever or until it is disposed of. Unlike the code, the index is dependent on the other items. The above list could have been at some point:

  • 0 Peter
  • 1 Andrew
  • 2 John
  • 3 Jessica
  • 4 Mary

But then 3 persons were deleted (for whatever practical reasons the interviewer realized they are not members of the household). The remaining ones will get shifted across the member__* variables of the HH file, so in the left most position there will be the person with the smallest code, but their codes will not be renumbered (!) And hence the code is not equal to the index. (!)

It follows then that the code can be anything, code can be more than capacity (if there were many deletions). But what you can be assured is that in this case the person with smaller code precedes the person with the higher code (has a smaller index). (this differs for e.g. multiselect questions where the order of options is maintained, but this is different).

Depending on the analysis you may sometimes need the code and in other cases need the index. Use the roster file to see the code. If you need the index, in Stata you can do something like:
bysort(interview__id member__id): generate int member__index=(_n-1)

Best, Sergiy

1 Like


Crystal clear, many thanks,