Roster ID variable should inherit the value labels of its trigger (multi-select) question

Suppose you have a multi-select question whichVeggies :

Which vegetables did your household consume in the past 7 days?

Green peppers...3
Other veggies.....4

Yes/No mode [X]

Suppose furthermore that this multi-select question triggers a roster named vegetables

Currently, when the data are exported, the roster ID question vegetables__id will be numeric values without value labels.

To construct labels, one needs to extract information from the .do associated with the main data file–getting the label values from the name of the multi-select var name (e.g., 1, 2, 3, 4 from whichVeggies__1, whichVeggies__2, etc.), getting the label text from the variable labels of the multi-select variables (e.g., the text that appears after : in the label Which vegetables did your household consume in the past 7 days?:Onions), and constructing a value label to apply to the roster variable.

With a new feature, Survey Solutions data export could automatically construct a value label from the answer options defined for whichVeggies and apply it to veggies__id.

Having a variable label for the roster ID variable is useful for quick review of the data and for labeling plots derived from the data (e.g., checking whether there are outliers in the unit price for each food item in the food consumption module).


I also reported this a while ago, issue #11264

It’s a little painful that all the rosters are represented slightly differently in the data export, especially when coding a piece of software to use the rosters - I don’t want to have to care which type of question spawned the roster.

From what I can see, it’s something like the following:

Multi select has the ID but no label in the ID column and no additional column with the variable name/value.

Variable Length List rosters have the ID in the ID column, but no label. However, it has an additional column filled with the variable value for each list item.

Fixed list has ID and Label on the ID column, but no additional column

I have no numeric roster questions so I am unaware how this is represented.

1 Like

Scott, thanks for broadening the discussion.

You’re right that it would be helpful to have rosters, to the extent possible, export the same form of data regardless of their trigger.

The ideal, by that logic, would be that all roster files would have the same form–say, a machine-readable ID column plus a human-readable (e.g., roster__id == 1, roster__title == “Scott”; roster__id == 2, roster__title == “Arthur”)

But I’d like to suggest that two different templates for roster files. For fixed and multi-select rosters, let’s have a single column that is labelled.The label for each row is already defined in the Designer, and the row numbers are the same for every observation in the data set (e.g., if table == 1, chair ==2, etc, those labels are universally valid for interviews with the questionnaires). For numeric and list rosters, let’s have a column for the row number and a column for the row text. For the list roster, this information is available elsewhere (i.e., in the i-th element of the list), but is more convenient to have in the roster. For the numeric roster, the roster text question is (optionally) chosen at design time, and (if that option is exercised) will be in the roster anyway.

Does this sound reasonable? Other views or suggestions?


I would be happy with this implementation.

Especially as although list rosters do have the information elsewhere, it’s not a direct mapping from roster_id x to the base_question_x column, you have to run through the roster to work out that it’s the i-th element and work backwards to the base_question_i-th column.

Since the list roster row text is also data rather than just a label I think it makes sense to (as you suggest) have this information in its own column.
And it will be very nice for the multi select rosters to have a label on their id column values. :slight_smile: