Designer Roster Limit - Belize Census Questionnaire

Good afternoon Support Team,

Thank you for your continuous support and development of Survey Solutions. As we prepare for the re-instatement of our 2022 Housing and Population Census, the SIB is thankful to have Survey Solutions as its primary data collection tool.

I bring to you today a concern about one of the Survey Solutions questionnaire limits. We are currently exceeding the roster limit of 400 questions and require the addition of a few more items.

To understand why we have a large amount of questions in a single roster, kindly refer to this image. You will notice that in our Census, the base questionnaire represents one building. Each building, then might have any arbitrary number of Units, which is our base roster we have a base roster. Units, then, contain the actual questionnaires, both for Households and Individuals. As you can see, the number of questions in our Census Questionnaires (barely) exceeds the design limits. However, we need to include a few more key questions required by government and other stakeholders.

Note that for other questionnaires we have tried strategies such as decoupling rosters with the same trigger question which has the effect of having one logical roster divided across more than one subsection. However, for this Census questionnaire, the hierarchy does not lend itself to that. The Building contains Units, a Unit contains Households (with the Household roster trigger question INSIDE the Unit roster). Thus we cannot move the Household roster (the largest questionnaire, having 95% of the questions) outside of the Unit roster (the base roster) because it would be at a higher level than its parent. We get a designer error.

We therefore ask you for your feedback on the matter. We know that there are very good reasons for the imposed limits, both in terms of device/tablet performance and usability/UX. However, we are uncertain as to how to proceed with the necessary questions we must add. If possible, we would look forward to an expansion of this limit in one of the upcoming software updates. However, if you have any other recommendation kindly let us know.

Best regards,

Gian Aguilar
Systems Development and Data Processing
Statistical Institute of Belize

Hello Gian,

I have been working on a Building and Population Census (actually, still am) for another country.
We started with a questionnaire architecture like yours (one interview per building).

We didn’t run into the problem of exceeding the question limit, but considered it too dangerous to have a single interview for all the housing units (dwellings) in a building. The exact number of housing units is determined by the enumerator on arrival. What if he counts 85 units in the building and later discovers that there are actually only 82? He adjusts his count and the interview drops the excess units. If he makes a mistake he can lose one or more completed households (hours or days of work lost). Not to mention the risk of losing a week’s work if he loses or destroys his tablet, because he can only synchronize after completing the whole building.

So we changed the architecture to one interview per housing unit and included a question:

“Have you already completed the building section?”

This way he will not be asked the building questions all over again.

With this structure you can make the Building Section a separate top level section which is enabled by the above question. In your case you would gain the question count occupied by the building section for use in the Housing Unit/Individuals section.

Could this work for you?

Klaus

Cannot agree with @klaus more. I’m also advising several other countries for their census preparation and the same goes - be careful in deciding on the unit of your observation.

The technical side is relatively ‘easy’ - not thinking of the meaning of the levels, there is not real problem to relax the 400 limit and most likely we will implement this change. But from the survey questionnaire design, and more importantly, data flow management and control, I’d have a hard time justifying such complex multi-level approach.

To try and take it to an absurd extreme - what about having an interview for an enumeration area? i.e. multiple buildings - > multiple dwellings → … or city, or the whole country? one root interview for the census, roster of cities → roster of ea - > roster of buildings ->… you get my point.
Of course currently we do not allow this much in the system, but imagine we relaxed the limitation, would you want to have such a flow?

Whatever is part of one interview, can only worked by one interviewer at a time, travels (complete, reject) as one unit and is reviewed/exported/analyzed as such. But if you’re able to lower your unit of observation to a dwelling, then more than one interviewer can be working in a building, could be beneficial for large urban settings…

There is the approach outlined by @klaus to still have building section, which is asked only once, or another option would be to have two stage - dwelling listing/main interview approach - you have a separate questionnaire for buildings, fieldworkers list all the dwellings in them with corresponding questions and information, and synchronize up. Then you have a step (automated, using api) that does whatever validations are needed for the data and creates dwelling-specific assignments for the main interview. This will allow to have more granular view and control on each interviewer’s progress and as I said above, if need be, reassign some of the dwellings to another member of the team in case the original interviewer needs help.

Hi @zurab1 and @klaus,

Thank you very much for your input. These are certainly some ideas worth considering and discussing with the Belize Census Team. A few comments on my part before that.

**For the method described by @klaus **,

What if he counts 85 units in the building and later discovers that there are actually only 82? He adjusts his count and the interview drops the excess units.

We make our trigger questions of List type instead of numeric to avoid this situation. Enumerators enter unit numbers sequentially into the List-type trigger to create the units as they go. If they make a mistake they can delete or adjust a unit individually. We have other checks in place to ensure proper formatting of the numbering.

Not to mention the risk of losing a week’s work if he loses or destroys his tablet, because he can only synchronize after completing the whole building.

Indeed this is always a risk. However, does partial synchronization solve this issue? I have not tested it personally but I read the documentation on the release notes. Interviewers usually have internet access at least one time per day to synchronize.

With this structure you can make the Building Section a separate top level section which is enabled by the above question.

I am not sure if this would solve the issue, given that our rosters begin at the Unit level, the Building questions are already at the top level outside the roster. Then, if the Unit is of type “Dwelling” then the household questionnaire is enabled. The household questionnaire has 95% of the questions, the other 5% is in Unit level, the parent of the Household. So the “nestedness” of the questionnaire is not the biggest issue, as far as I can see.

For what it’s worth, we did employ more or less the methodology you describe at some point in the past, which was a slightly more literal translation of the Paper method. Nevertheless, there is some food for thought in your response that I will be discussing with the team.

**For the method described by @zurab1 **,

To try and take it to an absurd extreme - what about having an interview for an enumeration area? i.e. multiple buildings - > multiple dwellings → … or city, or the whole country? one root interview for the census, roster of cities → roster of ea - > roster of buildings ->… you get my point.

Yes indeed. That would be bad. In our case, we had decided that the “Building” was the best basepoint at which to start enumeration. We have a GIS team who has digitized most buildings in the country at this point and we have a mapping/navigtion tool that helps interviewers navigate to each building, at which point they would switch over to SS and open the corresponding questionnaire (we are aware of the Map capabilities of SS but our custom app has a few extra features that are used). Thus, we would prefer to keep the SS base questionnaire at the Building level.

The API approach you mentioned is very intriguing. It certainly has piqued my interest. However, a couple things come to mind immediately.

This implies some sort of latency period between the synchronization of the buildings/units and the creation of the “Dwelling-specific” assignments. Although Belize is small, we do not have internet in many rural areas and Interviewers would only have access once a day (we have verified the once-a-day metric). This would mean either waiting for the Dwellings-specific assignments to be created after enumerating buildings in a particular area, which is not ideal; or enumerating all buildings and units first, waiting for the Household assignements (which are the Dwelling-specific assignments) and then doing these. This would represent a fundamental change to the canvassing and enumerating approach by Belize and I do not believe would be something that can be discussed, tested and implemented before training begins in January.

Again, though, this discussion has certainly been though-provoking as to alternative ways to approach our issue. However, I still am not sure either approach would be satisfactory. In reality, I believe it is about 10-15 extra items which we cannot currently fit.

Looking forward to more discussion.

Best,
Gian

EDIT
LIkewise I would like to mention that we did not exceed the limit until recently, when some stakeholders (government ministries) reached out requesting the inclusion of a few extra questions. Prior to this recent event, we were JUST at the limit but not exceeding it. Hence not reaching out earlier about the matter

Hi @giansib,

Here are some additional comments to your answers:

We make our trigger questions of List type instead of numeric to avoid this situation.

You still have the possibility to delete list entries and lose the data. Won’t happen? I bet you it will!

does partial synchronization solve this issue?

You will have a huge increase in data traffic, especially in your case where a single interview may contain a large number of households. You would transmit 90% of unnecessary data on each synchronization (because it will already be on the server by then).

the Building questions are already at the top level outside the roster

Outside the roster or outside the section? Remember the 400 questions limit is per section.
Does your top level show more than one section? One as the parent of the housing units > households and one for the building questions? In this case you already have the structure I suggested. It doesn’t show in your diagram though.

Regarding automation through API calls:

We have 2500 enumerators for ~320,000 households. So we expect around 10,000 completed interviews per day. We decided it was impossible to verify and reject/approve 10,000 interviews every day, therefore I automated the procedures to the maximum. Automated assignment generation with preloading admin. areas and building identification, geographical clustering of these assignments (into weekly batches), automated daily verification with automatic rejection/approval, etc. All with daily and weekly report generation for supervisors and headquarters.

The only thing headquarters had to do manually was to select and upload new clusters for each enumerator from time to time and look at the reports. Supervisors were only in the field (no server interaction required).
In June we did a field test with 8 enumerators for 1100 buildings (2000+ dwellings) which showed the automation working flawlessly. It also revealed many enumerator failures which would be impossible to detect without automated daily verification (enumerators not fetching interviews, not sending any completed interviews, reporting 10 times more refusals than the average, etc.).

You may have less enumerators and households, but automation will still save you a lot of headache…

Best,

Klaus

Thanks again for taking time out to respond, @klaus

You still have the possibility to delete list entries and lose the data. Won’t happen? I bet you it will!

Yes I never meant to imply that it would never happen but that the risk of chunk-deleting multiple entries was greatly reduced.

It is worth mentioning that we have done two “Household Listings” in the past year, where about 30 buildings and 50 thousand units were enumerated using a similar general architecture, but with much less items of course.

You will have a huge increase in data traffic

Very good point and something important to consider. We would have to pilot the method and monitor the data usage.

Outside the roster or outside the section?

Both. In my diagram I was trying to convey Hierarchy, not that components all were in the same section. Building is the base questionnaire and has some top-level questions, including the trigger question for Units.

Units is in another section and contains only one thing: the Unit roster, which contains the Unit-level items, and the triggers for Household and Individuals. If you are willing, I am willing to share the questionnaire via Designer with you to take a closer look.

You may have less enumerators and households, but automation will still save you a lot of headache…

We will have around 900 enumerators and around 125,000 households. However, my main concern with the API was not the convenience or usefulness but the 1. change in canvassing/enumeration method and 2. the latency period; unless I am missing something. Would the flow of the assignments not be:

Enumerate Building >> Synchronize Building >> [[API CALL CREATES HOUSEHOLD ASSIGNMENTS00 >> Synchronize to receive Households >> Enumerate Households

So, in between the enumeration of buildings and their corresponding households, there is the need for internet synchronization. This would mean(I believe) would be enumerating all buildings first and then households? This would imply canvassing/enumerating the same areas twice. And In some cases, waiting until the end of the day (in our Rural areas) to gain internet access to synchronize buildings and receive their household assignments. From a groundwork perspective, this does not seem efficient for our context. And again would represent a big shift in how canvassing is done in Belize and would require testing and piloting, for which there is not enough time.

If there’s something I’m missing about the API flow, let me know.

Best,
Gian

Hi @giansib,

This is what @zurab1’s suggestion implies. In our census we do the following:

  1. We do a Building Listing to identify residential and non-residential buildings that have persons living there (e.g. the janitor of a school/hospital), including an estimate of the number of housing units (to help scheduling the workload).
    You can skip this step if you have up-to-date cartography data. You probably should include a second assignment for “New Buildings” in this case.

  2. A program then processes the exported data from the listing and creates building assignments with unlimited interviews (one per housing unit) through the API.

You could do only 2) if you feel you already have a complete list of buildings, but include a “New Buildings” assignment.

In our case (performing a listing survey before) yes. We give 2 EAs to each enumerator. The week before the census they do the building listing there (2 to 3 days) and produce the data for creating the census assignments and preparing weekly batches (we don’t want to overcrowd the tablets with more assignments than necessary).
The same enumerator who listed the buildings will then enumerate the households in these buildings for the census.

With this setup we require a synchronization once a week (to upload the next batch), but encourage daily synchronizations to avoid the possibility of losing interviews and to enable us to monitor the progress (mainly detect enumerators not producing interviews because they are ill, have deserted, etc.).

We are using the SuSo map dashboard, both for the listing and for the census. If enumerators open the map dashboard every morning before leaving, the background maps are kept in the tablet’s cache and they can switch to Airplane mode to save battery.

I have few projects at the moment, so, yes, I could take a look at your questionnaire.

Best,

Klaus

This is very interesting (and long) exchange, will give couple clarifications:

  • synchronization steps, both full and partial are ‘smart’ in a sense that only the new information is uploaded, not all. For example in the case of rejected interview with 5000 answers. Interviewer downloads all 5000 (because after complete the tablet copy is removed), but if only 3 answers were edited, after the complete, data on only those 3 answers will be sent up.
    Similarly, for partial sync, on every upload, only the incremental part is sent up. There is some overhead still though - when complete interview is sent with the full sync - there is some optimization done to cut out intermediate events (questions ware modified multiple times, enabled/valid events were fired as a result) and only the final status at the moment of completion are retained. But in the case of partial sync, all ‘raw’ data that are present at the moment of the sync are sent up…

  • Indeed, changes in the dataflow will affect the workflow of enumerators and other teams, so this must be a joint decision. However, many times we’ve seen that only one side of the cost is considered - there are options of paying enumerators for extra time when asking them to do the listing firs; there is also an option of buying cellular internet, renting satellite modems etc. Of course this will put extra costs on the budget.
    But similarly, cleaning data mistakes, supervision, recovery of lost records, missed dwellings etc are also not free, so ideally, all these issues together should be considered when deciding on the final flow.

  • and the last part, as I wrote earlier, from the software point of view, 400-item limit is somewhat arbitrary and coming from a technical considerations that can be revised, and we are going to increase the limit.

@klaus and @zurab,

Very insightful and useful feedback and much appreciated. The experiences and tips you are sharing will be very useful going forward. I know the Belize Census team will be grateful for the feedback and the knowledge shared based on past experiences.

We plan to do the Listing and the Census at the same time, in the same SuSo questionnaire using the architecture I described in previously and displayed in the image. We have done two recent building listings that covered around 60% of the country, so our expectation is that a large part of the cartography is up to date, however another full listing will be done (the other Listings were for other survey purposes such as LFS, especially as we collected contact numbers for CATI surveys).

Doing a listing of the buildings and units (dwellings) first and subsequently conducting the household-level interviews does sound like it has its advantages, and I believe this is how it was done in Belize in the PAPI era. While I am inclined to agree with the advantages discussed, I would have to discuss it with the Census team.

Thanks @zurab, for clarifying the Smart synchronization. This will certainly allow us to have enumerators sync on a daily basis and have a more real-time idea of the progress of enumeration.

And again, thank you and the team for listening and for potentially expanding the 400-item limit.

Best,
Gian Aguilar

Hi @giansib ,

I had a quick look at your questionnaire and I have the following observations:

  1. There are duplicate questions for the building materials (floor, roof, etc.) in the Census and the Unit sections. How can a unit have different roof material than the building?

  2. In the Unit section (the one that has reached the question limit) you have several places where many similar questions can be combined into a roster. For example instead of asking the quantity of each asset in a separate question you could reduce that to a fixed-items roster with a single question. Make it a flat roster so enumerators don’t have to enter and exit a subsection for each asset.
    The same situation goes for agricultural activities which could also be combined into a multi-select question if you drop the DK option (who doesn’t know if he exercises a certain activity?).
    Same for the Disability & Health section (4.1 - 4.9, 4.12a - g) and Internet Access.
    This should save you at least 20 - 30 questions.

Hi @klaus, appreciate the time taken to review our questionnaire.

  1. I will have to confirm as to why there are Building and Unit questions for building wall and roof material, (flooring might vary) but I might suspect it is indeed a duplicate. These are 2-4 questions however. EDIT: After a brief chat with subject matter, I was reminded that the Building Characteristic questions at the building-level are recorded via observation and are always collected. On the other hand, the unit or household-specific building characteristics are done via a combination of observation and verbal confirmation, with the assumption that the household respondent can confirm the observations. I do see, however, the possibility of merging these together, with the only caveat being navigation.

  2. The items that are not in a multi-select question is mostly because of the need of the DK/NS option. It would not be up to me to determine if the DKNS is reasonable. What I can say is, adding the DK/NS option with the Yes/No multi-select questions would be an very welcome addition for us. Likewise, there is some data-processing convenience in having certain items as ‘flat’ variables within the main file without having to merge and flatten in post. Thus having them as individual questions instead of a roster is a better option especially if there is only one question to ask per item. A roster starts providing advantages when there are multiple questions to ask per item. However, we can consider it if the team agrees.

@zurab Thanks again for mentioning in your last point about increasing the limit Zurab. Is there a tentative data when to expect this, which I can pass on to the Institute? Even a rough estimate would be helpful.