What a wonderful discussion, All !
@tschoel above has posted the problem setting:
“So the typical case that we collect data on land size but in different regions we have a different measurement unit for the land size so we have one calculated variable to convert all other measurement units into hectare, during data collection we would like to use this calculated variable to see whether we are still in the right range of land size (what we know from previous years or secondary data) or we have maximum numbers that might be an outlier or a mistake in the data. Checking this kind of questions are not really feasible, especially when one works on a global level.”
Here several tasks are crammed together:
- Conversion of area to a standard unit (normalization);
- Use of the normalized value against benchmark boundaries in validation;
- Use of previous years’ data in validation;
- Making sure the supervisor is aware of the problem/error.
Refer to the “PUBLIC EXAMPLE Range Check Demo”.
Here is what it does:
- allows the area of the plot to be entered in hectares, ares, acres or square meters;
- normalizes the entered value to hectares and
- displays the normalized value to the enumerator;
- retrieves benchmark bounds (in hectares), which vary by region (specified in the lookup table);
- displays the error when the entered value is outside of these bounds;
- shows a message in the dashboard regarding the status of the land plot size:
- not recorded;
- entered, but can’t be validated;
- validated and deemed invalid;
- validated and deemed valid.
Here is the appearance for the supervisor and interviewer:
Notice the different messages on the status in the identifying information.
We can further restrict by question (such as
region variable) or by variable (such as our
msg variable) if we want to narrow our list of interviews to a particular area or status.
I am glad that the problem formulation mentioned specifically
what we know from previous years or secondary data
This is what allowed me to precalculate the bounds and include them as reference for every region in the lookup table. This is the typical approach, and I am glad that @kv700032 mentioned this practice.
I would have had a difficulty if
we have maximum numbers that might be an outlier
needed to be solved using the live data that flows into the system (you can expect that the first incoming interview would then be treated as an outlier, right?).
So the problem setting, (at least the way I am reading it), is disconnected from the original question in this thread, of producing statistical reports that utilize data from multiple interviews. Anything that you program in the Designer is happening within a single interview, so can’t be a solution to that. The functionality that goes across multiple interviews is built-in reports, which is limited, and currently does not support calculated variables and some question types.
For the moment, downloading the data via an API will allow you to quickly tabulate a particular variable and notify the corresponding coordinators of the problems, by sending them a report of a kind:
Region Correct Land
North 95.0 %
South 93.4 %
East 97.8 %
West 92.6 %
Having such a report for a large survey is hardly instrumental. And the coordinator (HQ or supervisor) will still have to attend to individual interviews to determine what the nature of the problem really is. I am glad that both @ashwinikalantri and @klaus find it straightforward to download the whole dataset from Survey Solutions and apply external validations and report building.
Determining the outliers using external validation is currently the recommended approach within the current functionality if the data from the current round must be used.