Show variables in Survey Statistics Report

Hi team,

It would be great to be able to access variables (even just simple long integers) in the survey statistics in headquarters. Example of how this would be useful - simple variable with number of household members - mean can be viewed by supervisors to check if any interviewers in team are under-enumerating the household roster.

Thanks!

+1 and to add a use case: We have a situation where in a questionnaire a large number or parameters are collected for the calculation of a gross margin (this is the same as in Export fails with extremely large number in double typed calculated variable). Now, we can do the calculation in a variable, but it appears to be impossible, currently, to use that calculated value in survey supervision to identify expected ranges, outliers etc.

Please elaborate. Suppose the value was calculated as X for a particular completed interview. What do you do (or want to do) with X after that?

Sure. Here is a description of a concrete use case from a colleague who is involved in the actual data collection work:

“So the typical case that we collect data on land size but in different regions we have a different measurement unit for the land size so we have one calculated variable to convert all other measurement units into hectare, during data collection we would like to use this calculated variable to see whether we are still in the right range of land size (what we know from previous years or secondary data) or we have maximum numbers that might be an outlier or a mistake in the data. Checking this kind of questions are not really feasible, especially when one works on a global level.”

For gross margins, it would be much the same thing.

1 Like

@kv700032 , @ashwinikalantri , @klaus would you like to try this case?

Okay, I’ll try. To make any meaningful suggestion I would need a more detailed idea when and how this variable (normalized land size) should be used.
If you have previous or secondary data you could use them during the interview in a validation condition, comparing the reported area to the “typical” area. For this you would have to preload such comparison data. You could then generate an error for outliers.

I usually verify this kind of data during daily monitoring, exporting the recent interviews and run a program to compare the reported data to the expected range, especially trying to find out if outliers are associated with certain enumerators.

(I remember specifically for land size that respondents (and enumerators) typically calculated a 50m x 50m plot as 50 m². So we changed the area question into 2 questions asking for length and width and had the tablet calculate the area.)

1 Like

I will try it too. At ENIGH we did something similar to normalize product prices.

But we used a lookup table to check if the value calculated by the variable was in the range of the maximum and minimum value defined in the lookup table.

In this case, I understand that we will collect the length and width of the land and then we will make a conversion to hectares.

1 Like

Hi - To add to this conversation as well as being able to view variables in the Survey statistics page it would be also great to be able to view questions which are based on cascading combo boxes. In a survey right now the fieldwork provider is given quotas based upon districts (cascading from province). Instead of being able to quickly view this in Survey Stats - I need to export a data set and run a table in SPSS or STATA to give them this update.

Thanks!

From what I understand, @tschoel suggests the use of validation from the new calculated variable (land in hectares). This could be helpful in some scenarios. A workaround this could be adding a static text that only shows up if the calculated variable is not in range.

As @klaus suggests, at HQ level, downloading the data and checking for outliers is straightforward.

It would be great to see the scope of SS expanding from a data collection platform to a complete survey management platform. We use dashboards created in Google’s data studio and with R shiny to monitor our data and survey progress. Our scripts run every 6 hours to download and wrangle the data and update these dashboards. But these are extremely specific solutions that need to be redone for every survey.

The survey statistics section can expand to include data monitoring (outliers, means, cross tabs, charts etc) and allow creation of dashboards for the same. May be adding the option to add some existing reports to these dashboards. This could be a one-stop solution for supervisors and administrators to look at the data and monitor progress.

I would also like api level access to the survey variables, so instead of downloading the entire data set every time to update our dashboards, we could just call for the specific variables.

1 Like

The Survey Statistics section shows interview aggregates (count, average, etc.) by team or enumerator.
Anything more specific like identifying interviews with certain variables outside some range or combining conditions on several variables, etc. would require saving these queries somewhere, otherwise one would have to key in complex conditions for various situations all over again when you want to examine them.

I think such complex situations are better handled in a dedicated program, especially as they are very survey-specific as @ashwinikalantri mentions.
A graphQL query retrieving only specific variables as suggested would make a lot of sense in this context.

What a wonderful discussion, All !

@tschoel above has posted the problem setting:

“So the typical case that we collect data on land size but in different regions we have a different measurement unit for the land size so we have one calculated variable to convert all other measurement units into hectare, during data collection we would like to use this calculated variable to see whether we are still in the right range of land size (what we know from previous years or secondary data) or we have maximum numbers that might be an outlier or a mistake in the data. Checking this kind of questions are not really feasible, especially when one works on a global level.”

Here several tasks are crammed together:

  1. Conversion of area to a standard unit (normalization);
  2. Use of the normalized value against benchmark boundaries in validation;
  3. Use of previous years’ data in validation;
  4. Making sure the supervisor is aware of the problem/error.

Refer to the “PUBLIC EXAMPLE Range Check Demo”.

Here is what it does:

  • allows the area of the plot to be entered in hectares, ares, acres or square meters;
  • normalizes the entered value to hectares and
  • displays the normalized value to the enumerator;
  • retrieves benchmark bounds (in hectares), which vary by region (specified in the lookup table);
  • displays the error when the entered value is outside of these bounds;
  • shows a message in the dashboard regarding the status of the land plot size:
    • not recorded;
    • entered, but can’t be validated;
    • validated and deemed invalid;
    • validated and deemed valid.

Here is the appearance for the supervisor and interviewer:

Notice the different messages on the status in the identifying information.

We can further restrict by question (such as region variable) or by variable (such as our msg variable) if we want to narrow our list of interviews to a particular area or status.

image

I am glad that the problem formulation mentioned specifically

what we know from previous years or secondary data

This is what allowed me to precalculate the bounds and include them as reference for every region in the lookup table. This is the typical approach, and I am glad that @kv700032 mentioned this practice.

I would have had a difficulty if

we have maximum numbers that might be an outlier

needed to be solved using the live data that flows into the system (you can expect that the first incoming interview would then be treated as an outlier, right?).

So the problem setting, (at least the way I am reading it), is disconnected from the original question in this thread, of producing statistical reports that utilize data from multiple interviews. Anything that you program in the Designer is happening within a single interview, so can’t be a solution to that. The functionality that goes across multiple interviews is built-in reports, which is limited, and currently does not support calculated variables and some question types.

For the moment, downloading the data via an API will allow you to quickly tabulate a particular variable and notify the corresponding coordinators of the problems, by sending them a report of a kind:

Region     Correct Land
-------------------------
North       95.0 %
South       93.4 %
East        97.8 %
West        92.6 %

Having such a report for a large survey is hardly instrumental. And the coordinator (HQ or supervisor) will still have to attend to individual interviews to determine what the nature of the problem really is. I am glad that both @ashwinikalantri and @klaus find it straightforward to download the whole dataset from Survey Solutions and apply external validations and report building.

Determining the outliers using external validation is currently the recommended approach within the current functionality if the data from the current round must be used.

Well, I mostly agree and I understand that Survey solutions does not intend to handle complex analyses of collected data. In fact, I was mildly surprised to find that such a thing as the Survey Statistics report even existed.

Finding outliers is always related to averages and percentiles in a set of interviews, so this is, in fact, related to the original question. Admittedly, it would be more elegant to solve these things in an external script or so, but at least in the contexts where we use Survey Solutions it is regularly much more difficult to assign sufficient developer resources (you have to find any in the first place) than supervisors and the like. So, it would indeed help to give the latter the ability to check those averages and percentiles so that they can look out for outliers when checking the interviews.

In any case, that functionality is already there with the Survey Statistics report. It is just artificially limited to not include calculated variables. I must admit that I am having a bit of a hard time understanding both, that reports presence as well as this limitation.

Oh, and: Thanks for all the tips. I am learning a lot here. Only that PUBLIC EXAMPLE that you linked appears to be inaccessible (designer says: “You don’t have permission to edit this questionnaire”).

Must be a glitch.
Could you please log-out from the Designer, then sign in and try again?

I agree with @lachb, It would be interesting to be able to display in the survey statistics report the categorical questions whose display mode is combo box or cascading combo box, as long as the number of categories is less than at least X number of options.
It could also be interesting to cross numerical questions with categorical questions.

@ashwinikalantri you’re absolutely right, being able to have an API that allows obtaining the values ​​calculated by a variable could facilitate in the future the use of a dedicated external program to check outliers, as mentioned by @klaus.

Finally, with the current capabilities provided by Survey Solutions, the solution proposed by @sergiy complies with the approach proposed by @tschoel. The example developed by Sergiy allows you to view the interviews that present some abnormal value and above all, the example developed allows the interviewer to be warned in real time about the error that has occurred and in case of ignoring it, the supervisor or the HQ can monitor it.

1 Like

@tschoel the questionnaire is read-only. You may want to make a copy to edit it.

Designer > Public Questionnaires > Search for “PUBLIC EXAMPLE Range Check Demo” > Actions > Copy