Include all answer valid/invalid events in paradata

Use case: The paradata currently includes all events related to answers being provided (e.g., initially set, changed, deleted, etc.) but includes only the last event related to answer being valid/invalid. In other words, the paradata describe whether the last answer given is valid/invalid, but not whether previous answers given were valid/invalid.

One potential use of paradata might be to investigate whether interviewers change answers in response to validation errors, and whether those changes suggest correcting an accidental error (e.g., changing 5000, which contains an extra 0, to 500) or editing data so that it fulfills a validation condition (e.g., entering 5000, seeing an error, changing to 4000, seeing an error, … , and changing to a final answer that satisfies the validation condition but may be inaccurate data).

But current paradata do not allow this to be done easily. While the analyst could, in principle, reconstruct from the paradata whether each answer change result in a valid/invalid answer, the task would be substantially easier if the valid/invalid state for each answer change were part of paradata.

Potential changes:

  • Record these answer valid/invalid events in the data sent from the tablet to the server. This could result in large data files that might be difficult to send over poor networks.
  • Reconstruct the answer valid/invalid events on the server based on data received. This would be a computationally costly exercise that may not be a good use of system resources.
2 Likes

Thanks Arthur. I would like to second this feature request. Being able to track error messages flagged during the interview would be a very useful feature for QA purposes.

Use cases:

  1. There is increasing concern that the validation rules added to questionnaires might help interviewers to fake individuals answers, sections or questionnaires. With the data of such interviews being consistent with checking rules, we currently have no means to identify and monitor this. Some users start arguing in favour of not using validation checks in interviews for that very purpose. Having records of error messages flagged during the interviewing process would provide us with a means to identify and monitor this type of behaviour.

  2. For QA work of any survey it is important to check the functionality of all questions and sections and ensure interviewers’ correct understanding. With interviewers tendency to “solve” errors by fiddling around with the answer until the error messages disappears and providing limited feedback up, and as data user to only being able to see the output (data) but not the process, one cannot identify which questions are difficult/not understood or not functioning. If we had records in para of errors and warnings flagging, together with the variable name and ideally the validation number, one can relatively easily monitor errors by interviewer, and over time, identifying those who did not understand something so they can be retrained, and to identify questions that do not function or validations that fire wrongly. Currently there is a trade off between the use of validations and motoring of errors over time for targeted feedback. The feature would remove the trade-off.

  3. Since e.g. count of errors and warnings flagged during an interview is survey independent, there is potential to develop one or more standard, off-the-shelf QA indicators that can be used for every Survey Solution survey to assess the quality of the interviews. One could test them in experiments, as well as the usefulness of in questionnaire validation checks. There is potentially a bit of publications etc that one could produce to demonstrate the benefits of Survey Solutions QA system.

  4. Paradata in its current format is almost too large to be useful. It contains many passive events that make up 3/4 of the para data, make exporting and downloading time prohibitively long, but are not at all needed from a survey quality point of view (I spent quite some time trying to get something useful out of them). I end up using para data only sporadically, because it is taking too long. If it were to be faster it could be more comprehensively integrated into QA systems that are being built by some of the Survey Solutions users.

Changes required to have such functionality:

  1. To shorten para data and make it more usable, the following actions can be dropped. VariableDisabled, VariableEnabled, QuestionDeclaredValid, QuestionDeclaredInvalid (at the end of an interview). There is no recognizable use for them in a survey QA monitoring exercise and they slow down export and downloading. If they are needed for internal purposes, they could be dropped upon para data export. I am happy to go into more details about any of them if of help.
  2. Add entries into the paradata every time a warning or error message is fired, with the variable name, and if possible with indication if it is a warning or error message, and the validation number (the one that is displayed in the interviewer, to know which validation rule fired).
2 Likes