Paradata redesign


(Andreas Kutka) #1

Paradata redesign: I am working with the paradata for GHS at the moment, trying to identify a couple of things. I have a few suggestions for the para data redesign. Happy to join any talk on the para data design.

The big picture: as a survey user I am primarily interested in the interviewer activities and history of a file/an answer/a roster row. Being able to extract stories on that would be quite useful, there is potential to make it easier and to identify more indicators. I understand the developers team may have different needs, but maybe one can filter this somewhere around the export of the data.

Indicators I want to (and am) building is speed in answers per minute, periods of fast interviewing, the number of interviewing intervals, answers changed after rejecting, answers set outside the working hours. I think some of them might become quite generic quality indicators in the long run, as they should not differ too much form survey to survey (once you know your bench marks as a survey agency)

Now the details:

  1. I don’t rally need the question declared valid, opened, closed etc actions, they are inactive and make the data huge for your transfer and for me to use it (I write some weird Stata program to rewrite the exported file and filter it beacuse I cannot insheet it).

  2. The declared invalid is an interesting action, as it would give us a view on how many errors an interviewer gets flagged on the way before delivering a clean data file. It all links back to consistent vs true data, I think it might be quite useful to tell a few stories on data quality.

  3. Depending on the length, exporting files per interview file might be easier, the compilation now takes ages and the resulting text files push text editors, excel and stata. If you want to use it you can also write a little applet to append all individual files or loop over it (I found it easier in the past).

  4. It would be great to distinguish between tablet action and interviewer action, maybe in an extra column, and or an indication if questions are pre-filled. E.g. I tried counting the answers set before 7 am and after 7 pm, and had to find a away to identify and not count the pre-filled questions.

  5. it would be nice to be able to understand if answers have been changed, and ideally from what answer to what answer. I can identify the changed by counting the rows per variable and roster, but need to have a clear pattern that filters out list and multi select questions. Two purposes here, first high number of question changed may be indicative of problems, and second, after rejecting it would be nice to see what interviewers have changed, to see if they have cleaned in a good way. In the long run, this information might be useful for a post-rejection “what has changed summary” when files are resubmitted. In the para data it could be a separate Action QuestionAnswerChanged with an indication from what answer to what answer. For lists and multichoice one could ignore adding options and only count the dropping of options, or ignore them all together.

  6. There are missing cells, e.g. the responsible for Action Resumed .

  7. Resumed and Paused is often appearing too many times and sometimes has duplicates in the data.

  8. Action Answer removed does not have the var name but the identifier.

  9. All datetimes should be of the same time zone, it is confusing that the answer to datetime is tablet time and the action time in UTC (also I cannot currently work out if this is actually correct for the tablets from Nigeria). I would do everything in UTC, tablet actions and server actions.

  10. Sequence, not sure it is always in sequence, have no example to back it up, might be related to 9.

That ticket was moved to that forum from the old feature request board