Correspondence between exported paradata and microdata

User Andreas has sent me the following question:

Is it correct to assume that in the paradata, taking the last answer set within each combination of interview__id-variable_name-roster … will give me the answer as exported in the microdata?

The answer is generally no, though it may often be the case.

While paradata records every new value specified by the interviewer/respondent and even the values set during preloading, events of other types may affect the value as contained in the exported data.

I can see at least the following example cases:

  1. The answer was set to the question, while subsequently the section where this question was answered closed due to an enabling condition. The last AnswerSet event may list some value as entered by the user, but the exported data will contain a missing value (specifically the one denoting missingness due to questionnaire logic).

  2. The question was answered for a roster instance, while subsequently a roster item was deleted and reinstated (such as in the case when the numeric roster trigger question was decremented and subsequently incremented). The answers recorded for the question pertaining to the roster item are purged by Survey Solutions when the roster is trimmed, and the new instance is created with non-answered questions. Simple retainment of just the last values of AnswerSet would miss this important change and yield incorrect values (again, some non-missing values instead of the missing ones, but in this case missing due to interviewer, not the logic).

If a particular question is not covered by any enabling condition and not part of any roster, than, I believe, there is no other reason for the value recorded in microdata to be different from the value recorded in the last AnswerSet or AnswerRemoved paradata event (whichever comes last), barring of course time consistency: the two exports (of paradata and microdata) must be close in time to leave no chance of an additional event/events taking place between the two exports. If that can’t be guaranteed (e.g. the server is being actively used during the data export process), one should take measures to export the microdata first, then paradata and subsequently trim the paradata set to only keep the events occurring before the microdata was requested.

For additional information refer to the Survey Solutions paradata format is described in details here:

Hope this helps.

Best, Sergiy