Export Individual Paradata File for an Interview

Hi Survey Solutions team

I would like to propose a new option to export the paradata file. I think it might be a good idea to export the paradata file for an interview or for an interview list.
I am proposing it because I currently have an overly large paradata file, which I cannot open with a simple note document reader.

The file is too big, I was able to open it with SPSS but I spent a long time waiting for all the rows to load in the SPSS panel and then when I tried to filter by specific interview_Id I had to wait a long time again.

My conclusion is that this paradata file generated for large surveys (both at the field survey duration level and at the questionnaire size level) becomes unmanageable due to the large amount of information it contains, for this reason I think that It would be a great idea to have the facility to generate a paratata file for an interview or for an interview list.

Thank you for your help and for always making Survey Solutions a better tool.

Hello Kevin,

  1. a notepad is really not the appropriate tool to work with this kind of files. It is neither intended nor optimized for working with large files.

  2. Statistical packages are handling the task better. But they all need to digest the data in text format into their internal structures (by reading the file and parsing its contents) before they can be manipulated by those packages. This means multiple reads of the files and analysis of every line on the go. Different packages may do that with different workload. For example, as of 2008 Stata was doing three full passes over the file:
    https://www.stata.com/statalist/archive/2008-02/msg00935.html

  3. While there is nothing specifically difficult about the paradata files, the share size of them is a common problem, since the statistical packages (Stata, SPSS, R, etc, not sure about SAS - there are some SAS users here, perhaps they can step in) tend to load the whole file into the memory. Furthermore, the memory footprint of the data imported from such file is usually larger than the file itself, due to string storage rules, and additional overhead, which is hard to quantify in general. (as a rule of thumb consider doubling the size of the data).

  4. A different strategy would be to import the data into a Database Application, from which to query the specific records.

  5. On the specific records - somebody has to extract them, spend time on processing. Either Survey Solutions server or the next program in the chain - the consumer of those data. For the moment we opted for the latter, as paradata is having a more prominent value as a whole, not as separate records plus the user has the freedom to pick and chose what is needed for the analysis.

Best, Sergiy

Thank you Sergiy.

I certainly agree with what you have mentioned.

However, I consider that when the HQ user exports the paradata file it is in order to review some precise data in the information collected in said file, so it would be interesting to have an alternate process that allows this revision task to be carried out in less time (avoiding having to generate and load row by row the conversion of the all paratata file in some statistical software because this is time consuming, it would also take some time to consider importing the data into a database application and then querying the specific records). With which it could be feasible to be able to generate the export of the paradata file for individual interviews.

Again thank you very much for the comprehensive explanation.