No. of answered questions - interview__diagnostics file

Proposal

Related to this (already implemented) suggestion, it could be of use to many users to include the number of ‘Answered Questions’ into the diagnostics service file. This might be of particular interest for use cases in which the export service needs to be used in any case. One would not need to call the /api/v1/interviews/{id}/stats endpoint in a seperate workstream/exercise, also resulting in less system-resources needed (?).

The variable itself could be used for data quality monitoring purposes, e.g. to calculate the ‘speed’ of interviewers (# of answered questions divided by duration).

Implication/disadvantage

interview__diagnostics file gets another column, resulting in larger export files

Current work arounds

  • Call the /api/v1/interviews/{id}/stats endpoint to get value ‘Answered’
  • Work through all survey data files to identify answered questions (see missing values).
    Attention: One column/variable does not imply one question (e.g.multi-select question)
  • Analyse paradata and identify the number of ‘Answer Set’ by variable.
    Attention: ‘Answer Set’ does not equal to ‘Answered Questions’ as for each question an answer can be set multiple times.

Interested to hear other users (& SuSo developer teams) thoughts!

1 Like

Indeed the pros/cons of including new column to the export file(s) is as you describe. I could find a way to defend and criticize the approach at the same time…

I’d try to separate two use cases (what came to my mind, please do share yours): during the survey monitoring; and post-survey analysis.
For the first case, I personally like/prefer the api approach - don’t need to read/export all the data, only monitor incoming interviews and check/validate/react.

For the second case, I’d think that my analysis would involve some data cleaning and transformation of original raw files. So, there is a chance that pre-calculated statistics would go out of sync with my ‘current’ data, so I’d rather have procedures to calculate/actualize stats that I need based on live data.
This does involve ‘reinventing the wheel’ and possibly not arriving to the exact same results, but again, since presumably I’m working with a modified data, being true to the calculations at the data-collection stage may not be as important.

I would vote for whichever approach is easier/cheaper to implement.

While I might have a slight preference for including an additional column in interview__diagnostics, I also see the API as a better means to provide users with extended interview metadata beyond the core needs provided in interview__diagnostics.

Just my personal view. I would love to hear what others think.

I also see the two use cases you describe.

Based on my experience & anectodal evidence it’s main use would be during survey monitoring.

I also prefer the api approach, however, actually looking back the past 2 years, I almost always had to use the export service, as we were closely monitoring some or extensively survey data that was not accessible through API endpoints/GraphQL (see related discussion). So for these use cases, I could have avoided additional API requests, as almost all info from the /api/v1/interviews/{id}/stats endpoint can be found in the system- generated files but the # of answers set.

Maybe another advantage: Less experienced users who can/do not use api approaches could get access to this information?

Would be nice to hear from others if they a) faced the same situation / make use of “Answers Set” and b) if it was actually of any use at some point to identify data quality issues / interviewer behavior @klaus @ashwinikalantri @kv700032

Well, I have different opinions based on my experiences using Survey Solutions.

If I think like my typical users, economist people who administer the survey as HQ users, they prefer to have a column in the exported file with the number of questions available, number of questions answered, number of unanswered questions and number of questions with errors. Also, if possible, they would ask me if there is a report with that information in Survey Solutions.

Now, if I think as a developer, I would like to have an API that allows me to know all that information by interviewer, maybe by section and by questionnaire, because the developer is more interested in creating automatic processes to inspect the information when our interviewers are synchronizing their tablets and start building reports with that collected data.

The most important experience I can describe was when my users supervised our most relevant survey (ENIGH :honduras: applied for approximately 70 days), they opened the interviews and first looked at the color of the sections of the questionnaire (if everything was green, they continued with another interview, if there were sections in red or blue they spent more time reviewing), then they exported the data and focused on reviewing the answers in the most important questions and finally they reviewed some reports about the performance of the interviewers (time, speed and number of completed interviews).