API question answer

Currently API can not modify any answers in the interview.

This limits the validations that can be done to just leaving the API-comments.

Suggestion is to permit modifying the answers in interview by the API either by allocating a separate (new) scope of question, or permitting the API to change any questions (e.g. API answer to supervisor question).

1 Like

A few disparate reactions:

If this is available via the API, would we want it available via the GUI as well? The GUI request has come up in the past.

However this feature is implemented, where would we store the audit trail of who changed what and when? Would this be put in the paradata?

Also, would we want to require a comment about why the answer was changed? Here I’m thinking of something like a commit message in Git that explains why changes were made.

Like the idea!

1 Like

I am afraid this will turn the feature into Supervisors typing in the data without interviews being conducted. The feature calls for API to have access to Supervisors questions, not supervisors having edit permissions for everything. Or otherwise who is the user of that hypothetical GUI?

AnswerChanged events may be recorded in the paradata, attributable to the user in the role=API and username corresponding to the API user name.

The idea is to mechanically insert a piece of information into the interview based on the values already entered, which is not available in the field. The comment would probably correspond to the purpose of the API, such as apiDriversLicenseProvider may be populates the drivers license numbers for each known resident, etc. So I don’t see that at the level of data points (answers), but such API exists already if needed:
https://demo.mysurvey.solutions/apidocs/index#!/Interviews/Interviews_CommentByVariable

The original idea, I now understand, is to have an API endpoint for the action of answering supervisor questions. That seems reasonable since, I believe, all other supervisor actions have an associated API endpoint.

My idea is a separate one: to allow both supervisors and HQ to change answers to questions. The rationale is to take account of a common pattern in surveys: a “supervisor” (i.e., Supervisor/HQ) receives an interview from the field, reviews it, has questions, calls the interviewer, discusses the issue, and agrees with the interviewer on changes to make. With other systems, this is possible. CSPro allows for this “secondary editing”. Working with raw data, one can make these changes in a script. With Survey Solutions, this process involves the “supervisor” rejecting the interview to the interviewer, having the interviewer make the edit and resend the interview, and asking the “supervisor” to review the interview again and approve/reject it. This adds a transaction to the process.

1 Like

I want to contribute one more use-case to having an API endpoint to modify interviews: coding.

This is in the context of classification codes such as ISIC (industry) ISCED (education) etc. We are currently exploring automated coding options with the help of other statistical bodies such as Stat Canada, where we would capture the text descriptions from the respondent, export and extract these text descriptions, code them via automated software, and THEN write back the code to Survey Solutions.

Currently, the process that we do is a manual one:

  • Interviewer captures text description
  • Supervisor has access to a supervisor-level question where they will enter the correct code based on the description.

The idea is that the coding is now to be done externally and written to the correct variable in the correct questionnaire, all automatedly with the use of API.

Looking forward to feedback!

1 Like

Which software do you use to encode the ISIC and ISCED codes?

How are the assigned codes used in the interviewing process?

I second this. To me coding is the most important use case for supervisor questions by far.
I’ve also made some attempts to automate coding (in post-processing), but abandoned it because most of the surveys I am involved in use languages other than english and the rules for automation are very language-specific.

We are attempting to incorporate the G-Code coding software, a Statistics Canada product, for the coding.

Coding happens alongside fieldwork to ensure that interviewers are capturing clear, descriptive, comprehensive descriptions that will help the “coders” (supervisors) to determine and assign the correct code. If descriptions are unsatisfactory, an interview is rejected and the interviewer is asked to provide a better one. This is why it is important for coding to happen during the data collection process.

However, coding is a very involved process and requires additional human resources. What we are trying to do now is to extract interview data as it is submitted (no issues here), run descriptions through G-Code for automated coding, WRITE BACK successful codes to Survey Solutions (not currently possible), and then let the coders handle the cases that were not successfully coded by the software. In our context, we cannot do coding post-collection, so hence why this is a very important use case.

Thanks @klaus for seconding.

I would rephrase that as:

If descriptions are unsatisfactory, an interview is rejected and the interviewer is asked to provide a better one. This is why it is important for rejection to happen during the data collection process.

If the description is not sufficient to deduce the appropriate code, then you reject it to the interviewer (action available in API) leaving an explanatory comment (action available in API).

The second part of your explanation contradicts the first one. The first paragraph says that the interviews with unsuccessful coding are rejected to the interviewers. The second paragraph indicates that they are passed to human operators (coders). So, I don’t understand how this protocol is going to work. If, perhaps, and I am guessing here, the text is sufficiently long and there were at least 3 attempts to code with software then coder must be engaged, then you can reject to a different person, and the coder is in fact another interviewer. This can also be done with the API.

Our colleagues from Honduras (@kv700032 ) can step in, as they have used coders in one of the surveys.

Of course the success of the above depends on how this G-code coder works (and I don’t see any information about it online, where is the manual?). Specifically, if it is not learning and produces the same verdict from the same input, then it is ok. But if it evolves somehow with the data that it sees, then it is a different issue, since it may be successful today when the survey is running, but fail later, after the survey is completed. Perhaps, the colleagues from StatCan (@lhunter) could elaborate.

I assume the answer to my second question is “Not used”, meaning that the code assigned in the above process is not used in any validation condition, enabling condition, filtering condition or calculation. Somehow you’ve both skipped this, but this is in fact what matters here.

Hi Sergiy. I’m sorry if what I said was unclear.

If the description is not sufficient to deduce the appropriate code, then you reject it to the interviewer (action available in API) leaving an explanatory comment (action available in API).

The coding process still happens during collection, not just rejection. Rejection is a step to ensure quality. The quality checks and coding go hand-in-hand

The second part of your explanation contradicts the first one. The first paragraph says that the interviews with unsuccessful coding are rejected to the interviewers. The second paragraph indicates that they are passed to human operators (coders).

No, the human operators first verify if the description is satisfactory to be able to assign a code. If it is not, they reject and await a better description. If it is, then they manually assign a code. Human operators do all the checks and all the coding. Interviewers capture descriptions. Keep in mind that CURRENTLY we have not deployed the software and these are all manual tasks. The hope is to get software involved to reduce the number of manual coding to only those cases in which the software is not able to do it at a satisfactory level. Manual coding and automated coding would happen simultaneously. Coders would also be verifying a subset of the software-coded cases, and the most convenient place to do this verification is also through Survey Solutions.

Once again, the feature of allowing an API call to modify responses (perhaps, just Supervisor-scope at first), would allow this process to take place conveniently, as coders would be able to use one software (Survey Solutions) to continue doing all of their work. The workload would just be a reduced one.

The specifics of G-Code should not be relevant to this discussion. It could just as well been any other software. The onus is on us to feed survey data into G-Code (which has an API as well), and this step is not an issue. The crux of the matter is just taking the output of some process, and writing it back to Survey Solutions. I provided the practical example of coding software, and explained our current process, which StatCan and @lhunter are also aware of. As it relates to G-Code + Survey Solutions, there are work arounds that center around the rejection/reassignment mechanism, which @lhunter has previously suggested to us, but the suggested mechanism of direct writing via API may be more practical.

Given that I saw that this feature had already been suggested, my intention was to provide another use-case.

Hi @giansib,

In Honduras we developed ENIGH in 2020, whose duration in the field was approximately 60-70 days of lifting due to the Covid-19 pandemic.

In our survey we had to code the Classification of Individual Consumption by Purpose (CCIF), the National Classifier of Occupations of Honduras (CNOH) and the International Standard Industrial Classification (ISIC). In our questionnaire design, we first had a text-type question to collect the detail reported by the informant, then we had a single-selection question to select the corresponding code. Both questions were answered by our interviewers.

Then we had a process similar to the one you propose.
The survey was carried out entirely in the field for 10 days, with intermediate synchronizations to carry out the Supervisor - Interviewer review.
At the end of the 10 days of conducting the interview in a home, our HQ assigned the interview to a user called “Critic - Coder” who had an interviewer role in Survey Solutions, which allowed modifying the responses obtained in the field.
This process was not optimal, it generated a bottleneck and the review of the interviews by the “Critic-Coder” was slower and slower. For this reason and supported by the opinion of external international consultants, the process carried out in Survey Solutions, which consisted of reviewing / modifying the answers assigned in the code questions, were no longer carried out.

In the end, it was decided to obtain a preliminary code, assigned by the interviewer in the field and after the interview was completed, through a data cleaning process, the corrections of the codes that had been misassigned by the interviewer were made.

@giansib if you manage to develop an efficient process that does not create a bottleneck, I would recommend using Survey Solutions, either by reviewing in real time (using the comments APIs) or by employing an interviewer role (assigned to an expert who can assign the correct codes according to the descriptions obtained). My recommendation is based on the fact that once the correct coding of the interview is completed, all the details are stored and there is a follow-up of the changes made, either in the comments file, in the stop file, in the status history of the interview and in the user logs.

HI @kv700032,

Thank you very much for your insight and input based on your experiences. Indeed we are trying to create an efficient process that would reduce the number of Expert Coders needed, which was the reason for attempting to integrate the software. In our case, given that Belize is a very small country, Interviewers are able to synchronize every day, sometimes multiple times per day, which helps in easing the bottleneck a bit since there is a steady flow of completed interviews.

Our current plans are to explore the use of automated coding by setting up an export process that feeds into G-Code and saving successful cases into external file which will later be merged with the final exported survey data. Whatever G-Code cannot code, those ‘Interview Keys’ will be extracted and programmatically re-assigned to a specified Supervisor-account, which the Expert will have access to and so they know which cases are those that are in need of manual coding. However, the “Comments API” suggestion you gave is also a very interesting line to explore. It will likely still require our “full” set of Experts, however, but it is an interesting proposal nonetheless.

Thanks again for your insight! If we do anything interesting I will let you know.

@giansib,

  1. what is the command line to execute G-Code to encode the occupation “drives a bus every day”?
  2. how does G-Code return the numeric value of the code that it assigned to this occupation?
  3. how does it signal that it failed to determine a proper code?

Not sure how this is relevant because we’re not trying to have G-Code communicate with Survey Solutions directly. We have intermediate R scripts and processes to take care of data export, processing and diagnosis, with the help of the SS APIs. The final missing link, if possible and only raised because I saw the request already existed, is being able to write back to Survey Solutions. What we write back to an interview could be the result of any abstract process but the current use-case is that of Coding. If you’re still interested in G-Code, I can send you the user manual. Your input has been appreciated, however, and we do plan to use the current available APIs to integrate our processes. The API still provides great functionality that we continue to leverage and incorporate to make our processes more efficient. We truly appreciate the feature.

For the process you describe (programmatically matching and rejecting if no match found) an API answering a supervisor question would be the ideal solution, because then you can (manually or programmatically) distinguish the successful matches from the missing ones during any point of the survey.
My doubts concern the capability of G-Code to find correct matches from arbitrary free-text descriptions (something also addressed by Sergiy’s bus driver comment).

I have done text matching for free-text meal descriptions in a HBS with a 98% success rate. But in this case the task was to find a uniform representation, not to match against a classification.
So the program can classify all of:

folha de mandioqua
folga de mandioca
follha mandica
folhas de mandiocs
. . . (many more variations!)

to the correctly spelt “folhas de mandioca” (manioc leaves).

But in order to classify a job description like “makes doors and windows” as

71 Building and Related Trades Workers (excluding Electricians)**
711 Building Frame and Related Trades Workers
7115 Carpenters and Joiners

would require a large number of rules, for example, associating “doors and windows” with “carpenter”.
Does G-Code have these rules already incorporated? Or is it a framework allowing a user to specify such rules?

The whole subject is of course extremely interesting and this discussion here makes me want to have a go at automatic classification again. But as I had previously mentioned it would be dificult to port it to different languages, especially the ones I myself don’t know :frowning:

1 Like

I think it would be of interest to the community, not just me personally, so I’d appreciate a link to the website where the G-Code program and the manual can be found.

One way or another you get the data out of Survey Solutions. This is a known process described in API documentation for Survey Solutions or the corresponding API client for R, Stata, Python, etc. What is the “missing link” is the private knowledge of the G-Code syntax that is not described (or I could not locate) in any of the publicly accessible sources.

@giansib I agree with you, you probably haven’t noticed that it was my feature request, at the top of this thread.

Hi Sergiy,

Yes I did notice it was your request indeed, which is why I decided to support/add to the request. It was the first thing I noticed

One way or another you get the data out of Survey Solutions. This is a known process described in API documentation for Survey Solutions or the corresponding API client for R, Stata, Python, etc. What is the “missing link” is the private knowledge of the G-Code syntax that is not described (or I could not locate) in any of the publicly accessible sources.

Yes, we use the API extensively and have been for a long time. We have no problems at all in using the API. We use it for exporting, for programmatically reading data, for populating a custom dashboard. Etc.

The simple request was to provide an API option to WRITE to a Survey Solutions question. I am not sure if there’s some mutual misunderstanding here as we have both repeated our same points over and over again.

Our issue is not getting data into G-Code, or recovering data from G-Code. G-Code produces text data that we can manipulate programmatically however we want. We would want to use this text and write it directly to a question. Given that this is not possible (I have not checked on this thread since July so correct me if it is), then we will write this text/code to the comment instead. Since the Comments API allows for this.

Apologies for not answering until now. Things are very hectic with Census being reinstated in Belize.