Errors with GraphQL interviews endpoint

Recently, I’ve been experiencing errors with the GraphQL interviews endpoint. Sometimes, requests are satisfied without problem. Other times, requests fail with errors that look to come from Survey Solutions’ resolver.

Here is a screenshot from 3 requests on the same server (version 21.09.9 (build 31228)) within minutes of one another.

The first request fails and seems to cite Survey Solutions internals. The second request seems to be a generic error like this. The third request is satisfied without issue.

To get a sense of what the code behind the function in the screenshots. My R wrapper function does the following in order:

  1. Fetches the count of all interviews. See request here.
  2. Makes a request for a block of 100 interviews, paging iteratively through interviews until the end of all interviews. See request made here.

Note: I’ve not yet tried this on all recent versions of Survey Solutions. But I have noticed issues on these versions:

  • 22.02.5 (build 32369)
  • version 21.09.9 (build 31228)

The problem may be with my code, but I’m perplexed about why the errors seem to be issued from Survey Solutions/.NET.

Is this an IOPS issue?

Likely not an IOPS issue if you are getting an error on the very first call. And if the screenshot shows not the first call, then the key may be in what was run before it, which we don’t see.

The same or similar query is run to page through the interviews in the HQ. If that is working fine then it is something else. On the other hand, if you are getting errors like (even not exactly as) shown in this thread:

then this could eliminate your code from the suspects list.

Hope this helps, Sergiy

Many, many thanks, @sergiy, for the very helpful response.

The issue I’m facing does indeed look like some of what’s being described in the linked thread. The error messages I’m seeing in the second request in the screenshot above are the same described in the thread. The error arises episodically, with no discernable pattern or frequency, just as described in the thread.

But my symptoms do differ from those described in the other thread in a few ways. First, I can’t seem to reproduce the error on the Interviews tab of the GUI that are shown here. Put another way, I’ve never seen GraphQL error messages in the GUI. That said, my most frequent interface is the API. I may not be using the GUI enough to notice any errors, if there are some. Second, I can’t seem to reproduce the error on the Banana Cake Pop that is described here. That said, I have the same caveat as with the the GUI. Third, the error messages that surface in the first request in my screenshot above do not appear in the linked thread, from what I could quickly see.

And there is one difference between my issue and what appears in the thread. I’m deploying Survey Solutions on a Windows Server 2019 instance on AWS that sits behind a load balancer rather than via a Docker container.

Tomorrow, I hope to shre server logs that show this issue in greater technical detail.

Nevermind, I reproduced the error on Banana Cake Pop. Several times, I made the kitchen sink request for interviews. Between each satisfied request, I waited at least 30 seconds. After around 10 attempts, I finally got the error below. (Note: I replicated this behavior several times. The error message laden return came at random intervals.)

@sergiy , would this be enough for a GitHub issue? Put another way, is there anything more I can provide?

I’m not sure I have exact steps to reproduce–except keep trying until an error arises… :frowning: