Data Export Error -- only most recent generated file downloading

This is referring to our local server.

When I request a download either in the browser (HQ) or API, it is only sending the most recently generated data. It is not passing the correct data. For example, If I generate FormPeople v.3, and then I generate FormHouse v.2. When I try to download the FormPeople v3, I receive data for the FormHouse v2.

The .zip file name indicates FormPeople , but the data is FormHouse. I think this may be some kind of configuration or caching issue. I’m not sure.

It’s a known issue and was fixed in the 20.08 version of Survey Solutions.
Please download the latest version from portal site https://mysurvey.solutions/Download and update your local installation.

Thank you Vitalii,
I think this is our issue. We need to upgrade from version 20.07.
Fred

Hi, I am facing a similar issue while trying to export data via API v1.

My problem is the following:

Able to retrieve the list of imported questionnaires
{my server}/api/v1/questionnaires

Also able to retrieve information about the assignments of a given questionnaire
{my server}/api/v1/questionnaires/{QuestionnaireId}/{version}/interviews

However, I do get error 404 when I attempt to start the file creation
{my server}/api/v1/export/stata/{QuestionnaireIdentity}/start

I know that error 404 means that the required questionnaire was not found. However, I can access the assignments for that questionnaire. Has the v1 approach being depreciated for exporting data?

*I have used the same code to export in the past (last time in November 2020) and it worked fine, this is why I am wondering whether the V1 API has been depreciated

Any help would be very much appreciated :slight_smile:

Andres

@andresarau, you don’t specify, which version of Survey Solutions you are using and you don’t reveal the server name, but see here:

Hi sergiy,

server name: http://www.pulpodata.solutions
version:v20.12.0.2527

I see from your answer that I should migrate your code to use v2 API.

I’ll do that.

Thanks

Andres

  1. I just quoted another person’s answer (Andrii’s). The additional information that you’ve provided confirms that this advice fits your situation.
  2. migrate not my code, but your code (that calls the API)
  3. Note also that the following is not precise:

404 means the requested resource was not found. In this case:

  • server was found, proceed further
  • server/api was found, proceed further
  • server/api/v1 was found, proceed further, but
  • server/api/v1/export was not found, and hence the 404 that you’ve received and the chain didn’t reach the questionnaire ID. You would get the same with https://demo.mysurvey.solutions/api/v1/foobar

So the return codes shown e.g. in the Swagger reference should be interpreted with caution, that the right API procedure was called, otherwise their meaning may be somewhat different.

Thanks a lot, Sergiy!

I confirm that the V2 API is working just fine! I am now able to export the data.

For R users: This is how I am exporting the data from R calling API V2:

`

#First define the url

apiGenerate <- sprintf("%s/api/v2/export/", sserver) #sserver is my server

#Then start export file creation

response_generate = POST(apiGenerate,
                         authenticate(ssuser, sspassword), #this are my suso API credentials
                         body =   list(ExportType= "STATA"
                                       QuestionnaireId	 = quid, #this is the questionnaire ID
                                       InterviewStatus = interview_status
                         ),
                         encode = "json"
)

#get the links of files generated in the server
Myjson <- tempfile(fileext = ".json")

links_processes = GET(apiGenerate, 
                    authenticate(ssuser, sspassword),
                    query = list(exportType= ex_format,
                                 questionnaireIdentity = quid,
                                 interviewStatus = interview_status
                    ),
                    write_disk(Myjson, overwrite = T)
)

#finally, select the _export process ID_ to export the file:

#/api/v2/export/{id}/file

I hope this is useful to someone.

Andres

1 Like

@andresarau , I am not sure I fully understand the R-code that you’ve posted, but here is a rough chart of what is expected.


I am not seeing a delay line in your code, so I think you are making a mistake of hoping you will be the only user of the server and that it will produce the output immediately. Both are unlikely. It is much more likely that you post a job request, and then will have to wait periodically polling the status.

@sergiy, you’re right I did not include the delay line. The reason is that my example was inteded to be informative to other R users. However, I have included 5 second delay in the function that exports the data in my local workflow.

The diagram you shared is very useful and clear. Thanks

@andresarau,

I equally don’t see that you are using the JobID to verify it’s readiness status. Since you are not using the ‘response_generate’, you probably pick ‘some’ export, (perhaps satisfying the format, questionnaire and status), but not the one that was generated as a result of your POST method.

Again, this could be my lack of experience with R, or your incomplete code posted. If anyone more professional with R can look at or run the above posted code to confirm, please step in.

@sergiy ,

See the full code below:

1. Define API

apiGenerate <- sprintf("%s/api/v2/export/", sserver)

2. Generate file in the server

response_generate = POST(apiGenerate, authenticate(ssuser, sspassword), body = list(ExportType= ex_format, QuestionnaireId = quid, InterviewStatus = interview_status ), encode = "json" )

Wait 5 seconds
Sys.sleep(5)

3. Get the links and JobIDs generated in the system

#get the links of all files generated in the system
Myjson <- tempfile(fileext = ".json")

response_link = GET(apiGenerate, 
                    authenticate(ssuser, sspassword),
                    query = list(exportType= ex_format,
                                 questionnaireIdentity = quid,
                                 interviewStatus = interview_status
                    ),
                    write_disk(Myjson, overwrite = T)
)

3.1 arrange link by date so the latest file generated is exported (I am still working on this part to improve the syntax but this is the part where I verify that I am getting the latest file. I know that I can work with the jobID but I am still working on the code)

#arrange the links by date
response_json_link <- jsonlite::fromJSON(Myjson) %>%
  mutate(CompleteDate = str_replace(CompleteDate,"T", " "),
         CompleteDate = str_remove(CompleteDate, "\\.[^.]*$"),
         CompleteDate = lubridate::ymd_hms(CompleteDate)) %>%
  arrange(desc(CompleteDate))


#get the latest link
apiExport = response_json_link$Links$Download[1]

4. Export the data

response_export <- GET(apiExport, authenticate(ssuser, sspassword), user_agent(“andres.arau@outlook.com”))

REMEMBER THAT THIS IS WORK ON PROGRESS AND THAS SOME WORK IS TO BE DONE TO IMPROVE THE APPROACH TO TEST THAT THE JOB HAS BEEN COMPLETED. ALSO IMPROVABLE IS THE STRATEGY TO VERIFY THAT I AM GETTING THE EXPECTED JOBID (THIS CAN BE DONE MUCH MORE EFFECTIVELY… I AM WORKING ON IT)

The inputs of other R users would be great to learn how to make this process more efficient.

Hi Andres!

Some thoughts/ideas on your points raised:

  1. STRATEGY TO VERIFY THAT I AM GETTING THE EXPECTED JOBID
    It’s straightforward if you want to get the jobid from a new export request you just have posted (Step 2 in your code above): The JobId/Url is stored in the header of the response.
    Assuming a status code of 201 of your response_generate:

    library(httr) #Which you probably have used already
    exportjobid_url <- headers(response_generate)$location

    This way you can avoid step 3 for this task. In your current set up you might also run into the problem that other users are posting new requests after you ran response_generate: You would download their request and not yours.

  2. TEST THAT THE JOB HAS BEEN COMPLETED
    Once you have the JobId/URL you can use it to query if this particular export process is already completed. Basic syntax:

    repeat {
         Sys.sleep(5) #If you expect large datasets probably best to increase.
         ##Get detailed information about export process. See Api-Docs: GET​/api​/v2​/export​/{id}
         check_ready <- GET(exportjobid_url, authenticate(ssuser,sspassword)) 
         #Access info 
         content <- content(check_ready)  
         if (content$ExportStatus=="XYZ") {
         ##Check for different Export Statues. E.g. if "Fail" you want to break the repeat
         }
         if (content$ExportStatus=="Completed") {   
                #Get the download link which is exportjobid_url + "/file"
                download_link <- content$Links$Download
                # Download export file  
                response_export <- GET(download_link , authenticate(ssuser, sspassword),   user_agent(“andres.arau@outlook.com”))
               # Double check if redirect is necessary (for servers hosted by SurSol Team it's often necessary afaik)
                if (response_export $url!=download_link) response_export <- GET(response_export $url)
                ... Save your download file to your hard-drive...
                break 
          }   
    } 
    

All the code above doesn’t check for status codes of your request responses which I’d recommend.

In addition, you can consider getting the list of all existing export processes (which you do at Step 3.) BEFORE you try to generate a file in the server (Step 2):

Once you have obtained this list, filter it based on your body/specifications of the intended export request (Export Type Format, QXid, Interview Status).
If there is one process at all, take the timestamp of the latest request. Check if this export process is after the latest change to any interview of your respective questionnaire (see API-docs: /api/v1/questionnaires/{id}/{version}/interviews, response “LastEntryDate”).

If the latest change to an interview has been before the latest export process, just simply take the jobid of this export process. You avoid a new export process all together. Might be only worthwhile if there are multiple users exporting the data + you expect not that frequent changes to interviews (or new interviews at all).

Hope this is of any help. Happy to hear from you and other R users other approaches/ideas.