Cleaning the Data Export tab (deleting previously generated export sets)

One of our projects is running for more than a year now and we have over 20 questionnaires, so currently there are over 7000 previously generated export sets in the Data Export tab that are automatically created when downloading data via API calls. Does it affect the functionality or speed? Is there a way to clean the tab from old exported files? Could you advise if it is recommended to do so?
Thank you!

The protocol for v2 API export dictates that generation and downloading of export files are two separate actions. Thus you can only delete the generated export files if no consumer is expecting to find them for download. This requires access to the file system. In practice files that are older than about a week in an active survey are rarely useful.

Note, that the export files should be automatically erased when the coŕresponding survey is deleted from the server.

If your survey has ended more than a year ago, what is the point of keeping it on the server? If it is still active, perhaps it is a round 2? Etc.

Dear Sergiy, thank you for your reply. You are right, the survey is still ongoing, although we are using new questionnaires for the current round. Unless the export files affect neither functionality nor performance, we could keep them in. In any case, where are these generated export files stored?

Hi @sergiy , everyone

My question is similar to that of the OT. We currently have multiple teams running price surveys in multiple countries (though in the context of the survey solutions platform, all of them together are considered a single survey). Each country sees different challenges while in-country, and planning ongoing survey work can change day to day or sometimes even faster.

It’s necessary for us to be able to monitor the evolution of the surveys in near real time. I’m currently running a data export every hour to monitor progress. Unfortunately, the creation of so many export files results in disk space being eaten away very quickly which eventually results in the export service failing. Our server is managed by a third party. At the moment, to free up disk space, as far as I can tell I have no other option than to ask them to remove/archive old export files.

I’ve tried looking into only exporting interviews that have been completed/created since the last export but don’t see an easy way to do so with the package I’m using (the wrapper by Arthur Shaw). I’ve also seen that I can export to the cloud, but I’d have to go through our IT department for permissions to access the Onedrive API service and then request that the required changes are implemented server side as well.

A much simpler solution would be if I could clean or remove previously generated export files from the data export page, but I don’t see any options there. Is this simply not possible without direct access to the file system?

Hello Rens,

This issue appears periodically, both in the forum conversations (see e.g. Old export files ) and in the direct communications.

Exported data is available across all the HQ/admin users, so that the export generated by one is available to all of them. But deletion is a whole other story. So it is not allowed.

You can set up a script that will do that automatically, see e.g. this third party site:

or elsewhere in the internet.

From the security POV it should not be difficult to confirm that this command is not going to do any harm, since it doesn’t involve any other software than what you have already installed.

Export can be restricted to interviews changed after a certain date. The server provides the necessary API endpoint and parameters. The authors of individual API adapters may choose to implement only part of the functionality. AFAIK Arthur’s package is available in the source and it should be technically possible to modify it (but check the license first).

You can also opt for requiesting individual interviews data directly through API bypassing data export all together.

Thanks @sergiy ,

I think the option to request data for individual interviews directly through the API could work well for our monitoring requirements, especially if I can do this incrementally by identifying interviews that have been changed or are new since a previous run.

But there are periods that we can have significant numbers of new interviews coming in. Say I were to pull about 100 new interviews every hour or so, what kind of stress would this put on the server vis-a-vis requesting the generation of exports? Is this even something I should be worried about?

Rens

I’d say the workload on the server should be roughly equivalent to opening the interview by a supervisor the same number of times.

If you are only updating once per hour exporting should be more straightforward.

@renshendriks , via the API, one can export interviews that have changed in status between start and end dates, From and To, respectively.

With the R wrappers I developed, there are parameters from to to to provide these (optional) dates. See start_export() and download_matching(), for example. If either or both don’t work, please post a GitHub issue on relevant repository (i.e., susoapi, susoflows).

Thanks Arthur,

Before I start building/testing anything can I ask real quick: do the from/to fields only support dates, or actual date-time values? I see in the documentation on github that it takes UTC date-time inputs, but is the time part actually implemented? If so that would make the implementation on my part a lot easier.

Thanks!

Hello Rens,

Survey Solutions server should support date-time values. If not - it is a bug and must be reported.

For Arthur’s package the reference says:

from     Start date for interviews to export. date-time, UTC
to       End date for interviews to export. date-time, UTC

so should also pass the time component.

Best, Sergiy

Good question. In the spirit of being a thin wrapper, {susoapi} is just passing the arguments on to the body of the API request as you can see here.

That being said, let me do some testing and get back to you. If you happen to do the same for a quick case or two, kindly let me know the outcome.