New APIs that provide information about the exported files in tab format

Hello Survey Solutions team,

I would like to know if it is possible to have a new API that returns the name of the files created when you want to export data in tab format and the names and types of the variables present in that file.
For example:
FileName: Personal_Information: [{VariableName: Name, VariableType: String},
{VariableName: Age, VariableType: Numeric}]

This API could be useful when creating tables in the database to store the information of the exported tabular files.
Another option could be to include the type of the variable in an existing API.

Thanks SuSo.

There is a way to obtain most of this already–albeit not always before exporting the data. But it might be nice to have all of this info immediately available in a convenient format.

Before exporting

  • File name. While I don’t immediately see an automated way, I do see a manual one. Go to Survey Setup>Questionnaires. Right-click on the target questionnaire and select Upload assignments. Click on the DOWNLOAD .ZIP TEMPLATE FOR THIS QUESTIONNAIRE link in the right-hand column. The file names in this template will be the file names exported.
  • Variable names. After undertaking the steps above, one can find the variables in each file as their column names.
  • Variable types. First, download the JSON representation of the target questionnaire. This can be done via this endpoint: GET ​/api​/v1​/questionnaires​/{id}​/{version}​/document. Next, with this file in hand, one can see the SuSo type of each question. From there, one can know/infer/guess the export type.

After exporting

  • File name.
    • Option 1. Collect names of files with .tab extension extracted from the export .zip archive
    • Option 2: Parse the export__readme.txt file to obtain file names.
  • Variable names.
    • Option 1: Extract from each exported file
    • Option 2: Parse export__readme.txt. This file contains a comma-separated list of variables in each file.
  • Variable types.
    • Option 1: See process for “before exporting” above
    • Option 2: Let the function used for ingesting the files tell you the type. Your mileage may vary, depending on the values found in the files.
1 Like

I agree with you @arthurshaw2002 :+1:

Thank you for introducing us and explaining this process to us, previously we did most of the things explained in this process, but we noticed that they are not automatic and need human intervention, for this reason we think it might be a good idea to have an API that offers information in a format convenient and easy to use when you export in tabular format.

Also, if you have multiple choice questions, the variable name appears in the exported tab format files with two underscores plus code that is different to use the variable name present in the file export__readme.txt

Perhaps it could be a good suggestion to have an API that presents specific information about the structure designed in the questionnaire (questions, type of answer, list of options in the questions (for categorical questions)).

Thanks SuSo.

Why do you need the table structure before? and what exactly is ‘before’? we’ve been talking about adding this feature to the designer so that you get the structure as soon as you have your questionnaire, but I think for all practical purposes export file is enough - since we’re talking about automating things, the database schema will / should be created automatically, right? so you could generate export file as soon as you start your survey and even if there are no interviews collected yet, you will get those files with all the column headers in there.

@arthurshaw2002 already responded on the variable types, questionnaire document does contain all this info.

I’ve played with this a little and even have a (definitely not polished or final) working prototype, if you work with python you can take a look at https://github.com/vavalomi/surveysolutions_utils for some ideas - using export file structure to form the table/variable names, and by parsing the document fine-tuning the variable types.
Great thing of using sqlalchemy orm is that there is a layer of abstraction in front of the actual database engine - the same code should work to export to sqlite file, as well as postgresql, oracle, etc…

There is an interesting project that also does csv to sqlite conversion with dynamic schema generation (https://github.com/simonw/csvs-to-sqlite) in their case they just use the exported csv (tab) files and convert variables into either numeric or string variables, with optional conversion of datetime fields as well.