When doing a simple survey, many users lack the expertise to programmatically run through the *.do file to replace the values with their respective labels.
The most basic case of single select questions provide numeric outputs in the data export.

This makes it extra work for the user to then programmatically run through the *.do or to run a vlookup() on excel after manually copying the values from designer before running it through whatever data analysis program that they are using.
Or having to manually change the resultant analysis output by their label.

The modern data analysis programs have no problem with aggregating strings, so it would be great if there was an option to export the data with labels rather than values.

To the extent that I am aware do files may only be run in the Stata software, which allows the user to resave the data with labels instead of codes (values) with a single command. See help for outsheet. I am not sure how the designer is involved. Please clarify.

If you don’t have access to either Stata or SPSS, you’re likely to use the Data Export of Tabular

It’s possible to programmatically parse the *.do file to replace values with labels in the csv.
It’s also possible to use SpssLib or similar to parse the SPSS files and export them as csv with the labels.

However both of these require a level of technical skill not often available to a surveyor, so they might be forced to do a vlookup in a spreadsheet program, having copied the values and their labels from designer.

It would be much easier for the user to just be able to export the data in tabular format with labels rather than values in the export.

Neither SPSS and Stata are free to use.
When you are in an organisation with a very limited budget, buying these programs is simply not an option.

So it would be nice for the data export to consider users without either SPSS or Stata.
Which it does with the DDI and the tabular export, it’d just be nice to have this extra option.

As an R user, I agree that it would be more useful to have the tabular export the value labels instead of the values exported in the Tabular data.

Right now, I have to label them manually in my script by getting the value labels either from Designer, or the do file, or use a package to read in Stata file. All of which are not ideal. I agree with @macuata that it would be nice for the Tabular data export to be easier to use for users without SPSS or Stata.

@macuata, are there particular programs you have in mind? Would the ideal input data have categorical variables be strings corresponding to their labels? Do you see any second-best options (e.g., a easily usable dictionary of variable and value labels)?

Like @l2nguyen, I’m increasingly an R user. For R, there are a few paths to the desired end result with the existing export data:

  • Get labels from Stata (or SPSS). This would involve reading the Stata (or SPSS) file into R, and then changing the labelled variables from numeric to factor.
  • Get labels from Stata .do file. This would entail injesting the .do file as a structured text file. The user could subset to the labels of interest, apply them to variables, and then convert numeric variables to factor.

The first option is much easier, in my opinion, than the second. But–here I agree with you both–the workarounds may be beyond the coding skills of many users. For that reason, tab export of categorical variables as factors is, in my opinion, the first-best solution.

Tableau is the program I am using to create visualisations from the data I’m collecting, along with with generating HTML reports with my own custom program for all the villages/districts/provinces.

But even for a spreadsheet program it would be easier to create charts straight from the tab file if there was an option to be exported in pretty much the reverse fashion, with the labels in the tab and values in a separate file.

It is an option to export a dictionary/lookup table but I agree it’d be a distant second option as the first option allows the data to be immediately used by both someone of a low technical background and R users.

I expect the vast majority of the people I will be sharing the data with to run very basic analyses, such as how many houses in the village what are the houses made of, which villages don’t have fresh water, which village has the least toilets per person etc.
It’d be nice to not have to do quite so much post processing in order for the data to be easily usable.

