PSA on importing Stata data into R

Public service announcement for other SuSo users who work between R and Stata, use SuSo’s Stata files, and ingest them with haven.

In the period January 13, 2020 and prior, I used the following in injest SuSo’s Stata data into R:

library(haven)
someFolder <- "C:/my/folder/"
someFile <- "some_file.dta"
myData <- haven::read_stata(paste0(someFolder, someFile), encoding = "UTF-8")

On January 14, I tried the same with newly generated data, and got an error message of this form:

Error in df_parse_dta_file(spec, encoding, cols_skip, n_max, skip, name_repair = .name_repair) : 
  Failed to parse C:/my/folder/some_file.dta: Unable to convert string to the requested encoding (invalid byte sequence).

Looking at the haven documentation, I realized that I didn’t need to specify encoding. See discussion of encoding here. When I removed the encoding specification, everything worked fine.

myData <- haven::read_stata(paste0(someFolder, someFile))

I’m not sure what changed. The haven package hasn’t been updated since November 2019. The export files I worked with on Jan 13 were generated by SuSo 20.01.0.1320. The export files I worked with on Jan 14 were from the same server, but I’ll have to update later on the SuSo version that generated them.

1 Like

Please recheck with the newest version. The version 20.01.0 had a Stata export-related bug in it, which has been immediately fixed in v 20.01.1

Dear Sergiy,

Great, I’ve just downloaded the Stata export file. Now all files contain variable names and their labels. So I don’t have to import from .tab-file to .dta-file to get variable labels. Thanks so muchh!!! ^^