Text encoding from the API

Dear all,

I’ve downloaded a set of data via the API with HQ credentials, the zipping and unzipping of all the files is fine, however, the encoding for the languages is a bit funny. For example, all of my ñ are Ã-. I’ve tried setting the encoding to “latin1” when reading the data table in R, but it doesn’t help.

Has anyone else had this problem?

Alicia

Alicia, could you say a bit more about what you downloaded and what R commands you’re using?

My guess, without any further information, is that you should use encoding = "UTF-8" when ingesting your data into R.

For the data, are you reading in tab, Stata, or SPSS data? For the problems you’re describing, does this concern string variables, labels, or both?

For R, what command(s) are you using?

1 Like

here’s the code for getting the data from the API:

apiGetData <- function(headquarters_address = “https://XXX.mysurvey.solutions”,
export_type = “tabular”,
questionnaire_identity = “34a9cb83-dfb3-4f14-aa4e-d78932729a17$6”,
username = “XXXX”,
password = “XXXX”
){

sysinf <- as.list(Sys.info())
print(questionnaire_identity)
if(is.null(questionnaire_identity)){
  return("Questionnaire Does Not Exist")
}

#build query
action = "start"
query <- sprintf("%s/api/v1/export/%s/%s/%s", 
                 headquarters_address, export_type, questionnaire_identity, action)

data <- POST(query, authenticate(username, password)) #Note to reader, pw changed

action = ""
query <- sprintf("%s/api/v1/export/%s/%s/%s", 
                 headquarters_address, export_type, questionnaire_identity, action)

#Use the GET function in the httr package
data <- httr::GET(query, authenticate(username, password))


zipfilename <- paste0(unlist(strsplit(questionnaire_identity ,"-"))[1],"_","SuSo.zip")
#create temporary directory for storing and unzipping file
tdd <- tempdir()
if(dir.exists(tdd)){
  td <- tempfile("SuSo",fileext = "")
  dir.create(td)
  }else{td <- tempdir()}

#open connection to write contents
filecon <- file(file.path(td,"SuSo.zip"), "wb") 

#write data contents to the temporary file
writeBin(data$content, filecon) 


#close the connection
close(filecon)


#unzip zip file 
if(sysinf$sysname =="Windows"){
  zipF <- paste0(td, "\\SuSo.zip" )
  }else{zipF <- paste0(td, "//SuSo.zip" )}


unzip(zipF, exdir = td)

#read in data files and store as elements in a list
data.files <- list.files(td, 
                         pattern = ".tab")

names.files <- data.files

if(sysinf$sysname =="Windows"){
  data.files <- paste0(td,"\\",data.files)
}else{data.files <- paste0(td, "//", data.files)}



x <- vector("list", length(data.files))


x <- lapply(data.files, function(x) paste(x, collapse="\t"))
names(x) <- names.files

x
}

and the reading of the tables in the unzippzed files is this:
line2 <- unlist(strsplit(readLines(rosters[1], n=1),"\t")) ##since there is an error in the SuSo downloads to add an extra column
Listing <- read.table( rosters[1],fill = TRUE,skip = 1, header = FALSE, sep =’\t’, stringsAsFactors = TRUE, encoding = “latin1”)
names(Listing) <- line2

which gives me a table that has (where there is an issue with all of the accents and tildes)
PHL_FV_Loc_State PHL_FV_Loc_Dist PHL_FV_Loc_SubDist PHL_FV_Loc_Village PHL_FV_HoH PHL_FV_Measure_YN PHL_FV_Crop_Selected
1 Jalisco Cihuatlan Cihuatlan Cihuatlan José Frías Castro 2 plátano
2 Jalisco Cihuatlán Cihuatlán NA

You’re right the UTF-8 fixes it. For some reason I was thinking that this was the default…

1 Like