Hello,
Is it possible to export all the version of a questionnaire in one go. Thus have a single version, which will contain all other version, instead of downloading it version, by version.
Normally, when you introduce a new version the older ones become obsolete and the work continues on the new version. So one doesn’t need to keep on downloading the old data.
If this is not the case for you, please provide details.
Which indicates that all the old data and new data can be found in the new version.
Hello @sergiy, i have realized, the i cannot download the entire data as one set, as it is divided into version.
However, i want to download all the dataset as one single data set
If you want to download all of the data, you’ll need to:
- Export and download data for each version of the questionnaire
- Combine data from all versions into a single set of data files
While I agree that it would be nice if things were more convenient, I believe that the functionality is this way, among other reasons, because Survey Solutions’ export service would not know how to handle conflicts that might exist between different verions of the questionnaire (e.g., variable/value labels are different, variables are a different storage format, etc.). This type of data management task needs to be undertaken by someone who is aware of these differences and can decide on how to handle them.
Thank you @arthurshaw2002, but this is very inconveniencing for those of us, coming from different CAPI environments like ODK and Cspro, which allows you to download all as one, even when there is an update.
Whatever the CAPI program, one would need to have some script to handle these data management operations. From hazy memory, I seem to recall one would have a CSPro script for these actions in CSPro. For Survey Solutions, this script would exist outside of Survey Solutions.
For simple cases, where there aren’t any conflicts, one would simply need to write a script that iterates over folders where data can be found and does a few things:
- Loads a file in to memory
- Appends its contents to a same-named file
- Saves the combined file to another
If this is the point of friction, perhaps the community can create some tools to address these needs. (For my own part, I have some scripts for this in R, but haven’t bothered creating a function or package to share with others (or document for my future self))
What statistical language(s) are you typically using (e.g., Python, R, Stata, etc.)?
For less simple cases, one needs tools to address common problems. For the ODK world, the DIME unit in the World Bank has developed iecodebook. While I’m not an ODK or iecodebook user, I nevertheless understand that this is mean to addres many of the data harmonization problems I described above.
Thanks Arthur. I get it.
I have a function in R to concatenate datasets from different versions with differing variables (additions and deletions from one version to the other).
Sure, here it is:
merge.data.frames <- function(d1, d2) {
# if one data frame has no rows => return the other one
if (nrow(d1) == 0) return(d2)
if (nrow(d2) == 0) return(d1)
d1.names <- names(d1)
d2.names <- names(d2)
# columns in d1 but not in d2
d2.add <- setdiff(d1.names, d2.names)
if (FALSE & length(d2.add)>0){
print("*********** NOT IN D2 *******************************************************")
print(setdiff(d1.names, d2.names))
}
# columns in d2 but not in d1
d1.add <- setdiff(d2.names, d1.names)
if (FALSE & length(d1.add)>0){
print("*********** NOT IN D1 *******************************************************")
print(setdiff(d2.names, d1.names))
}
# add blank columns to d2
if(length(d2.add) > 0) {
for(i in 1:length(d2.add)) {
d2[d2.add[i]] <- NA
}
}
# add blank columns to d1
if(length(d1.add) > 0) {
for(i in 1:length(d1.add)) {
d1[d1.add[i]] <- NA
}
}
return(rbind(d1, d2))
}
If you have to combine more than 2 versions, you have to apply merge.data.frames() several times.
Klaus