Aggregating multiple versions of the survey data (in SPSS)

Repeat the following steps for every file (roster) produced by the Survey Solutions system. In this process do not overwrite, modify or otherwise alter the original files exported from Survey Solutions. Protect them from accidental modification by setting the read only flag and retainig the backup copies. Save temporary files to a scratch folder, save resulting files to a final folder.

The whole aggregation process should be implemented as a script that can be run in SPSS (Stata, other package) taking the original exported data as input and producing consolidated output in the designated folder.

  1. for each question/variable, if the variable names have changed, rename them consistently. If the question texts have changed, decide whether the change was cosmetic (such as a typo correction) or substantial (e.g. reference period has changed from 7 days to 2 weeks). Decide on what is the preferred final variable label. Consult the original questionnaires used for each version in the data collection process.
    To rename variables use SPSS command RENAME VARIABLES:
    https://www.ibm.com/support/knowledgecenter/en/SSLVMB_23.0.0/spss/base/syn_rename_variables.html
    To assign/change a variable label use the SPSS command VARIABLE LABELS:
    https://www.ibm.com/support/knowledgecenter/en/SSLVMB_23.0.0/spss/base/syn_variable_labels.html

To rename multiple variables mention them in parentheses with their new names:

RENAME VARIABLES (X=HIREDDATE) (Y=FIREDDATE).
  1. for categorical variables - inspect the value codes used in every file. Make sure they are same. If the codes are different, the variables will need to be recoded consistently or kept as separate variables.

  2. for string variables - inspect the length (storage type) of each string variable in every file, make sure they are the same (SPSS limitation of appending cases is that all string variables must have the same length, Stata doesn’t have this problem and will expand the storage type to the widest). “If a string variable has the same name but different formats (for example, A24 and A16) in different input files, the command is not executed.” In SPSS select the widest storage across all the versions of the data. If your version of SPSS supports the utility described here:
    https://www-01.ibm.com/support/docview.wss?uid=swg21972488
    consider using it to automate the process of string variable width alignment.

  3. use ALTER TYPE command of SPSS to adjust storage types of string variables,
    https://www.ibm.com/support/knowledgecenter/en/SSLVMB_23.0.0/spss/base/syn_alter_type.html
    for example

ALTER TYPE name (A20)
  1. For appending harmonized files:
    use the GET command of SPSS to open the first file:
    https://www.ibm.com/support/knowledgecenter/en/SSLVMB_23.0.0/spss/base/syn_get.html
    then use the ADD FILES command of SPSS to append the subsequent files:
    https://www.ibm.com/support/knowledgecenter/en/SSLVMB_23.0.0/spss/base/syn_add_files_overview.html
    After all the files have been appended, use the SAVE command of SPSS to save the resulting aggregated file:
    https://www.ibm.com/support/knowledgecenter/en/SSLVMB_23.0.0/spss/base/syn_save.html

In the resulting file

  • if a new question has been added, earlier versions will have missing values in that question;
  • if a question was deleted, later versions will have missing values in that question.

If new rosters were added (removed) in later versions, their aggregated versions will not have records for earlier (later) versions.

For more serious changes, such as question type changes:

  1. if some versions contain a number while others a string - promote the answer to a wider type (string)
  2. if some versions contain the answer as a single select while others as a multiselect - promote to a multiselect (single select is the first category in multiselect).

To avoid the aggregation task, pre-test and pilot test the questionnaire and avoid modification to the questionnaire during the actual field work.

1 Like