As follows from this source file the value of the interview key is just:
Random.Next(99999999)
formatted to a "##-##-##-##"
pattern.
If anything, I would be concerned about the interviewers behavior more than that of the software behavior in this case. Since the interviewers are seeing the interview keys, and being aware of the selection strategy, they would realize quite quickly that the interview 00-12-94-10
will have a higher chance of being selected than interviews 65-00-13-89
or 99-99-99-99
. So if there is anything they can manipulate for their advantage, they will do that (e.g. manipulate the household size, eligibility, etc).
So despite any good statistical properties of the interview key, itâs usage for this purpose is far from being ideal.
The rest of the answer may be unnecessarily, but just in case. Early versions of Survey Solutions didnât have a concept of an interview key at all, and it was added subsequently in an update, including, retrospectively to interviews collected to date. The strategies of assigning keys are somewhat different (when it comes to resolving key collisions), depending on the origin of the interview (existed at that time of the upgrade or a new one). The differences probably start to matter when the probability of a key collision is non-trivial (in the absence of any context, letâs assume it is something like: if your number of interviews is 1mln or more).
Given that it is very unlikely that you still retained interviews from a version predating 5.20, and that the number of interviews on your server is low, and the chances of a key collision in an EA are virtually non-existent, I wouldnât be too much concerned about the possibility of a bias in the assigned interview key.
Here are a few reassuring stats on the distribution of the digits in the assigned keys (using 601 interview keys, all generated in a modern version of Survey Solutions):
. chitest digit, count sep(0)
observed frequencies of digit; expected frequencies equal
Pearson chi2(9) = 7.6905 Pr = 0.566
likelihood-ratio chi2(9) = 7.5756 Pr = 0.577
+---------------------------------------------------+
| digit observed expected obs - exp Pearson |
|---------------------------------------------------|
| 0 481 480.800 0.200 0.009 |
| 1 481 480.800 0.200 0.009 |
| 2 528 480.800 47.200 2.153 |
| 3 483 480.800 2.200 0.100 |
| 4 478 480.800 -2.800 -0.128 |
| 5 466 480.800 -14.800 -0.675 |
| 6 497 480.800 16.200 0.739 |
| 7 458 480.800 -22.800 -1.040 |
| 8 460 480.800 -20.800 -0.949 |
| 9 476 480.800 -4.800 -0.219 |
+---------------------------------------------------+
and the Stata code used to obtain it:
clear all
use "interview__diagnostics.dta"
generate keys=subinstr(interview__key,"-","",.)
forval i=1/8 {
generate digit`i'=real(substr(keys,`i',1))
label variable digit`i' "Digit in position `i'"
histogram digit`i', d name(g`i') freq xlabel(0(1)9) color(stc1)
}
keep digit*
generate i=_n
reshape long digit, i(i) j(position)
label variable digit "Digit in any position"
histogram digit, d name(gtotal) freq xlabel(0(1)9) color(stc5)
graph combine g1 g2 g3 g4 g5 g6 g7 g8 gtotal, rows(3) scale(0.5)
chitest digit, count sep(0)