Hello! I am working with survey solution in its self-enumeration option. The survey includes two ways of randomization but after having collected approx 200 answers I am not getting the distribution of responses that I thought.
The first randomization is considering different sections of the survey. For this, I used the syntaxis from the “Public example Random subsection selection”. I computed a long integer variable named circunstancias_section using (int)Math.Floor(rdnumb.Value * 2.00) + 1, and then add circunstancias_section==1 //circunstancias_section==2 for each of the sections that I wanted to randomize. By doing this I got 60% with one section and 40% with the other. Do you think that the difference with the approx. 50%-50% distribution is for the number of cases?
The second randomization is for the order of questions. For that, I used the “Public example Randomizing order of questions” syntaxis. In this case, the results are more problematic. For example, the distribution for one of the questions shows that in 13% of the responses it was presented in the first place, 49% in the second, 22% in the third, and 16% in the fourth place. This unbalance in the order of appearance replicates (as expected) for the other questions. How should I deal with this? Is there any way of randomizing the order of questions to obtain a more balanced result?
One explanation could be that the distribution is actually close to 50%-50%, but there is an error when calculating that ratio (in other words, do you rule out an error in your tabulations?).
When looking at a particular survey I see the following distribution of the random value:
which may look random or not, but there are 299 values out of 601 which are smaller than 0.5 and the ratio of the two would be 299/302=0.99006623, which is very close to 1.0000 which would signify exact parity of the two choices.
To formally address this question one could do a one-sample test of proportion:
One-sample test of proportion Number of obs = 601
------------------------------------------------------------------------------
Variable | Mean Std. err. [95% conf. interval]
-------------+----------------------------------------------------------------
z | .4975042 .0203952 .4575304 .537478
------------------------------------------------------------------------------
p = proportion(z) z = -0.1224
H0: p = 0.5
Ha: p < 0.5 Ha: p != 0.5 Ha: p > 0.5
Pr(Z < z) = 0.4513 Pr(|Z| > |z|) = 0.9026 Pr(Z > z) = 0.5487
Which (I hope you agree), does not reject the null that the proportion is equal to 0.5.
You can do that with your obtained data, and see whether the “randomness” is well-behaving (close to 50/50). If it does, you may have distorted it in some later transformations.
(If I plug your values to the formula here I get 2.8284, which is somewhat bigger than the classical 1.96, though you may plug in your own preferred critical value).
I think this unbalanced distribution of random values is due to the low number of 200 cases (interviews). After 10,000 interviews it should be much closer to the expected value.
What I did in cases where I needed a precise random distribution, was to generate the random numbers beforehand, during survey preparation, and then preload them for the assignments.
For example for your two random sections, take as many values as you will have assignments, half of them as “1”, the other half as “2”. Then create a random permutation (rearrangement) of these values. Create a hidden question called random_section and preload it with the values generated before: assignment n gets the n-th value in the list.
In the questionnaire use random_section to select the section to show.
Klaus solution is not applicable in my case because I do not know the total number of respondents, but I will have it in mind for future surveys.
Regarding Sergiy’s answer, I agree with all the comments. I posted this to ensure that I was following the correct steps in my syntax and that I was obtaining the best I could have in my setting.