Variable containing random integer within range

No, URBANC is calculated at the very start of the interview, out of any roster.

Hi Sergiy,

I just sent you the green checklist report. Let me know if anything is missing and if there is anything I can do to solve this issue. Thanks a lot.
Best,

Sorry for the multiple messages. Could the issue be related to the server version? Our server currently runs 19.08 but a new server version seems available (19.08.1). Please ignore if this does not make sense.

Dear Colleagues,

thank you everyone who contributed descriptions of symptoms and tracing the events regarding the above issue.

  1. the problem has been investigated and was traced to changes in the external library code that we are using.

  2. Our developers are working on a fix for this issue.

  3. At this time there is nothing that the affected users can do. Yet, if your cloud server time is expiring please submit a request to extend the server. You will need to re-export the data with the hotfixed version (see #5 below).

  4. The data entered by the enumerators is not lost and should reappear (selection restored) for the supervisors/hq users once the hotfix is applied. Correspondingly is will also appear in the exported data.

  5. Any questionnaire using randomization will need to be re-exported after the hotfix to have the correct state and have the correct values of the random value in the sssys_irnd variable.

Best, Sergiy

Hi Sergiy,

These are good news indeed. Thanks for the clarifications. Would it just be possible to keep us informed about when the hotfixed version is released? Our data collection is still ongoing and for now, I told our QC to ignore any issue related to the questions where randomization is involved (simply because they will not be able to know whether this comes from the bug, or from an actual interviewer’s mistake). Thanks again.

Best,

Benjamin.

Colleagues!

  1. version 19.08.2 that is currently being pushed to the servers will prevent the issue with random value evaluated differently on the tablet and on the server from appearing again.

  2. New interviews collected with this version should not suffer from this problem.

  3. Older interviews collected with the previous versions still suffer from the same problem. With 19.08.2 the users can:
    a) supervisor change the language of the interview, than revert it back to the previously set language (if the questionnaire is multilingual);
    b) supervisor/hq answer a supervisor question (if the questionnaire contains a supervisor question);
    c) supervisor reject back to the tablet, interviewer synchronize and complete again on the tablet (all data survives, just click the complete button again); then resubmit the completed interviews (synchronize).

This doesn’t deliver bullet #4 here yet. We are searching now for the fully automatic solution to automate c) and not involve the enumerators for surveys where there is no supervisor questions/multiple languages.

Will keep you posted here.
Best, Sergiy

Ok Sergiy, that is helpful for now, thank you. I am testing your solution c) right now so we do not lose too much time on QC while waiting for the automated fix. But I am not sure I understand the process well: to get the data back to normal, do we need to successively apply the three steps a) to c)? In our situation, we have no supervisor question, but we do have different multiple languages, so the process I plan to implement is:

  1. reject the questionnaire from HQ back to the supervisor;
  2. supervisor receives the rejected case, change the language of the interview (is this step necessary? interviews were conducted in Khmer -> should the supervisor then switch to English, and then back to Khmer?);
  3. supervisor rejects the case back to interviewer;
  4. interviewer synchronizes, receives the case and only needs to complete it again, and synchronizes once more to send it back to supervisor;
  5. supervisor checks if the data is now correct.

Does that look correct to you?
Thanks,

Benjamin.

Hello Benjamin,

A, B, and C above are alternatives. Either of them should achieve the same result. A and B are more attractive, since they can be undertaken centrally, e.g. by the supervisors, without having to coordinate with the interviewers and consumption of the mobile network traffic. Yet they require particular features (supervisor question for A, multilingual for B), which may not be available, hence C is the universal recipe.

In your case the HQ or supervisor can switch the language for an affected interview (collected with the earlier version than the hotfixed 18.08.2). That should immediately reveal the data exactly as the interviewer saw it (with the difference being the language). Switching the language back would cancel out the language difference as well and retain the correct state of the interview.

There is hardly any visual evidence which interviews have been already recovered that way and which haven’t. So you either track them somewhere, or attach a comment, or advise on another method that has worked for you.

For questionnaires with random selection of a respondent for an extended interview, the situation is usually simpler, since you will likely have a large number of not-answered questions, that would be a pretty obvious signal of the interview not recovered yet.

I believe the language approach should work regardless whether you are an HQ or supervisor, as long as you can open the interview for viewing the content.

Hi Sergiy,

We tried solution C yesterday and it worked well. But after reading your message this morning, I tried solution B (language switch) from admin, HQ and supervisor accounts and it does not seem to work for any of these account types, not sure why. I’ll stick to solution C then.

No problem for me to identify the interviews with random number issues, as I have a variable storing the random value (called “RANDOM”, which stores the random value after it changed) and other variables depending on this random value. In the Stata file, “sssys_irnd” corresponds to the initial random value (assigned to this case when it was created, before the change). Therefore I can easily identify in the export Stata file which cases had their depending variables affected by the random value change, reject them back to interviewers and implement the solution C that you proposed yesterday.

All of our interviewers are using v19.08.2 now, and I hope that this issue will not occur again. I will run regular checks to monitor this (by exporting and checking the data as described above) and will get back to you if we meet any further issues.

Thanks. Best,

Benjamin.

  1. I will check what may be the reason that B didn’t work. Benjamin, please send to support email the name of the server and interview key, where strategy B didn’t work when you tried it.

  2. The approach described by Benjamin is valid. After the hotfix the variable sssys_irnd will be exported correctly for all interviews. If within the questionnaire there is a variable that preserved the original value returned by Quest.IRnd() that was distorted on the server, the two will be different and this will point at necessity to reject and resubmit that interview. Unfortunately, not all user questionnaires/surveys have such a value stored, hence we are still in the process of determining of a mechanical procedure that would recalculate all the interviews without having to have the interviewers involved (that can be executed solely on the server).

Hi Sergiy,

I see that there is a new Interviewer app version available. Does it mean that the above issue will now be automatically fixed? Thanks,

Benjamin.

Dear Benjamin,

when you check the data on your server you should find the issue fixed in all new interviews and all old interviews, except the ones rejected and received by the interviewers. The interviewers will find all the entered data in them, and once they resubmit them (they can still correct any other issues as usual) the supervisors will also find the data in correct appearance.

Best, Sergiy

Excellent, thanks a lot!

Status update: The issue should now be fixed in all the surveys on all the cloud servers that are managed by our team: https://*.mysurvey.solutions

Hi, I think there is an issue with this code: the Round() cuts the density of marginal numbers (here 1 and 8) in half (see mini simulation in PDF attached). I think the code should be: (int)Math.Floor((9-1)*Quest.IRnd() + 1)

randomization.pdf (113.1 KB)

Could you please be more specific? What is the issue?

In other words, the code that was provided does not seem to select numbers 1 to 8 with the same probability, but 1 and 8 (the largest and smallest) get half the probability of the other numbers.

Cool. Where in the above the same probability was requested?

Sergiy, in my understanding someone asking to generate a random number between 1 and 8 in a CAPI context would want them to have equal likelihood.

To reply to your (I think rhetoric) question: where in the above was requested “a random number where the first and last occur half as often as the others”?

Bottmline, I was just flagging this such that others do not repeat a mistake we did…

Best,

This is my understanding too. But people are different. Angkor originally asked:

The solution mentioned by Misha above satisfies that:
1) it results in a number that is random;
2) the number is between 1 and 8
3) the number is an integer

Those interested in details about how computers operate with random numbers may refer to e.g. this very nicely written chapter: https://galileo.phys.virginia.edu/compfac/courses/practical-c/02.pdf
(see paragraph 2.7).

Assuming that the original poster indeed wanted equal probabilities for each outcome, the critique of @FMBARBA relies on Quest.IRnd() to have equal probabilities of 0.1 and 0.2 and other numbers in his/her simulations. Whether it is true or not is a different question all together. Survey Solutions relies on C#'s built-in random number generator (which generates random numbers uniformly distributed on the interval [0;1) ):

        public static double GetRandomDouble(this Guid id)
        {
            Random r = new Random(id.GetHashCode());
            return r.NextDouble();
        }

It’s properties are sufficient to make a conclusion, that if you start from a random point determined from the id (seed) and take a million realizations, then the obtained sample will pass the uniformity tests (will look like a uniformly distributed random value).

But in practice, don’t forget, that Survey Solutions doesn’t do that. Instead a million random number generators are generated, and each generator is asked for a single random number. And in order to make them produce different results, the id of the interview is supplied. This brings in two potential problems:

  1. whether the GUIDs of the interviews are generated uniformly (I know they appear as text, but think of them as huge hexadecimal numbers)
  2. whether the algorithm that initializes random number generators in C# is somehow treating them unequally.

Both are beyond immediate intuition, they may or may not be true.

In addition, the random realizations here are not independent from the human will. An interviewer may instantiate an interview, discard it, and repeat, until she gets a number sufficiently close to the desired (and you can get an estimate of how many retries she needs for that).

So, whether values obtained from Quest.IRnd() in a large sample will follow a uniform distribution, is (imho) not guaranteed. Whether they are seen as such, you can ask other colleagues doing massive data collection, such as the users from Statistics South Africa or Zambia CSO.

In practice I would guide myself by the following rule: if the need for randomness is purely to make the questionnaire look differently, such as present the options in a different order, then none of the considerations above are of any importance. Good to go with Quest.IRnd(). But if your need for randomness is some sort of design experiment where rigorous assumptions about distributions are made and must be followed, then the question is not so simple.

So, no, not a rhetorical question.