Linking two interviews without synchronization

In our project, we need to link two surveys to the same respondent without depending on prior synchronization. In our case, there are two teams of interviewers, one responsible for gathering alphanumeric information (demographic, economic, and educational data of the respondent and their family group), and another group tasked with measuring plots, geolocating crops, etc. We require the ability to link two questionnaires referring to the same respondent (one with alphanumeric information and the other with geographic information) without the need of synchronizations since there is no internet signal in the field.

One alternative to achieve this could be accessing the interview key of the main interview in the questionnaire code (the number in the format 99-99-99-99) as a system variable named @interviewkey (similar to the @rowcode variable) and use this number as a seed to generate a unique number that can be registered in other interview. This would allow linking two interviews ex-post when there is no internet service during data collection.

We have considered generating a random number in the main interview to be recorded manually in the second interview, but there is a fear of not being able to achieve complete randomness in 100% of cases because it is a large universe of interviews (350k interviews).

What suggestion can you offer in this regard?

A few clarifying quesitons:

  • For the survey gathering the alphanumeric interview, are unit IDs assigned before fieldwork begins or during fieldwork (e.g., create a unique hhid for each unit)
  • Are the two surveys done sequentially for each unit (e.g., first, alphanumeric info; then, plot measurement)?

Thank you Arthur for your comments about this topic. This gives the oportunity to provide more information.

The situation where you pre-identify the units occurs when dealing with sample-based surveys. In these scenarios, the survey organizing team decides which respondents should be visited so that the inferences drawn from the study are applicable to larger populations (such as a country, for example). However, when conducting a Census, only the administrative geographic units of the country (states, districts, counties) are pre-identified. That is to say, the individual units to be interviewed are not pre-identified because the intention is to interview all of them. In this scenario, the interviewing team covers the territory entirely.

Regarding your question about the process, this is an agricultural census where agricultural producers are visited by covering the terrain. In the main interview, there is a roster to capture information about the plots they own. For each of these plots, information is collected about its size, ownership status, crops grown, crop varieties, cultivation practices, production obtained, etc. A producer may have more than one plot (some producers have dozens of properties scattered throughout the country), and some of these plots may be located hundreds of kilometers away from where the interview is conducted.

In these situations, the geographic information of the plot (longitude and latitude) is not captured by the same interviewer. Instead, another team in the field mobilizes to locate and geolocate the plots. To link both interviews (the one containing alphanumeric information and the other containing geographic information). In this situation a unique code is needed (using the key from the main interview would be a good alternative, such as 99-99-99-99, as it is the most random option available). A derivative number of the interview key could be entered as data to link the interviews together.

As an example os this implementation could be:

The interviewer of the main interview could share a number (calculated automatically by the questionnaire) this number could be calculated with he interview key of the main interview + a correlative of the plot + a verification digit . This number would be calculated automatically by the main interview to be shared with the interviewer in charge of conducting the second interview and capture the geographic coordinates and measure the plots along its contour.

We are thinking in the interview key to avoid collisions with other numbers. Using the owner id , name, and other data is not safe enough in an scenario oof 350k interviews.

this doesn’t look like a good option to me as the interview key may change.