R package httr2 may improve assignment creation timings considerably

httr2 is the successor of the httr package, which was the Swiss Army Knife for http request over the past years.

It is now out as a stable release, and while there are still many features to be tested in the httr2 package, and i will provide feedback on this when transiting the SurveySolutionsAPI package to it over the next months, there’s one significant feature i wanted to share with you right away, namely the functionality to carry out the assignment creation process in parallel, with a single function. Given that several users have approached us how to make assignment creation faster, in particular for very large surveys or censuses, I thought this may be a very useful feature for others too.

Below is a simplified implementation of the original suso_createASS function from the SurveySolutionsAPI package, with the httr2 package. The speed improvement over a standard sequential approach seems to be massive, even if you have used parallel processing, or the httr::handle command previously.

Have a look for yourself, and test it on your system. I have tested it only up to 10k so far, which resulted in a total completion time of 209 seconds, but I would be curios what the results look like when it is 100K or more.

The default setting for concurrent connections is 6, but can be changed with the pool parameter and by using the curl::new_pool(host_con = 100, total_con = 100) function. For my tests i have used 100 concurrent connections, playing around with it may also improve its performance, but may on the other hand create problems for your server, so be careful!

df Dataframe/Data.table with assignment data; must contain as columns the Identity variables,
plus a column with ResponsibleName and Quantity
server Survey Solutions server address
apiUser Survey Solutions API user
apiPass Survey Solutions API password
workspace server workspace, if nothing provided, defaults to primary
token If Survey Solutions server token is provided apiUser and apiPass will be ignored
QUID the questionnaire id
version the questionnaire version

suso_createASS_httr2 <- function(df = NULL,
                                 server = SurveySolutionsAPI::suso_get_api_key("susoServer"),
                                 apiUser = SurveySolutionsAPI::suso_get_api_key("susoUser"),
                                 apiPass = SurveySolutionsAPI::suso_get_api_key("susoPass"),
                                 workspace = NULL,
                                 token = NULL,
                                 QUID = NULL,
                                 version = NULL) {
  library(httr2)
  library(jsonlite)
  library(data.table)
  
  # A. Check if all inputs are provided except for workspace which can be NULL
  if (is.null(df) | is.null(server) | is.null(apiUser) | is.null(apiPass) | is.null(QUID) | is.null(version)) {
    stop("Not all inputs provided")
  }
  
  # B. Check if is data.table if not transform to data.table
  if (!is.data.table(df)) {
    df <- data.table::as.data.table(df)
  }
  # C. Copy data.table to avoid side effects
  df <- data.table::copy(df)
  
  # D. Default workspace
  workspace <- SurveySolutionsAPI:::.ws_default(ws = workspace)
  
  # E. Base URL and path
  base_url <- server
  path <- file.path(workspace, "api", "v1", "assignments")
  
  # F. Questionnaire ID
  quid <- paste0(QUID, "$", version)
  
  # G. Transform input df
  respname<-df$ResponsibleName
  quant<-df$Quantity
  df[, `:=`(ResponsibleName, NULL)][, `:=`(Quantity, NULL)]
  df<-as.data.frame(df)
  
  # H. Request
  # H.1. Function to generate requests
  genrequests<- function(i) {
    js_ch <- list(
      Responsible = unbox(respname[i]),
      Quantity = unbox(quant[i]),
      QuestionnaireId = unbox(quid),
      IdentifyingData = data.frame(Variable = c(names(df)),
                                   Identity = rep("", length(names(df))),
                                   Answer = c(unlist(df[i,], use.names = FALSE)))
    )
    
    req <- request(base_url) %>%
      req_url_path(path) %>%
      req_auth_basic(apiUser, apiPass) %>%
      req_body_json(js_ch) %>% 
      req_method("POST")
    return(req)
  }
  # H.2. Execute request generation
  requests <- lapply(1:nrow(df), genrequests)
  
  # H.3. Perform requests in parallel
  responses <- httr2::req_perform_parallel(
    requests, 
    pool = curl::new_pool(host_con = 100, total_con = 100), 
    on_error = "continue"
    )
  
  # I. Response
  # I.1. Get failed responses
  failed<-responses %>% httr2::resps_failures()
  # I.2. Get successful responses
  responses<-responses %>% resps_successes()
  # I.2.1. Check if there are any successful responses and stop if not
  if(length(responses)==0){
    stop("No successful responses")
  }
  # I.3. Create dataframe from successful responses
  # I.3.1. Function to transform response to dataframe
  transformresponse<-function(resp) {
    # i. Convert to json
    respfull <-resp %>% 
      resp_body_json(simplifyVector = T, flatten = TRUE)
    
    # ii. Get identifying data
    # transform to wide format
    resp<-data.frame(respfull$Assignment$IdentifyingData)
    reshaped_data <- as.vector(t(resp))
    new_col_names <- paste0(rep(names(resp), each = nrow(resp)), 1:nrow(resp))
    resp<-setNames(data.frame(matrix(reshaped_data, ncol = length(reshaped_data), byrow = TRUE)), new_col_names)
    # iii. Get other data
    nodf<-names(respfull$Assignment)[!grepl("IdentifyingData", names(respfull$Assignment))]
    for(x in nodf){
      resp[[x]] <- respfull$Assignment[[x]]
    }
    return(resp)
  }
  status_list <- lapply(responses, transformresponse)
  # I.3.2. Convert to data.table and bind
  status_list<-data.table::rbindlist(status_list, fill = TRUE)
  
  return(status_list)
}

This is just very quick and dirty implementation, however i hope it inspires others to check themselves. And if you do so, please share your experience with it, as well as any suggestion to even increase the performance. And also if you test any other of the httr2 functions, please share if you consider them useful for other Survey Solutions/R users.

2 Likes

The httr2 package is now used in the new GitHub - michael-cw/SurveySolutionsAPIv2: A comprehensive set of R functions to access the Survey Solutions REST/GraphQL API (httr2 based version), please continue the discussion in this topic: Beta-release of the new SurveySolutionsAPIv2 (httr2) based package.