Function for fitting several sequential sampling confidence models in parallel

This function is a wrapper of the function fitConfModel (see there for more information). It calls the function for every possible combination of model and participant/subject in model and data respectively. Also, see ddynaViTE, d2DSD, dDDConf, and dRM for more information about the parameters.

Usage

fitRTConfModels(data, models = c("dynaViTE", "2DSD", "PCRMt"),
  nRatings = NULL, fixed = list(sym_thetas = FALSE), restr_tau = Inf,
  grid_search = TRUE, opts = list(), optim_method = "bobyqa",
  logging = FALSE, precision = 3, parallel = TRUE, n.cores = NULL, ...)

Arguments

data

a data.frame where each row is one trial, containing following variables (column names can be changed by passing additional arguments of the form condition="contrast"):

condition (not necessary; for different levels of stimulus quality, will be transformed to a factor),
rating (discrete confidence judgments, should be given as integer vector; otherwise will be transformed to integer),
rt (giving the reaction times for the decision task),
either 2 of the following (see details for more information about the accepted formats):
- stimulus (encoding the stimulus category in a binary choice task),
- response (encoding the decision response),
- correct (encoding whether the decision was correct; values in 0, 1)
sbj alternatively subject or participant (giving the subject ID; the models given in the second argument are fitted for each subject individually. (Furthermore, if logging = TRUE, the ID is used in files saved with interim results and logging messages.) The output data frame reused the name of the column in the input (i.e. the output contains a subject column, if the input contains subject instead of sbj).)

models

character vector with following possible elements "dynWEV", "2DSD", "IRM", "PCRM", "IRMt", and "PCRMt" for the models to be fit.

nRatings

integer. Number of rating categories. If NULL, the maximum of rating and length(unique(rating)) is used. This argument is especially important for data sets where not the whole range of rating categories is realized. If given, ratings has to be given as factor or integer.

fixed

list. List with parameter value pairs for parameters that should not be fitted. (see Details).

restr_tau

numerical or Inf or "simult_conf". Used for 2DSD and dynWEV only. Upper bound for tau. Fits will be in the interval (0,restr_tau). If FALSE tau will be unbound. For "simult_conf", see the documentation of d2DSD and ddynaViTE

grid_search

logical. If FALSE, the grid search before the optimization algorithm is omitted. The fitting is then started with a mean parameter set from the default grid. (Default: TRUE)

opts

list. A list for more control options in the optimization routines (depending on the optim_method). See details for more information.

optim_method

character. Determines which optimization function is used for the parameter estimation. Either "bobyqa" (default), "L-BFGS-B" or "Nelder-Mead". "bobyqa" uses a box-constrained optimization with quadratic interpolation. (See bobyqa for more information.) The first two use a box-constraint optimization. For Nelder-Mead a transfinite function rescaling is used (i.e. the constrained arguments are suitably transformed to the whole real line).

logging

logical. If TRUE, a folder 'autosave/fitmodel' is created and messages about the process are printed in a logging file and to console (depending on OS). Additionally intermediate results are saved in a .RData file with the participant/subject ID in the name.

precision

numerical numeric. Precision of calculation for the density functions (see ddynaViTE and dPCRM for more information).

parallel

"models", "single", "both" or FALSE. If FALSE no parallelization is used in the fitting process. If "models" the fitting process is parallelized over participants and models (i.e. over the calls for fitting functions). If "single" parallelization is used within the fitting processes (over initial grid search and optimization processes for different start points, but see fitRTConf). If "both", parallelization is done hierarchical. For small number of models and participants "single" or "both" is preferable. Otherwise, you may use "models".

n.cores

integer vector or NULL. If parallel is "models" or "single", a single integer for the number of cores used for parallelization is required. If parallel is "both", two values are required. The first for the number of parallel model-participant combinations and the second for the parallel processes within the fitting procedures (this may be specified to match the nAttemps-Value in the opts argument. If NULL (default) the number of available cores -1 is used. If NULL and parallel is "both", the cores will be used for model-participant-parallelization, only.

...

Possibility of giving alternative variable names in data frame (in the form condition = "SOA", or response="pressedKey").

Value

Gives data frame with rows for each model-participant combination and columns for the different parameters as fitted result as well as additional information about the fit (negLogLik (for final parameters), k (number of parameters), N (number of data rows), BIC, AICc and AIC)

Details

The fitting involves a first grid search through an initial grid. Then the best nAttempts parameter sets are chosen for an optimization, which is done with an algorithm, depending on the argument optim-method. The Nelder-Mead algorithm uses the R function optim. The optimization routine is restarted nRestarts times with the starting parameter set equal to the best parameters from the previous routine.

stimulus, response and correct. Two of these columns must be given in data. If all three are given, correct will have no effect (and will be not checked!). stimulus can always be given in numerical format with values -1 and 1. response can always be given as a character vector with "lower" and "upper" as values. Correct must always be given as a 0-1-vector. If stimulus is given together with response and they both do not match the above format, they need to have the same values/levels (if factor). In the case that only stimulus/response is given in any other format together with correct, the unique values will be sorted increasingly and the first value will be encoded as "lower"/-1 and the second as "upper"/+1.

fixed. Parameters that should not be fitted but kept constant. These will be dropped from the initial grid search but will be present in the output, to keep all parameters for prediction in the result. Includes the possibility for symmetric confidence thresholds for both alternative (sym_thetas=logical). Other examples are z =.5, sv=0, st0=0, sz=0. For race models, the possibility of setting a='b' (or vice versa) leads to identical upper bounds on the decision processes, which is the equivalence for z=.5 in a diffusion process

opts. A list with numerical values. Possible options are listed below (together with the optimization method they are used for).

nAttempts (all) number of best performing initial parameter sets used for optimization; default 5
nRestarts (all) number of successive optim routines for each of the starting parameter sets; default 5,
maxfun ('bobyqa') maximum number of function evaluations; default: 5000,
maxit ('Nelder-Mead' and 'L-BFGS-B') maximum iterations; default: 2000,
reltol ('Nelder-Mead') relative tolerance; default: 1e-6),
factr ('L-BFGS-B') tolerance in terms of reduction factor of the objective, default: 1e-10)

References

Hellmann, S., Zehetleitner, M., & Rausch, M. (2023). Simultaneous modeling of choice, confidence and response time in visual perception. Psychological Review 2023 Mar 13. doi: 10.1037/rev0000411. Epub ahead of print. PMID: 36913292.

Author

Sebastian Hellmann, sebastian.hellmann@ku.de

Examples

# 1. Generate data from two artificial participants
# Get random drift direction (i.e. stimulus category) and
# stimulus discriminability (two steps: hard, easy)
stimulus <- sample(c(-1, 1), 400, replace=TRUE)
discriminability <- sample(c(1, 2), 400, replace=TRUE)

# generate data for participant 1
data <- rdynaViTE(400, a=2, v=stimulus*discriminability*0.5,
             t0=0.2, z=0.5, sz=0.1, sv=0.1, st0=0,  tau=4, s=1, w=0.3)
# discretize confidence ratings (only 2 steps: unsure vs. sure)
data$rating <- as.numeric(cut(data$conf, breaks = c(-Inf, 1, Inf), include.lowest = TRUE))
data$participant = 1
data$stimulus <- stimulus
data$discriminability <- discriminability
# generate data for participant 2
data2 <- rdynaViTE(400, a=2.5, v=stimulus*discriminability*0.7,
             t0=0.1, z=0.7, sz=0, sv=0.2, st0=0,  tau=2, s=1, w=0.5)
data2$rating <- as.numeric(cut(data$conf, breaks = c(-Inf, 0.3, Inf), include.lowest = TRUE))
data2$participant = 2
data2$stimulus <- stimulus
data2$discriminability <- discriminability

# bind data from participants
data <- rbind(data, data2)
data <- data[data$response!=0, ] # drop not finished decision processes
data <- data[,-3] # drop conf measure (unobservable variable)
head(data)
#>     rt response rating participant stimulus discriminability
#> 1 1.31        1      1           1        1                1
#> 2 1.84       -1      1           1       -1                1
#> 3 5.16       -1      2           1        1                1
#> 4 1.20       -1      2           1       -1                1
#> 5 0.46        1      2           1        1                2
#> 6 1.77        1      2           1       -1                1


# 2. Use fitting function
if (FALSE) { # \dontrun{
  # Fitting takes very long to run and uses multiple (6) cores with this
  # call:
  fitRTConfModels(data, models=c("dynWEV", "PCRM"), nRatings = 2,
                logging=FALSE, parallel="both",
                n.cores = c(2,3), # fit two participant-model combination in parallel
                condition="discriminability")# tell which column is "condition"
} # }