create settings for training a non-temporal transformer — setTransformer • DeepPatientLevelPrediction

A transformer model

setTransformer(
  numBlocks = 3,
  dimToken = 192,
  dimOut = 1,
  numHeads = 8,
  attDropout = 0.2,
  ffnDropout = 0.1,
  resDropout = 0,
  dimHidden = 256,
  dimHiddenRatio = NULL,
  estimatorSettings = setEstimator(weightDecay = 1e-06, batchSize = 1024, epochs = 10,
    seed = NULL),
  hyperParamSearch = "random",
  randomSample = 1,
  randomSampleSeed = NULL
)

Arguments

numBlocks: number of transformer blocks
dimToken: dimension of each token (embedding size)
dimOut: dimension of output, usually 1 for binary problems
numHeads: number of attention heads
attDropout: dropout to use on attentions
ffnDropout: dropout to use in feedforward block
resDropout: dropout to use in residual connections
dimHidden: dimension of the feedworward block
dimHiddenRatio: dimension of the feedforward block as a ratio of dimToken (embedding size)
estimatorSettings: created with `setEstimator`
hyperParamSearch: what kind of hyperparameter search to do, default 'random'
randomSample: How many samples to use in hyperparameter search if random
randomSampleSeed: Random seed to sample hyperparameter combinations

Details

from https://arxiv.org/abs/2106.11959