A transformer model

setTransformer(
  numBlocks = 3,
  dimToken = 96,
  dimOut = 1,
  numHeads = 8,
  attDropout = 0.25,
  ffnDropout = 0.25,
  resDropout = 0,
  dimHidden = 512,
  dimHiddenRatio = NULL,
  estimatorSettings = setEstimator(weightDecay = 1e-06, batchSize = 1024, epochs = 10,
    seed = NULL),
  hyperParamSearch = "random",
  randomSample = 1,
  randomSampleSeed = NULL
)

Arguments

numBlocks

number of transformer blocks

dimToken

dimension of each token (embedding size)

dimOut

dimension of output, usually 1 for binary problems

numHeads

number of attention heads

attDropout

dropout to use on attentions

ffnDropout

dropout to use in feedforward block

resDropout

dropout to use in residual connections

dimHidden

dimension of the feedworward block

dimHiddenRatio

dimension of the feedforward block as a ratio of dimToken (embedding size)

estimatorSettings

created with `setEstimator`

hyperParamSearch

what kind of hyperparameter search to do, default 'random'

randomSample

How many samples to use in hyperparameter search if random

randomSampleSeed

Random seed to sample hyperparameter combinations

Details

from https://arxiv.org/abs/2106.11959