A transformer model
setTransformer(
numBlocks = 3,
dimToken = 192,
dimOut = 1,
numHeads = 8,
attDropout = 0.2,
ffnDropout = 0.1,
resDropout = 0,
dimHidden = 256,
dimHiddenRatio = NULL,
estimatorSettings = setEstimator(weightDecay = 1e-06, batchSize = 1024, epochs = 10,
seed = NULL),
hyperParamSearch = "random",
randomSample = 1,
randomSampleSeed = NULL
)
number of transformer blocks
dimension of each token (embedding size)
dimension of output, usually 1 for binary problems
number of attention heads
dropout to use on attentions
dropout to use in feedforward block
dropout to use in residual connections
dimension of the feedworward block
dimension of the feedforward block as a ratio of dimToken (embedding size)
created with `setEstimator`
what kind of hyperparameter search to do, default 'random'
How many samples to use in hyperparameter search if random
Random seed to sample hyperparameter combinations
from https://arxiv.org/abs/2106.11959