Configuration Reference¶

LinT uses LinTConfig for single-modality models. LinMulT uses LinMulTConfig for multimodal models. Each output head is described by a HeadConfig.

from linmult import LinT, LinTConfig, LinMulT, LinMulTConfig, HeadConfig

# Single-modality (LinT)
cfg = LinTConfig(
    input_feature_dim=25,
    heads=[HeadConfig(type="simple", output_dim=3, name="emotion")],
)
model = LinT(cfg)

# Multimodal (LinMulT)
cfg = LinMulTConfig(
    input_feature_dim=[25, 35],
    heads=[HeadConfig(type="simple", output_dim=3, name="sentiment")],
)
model = LinMulT(cfg)

# From a dict (e.g. loaded from YAML)
cfg = LinTConfig.from_dict({"input_feature_dim": 25, ...})
cfg = LinMulTConfig.from_dict({"input_feature_dim": [25, 35], ...})

# From a YAML file
cfg = LinTConfig.from_yaml("lint.yaml")
cfg = LinMulTConfig.from_yaml("linmult.yaml")

Attention variants¶

`attention_type`	Algorithm	Complexity
`linear`	Linear attention (Katharopoulos et al., ICML 2020)	O(N·D²)
`performer`	FAVOR+ (Choromanski et al., ICLR 2021)	O(N·r·D)
`flash`	Gated Attention Unit (Hua et al., ICML 2022)	O(N·s)
`bigbird`	BigBird sparse attention	O(N·√N)
`softmax`	Scaled dot-product attention	O(N²)
`mha`	`nn.MultiheadAttention`	O(N²)

Example YAML (`LinMulT`)¶

input_feature_dim: [25, 41, 768]

heads:
  - name: valence
    type: sequence_aggregation
    output_dim: 7
    norm: bn
    pooling: gap
  - name: arousal
    type: sequence
    output_dim: 2

d_model: 40
num_heads: 8
cmt_num_layers: 6
attention_type: linear

time_dim_reducer: gap

add_module_tcn: true
tcn_num_layers: 3
tcn_kernel_size: 3

add_module_unimodal_sat: false
add_module_multimodal_signal: true
tam_aligner: amp
tam_time_dim: 300
mms_num_layers: 6

dropout_input: 0.0
dropout_pe: 0.0
dropout_ffn: 0.1

Example YAML (`LinT`)¶

input_feature_dim: 25

heads:
  - name: emotion
    type: sequence_aggregation
    output_dim: 7

d_model: 40
num_heads: 8
cmt_num_layers: 6
attention_type: linear

add_module_tcn: true
tcn_num_layers: 3
tcn_kernel_size: 3

time_dim_reducer: attentionpool

dropout_input: 0.0
dropout_pe: 0.0
dropout_ffn: 0.1