Configuration Reference

LinT uses LinTConfig for single-modality models. LinMulT uses LinMulTConfig for multimodal models. Each output head is described by a HeadConfig.

from linmult import LinT, LinTConfig, LinMulT, LinMulTConfig, HeadConfig

# Single-modality (LinT)
cfg = LinTConfig(
    input_feature_dim=25,
    heads=[HeadConfig(type="simple", output_dim=3, name="emotion")],
)
model = LinT(cfg)

# Multimodal (LinMulT)
cfg = LinMulTConfig(
    input_feature_dim=[25, 35],
    heads=[HeadConfig(type="simple", output_dim=3, name="sentiment")],
)
model = LinMulT(cfg)

# From a dict (e.g. loaded from YAML)
cfg = LinTConfig.from_dict({"input_feature_dim": 25, ...})
cfg = LinMulTConfig.from_dict({"input_feature_dim": [25, 35], ...})

# From a YAML file
cfg = LinTConfig.from_yaml("lint.yaml")
cfg = LinMulTConfig.from_yaml("linmult.yaml")

Attention variants

attention_type

Algorithm

Complexity

linear

Linear attention (Katharopoulos et al., ICML 2020)

O(N·D²)

performer

FAVOR+ (Choromanski et al., ICLR 2021)

O(N·r·D)

flash

Gated Attention Unit (Hua et al., ICML 2022)

O(N·s)

bigbird

BigBird sparse attention

O(N·√N)

softmax

Scaled dot-product attention

O(N²)

mha

nn.MultiheadAttention

O(N²)


Example YAML (LinMulT)

input_feature_dim: [25, 41, 768]

heads:
  - name: valence
    type: sequence_aggregation
    output_dim: 7
    norm: bn
    pooling: gap
  - name: arousal
    type: sequence
    output_dim: 2

d_model: 40
num_heads: 8
cmt_num_layers: 6
attention_type: linear

time_dim_reducer: gap

add_module_unimodal_sat: false
add_module_multimodal_signal: true
tam_aligner: amp
tam_time_dim: 300
mms_num_layers: 6

dropout_input: 0.0
dropout_pe: 0.0
dropout_ffn: 0.1

Example YAML (LinT)

input_feature_dim: 25

heads:
  - name: emotion
    type: sequence_aggregation
    output_dim: 7

d_model: 40
num_heads: 8
cmt_num_layers: 6
attention_type: linear

time_dim_reducer: attentionpool

dropout_input: 0.0
dropout_pe: 0.0
dropout_ffn: 0.1