Configuration Reference¶
LinT uses LinTConfig
for single-modality models. LinMulT uses
LinMulTConfig
for multimodal models. Each output head is described by a
HeadConfig.
from linmult import LinT, LinTConfig, LinMulT, LinMulTConfig, HeadConfig
# Single-modality (LinT)
cfg = LinTConfig(
input_feature_dim=25,
heads=[HeadConfig(type="simple", output_dim=3, name="emotion")],
)
model = LinT(cfg)
# Multimodal (LinMulT)
cfg = LinMulTConfig(
input_feature_dim=[25, 35],
heads=[HeadConfig(type="simple", output_dim=3, name="sentiment")],
)
model = LinMulT(cfg)
# From a dict (e.g. loaded from YAML)
cfg = LinTConfig.from_dict({"input_feature_dim": 25, ...})
cfg = LinMulTConfig.from_dict({"input_feature_dim": [25, 35], ...})
# From a YAML file
cfg = LinTConfig.from_yaml("lint.yaml")
cfg = LinMulTConfig.from_yaml("linmult.yaml")
Attention variants¶
|
Algorithm |
Complexity |
|---|---|---|
|
Linear attention (Katharopoulos et al., ICML 2020) |
O(N·D²) |
|
FAVOR+ (Choromanski et al., ICLR 2021) |
O(N·r·D) |
|
Gated Attention Unit (Hua et al., ICML 2022) |
O(N·s) |
|
BigBird sparse attention |
O(N·√N) |
|
Scaled dot-product attention |
O(N²) |
|
|
O(N²) |
Example YAML (LinMulT)¶
input_feature_dim: [25, 41, 768]
heads:
- name: valence
type: sequence_aggregation
output_dim: 7
norm: bn
pooling: gap
- name: arousal
type: sequence
output_dim: 2
d_model: 40
num_heads: 8
cmt_num_layers: 6
attention_type: linear
time_dim_reducer: gap
add_module_unimodal_sat: false
add_module_multimodal_signal: true
tam_aligner: amp
tam_time_dim: 300
mms_num_layers: 6
dropout_input: 0.0
dropout_pe: 0.0
dropout_ffn: 0.1
Example YAML (LinT)¶
input_feature_dim: 25
heads:
- name: emotion
type: sequence_aggregation
output_dim: 7
d_model: 40
num_heads: 8
cmt_num_layers: 6
attention_type: linear
time_dim_reducer: attentionpool
dropout_input: 0.0
dropout_pe: 0.0
dropout_ffn: 0.1