linmult.core.config¶

Typed configuration dataclasses for LinT and LinMulT.

Classes¶

`HeadConfig`	Configuration for one output head.
`LinTConfig`	Configuration for `LinT` (unimodal linear-complexity transformer).
`LinMulTConfig`	Configuration for `LinMulT` (multimodal linear-complexity transformer).

Module Contents¶

class linmult.core.config.HeadConfig[source]¶

Configuration for one output head.

Parameters:

type (str) – Head type. One of "sequence_aggregation", "sequence", "vector", "simple", "upsample", "downsample".
output_dim (int) – Output feature dimensionality.
name (str) – Head name used as key in the output dict. Defaults to "" (resolved to the head class name at construction time).
norm (str) – Normalisation type for heads that use it. One of "bn", "in". Defaults to "bn".
pooling (str | None) – Pooling strategy. One of "gap", "gmp", "attentionpool", or None (no pooling, e.g. for SimpleHead without temporal reduction). Defaults to None (preserve sequence).
hidden_dim (int) – Hidden projection size. Defaults to 256.
dropout (float) – Dropout probability used inside the head. Defaults to 0.1.
input_time_dim (int | None) – Source time dimension for UpsampleHead / DownsampleHead. Defaults to None.
output_time_dim (int | None) – Target time dimension for UpsampleHead / DownsampleHead. Defaults to None.

classmethod from_dict(d: dict[str, Any]) → HeadConfig[source]¶

Construct from a plain dict, ignoring unknown keys.

Parameters:: d (dict) – Dictionary of head configuration values.
Returns:: A new HeadConfig instance.
Return type:: HeadConfig

class linmult.core.config.LinTConfig[source]¶

Configuration for LinT (unimodal linear-complexity transformer).

Required

Parameters:: input_feature_dim (int) – Input feature dimensionality.

Identity

Parameters:: name (str) – Model name shown in repr. Defaults to "".

Core

Parameters:

d_model (int) – Internal embedding dimension. Defaults to 40.
num_heads (int) – Number of attention heads. Defaults to 8.
cmt_num_layers (int) – Self-attention encoder depth. Defaults to 6.

Attention

Parameters:

attention_type (str) – Attention mechanism. One of "linear" (default), "performer", "flash", "softmax", "bigbird", "mha".
flash_query_key_dim (int | None) – Scoring dimension for "flash" (GAU). Defaults to None (computed as max(d_model // 2, 16)).
performer_num_random_features (int | None) – Random feature count for "performer". Defaults to None (computed as max(head_dim * 4, 32)).
bigbird_block_size (int) – Local block size for "bigbird". Defaults to 64.
bigbird_num_global_tokens (int) – Global tokens for "bigbird". Defaults to 16.
bigbird_num_random_tokens (int) – Random tokens for "bigbird". Defaults to 10.

Dropout

Parameters:

dropout_input (float) – Dropout on input before projection. Defaults to 0.0.
dropout_output (float) – FFN-fusion output dropout. Defaults to 0.0.
dropout_pe (float) – Dropout after positional encoding. Defaults to 0.0.
dropout_ffn (float) – Dropout in transformer FFN. Defaults to 0.1.
dropout_attention (float) – Attention-weight dropout. Defaults to 0.0.

TRM

Parameters:: time_dim_reducer (str | None) – Collapse (B, T, F) → (B, F) before heads. One of "attentionpool", "gap", "gmp", "last", or None (no reduction). Defaults to None.

TCN (optional)

Parameters:

add_module_tcn (bool) – Per-modality temporal convolutional network after projection. Smooths frame-level features before the encoder. Defaults to True.
tcn_num_layers (int) – Number of dilated causal convolution layers. Defaults to 3.
tcn_kernel_size (int) – Kernel size for each TCN layer. Defaults to 3.
tcn_dropout (float) – Dropout in each TCN layer. Defaults to 0.1.

Optional modules

Parameters:: add_module_ffn_fusion (bool) – FFN + residual block after the encoder. Defaults to False.

Heads

Parameters:: heads (list[HeadConfig | dict]) – Output head configurations. Plain dicts are automatically coerced to HeadConfig. Defaults to [].

Special handling

Parameters:: special_handling (dict[str, Any]) – Modality-specific input handling (e.g. weighted-sum of transformer layers). Defaults to {}.

__post_init__() → None[source]¶: Coerce head dicts to HeadConfig instances.

classmethod from_dict(d: dict[str, Any]) → LinTConfig[source]¶

Construct from a plain dict (e.g. loaded from YAML), ignoring unknown keys.

Parameters:: d (dict) – Dictionary of configuration values.
Returns:: A new LinTConfig instance.
Return type:: LinTConfig

classmethod from_yaml(path: str | pathlib.Path) → LinTConfig[source]¶

Load a LinTConfig from a YAML file.

Parameters:: path (str | Path) – Path to the YAML configuration file.
Returns:: A new LinTConfig instance.
Return type:: LinTConfig

build_attention_config() → linmult.core.attention.AttentionConfig[source]¶

Build an AttentionConfig from this config.

Returns:: Attention configuration ready for use in model construction.
Return type:: AttentionConfig

class linmult.core.config.LinMulTConfig[source]¶

Configuration for LinMulT (multimodal linear-complexity transformer).

Required

Parameters:: input_feature_dim (list[int]) – Input feature dimensionality per modality. Must have at least 2 entries.

Identity

Parameters:: name (str) – Model name shown in repr. Defaults to "".

Core

Parameters:

d_model (int) – Internal embedding dimension. Defaults to 40.
num_heads (int) – Number of attention heads. Defaults to 8.
cmt_num_layers (int) – Cross-modal transformer (CMT) encoder depth. Defaults to 6.
branch_sat_num_layers (int) – Per-branch self-attention encoder depth. Defaults to 6.

Attention

Parameters:

attention_type (str) – Attention mechanism. One of "linear" (default), "performer", "flash", "softmax", "bigbird", "mha".
flash_query_key_dim (int | None) – Scoring dimension for "flash" (GAU). Defaults to None (computed as max(d_model // 2, 16)).
performer_num_random_features (int | None) – Random feature count for "performer". Defaults to None (computed as max(head_dim * 4, 32)).
bigbird_block_size (int) – Local block size for "bigbird". Defaults to 64.
bigbird_num_global_tokens (int) – Global tokens for "bigbird". Defaults to 16.
bigbird_num_random_tokens (int) – Random tokens for "bigbird". Defaults to 10.

Dropout

Parameters:

dropout_input (float) – Dropout on input before projection. Defaults to 0.0.
dropout_output (float) – FFN-fusion output dropout. Defaults to 0.0.
dropout_pe (float) – Dropout after positional encoding. Defaults to 0.0.
dropout_ffn (float) – Dropout in transformer FFN. Defaults to 0.1.
dropout_attention (float) – Attention-weight dropout. Defaults to 0.0.
dropout_tam (float) – Dropout inside the TAM projector. Defaults to 0.1.

TCN (optional)

Parameters:

add_module_tcn (bool) – Per-modality temporal convolutional network after projection. Smooths frame-level features before cross-modal attention. Defaults to False.
tcn_num_layers (int) – Number of dilated causal convolution layers. Defaults to 3.
tcn_kernel_size (int) – Kernel size for each TCN layer. Defaults to 3.
tcn_dropout (float) – Dropout in each TCN layer. Defaults to 0.1.

Unimodal self-attention (optional)

Parameters:

add_module_unimodal_sat (bool) – Per-modality self-attention transformer (SAT) before cross-modal layers. Defaults to False.
unimodal_sat_num_layers (int) – Unimodal SAT encoder depth. Defaults to 6.

Multimodal signal via TAM (optional)

Parameters:

add_module_multimodal_signal (bool) – Prepend a TAM-fused cross-modal summary to each branch. Requires tam_time_dim. Defaults to False.
mms_num_layers (int) – Encoder depth inside the MMS TAM. Defaults to 6.
tam_aligner (str | None) – Temporal alignment strategy. One of "aap", "amp", "padding". Required when either TAM module is enabled. Defaults to None.
tam_time_dim (int | None) – Target time dimension after TAM alignment. Required when either TAM module is enabled. Defaults to None.

TRM

Parameters:: time_dim_reducer (str | None) – Collapse (B, T, F) → (B, F) before heads. One of "attentionpool", "gap", "gmp", "last", or None (no reduction). Defaults to None.

Fusion (optional)

Parameters:

add_module_tam_fusion (bool) – TAM-based fusion after cross-modal branches. Requires tam_time_dim. Defaults to False.
fusion_num_layers (int) – Encoder depth inside the TAM fusion module. Defaults to 6.
add_module_sat_fusion (bool) – Self-attention transformer on the fused representation. Defaults to False.
fusion_sat_num_layers (int) – Fusion SAT encoder depth. Defaults to 6.
add_module_ffn_fusion (bool) – FFN + residual block after fusion. Defaults to False.

Heads

Parameters:

heads (list[HeadConfig | dict]) – Output head configurations. Plain dicts are automatically coerced to HeadConfig. Defaults to [].
auxiliary_heads (list[HeadConfig | dict]) – Per-branch auxiliary head configs. Plain dicts are automatically coerced to HeadConfig. Defaults to [].

Special handling

Parameters:: special_handling (dict[str, Any]) – Modality-specific input handling (e.g. weighted-sum of transformer layers). Defaults to {}.

__post_init__() → None[source]¶: Coerce head dicts and validate TAM-dependent options.

classmethod from_dict(d: dict[str, Any]) → LinMulTConfig[source]¶

Construct from a plain dict (e.g. loaded from YAML), ignoring unknown keys.

Parameters:: d (dict) – Dictionary of configuration values.
Returns:: A new LinMulTConfig instance.
Return type:: LinMulTConfig

classmethod from_yaml(path: str | pathlib.Path) → LinMulTConfig[source]¶

Load a LinMulTConfig from a YAML file.

Parameters:: path (str | Path) – Path to the YAML configuration file.
Returns:: A new LinMulTConfig instance.
Return type:: LinMulTConfig

build_attention_config() → linmult.core.attention.AttentionConfig[source]¶

Build an AttentionConfig from this config.

Returns:: Attention configuration ready for use in model construction.
Return type:: AttentionConfig