linmult.core.config

Typed configuration dataclasses for LinT and LinMulT.

Classes

HeadConfig

Configuration for one output head.

LinTConfig

Configuration for LinT (unimodal linear-complexity transformer).

LinMulTConfig

Configuration for LinMulT (multimodal linear-complexity transformer).

Module Contents

class linmult.core.config.HeadConfig[source]

Configuration for one output head.

Parameters:
  • type (str) – Head type. One of "sequence_aggregation", "sequence", "vector", "simple", "upsample", "downsample".

  • output_dim (int) – Output feature dimensionality.

  • name (str) – Head name used as key in the output dict. Defaults to "" (resolved to the head class name at construction time).

  • norm (str) – Normalisation type for heads that use it. One of "bn", "in". Defaults to "bn".

  • pooling (str | None) – Pooling strategy. One of "gap", "gmp", "attentionpool", or None (no pooling, e.g. for SimpleHead without temporal reduction). Defaults to None (preserve sequence).

  • hidden_dim (int) – Hidden projection size. Defaults to 256.

  • dropout (float) – Dropout probability used inside the head. Defaults to 0.1.

  • input_time_dim (int | None) – Source time dimension for UpsampleHead / DownsampleHead. Defaults to None.

  • output_time_dim (int | None) – Target time dimension for UpsampleHead / DownsampleHead. Defaults to None.

classmethod from_dict(d: dict[str, Any]) HeadConfig[source]

Construct from a plain dict, ignoring unknown keys.

Parameters:

d (dict) – Dictionary of head configuration values.

Returns:

A new HeadConfig instance.

Return type:

HeadConfig

class linmult.core.config.LinTConfig[source]

Configuration for LinT (unimodal linear-complexity transformer).

Required

Parameters:

input_feature_dim (int) – Input feature dimensionality.

Identity

Parameters:

name (str) – Model name shown in repr. Defaults to "".

Core

Parameters:
  • d_model (int) – Internal embedding dimension. Defaults to 40.

  • num_heads (int) – Number of attention heads. Defaults to 8.

  • cmt_num_layers (int) – Self-attention encoder depth. Defaults to 6.

Attention

Parameters:
  • attention_type (str) – Attention mechanism. One of "linear" (default), "performer", "flash", "softmax", "bigbird", "mha".

  • flash_query_key_dim (int | None) – Scoring dimension for "flash" (GAU). Defaults to None (computed as max(d_model // 2, 16)).

  • performer_num_random_features (int | None) – Random feature count for "performer". Defaults to None (computed as max(head_dim * 4, 32)).

  • bigbird_block_size (int) – Local block size for "bigbird". Defaults to 64.

  • bigbird_num_global_tokens (int) – Global tokens for "bigbird". Defaults to 16.

  • bigbird_num_random_tokens (int) – Random tokens for "bigbird". Defaults to 10.

Dropout

Parameters:
  • dropout_input (float) – Dropout on input before projection. Defaults to 0.0.

  • dropout_output (float) – FFN-fusion output dropout. Defaults to 0.0.

  • dropout_pe (float) – Dropout after positional encoding. Defaults to 0.0.

  • dropout_ffn (float) – Dropout in transformer FFN. Defaults to 0.1.

  • dropout_attention (float) – Attention-weight dropout. Defaults to 0.0.

TRM

Parameters:

time_dim_reducer (str | None) – Collapse (B, T, F)(B, F) before heads. One of "attentionpool", "gap", "gmp", "last", or None (no reduction). Defaults to None.

Optional modules

Parameters:

add_module_ffn_fusion (bool) – FFN + residual block after the encoder. Defaults to False.

Heads

Parameters:

heads (list[HeadConfig | dict]) – Output head configurations. Plain dicts are automatically coerced to HeadConfig. Defaults to [].

Special handling

Parameters:

special_handling (dict[str, Any]) – Modality-specific input handling (e.g. weighted-sum of transformer layers). Defaults to {}.

__post_init__() None[source]

Coerce head dicts to HeadConfig instances.

classmethod from_dict(d: dict[str, Any]) LinTConfig[source]

Construct from a plain dict (e.g. loaded from YAML), ignoring unknown keys.

Parameters:

d (dict) – Dictionary of configuration values.

Returns:

A new LinTConfig instance.

Return type:

LinTConfig

classmethod from_yaml(path: str | pathlib.Path) LinTConfig[source]

Load a LinTConfig from a YAML file.

Parameters:

path (str | Path) – Path to the YAML configuration file.

Returns:

A new LinTConfig instance.

Return type:

LinTConfig

build_attention_config() linmult.core.attention.AttentionConfig[source]

Build an AttentionConfig from this config.

Returns:

Attention configuration ready for use in model construction.

Return type:

AttentionConfig

class linmult.core.config.LinMulTConfig[source]

Configuration for LinMulT (multimodal linear-complexity transformer).

Required

Parameters:

input_feature_dim (list[int]) – Input feature dimensionality per modality. Must have at least 2 entries.

Identity

Parameters:

name (str) – Model name shown in repr. Defaults to "".

Core

Parameters:
  • d_model (int) – Internal embedding dimension. Defaults to 40.

  • num_heads (int) – Number of attention heads. Defaults to 8.

  • cmt_num_layers (int) – Cross-modal transformer (CMT) encoder depth. Defaults to 6.

  • branch_sat_num_layers (int) – Per-branch self-attention encoder depth. Defaults to 6.

Attention

Parameters:
  • attention_type (str) – Attention mechanism. One of "linear" (default), "performer", "flash", "softmax", "bigbird", "mha".

  • flash_query_key_dim (int | None) – Scoring dimension for "flash" (GAU). Defaults to None (computed as max(d_model // 2, 16)).

  • performer_num_random_features (int | None) – Random feature count for "performer". Defaults to None (computed as max(head_dim * 4, 32)).

  • bigbird_block_size (int) – Local block size for "bigbird". Defaults to 64.

  • bigbird_num_global_tokens (int) – Global tokens for "bigbird". Defaults to 16.

  • bigbird_num_random_tokens (int) – Random tokens for "bigbird". Defaults to 10.

Dropout

Parameters:
  • dropout_input (float) – Dropout on input before projection. Defaults to 0.0.

  • dropout_output (float) – FFN-fusion output dropout. Defaults to 0.0.

  • dropout_pe (float) – Dropout after positional encoding. Defaults to 0.0.

  • dropout_ffn (float) – Dropout in transformer FFN. Defaults to 0.1.

  • dropout_attention (float) – Attention-weight dropout. Defaults to 0.0.

  • dropout_tam (float) – Dropout inside the TAM projector. Defaults to 0.1.

Unimodal self-attention (optional)

Parameters:
  • add_module_unimodal_sat (bool) – Per-modality self-attention transformer (SAT) before cross-modal layers. Defaults to False.

  • unimodal_sat_num_layers (int) – Unimodal SAT encoder depth. Defaults to 6.

Multimodal signal via TAM (optional)

Parameters:
  • add_module_multimodal_signal (bool) – Prepend a TAM-fused cross-modal summary to each branch. Requires tam_time_dim. Defaults to False.

  • mms_num_layers (int) – Encoder depth inside the MMS TAM. Defaults to 6.

  • tam_aligner (str | None) – Temporal alignment strategy. One of "aap", "amp", "padding". Required when either TAM module is enabled. Defaults to None.

  • tam_time_dim (int | None) – Target time dimension after TAM alignment. Required when either TAM module is enabled. Defaults to None.

TRM

Parameters:

time_dim_reducer (str | None) – Collapse (B, T, F)(B, F) before heads. One of "attentionpool", "gap", "gmp", "last", or None (no reduction). Defaults to None.

Fusion (optional)

Parameters:
  • add_module_tam_fusion (bool) – TAM-based fusion after cross-modal branches. Requires tam_time_dim. Defaults to False.

  • fusion_num_layers (int) – Encoder depth inside the TAM fusion module. Defaults to 6.

  • add_module_sat_fusion (bool) – Self-attention transformer on the fused representation. Defaults to False.

  • fusion_sat_num_layers (int) – Fusion SAT encoder depth. Defaults to 6.

  • add_module_ffn_fusion (bool) – FFN + residual block after fusion. Defaults to False.

Heads

Parameters:
  • heads (list[HeadConfig | dict]) – Output head configurations. Plain dicts are automatically coerced to HeadConfig. Defaults to [].

  • auxiliary_heads (list[HeadConfig | dict]) – Per-branch auxiliary head configs. Plain dicts are automatically coerced to HeadConfig. Defaults to [].

Special handling

Parameters:

special_handling (dict[str, Any]) – Modality-specific input handling (e.g. weighted-sum of transformer layers). Defaults to {}.

__post_init__() None[source]

Coerce head dicts and validate TAM-dependent options.

classmethod from_dict(d: dict[str, Any]) LinMulTConfig[source]

Construct from a plain dict (e.g. loaded from YAML), ignoring unknown keys.

Parameters:

d (dict) – Dictionary of configuration values.

Returns:

A new LinMulTConfig instance.

Return type:

LinMulTConfig

classmethod from_yaml(path: str | pathlib.Path) LinMulTConfig[source]

Load a LinMulTConfig from a YAML file.

Parameters:

path (str | Path) – Path to the YAML configuration file.

Returns:

A new LinMulTConfig instance.

Return type:

LinMulTConfig

build_attention_config() linmult.core.attention.AttentionConfig[source]

Build an AttentionConfig from this config.

Returns:

Attention configuration ready for use in model construction.

Return type:

AttentionConfig