LinMulT

General-purpose Multimodal Transformer with Linear-Complexity Attention.

Handles variable-length inputs across any number of modalities, supports missing-modality scenarios, and offers six attention variants from O(N²) softmax to O(N·s) gated linear attention — all behind a single config key.

Installation

pip install linmult

Quick Start

Single modality (LinT):

import torch
from linmult import LinT, LinTConfig

x = torch.rand(8, 1500, 25)  # (batch, time, features)
model = LinT(LinTConfig.from_dict({
    "input_feature_dim": 25,
    "heads": [{"name": "out", "type": "simple", "output_dim": 5}],
    "time_dim_reducer": "attentionpool",
}))
result = model(x)  # {"out": (8, 5)}

Multiple modalities (LinMulT):

import torch
from linmult import LinMulT, LinMulTConfig

x1, x2 = torch.rand(8, 1500, 25), torch.rand(8, 450, 35)
model = LinMulT(LinMulTConfig.from_dict({
    "input_feature_dim": [25, 35],
    "heads": [{"name": "sentiment", "type": "simple", "output_dim": 3}],
    "time_dim_reducer": "gap",
}))
result = model([x1, x2])  # {"sentiment": (8, 3)}