where is cross‑entropy for the primary classification, MSE encourages similar gating patterns for correlated modalities, and Θ denotes all trainable parameters. Hyper‑parameters are set to λ_cls = 1.0 , λ_att = 0.1 , λ_reg = 5 × 10⁻⁴ .
The pooled vectors p₁,…,p_K are concatenated and fed to the classification head. By allowing multiple “pools,” ATP can capture both short‑term actions and long‑range context. fc2 3292343