AP-OOD: Attention Pooling for Out-of-Distribution Detection

Claus Hofmann¹, Christian Huber², Bernhard Lehner²,
Daniel Klotz³, Sepp Hochreiter¹, Werner Zellinger⁴

¹ Institute for Machine Learning, JKU LIT SAL IWS Lab, Johannes Kepler University, Linz, Austria
² Silicon Austria Labs, JKU LIT SAL IWS Lab, Linz, Austria
³ Interdisciplinary Transformation University Austria, Linz, Austria
⁴ ELLIS Unit, LIT AI Lab, Institute for Machine Learning, JKU Linz, Austria

This repository contains a generic implementation of our paper "AP-OOD: Attention Pooling for Out-of-Distribution Detection" accepted at ICLR 2026. The paper is available here. Instructions to reproduce the experiments can be found in the Experiments section.

Abstract

Out-of-distribution (OOD) detection, which maps high-dimensional data into a scalar OOD score, is critical for the reliable deployment of machine learning models. A key challenge in recent research is how to effectively leverage and aggregate token embeddings from language models to obtain the OOD score. In this work, we propose AP-OOD, a novel OOD detection method for natural language that goes beyond simple average-based aggregation by exploiting token-level information. AP-OOD is a semi-supervised approach that flexibly interpolates between unsupervised and supervised settings, enabling the use of limited auxiliary outlier data. Empirically, AP-OOD sets a new state of the art in OOD detection for text: in the unsupervised setting, it reduces the FPR95 (false positive rate at 95% true positives) from 27.84% to 4.67% on XSUM summarization, and from 77.08% to 70.37% on WMT15 En–Fr translation.

Mean Pooling ❌

Attention Pooling ✅

Installation

Install AP-OOD via PIP:

pip install git+https://github.com/ml-jku/ap-ood.git

Quickstart

This is a guide to get you started with AP-OOD. For examples on specific data sets, see Examples. AP-OOD is implemented as a PyTorch module. You can use it as follows:

import torch
from ap_ood import APOOD

feature_dim = 1024

model = APOOD(
    feature_dim=feature_dim,
    n_heads=128,
    n_queries=2,
    beta=1.,
    similarity='dot',
)

# Batch of 512 sequence representations with 512 tokens and 1024 features
tokens = torch.randn(512, 512, feature_dim)
mask = torch.ones([512, 512])

d = model(tokens, mask)

AP-OOD can be trained like any PyTorch model (The Adam optimizer is recommended):

optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

for tokens, mask in dataloader:
   optimizer.zero_grad()
   
   # Forward pass
   d = model(tokens, mask)
   
   # Compute loss
   loss = torch.mean(d, dim=0)
   
   # Backward pass and optimize
   loss.backward()
   optimizer.step()

After AP-OOD was trained, please use model.partial_fit_mean to fit the mean using the mini-batch attention pooling process:

for tokens, mask in dataloader:
   model.partial_fit_mean(tokens, mask)

To get the distances of a batch of sequences, just pass the sequences to the model. Don't forget to call model.eval() before running inference!

model.eval()
with torch.no_grad():
    d = model(tokens, mask)

Examples

Notebook	Topic	Description
📓 Example 1	MIL / MUSK	OOD detection for multiple instance learning on the MUSK data set
📓 Example 2	NLP / XSUM	OOD detection for language modeling: Text summarization

Example 1: OOD detection for multiple instance learning (MIL) on the MUSK data set

This example demonstrates the OOD detection capabilities of AP-OOD on the MUSK dataset:

Dataset Description

This dataset describes a set of 92 molecules of which 47 are judged by human experts to be musks and the remaining 45 molecules are judged to be non-musks. The goal is to learn to predict whether new molecules will be musks or non-musks. However, the 166 features that describe these molecules depend upon the exact shape, or conformation, of the molecule. Because bonds can rotate, a single molecule can adopt many different shapes. To generate this data set, the low-energy conformations of the molecules were generated and then filtered to remove highly similar conformations. This left 476 conformations. Then, a feature vector was extracted that describes each conformation.

This many-to-one relationship between feature vectors and molecules is called the "multiple instance problem". When learning a classifier for this data, the classifier should classify a molecule as "musk" if ANY of its conformations is classified as a musk. A molecule should be classified as "non-musk" if NONE of its conformations is classified as a musk.

Example 2: OOD detection for language modeling: Text summarization on the XSUM data set

This example demonstrates the OOD detection capabilities of AP-OOD on the XSUM dataset using the Pegasus-XSUM model:

Dataset Description

The XSUM dataset is a summarization dataset that consists of BBC articles and their corresponding summaries. The dataset is widely used for evaluating text summarization models.

Experiments

The experiments are located in a separate package (ap_ood_experiments). To run the experiments, we recommend setting up a Python environment with Anaconda:

Installation

The experimental code works best with Anaconda (download here). To install the experimental library and all dependencies, run the following commands:

  conda env create -f experiments/environment.yml
  conda activate ap-ood
  pip install -e ./experiments

Weights and Biases

AP-OOD supports logging with Weights and Biases (W&B). By default, W&B will log all metrics in anonymous mode. Note that runs logged in anonymous mode will be deleted after 7 days. To keep the logs, you need to create a W&B account. When done, login to your account using the command line.

Data Sets

To run, you need the following data sets. We follow the benchmark from Ren et al. (2023).

The location of the data sets and other environment variables is managed via a .env file: Copy the .env.examples file located in the root directory of the repository. Name the newly created file .env. Customize the new file to contain the paths to the data sets on your machine.

In-Distribution Data Sets

XSUM: Automatically downloaded from HuggingFace
WMT15 En--Fr: Automatically downloaded from HuggingFace

Auxiliary Outlier Data Set

C4: Automatically downloaded from HuggingFace
ParaCrawlv9: Download it from the link (format bilingual-moses), extract it, and link the environment variable PARACRAWL_ROOT to the location of the extracted file.

Out-of-Distribution Test Data Sets

The OOD test data for the summarization task consists of:

CNN/Daily Mail: Automatically downloaded from HuggingFace
Lil-Lab Newsroom: The dataset is managed from HuggingFace, but you need to download the data manually. Download the data set and set the environment variable NEWSROOM_ROOT to the location of the extracted files.
Reddit-TIFU: Automatically downloaded from HuggingFace
Samsum: Automatically downloaded from HuggingFace

The OOD test data for the translation task consists of the following data sets. For the Opus data sets, create a new directory for the data set and set the environment variable OPUS_ROOT to the location of the directory.

Newstest14 Download the development sets from the link and set the environment variable WMT_DEV_ROOT to the location of the extracted files.
Newsdiscussdev2015 Download the development sets from the link and set the environment variable WMT_DEV_ROOT to the location of the extracted files.
Newsdiscusstest2015 Download the test sets from the link and set the environment variable WMT_TEST_ROOT to the location of the extracted files.
Opus-Law Download the data set (format bilingual-moses) from the link and place it in OPUS_ROOT in the subdirectory law.
Opus-Medical Download the data set (format bilingual-moses) from the link and place it in OPUS_ROOT in the subdirectory medical.
Opus-Koran Download the data set (format bilingual-moses) from the link and place it in OPUS_ROOT in the subdirectory Koran.
Opus-IT Download the data set (format bilingual-moses) from the link and place it in OPUS_ROOT in the subdirectory it.
Opus-Subtitles Download the data set (format bilingual-moses) from the link and place it in OPUS_ROOT in the subdirectory subtitles.

How to Run

Summarization

Set the environment variable EMBEDDING_ROOT to the location where you want to store the language model embeddings.

To create the input and output embeddings for text summarization, run the command

python -m ap_ood_experiments.create_embeddings -cn summarization-pegasus-xsum --multirun embedding_type=INPUT,OUTPUT

To run the unsupervised method on the input and output, run

python -m ap_ood_experiments.run_methods -cn summarization-pegasus-xsum-input method=ap-ood
python -m ap_ood_experiments.run_methods -cn summarization-pegasus-xsum-output method=ap-ood

To run the supervised method on the input and output, run

python -m ap_ood_experiments.run_methods -cn summarization-pegasus-xsum-input method=ap-ood-oe
python -m ap_ood_experiments.run_methods -cn summarization-pegasus-xsum-output method=ap-ood-oe

Translation

Set the environment variable WMT_MODEL_CHECKPOINT to the location where you want to store the model checkpoints.
Set the environment variable EMBEDDING_ROOT to the location where you want to store the language model embeddings.

Train the translation model

python -m ap_ood_experiments.transformer.train_wmt

To create the input and output embeddings for translation, run the command

python -m ap_ood_experiments.create_embeddings -cn translation-transformer-wmt --multirun embedding_type=INPUT,OUTPUT

To run the unsupervised method on the input and output, run

python -m ap_ood_experiments.run_methods -cn translation-transformer-wmt-input method=ap-ood
python -m ap_ood_experiments.run_methods -cn translation-transformer-wmt-output method=ap-ood

To run the supervised method on the input and output, run

python -m ap_ood_experiments.run_methods -cn translation-transformer-wmt-input method=ap-ood-oe
python -m ap_ood_experiments.run_methods -cn translation-transformer-wmt-output method=ap-ood-oe

📚 Citation

If you found this repository helpful, consider giving it a ⭐ and cite our paper:

@inproceedings{
hofmann2026apood,
title={{AP}-{OOD}: Attention Pooling for Out-of-Distribution Detection},
author={Claus Hofmann and Christian Huber and Bernhard Lehner and Daniel Klotz and Sepp Hochreiter and Werner Zellinger},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=fEYonozhKk}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
experiments		experiments
img		img
notebooks		notebooks
src		src
.gitignore		.gitignore
LICENSE		LICENSE
pyproject.toml		pyproject.toml
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AP-OOD: Attention Pooling for Out-of-Distribution Detection

Abstract

Mean Pooling ❌

Attention Pooling ✅

Installation

Quickstart

Examples

Example 1: OOD detection for multiple instance learning (MIL) on the MUSK data set

Dataset Description

Example 2: OOD detection for language modeling: Text summarization on the XSUM data set

Dataset Description

Experiments

Installation

Weights and Biases

Data Sets

In-Distribution Data Sets

Auxiliary Outlier Data Set

Out-of-Distribution Test Data Sets

How to Run

Summarization

Translation

📚 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AP-OOD: Attention Pooling for Out-of-Distribution Detection

Abstract

Mean Pooling ❌

Attention Pooling ✅

Installation

Quickstart

Examples

Example 1: OOD detection for multiple instance learning (MIL) on the MUSK data set

Dataset Description

Example 2: OOD detection for language modeling: Text summarization on the XSUM data set

Dataset Description

Experiments

Installation

Weights and Biases

Data Sets

In-Distribution Data Sets

Auxiliary Outlier Data Set

Out-of-Distribution Test Data Sets

How to Run

Summarization

Translation

📚 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages