vLLM BART Model Plugin

This plugin adds support for BART (Bidirectional and Auto-Regressive Transformers) models to vLLM's inference engine.

Overview

BART is an encoder-decoder model that is particularly effective for sequence-to-sequence tasks like summarization, translation, and text generation. This plugin integrates the BART model implementation with vLLM's plugin system, allowing you to use BART models with vLLM's optimized inference engine.

Installation

Prerequisites

This plugin requires uv for package management. If you don't have it installed:

curl -LsSf https://astral.sh/uv/install.sh | sh

From Source

Clone this repository:

git clone <repository-url>
cd bart-plugin

Install the plugin in development mode:

uv pip install -e .

Or install directly:

uv pip install .

Verify Installation

After installation, vLLM should automatically discover and load the BART plugin. You can verify by checking the vLLM logs when initializing a model.

python verify_plugin.py

Usage

Basic Usage

Run python example_usage.py or

import vllm_bart_plugin
from vllm import LLM, SamplingParams
model_name = "facebook/bart-large-cnn"

llm = LLM(
    model=model_name,
    max_model_len=1024,
    gpu_memory_utilization=0.5,
    dtype="float16",
)
params = SamplingParams(temperature=0.0, max_tokens=20)
outputs = llm.generate(
    [
        {  
            "encoder_prompt": {
                "prompt": "",
                "multi_modal_data": {
                    "text": "The president of the United States is",
                },
            },
            "decoder_prompt": "<s>Donald",
        },
        {
            "encoder_prompt": {
                "prompt": "",
                "multi_modal_data": {
                    "text": "<s>",
                },
            },
            "decoder_prompt": "<s>The capital of France is",
        },
    ],
    sampling_params=params,
)
for o in outputs:
    generated_text = o.outputs[0].text
    print("output:", generated_text)

Plugin Architecture

This plugin follows vLLM's plugin system architecture:

Entry Point: Registered via setuptools entry_points in setup.py
Registration Function: register_bart_model() in __init__.py registers the model with vLLM's ModelRegistry
Model Implementation: The BART model class in bart.py implements vLLM's model interfaces

Plugin Discovery

vLLM automatically discovers plugins using Python's entry point mechanism. The plugin is registered under the vllm.plugins group and is loaded when vLLM initializes.

Model Features

The BART implementation includes:

Encoder-Decoder Architecture: Full support for BART's encoder-decoder structure
Cross-Attention: Proper implementation of cross-attention between encoder and decoder
Multi-Modal Support: Integration with vLLM's multimodal processing pipeline
Quantization Support: Compatible with vLLM's quantization features
Tensor Parallelism: Support for distributed inference

Supported Models

This plugin should work with any BART-based model from HuggingFace, including:

facebook/bart-large-cnn
facebook/bart-large
facebook/bart-large-cnn
Other BART variants and fine-tuned models

Evaluation

To evaluate the model on CNN/DailyMail summarization:

# Install evaluation dependencies
pip install datasets rouge-score

# Quick test
python scripts/eval_cnn_dailymail.py --num-samples 10

# Full evaluation on test set
python scripts/eval_cnn_dailymail.py --split test

See scripts/eval_cnn_dailymail.py for more options and reference ROUGE scores.

TODO

Support MBartForConditionalGeneration

Environment Variables

You can control plugin loading with the VLLM_PLUGINS environment variable:

# Load all plugins (default)
export VLLM_PLUGINS=all

# Load only specific plugins
export VLLM_PLUGINS=bart

# Disable all plugins
export VLLM_PLUGINS=none

`VLLM_BART_ENCODER_MAX_SEQ_PADDING`

Enable a small optimization that batches encoder forwards by padding all encoder inputs in a batch to the maximum encoder sequence length, running the encoder once, then slicing outputs back to per-item lengths.

Default: disabled
Enable:

export VLLM_BART_ENCODER_MAX_SEQ_PADDING=1

Notes:

Requires pad_token_id to be set in the HF config. If it is missing, the plugin will log a warning and keep the optimization disabled.

Development

Project Structure

bart-plugin/
├── vllm_bart_plugin/
│   ├── __init__.py          # Plugin registration
│   └── bart.py              # BART model implementation
├── setup.py                 # Package configuration and entry points
├── README.md                # This file
└── LICENSE                  # License file

Running Tests

# Install development dependencies
uv pip install -e .

# Install pre-commit hooks
pre-commit install

# Run fast tests
pytest -m tests/

Troubleshooting

Plugin Not Loading

If the plugin isn't being discovered:

Verify installation: uv pip list | grep vllm-bart-plugin
Check entry points: python -c "from importlib.metadata import entry_points; print(list(entry_points(group='vllm.plugins')))"
Enable verbose logging: Set VLLM_LOGGING_LEVEL=DEBUG
Run the verification script: python verify_plugin.py

Model Not Found

If vLLM doesn't recognize the BART model:

Ensure the plugin loaded successfully (check logs) you should see

[2025-12-19 14:32:11] INFO __init__.py:33: Successfully registered BART model with vLLM

Verify the model architecture name matches: BartForConditionalGeneration
Try explicitly setting trust_remote_code=False

Import Errors

Make sure all dependencies are installed:

uv pip install vllm torch transformers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

vLLM BART Model Plugin

Overview

Installation

Prerequisites

From Source

Verify Installation

Usage

Basic Usage

Plugin Architecture

Plugin Discovery

Model Features

Supported Models

Evaluation

TODO

Environment Variables

`VLLM_BART_ENCODER_MAX_SEQ_PADDING`

Development

Project Structure

Running Tests

Troubleshooting

Plugin Not Loading

Model Not Found

Import Errors

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
scripts		scripts
tests		tests
vllm_bart_plugin		vllm_bart_plugin
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example_usage.py		example_usage.py
pyproject.toml		pyproject.toml
setup.py		setup.py
verify_plugin.py		verify_plugin.py

License

neuralmagic/bart-plugin

Folders and files

Latest commit

History

Repository files navigation

vLLM BART Model Plugin

Overview

Installation

Prerequisites

From Source

Verify Installation

Usage

Basic Usage

Plugin Architecture

Plugin Discovery

Model Features

Supported Models

Evaluation

TODO

Environment Variables

VLLM_BART_ENCODER_MAX_SEQ_PADDING

Development

Project Structure

Running Tests

Troubleshooting

Plugin Not Loading

Model Not Found

Import Errors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

`VLLM_BART_ENCODER_MAX_SEQ_PADDING`

Packages