2025-04-28-llm-knowledge-distil-157/blog/llm-knowledge-distil/

# On LLM Knowledge Distillation - A Comparison between Forward KL and Reverse KL | ICLR Blogposts 2025

In this blog post, we delve into knowledge distillation techniques for Large Language Models (LLMs), with a particular focus on using Kullback-Leibler (KL) Divergence as the optimization objective. Knowledge distillation is a powerful tool to reduce model size while maintaining comparable performance, making it especially useful in scenarios with constrained computational or serving resources. We specifically explore the nuances of Forward KL divergence and Reverse KL divergence, examining their roles in the distillation process. By comparing these two approaches, we aim to uncover their behaviours, strengths, and practical applications in LLM distillation.

[https://d2jud02ci9yv69.cloudfront.net/2025-04-28-llm-knowledge-distil-157/blog/llm-knowledge-distil/](https://d2jud02ci9yv69.cloudfront.net/2025-04-28-llm-knowledge-distil-157/blog/llm-knowledge-distil/)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

2025-04-28-llm-knowledge-distil-157/blog/llm-knowledge-distil/ #223

On LLM Knowledge Distillation - A Comparison between Forward KL and Reverse KL | ICLR Blogposts 2025

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

2025-04-28-llm-knowledge-distil-157/blog/llm-knowledge-distil/ #223

Description

On LLM Knowledge Distillation - A Comparison between Forward KL and Reverse KL | ICLR Blogposts 2025

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions