Skip to content

Conversation

@lkk12014402
Copy link
Contributor

Description

update qat example/api doc.

Signed-off-by: lkk <[email protected]>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the QAT (Quantization-Aware Training) documentation and examples by removing an outdated quantization script and updating the README with corrected instructions.

Key Changes:

  • Removed the standalone quantize_autoround.py script
  • Updated README documentation to reference the centralized auto_round example instead

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
quantize_autoround.py Removed outdated standalone quantization script
README.md Updated Step 2 instructions with corrected command and reference to centralized auto_round example

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

##### Step 2:

Save the model directly to a get post training quantization model with using [auto-round](https://github.com/intel/auto-round).
Save the model directly to a get post training quantization model with following this example [auto_round
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrase 'to a get post training' contains grammatical errors. It should be 'to get a post-training' or 'to get post-training'.

Suggested change
Save the model directly to a get post training quantization model with following this example [auto_round
Save the model directly to get a post-training quantization model by following this example [auto_round

Copilot uses AI. Check for mistakes.
@lkk12014402 lkk12014402 added this to the 3.7 milestone Dec 16, 2025

```
python quantize_autoround.py
CUDA_VISIBLE_DEVICES=0 python ../auto_round/quantize.py \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yiliu30 yiliu30 requested a review from xin3he December 17, 2025 00:48

This section walks through an end-to-end example based on the provided code and examples in:

`examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm_qat/`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to add a link.


`requirements.txt` includes (among others):

- `auto-round==0.8.0`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use 0.9.3?

--model vllm \
--model_args pretrained=./llama3.1-finetuned-qat,\
tensor_parallel_size=1,data_parallel_size=1,\
gpu_memory_utilization=0.3,max_model_len=32768,enforce_eager=True \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gpu_memory_utilization is quite low,
enforce_eager cause poor perf.

eval_size: int = 0
```

4. **QuantizationArguments**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate QuantizationArguments with the one at L144?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants