Encountered difficulties in reproducing the results on TruthfulQA

Hi, thanks for sharing your work! I’m having some trouble reproducing the results and would appreciate your help.
For the TruthfulQA Multiple-Choice task, I got these results when evaluating LLMs + TruthX, MC1: 0.5018, MC2: 0.7050, MC3: 0.3905. These are different from the results in the paper (54.22%, 73.90%, 44.37%).
How can I adjust to match the results from the paper? Thanks! 



For your reference, I’ve attached the relevant scripts and code snippets:
We used the model file from https://huggingface.co/ICTNLP/Llama-2-7b-chat-TruthX which does not implement two-fold validation. As a result, we made a few minor adjustments to the scripts and llm.py file, as detailed below:

1. Script content

- Based on scripts/truthfulqa.mc.truthx.sh:

export CUDA_VISIBLE_DEVICES=0

ROOT=path_to_truthx_dir
EXP_ROOT=$ROOT/results
model_path=path_to_llm # e.g. Llama-2-7b-chat-hf

truthx_model1=truthx_models/Llama-2-7b-chat-hf/truthx_model.fold1.pt
truthx_model2=truthx_models/Llama-2-7b-chat-hf/truthx_model.fold2.pt

strength=4.5
layers=10

python3  $ROOT/scripts/truthfulqa_mc_truthx.py \
    --model-path $model_path \
    --truthx-model $truthx_model1 \
    --truthx-model2 $truthx_model2 \
    --two-fold True \
    --data-yaml data/truthfulqa_data_fold1.yaml \
    --edit-strength $strength --top-layers $layers \
    --fewshot-prompting True \
    --output-dir $EXP_ROOT/truthfulqa_mc_truthx/llama-2-7b-chat.truthx

- Our Modifications:

export CUDA_VISIBLE_DEVICES=6

ROOT=.
EXP_ROOT=$ROOT/results
model_path="/app/model_download/Llama-2-7b-chat-hf"
truthx_model1=/app/baseline/TruthX/truthx_models/Llama-2-7b-chat-hf/truthx_model.pt

strength=4.5
layers=10

python3  $ROOT/scripts/truthfulqa_mc_truthx.py \
    --model-path /app/model_download/Llama-2-7b-chat-hf \
    --truthx-model $truthx_model1 \
    --edit-strength $strength --top-layers $layers \
    --fewshot-prompting True \
    --output-dir $EXP_ROOT/truthfulqa_mc_truthx/llama-2-7b-chat.truthx

2. In the llm.py file, we added the get_lprobs_with_ae function.

- Based on get_lprobs_with_ae_2fold function:

outputs, past_key_values, hidden_states = self.model(
                input_ids,
                output_hidden_states=True,
                truthx_model=(
                    self.truthx if idx not in self.fold1_data else self.truthx2
                ),
            ).values()

- Our Modifications:

outputs, past_key_values, hidden_states = self.model(
                input_ids,
                output_hidden_states=True,
                truthx_model=self.truthx,
            ).values()

- The rest of the code is the same as get_lprobs_with_ae_2fold function.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encountered difficulties in reproducing the results on TruthfulQA #5

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Encountered difficulties in reproducing the results on TruthfulQA #5

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions