Skip to content

Comments

Compare jcommonsense qa prompts with question first vs last#113

Open
kumapo wants to merge 6 commits intoStability-AI:jp-stablefrom
kumapo:compare-question-first-vs-last
Open

Compare jcommonsense qa prompts with question first vs last#113
kumapo wants to merge 6 commits intoStability-AI:jp-stablefrom
kumapo:compare-question-first-vs-last

Conversation

@kumapo
Copy link

@kumapo kumapo commented Nov 4, 2023

As reported by this article, jcommonsense qa prompts that puts question last results in better performance.
And, as you see the results in following table, I reproduced the performance jump with the prompts by changing only the order of question.

But currently, 0.3 and 0.6 prompts put the question last, and the others put it first.
To ensure a fair model comparison, prompts should have the question in the same position.

What do you think if we add prompts that put the question last or update the current prompts to have the question last?
If I missed something to experiment, please let me know.

Model Acc of Question First (Prompt Ver) Acc of Question Last (Prompt Ver)
japanese-stablelm-base-alpha-7b 0.5728 (v0.2.1) 0.7954 (v0.2.2)
open-calm-3b 0.3128 (v0.2.1) 0.7453 (v0.2.2)
ELYZA-japanese-Llama-2-7b 0.7516 (v0.2.1) 0.7730 (v0.2.2)
llama2-7b-chat 0.5952 (v0.3.2) 0.5559 (v0.3)
japanese-stablelm-instruct-alpha-7b 0.5898 (v0.3.2) 0.8222 (v0.3)
rinna-japanese-gpt-neox-3.6b-instruction-ppo 0.4406 (v0.4) 0.5934 (v0.4.2)
rinna-bilingual-gpt-neox-4b-instruction-ppo 0.4879 (v0.5) 0.5237 (v0.5.2)
llama2-7b-chat 0.6667 (v0.6.2) 0.613 (v0.6)

@kumapo kumapo marked this pull request as ready for review November 5, 2023 09:00
@kumapo kumapo requested a review from jon-tow as a code owner November 5, 2023 09:00
@kumapo kumapo changed the title Compare question first vs last Compare jcommonsense qa prompts with question first vs last Nov 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant