Skip to content

Conversation

@chiral-carbon
Copy link

Adds:

  • download instructions for dataset in docs/Evaluation.md
  • script to convert HallusionBench.json to llava format for inference
  • custom model_vqa_hallusionbench.py script for inference and generating results
  • script to convert result file back to .json format to evaluate with official evaluation scripts from HallusionBench GitHub repo

The pipeline works correctly, but the custom vqa script has issues.


TODO:

  • Investigate and fix model_vqa_hallusionbench.py as it fails to generate response correctly, hence defaulting to model prediction category "2" for all inputs and accuracy = 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant