-
Notifications
You must be signed in to change notification settings - Fork 15
Description
Hello, thank you for sharing your code. I encountered some issues while running the inference code. The video I tested with is 'v_0A6fEUxdDMk.mp4' from the test set of the VATEX dataset, which is a video of a chef making sushi.
Firstly, I downloaded the pre-trained parameters from luoruipu1/Valley2-7b and ran the run_valley_llamma_v2.py file. The user prompt I used was <video> Describe the video concisely, and the answer I got was "['10. Can you describe the scene in the video']";
Then, I used these parameters to run the run_valley.py file, and the answer I received was "['10.']";
I'm not sure why this is happening. I haven't modified any code. Could it be that I used the wrong parameters or the wrong prompt format?
Subsequently, I re-downloaded the parameters from Zhaoziwang/chinese_valley7b_v1 and attempted to run the run_valley.py code. When I used the user prompt "请描述这个视频\n<video>", the returned result was an empty string. When I modified the prompt to "<video>请描述这个视频\n", the result was repeated garbage characters.
How can I correctly run the valley model?