Hi Team,
Thank you for opensourcing the codebase. I wanted to ask a question if I could use the released checkpoint on test on vqa benchmarks which contains image prompt and text prompt (containing question + choices) without any additional metadata information like steering, historical trajectory, speed, acceleration, etc.
If the input information is just simple vision image input and text prompt, can i adapt your inference pipeline code with the released checkpoint.
Thank you for answering.
Hi Team,
Thank you for opensourcing the codebase. I wanted to ask a question if I could use the released checkpoint on test on vqa benchmarks which contains image prompt and text prompt (containing question + choices) without any additional metadata information like steering, historical trajectory, speed, acceleration, etc.
If the input information is just simple vision image input and text prompt, can i adapt your inference pipeline code with the released checkpoint.
Thank you for answering.