Project page: https://pixelarena.reify.ing/project
Web viewer for the results: https://pixelarena.reify.ing/
Setup project:
- Clone the repository:
git clone https://github.com/ifsheldon/mllm-semantic-segmentation.git - (Optional) setup submodules:
git submodule update --init --recursive - Install
uv: https://docs.astral.sh/uv/getting-started/installation/ - Run
uv syncto install dependencies. - Run
uv run poe setup-frontendto install frontend dependencies. - (Optional) install
oxen: https://docs.oxen.ai/getting-started/install - (Optional) run
oxen clone https://hub.oxen.ai/ifsheldon/mllm-segmentation-datato get all results.- remember to run
ln -s mllm-segmentation-data/results resultsif you need to run the frontend.
- remember to run
Run frontend: uv run poe run-frontend
A random subset (500 images) of the CelebAMask-HQ dataset is used for evaluation.
- Images:
eval-set/celeb/images,512x512 - Images (150):
eval-set/celeb/images-150,512x512, a subset (150) of the images - Reference masks:
eval-set/celeb/masks-512,512x512 - Upscaled reference masks:
eval-set/celeb/masks-1024,1024x1024
Test results should be saved in results/celeb directory. For Gemini and GPT generated masks, the naming convention is <id>.mask.[0-4].{raw.jpeg, raw.png, pred.png}. [0-4] is attempt index (total 5 attempts). raw.{jpeg, png} means the colorful mask images generated by Gemini/GPT, pred.png means the P-mode png converted from the colorful jpeg.
A random subset (150 images) of the COCO dataset is used for evaluation.
- Images (150):
eval-set/coco/images-150,512x512, a subset (150) of the images - Reference masks:
eval-set/coco/masks-1024,1024x1024
Test results should be saved in results/coco directory. For Gemini and GPT generated masks, the naming convention is <id>.mask.[0-4].{raw.jpeg, raw.png, pred.png}. [0-4] is attempt index (total 5 attempts). raw.{jpeg, png} means the colorful mask images generated by Gemini/GPT, pred.png means the P-mode png converted from the colorful jpeg.
Results are tracked by oxen, a version control system for large datasets.