Skip to content
Merged

fix #32

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 0 additions & 30 deletions src/opentau/datasets/grounding/pixmo.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,36 +15,6 @@

This module provides the PIXMO (Pixel-level Manipulation) dataset implementation
for training vision-language models on part localization and object grounding tasks.

The dataset contains images with point annotations for object parts, enabling models
to learn fine-grained spatial understanding.

The dataset is loaded from HuggingFace (allenai/pixmo-points) and includes
automatic retry logic for handling image download failures. Point coordinates
are normalized to a 255x255 grid and formatted as JSON strings in the postfix.

Classes:
PixmoDataset: Dataset class that loads and formats PIXMO data for part
localization tasks.

Functions:
_pil_from_url: Download and decode an image from URL with retry logic.
_get_post_fix: Convert point coordinates to normalized grid format and
format as JSON string.
_img_to_normalized_tensor: Convert PIL Image to normalized torch tensor.

Constants:
IMG_SIZE: Target image size (224x224).
POINT_GRID: Grid size for point normalization (255x255).
MAX_RETRIES: Maximum HTTP retry attempts.
HTTP_TIMEOUT: HTTP request timeout in seconds.

Example:
Use PIXMO dataset in training::

>>> from opentau.configs.default import DatasetConfig
>>> cfg = DatasetConfig(grounding="pixmo")
>>> dataset = make_dataset(cfg, train_cfg)
"""

import json
Expand Down
31 changes: 0 additions & 31 deletions src/opentau/datasets/grounding/vsr.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,37 +17,6 @@
models on visual spatial reasoning tasks. The dataset contains images with
statements about spatial relationships, and models must determine whether each
statement is true or false based on the image content.

The dataset is loaded from HuggingFace (cambridgeltl/vsr_random) and includes
automatic retry logic for handling image download failures. Statements are
formatted as grounding tasks with true/false labels.

Key Features:
* Spatial reasoning: Tests understanding of spatial relationships between
objects in images.
* Binary classification: Simple true/false format for clear learning signal.
* Robust loading: Automatic retry with random sampling for failed image
downloads.

Classes:
VSRDataset: Dataset class that loads and formats VSR data for true/false
spatial reasoning tasks.

Functions:
_pil_from_url: Download and decode an image from URL with retry logic.
_img_to_normalized_tensor: Convert PIL Image to normalized torch tensor
with channel-first format and [0, 1] normalization.

Constants:
MAX_RETRIES: Maximum HTTP retry attempts.
HTTP_TIMEOUT: HTTP request timeout in seconds.

Example:
Use VSR dataset in training::

>>> from opentau.configs.default import DatasetConfig
>>> cfg = DatasetConfig(grounding="vsr")
>>> dataset = make_dataset(cfg, train_cfg)
"""

import logging
Expand Down
Loading