TensorAuto · akshay18iitg · Jan 5, 2026 · Jan 5, 2026
diff --git a/docs/source/model.rst b/docs/source/model.rst
@@ -6,7 +6,7 @@ This is the documentation for the supported models in OpenTau.
 pi05
 ----
 - Pi05 is a state of the art Vision-language-action flow model for general robot control. It supports both autoregressive discrete actions and flow matching continuous actions.
-- More details can be found in the `paper <https://www.pi.website/download/pi05.pdf>`_.
+- More details can be found in the `pi05 paper <https://www.pi.website/download/pi05.pdf>`_.
 - See the implementation in `src/opentau/policies/pi05/modeling_pi05.py`.
 - Checkpoint of the model finetuned on the LIBERO dataset is available on Hugging Face: `TensorAuto/tPi05-Libero <https://huggingface.co/TensorAuto/tPi05-Libero>`_
 - Disclaimer: Our implementation doesn't support sub-task prediction yet, as mentioned in the paper.
@@ -15,13 +15,13 @@ pi05
 pi0
 ----
 - Pi0 is a Vision-language-action flow model that only supports flow matching continuous actions.
-- More details can be found in the `paper <https://www.pi.website/download/pi0.pdf>`_.
+- More details can be found in the `pi0 paper <https://www.pi.website/download/pi0.pdf>`_.
 - See the implementation in `src/opentau/policies/pi0/modeling_pi0.py`.
 - This model can be changed to pi0-star by changing the `advantage_always_on` flag to `on`/'use' in the config file.
 - Checkpoint of the model finetuned on the LIBERO dataset is available on Hugging Face: `TensorAuto/tPi0-Libero <https://huggingface.co/TensorAuto/tPi0-Libero>`_
 
 value
 -----
 - Value model is a Vision-language model used to predict the value of the current state. Its used to train VLA policies with RECAP framework.
-- More details can be found in the `paper <https://www.pi.website/download/pistar06.pdf>`_.
+- More details can be found in the `pi*06 paper <https://www.pi.website/download/pistar06.pdf>`_.
 - See the implementation in `src/opentau/policies/value/modeling_value.py`.
diff --git a/docs/source/tutorials/datasets.rst b/docs/source/tutorials/datasets.rst
@@ -4,12 +4,12 @@ Datasets
 .. note::
    Make sure you have followed the :doc:`/installation` guide before proceeding.
 
-Building a dataset mixture 
+Building a dataset mixture
 --------------------------
 
 You can define a dataset mixture in your configuration file using the ``dataset_mixture`` key. Here is an example:
 
-.. code-block:: json
+.. code-block:: javascript
 
     {
         "dataset_mixture": {
@@ -30,21 +30,21 @@ You can define a dataset mixture in your configuration file using the ``dataset_
         ...
     }
 
-For each new dataset, you must add an entry to ``opentau/datasets/standard_data_format_mapping.py`` to map the dataset features to the Standard Data Format (see the :ref:`Standard Data Format section <concepts/standard-data-format>` in the Concepts documentation).
+For each new dataset, you must add an entry to ``opentau/datasets/standard_data_format_mapping.py`` to map the dataset features to the Standard Data Format.
 Alternatively, you can provide a custom mapping in the dataset config using the ``data_features_name_mapping`` and ``loss_type_mapping`` keys.
 For example:
 
-.. code-block:: json
+.. code-block:: javascript
 
     {
         "dataset_mixture": {
             "datasets": [
                 {
-                    "repo_id": "physical-intelligence/libero"
+                    "repo_id": "physical-intelligence/libero",
                     "data_features_name_mapping": {
                         "camera0": "observation.images.exterior_image_1_left",
-                        "camera1": "observation.images.exterior_image_2_left",
-                    }
+                        "camera1": "observation.images.exterior_image_2_left"
+                    },
                     "loss_type_mapping": "MSE"
                 },
                 {
@@ -73,5 +73,3 @@ Each training config should contain a dataset mixture definition. To evaluate th
         --num_workers=10
 
 This will output a token count for each language key in the dataset mixture, and save it to ``outputs/stats/token_count.json``.
-
-
diff --git a/src/opentau/datasets/__init__.py b/src/opentau/datasets/__init__.py
@@ -11,3 +11,74 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
+"""Dataset management and processing utilities for robot learning and vision-language tasks.
+
+This module provides a comprehensive toolkit for loading, creating, managing, and
+processing datasets for training vision-language-action (VLA) models. It supports
+both robot learning datasets (with actions and states) and vision-language
+grounding datasets (for multimodal understanding tasks).
+
+The module is organized into several key components:
+
+    - **Core Datasets**: LeRobotDataset for robot learning data with support for
+      temporal alignment, multi-modal data, and version compatibility.
+    - **Grounding Datasets**: Vision-language datasets (CLEVR, COCO-QA, PIXMO, VSR)
+      for training visual understanding without robot actions.
+    - **Dataset Mixtures**: WeightedDatasetMixture for combining multiple datasets
+      with controlled sampling proportions.
+    - **Data Processing**: Utilities for statistics computation, image/video
+      handling, transforms, and format standardization.
+    - **Factory Functions**: High-level functions for creating datasets and mixtures
+      from configuration objects.
+
+Key Features:
+
+    - **HuggingFace Integration**: Seamless loading from HuggingFace Hub with
+      automatic version checking and backward compatibility.
+    - **Temporal Alignment**: Delta timestamps enable sampling features at
+      different time offsets with optional Gaussian noise for data augmentation.
+    - **Multi-modal Support**: Handles images, videos, state vectors, actions,
+      and text prompts with automatic format conversion.
+    - **Weighted Sampling**: Combine heterogeneous datasets with configurable
+      sampling weights for balanced training.
+    - **Standard Data Format**: Unified data format across all datasets for
+      consistent model input/output interfaces.
+    - **Statistics Management**: Automatic computation and aggregation of dataset
+      statistics for normalization.
+    - **Video Handling**: Multiple video backends (torchcodec, pyav, video_reader)
+      for efficient frame extraction and encoding.
+    - **Asynchronous I/O**: High-performance image writing for real-time data
+      recording without blocking.
+
+Main Modules:
+
+    - **lerobot_dataset**: Core dataset implementation for robot learning data.
+    - **grounding**: Vision-language grounding datasets (CLEVR, COCO-QA, PIXMO, VSR).
+    - **dataset_mixture**: Weighted combination of multiple datasets.
+    - **factory**: Factory functions for creating datasets from configurations.
+    - **utils**: Utility functions for I/O, metadata management, and validation.
+    - **compute_stats**: Statistics computation and aggregation utilities.
+    - **transforms**: Image transformation pipelines for data augmentation.
+    - **video_utils**: Video encoding, decoding, and metadata extraction.
+    - **image_writer**: Asynchronous image writing for high-frequency recording.
+    - **sampler**: Episode-aware sampling with boundary frame filtering.
+    - **standard_data_format_mapping**: Feature name and loss type mappings.
+
+Example:
+    Create a dataset mixture from configuration:
+
+        >>> from opentau.datasets.factory import make_dataset_mixture
+        >>> mixture = make_dataset_mixture(train_cfg)
+        >>> dataloader = mixture.get_dataloader()
+
+    Load a single dataset:
+
+        >>> from opentau.datasets.factory import make_dataset
+        >>> dataset = make_dataset(dataset_cfg, train_cfg)
+
+    Access grounding datasets:
+
+        >>> from opentau import available_grounding_datasets
+        >>> print(list(available_grounding_datasets.keys()))
+        ['clevr', 'cocoqa', 'dummy', 'pixmo', 'vsr']
+"""
diff --git a/src/opentau/datasets/compute_stats.py b/src/opentau/datasets/compute_stats.py
@@ -42,16 +42,22 @@
       weighted variance across multiple statistics.
 
 Functions:
-    estimate_num_samples: Heuristic to estimate optimal number of samples
-        based on dataset size.
-    sample_indices: Generate evenly spaced sample indices from a dataset.
-    auto_downsample_height_width: Automatically downsample large images.
-    sample_images: Load and downsample a subset of images from file paths.
-    get_feature_stats: Compute statistical measures for an array.
-    compute_episode_stats: Compute statistics for a single episode.
-    aggregate_feature_stats: Aggregate statistics for a feature across
-        multiple episodes.
-    aggregate_stats: Aggregate statistics from multiple episodes/datasets.
+    estimate_num_samples
+        Heuristic to estimate optimal number of samples based on dataset size.
+    sample_indices
+        Generate evenly spaced sample indices from a dataset.
+    auto_downsample_height_width
+        Automatically downsample large images.
+    sample_images
+        Load and downsample a subset of images from file paths.
+    get_feature_stats
+        Compute statistical measures for an array.
+    compute_episode_stats
+        Compute statistics for a single episode.
+    aggregate_feature_stats
+        Aggregate statistics for a feature across multiple episodes.
+    aggregate_stats
+        Aggregate statistics from multiple episodes/datasets.
 
 Example:
     Compute statistics for a single episode:

diff --git a/src/opentau/datasets/factory.py b/src/opentau/datasets/factory.py
@@ -101,8 +101,10 @@ def resolve_delta_timestamps(
 
     Returns:
         A 2-tuple containing:
+
             - At index 0, a 4-tuple containing delta timestamps mean, std, lower, and upper bounds for each group.
             - At index 1, a dictionary mapping feature names to their corresponding group and index.
+
         The delta timestamps and group mapping should follow the structure expected by LeRobotDataset.
     """
     group = "input_group"

diff --git a/src/opentau/datasets/grounding/pixmo.py b/src/opentau/datasets/grounding/pixmo.py
@@ -1,4 +1,3 @@
-
 # Copyright 2026 Tensor Auto Inc. All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
@@ -12,27 +11,18 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-
-"""
-Datasets for Image-Text Point Set grounding tasks.
+"""Datasets for Image-Text Point Set grounding tasks.
 
 This module provides the PIXMO (Pixel-level Manipulation) dataset implementation
-for training vision-language models on part localization and object grounding
-tasks. The dataset contains images with point annotations for object parts,
-enabling models to learn fine-grained spatial understanding.
+for training vision-language models on part localization and object grounding tasks.
+
+The dataset contains images with point annotations for object parts, enabling models
+to learn fine-grained spatial understanding.
 
 The dataset is loaded from HuggingFace (allenai/pixmo-points) and includes
 automatic retry logic for handling image download failures. Point coordinates
 are normalized to a 255x255 grid and formatted as JSON strings in the postfix.
 
-Key Features:
-    - Point set grounding: Provides pixel-level point annotations for object
-      parts with labels.
-    - Robust loading: Automatic retry with random sampling for failed image
-      downloads.
-    - Grid normalization: Converts pixel coordinates to normalized grid space
-      for consistent representation.
-
 Classes:
     PixmoDataset: Dataset class that loads and formats PIXMO data for part
         localization tasks.
@@ -50,7 +40,8 @@
     HTTP_TIMEOUT: HTTP request timeout in seconds.
 
 Example:
-    Use PIXMO dataset in training:
+    Use PIXMO dataset in training::
+
         >>> from opentau.configs.default import DatasetConfig
         >>> cfg = DatasetConfig(grounding="pixmo")
         >>> dataset = make_dataset(cfg, train_cfg)

diff --git a/src/opentau/datasets/grounding/vsr.py b/src/opentau/datasets/grounding/vsr.py
@@ -1,4 +1,3 @@
-
 # Copyright 2026 Tensor Auto Inc. All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
@@ -12,7 +11,6 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-
 """VSR (Visual Spatial Reasoning) dataset for true/false statement grounding.
 
 This module provides the VSR dataset implementation for training vision-language
@@ -25,10 +23,10 @@
 formatted as grounding tasks with true/false labels.
 
 Key Features:
-    - Spatial reasoning: Tests understanding of spatial relationships between
+    * Spatial reasoning: Tests understanding of spatial relationships between
       objects in images.
-    - Binary classification: Simple true/false format for clear learning signal.
-    - Robust loading: Automatic retry with random sampling for failed image
+    * Binary classification: Simple true/false format for clear learning signal.
+    * Robust loading: Automatic retry with random sampling for failed image
       downloads.
 
 Classes:
@@ -45,7 +43,8 @@
     HTTP_TIMEOUT: HTTP request timeout in seconds.
 
 Example:
-    Use VSR dataset in training:
+    Use VSR dataset in training::
+
         >>> from opentau.configs.default import DatasetConfig
         >>> cfg = DatasetConfig(grounding="vsr")
         >>> dataset = make_dataset(cfg, train_cfg)

diff --git a/src/opentau/datasets/image_writer.py b/src/opentau/datasets/image_writer.py
@@ -21,6 +21,7 @@
 robots and recording data at high frame rates without blocking the main process.
 
 The module supports two execution models:
+
     1. Threading mode (num_processes=0): Creates a pool of worker threads
        for concurrent image writing within a single process.
     2. Multiprocessing mode (num_processes>0): Creates multiple processes,
@@ -39,18 +40,22 @@
       even when exceptions occur.
 
 Classes:
-    AsyncImageWriter: Main class for asynchronous image writing with
-        configurable threading or multiprocessing backends.
+
+    AsyncImageWriter
+        Main class for asynchronous image writing with configurable threading
+        or multiprocessing backends.
 
 Functions:
-    image_array_to_pil_image: Convert numpy array to PIL Image with format
-        and type conversion.
-    write_image: Write an image (numpy array or PIL Image) to disk.
-    worker_thread_loop: Worker thread loop for processing image write queue.
-    worker_process: Worker process that manages multiple threads for image
-        writing.
-    safe_stop_image_writer: Decorator to safely stop image writer on
-        exceptions.
+    image_array_to_pil_image
+        Convert numpy array to PIL Image with format and type conversion.
+    write_image
+        Write an image (numpy array or PIL Image) to disk.
+    worker_thread_loop
+        Worker thread loop for processing image write queue.
+    worker_process
+        Worker process that manages multiple threads for image writing.
+    safe_stop_image_writer
+        Decorator to safely stop image writer on exceptions.
 
 Example:
     Create an async image writer with threading: