additional sim improvements, dialing in the workflow for remote training #11

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

blankey1337 merged 3 commits into main from feature/nvidia-sim-integration

Nov 28, 2025

docs/remote_simulation_workflow.md

-Original file line number
+Diff line change
@@ -0,0 +1,110 @@
+    # Remote Simulation & Training Workflow
+    This guide outlines how to develop, train, and test AlohaMini using a remote cloud GPU (e.g., Lambda Labs, AWS) for the heavy simulation, while controlling everything from your local laptop (e.g., MacBook).
+    ## Architecture
+    *   **Cloud Server (The "Lab")**: Runs NVIDIA Isaac Sim. Handles physics, rendering, and training.
+    *   **Local Machine (The "Mission Control")**: Runs the Dashboard and Teleoperation scripts. Connects to the cloud via SSH.
+    ## Prerequisites
+.  **Cloud Instance**: A server with an NVIDIA RTX GPU (A10, A100, RTX 3090/4090).
+        *   Recommended: Lambda Labs or Brev.dev (Ubuntu 20.04/22.04).
+        *   Must have **NVIDIA Drivers** and **Isaac Sim** installed (or use the Isaac Sim Docker container).
+.  **Local Machine**: Your laptop (Mac/Windows/Linux).
+.  **SSH Access**: You must be able to SSH into the cloud instance.
+    ## Setup
+    ### 1. Cloud Server Setup
+.  SSH into your cloud instance.
+.  Clone this repository:
+        ```bash
+        git clone https://github.com/blankey1337/AlohaMini.git
+        cd AlohaMini
+        ```
+.  Ensure you are in the python environment that has access to Isaac Sim (often `./python.sh` in the Isaac Sim folder).
+    ### 2. Local Machine Setup
+.  Clone this repository locally.
+.  Install dependencies:
+        ```bash
+        pip install -r software/requirements.txt
+        ```
+    ## The Workflow
+    ### Phase 1: Data Collection
+.  **Start the Simulation (Cloud)**
+        Run the simulation environment script. This listens on ports 5555 (Cmd) and 5556 (Obs).
+        ```bash
+        # On Cloud
+        isaac_sim_python software/examples/alohamini/isaac_sim/isaac_alohamini_env.py
+        ```
+.  **Establish Connection (Local)**
+        Forward the ZMQ ports from the cloud to your localhost.
+        ```bash
+        # On Local Mac
+        ssh -L 5555:localhost:5555 -L 5556:localhost:5556 ubuntu@<CLOUD_IP>
+        ```
+.  **Launch Dashboard (Local)**
+        Start the web dashboard to see what the robot sees.
+        ```bash
+        # On Local Mac
+        python software/dashboard/app.py
+        ```
+        Open `http://localhost:5001` in your browser.
+.  **Teleoperate & Record**
+        *   Use the Dashboard to see the camera feed.
+        *   Run the teleop script in another terminal to control the robot with your keyboard:
+            ```bash
+            python software/examples/alohamini/standalone_teleop.py --ip 127.0.0.1
+            ```
+        *   **To Record**: Click the **"Start Recording"** button on the Dashboard.
+        *   Perform the task (e.g., pick up the object).
+        *   Click **"Stop Recording"**.
+        *   Repeat 50-100 times. The data is saved to `AlohaMini/data_sim/` on the **Cloud Server**.
+    ### Phase 2: Training
+    You train the model directly on the Cloud GPU where the data lives.
+.  **Stop the Simulation** (to free up GPU VRAM).
+.  **Run Training**:
+        Use the LeRobot training script (or your custom training script) pointing to the generated dataset.
+        ```bash
+        # On Cloud
+        python software/src/lerobot/scripts/train.py \
+            --dataset data_sim \
+            --policy act \
+            --batch_size 8 \
+            --num_epochs 1000
+        ```
+        *Note: Exact training command depends on the LeRobot configuration.*
+.  **Output**: This produces a model file (e.g., `outputs/train/policy.safetensors`).
+    ### Phase 3: Evaluation
+    Test the trained model in the simulator to see if it works.
+.  **Restart Simulation (Cloud)**:
+        ```bash
+        isaac_sim_python software/examples/alohamini/isaac_sim/isaac_alohamini_env.py
+        ```
+.  **Run Inference Node (Cloud or Local)**:
+        You need a script that loads the model and closes the loop (reads obs -> runs model -> sends action).
+        *   *Coming Soon: `eval_sim.py` which loads the safetensor and drives the ZMQ robot.*
+.  **Watch (Local)**:
+        Use the Dashboard to watch the robot perform the task autonomously.
+    ## Troubleshooting
+    *   **Laggy Video**: ZMQ over SSH tunneling is usually fast enough for 640x480, but if it lags, check your internet connection speed to the cloud server.
+    *   **"Address already in use"**: Ensure no other python scripts are using ports 5555/5556 on either machine.

software/dashboard/app.py

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -5,23 +5,32 @@
  
    import zmq

    import cv2

    import numpy as np

    from flask import Flask, render_template, Response, jsonify

    from flask import Flask, render_template, Response, jsonify, request

    app = Flask(__name__)

    # Global state

    latest_observation = {}

    lock = threading.Lock()

    connected = False

    recording = False

    cmd_socket = None

    def zmq_worker(ip='127.0.0.1', port=5556):

        global latest_observation, connected

    def zmq_worker(ip='127.0.0.1', port=5556, cmd_port=5555):

        global latest_observation, connected, cmd_socket

        context = zmq.Context()

        # Sub Socket

        socket = context.socket(zmq.SUB)

        socket.setsockopt(zmq.SUBSCRIBE, b"")

        socket.connect(f"tcp://{ip}:{port}")

        socket.setsockopt(zmq.CONFLATE, 1)

        # Cmd Socket (Push to Sim)

        cmd_socket = context.socket(zmq.PUSH)

        cmd_socket.setsockopt(zmq.CONFLATE, 1)

        cmd_socket.connect(f"tcp://{ip}:{cmd_port}")

        print(f"Connecting to ZMQ Stream at {ip}:{port}...")

        while True:

    @@ -93,6 +102,33 @@ def video_feed(camera_name):
  
        return Response(generate_frames(camera_name),

                        mimetype='multipart/x-mixed-replace; boundary=frame')

    @app.route('/api/command', methods=['POST'])

    def send_command():

        global cmd_socket

        if not request.json or 'command' not in request.json:

            return jsonify({'error': 'No command provided'}), 400

        cmd = request.json['command']

        print(f"Received command: {cmd}")

        # Example handling

        if cmd == 'reset_sim':

            # Send reset command (Isaac Sim needs to handle this logic)

            # For now, we can just zero out velocities or send a special flag

            if cmd_socket:

                cmd_socket.send_string(json.dumps({"reset": True}))

        elif cmd == 'start_recording':

            # Trigger recording logic

            if cmd_socket:

                cmd_socket.send_string(json.dumps({"start_recording": True}))

        elif cmd == 'stop_recording':

            if cmd_socket:

                cmd_socket.send_string(json.dumps({"stop_recording": True}))

        return jsonify({'status': 'ok'})

    @app.route('/api/status')

    def get_status():

        with lock:

software/dashboard/templates/index.html

-Original file line number
+Diff line change
@@ Expand Up / @@ -11,14 +11,27 @@ @@
             .camera-box img { max-width: 100%; height: auto; display: block; }
             .camera-title { text-align: center; font-size: 0.9em; margin-bottom: 5px; }
             .status-panel { flex: 1; background: #333; padding: 15px; border-radius: 5px; min-width: 300px; }
+            .controls-panel { width: 100%; background: #333; padding: 15px; border-radius: 5px; margin-top: 20px; }
             table { width: 100%; border-collapse: collapse; }
             td, th { padding: 5px; border-bottom: 1px solid #444; font-size: 0.9em; }
             th { text-align: left; color: #aaa; }
             .value { font-family: monospace; color: #4f4; }
+            .btn { padding: 10px 20px; font-size: 16px; margin-right: 10px; cursor: pointer; background: #555; color: white; border: none; border-radius: 4px; }
+            .btn:hover { background: #777; }
+            .btn-record { background: #c00; }
+            .btn-record:hover { background: #e00; }
+            .status-indicator { display: inline-block; width: 10px; height: 10px; border-radius: 50%; margin-right: 5px; }
+            .status-active { background: #0f0; }
+            .status-inactive { background: #555; }
         </style>
     </head>
     <body>
-        <h1>AlohaMini Dashboard</h1>
+        <div style="display:flex; justify-content:space-between; align-items:center;">
+            <h1>AlohaMini Dashboard</h1>
+            <div>
+                <span id="connectionStatus" class="status-indicator status-inactive"></span> <span id="connectionText">Disconnected</span>
+            </div>
+        </div>
         <div class="container">
             <div class="camera-grid" id="cameraGrid">
@@ Expand All / @@ -33,14 +46,63 @@ <h3>Robot State</h3> @@
             </div>
         </div>
+        <div class="controls-panel">
+            <h3>Simulation Controls</h3>
+            <p>Control the remote simulation directly from here.</p>
+            <button class="btn" onclick="sendCommand('reset_sim')">Reset Simulation</button>
+            <button class="btn btn-record" id="recordBtn" onclick="toggleRecording()">Start Recording</button>
+            <span id="recordingStatus" style="margin-left: 10px; color: #aaa;"></span>
+        </div>
         <script>
             const knownCameras = ['head_top', 'head_back', 'head_front', 'wrist_left', 'wrist_right', 'cam_high', 'cam_low', 'cam_left_wrist', 'cam_right_wrist'];
             const activeCameras = new Set();
+            let isRecording = false;
+            function toggleRecording() {
+                isRecording = !isRecording;
+                const btn = document.getElementById('recordBtn');
+                const status = document.getElementById('recordingStatus');
+                if (isRecording) {
+                    btn.textContent = "Stop Recording";
+                    btn.style.background = "#555";
+                    status.textContent = "Recording in progress...";
+                    sendCommand('start_recording');
+                } else {
+                    btn.textContent = "Start Recording";
+                    btn.style.background = "#c00";
+                    status.textContent = "Saved.";
+                    sendCommand('stop_recording');
+                    setTimeout(() => status.textContent = "", 3000);
+                }
+            }
+            function sendCommand(cmd) {
+                console.log("Sending command:", cmd);
+                fetch('/api/command', {
+                    method: 'POST',
+                    headers: { 'Content-Type': 'application/json' },
+                    body: JSON.stringify({ command: cmd })
+                });
+            }
             function updateStatus() {
                 fetch('/api/status')
                     .then(response => response.json())
                     .then(data => {
+                        // Update connection indicator
+                        const connInd = document.getElementById('connectionStatus');
+                        const connText = document.getElementById('connectionText');
+                        if (data.connected) {
+                            connInd.className = 'status-indicator status-active';
+                            connText.textContent = "Connected";
+                        } else {
+                            connInd.className = 'status-indicator status-inactive';
+                            connText.textContent = "Disconnected";
+                        }
                         const table = document.getElementById('statusTable');
                         let rows = '';
@@ Expand Down @@

software/examples/alohamini/isaac_sim/isaac_alohamini_env.py

-Original file line number
+Diff line change
@@ -1,7 +1,10 @@
     import base64
     import json
     import os
+    import random
     import sys
+    import time
+    from datetime import datetime
     import cv2
     import numpy as np
@@ Expand Down Expand Up / @@ -40,6 +43,54 @@ @@
     # Locate URDF
     URDF_PATH = os.path.join(repo_root, "software/src/lerobot/robots/alohamini/alohamini.urdf")
+    class DatasetRecorder:
+        def __init__(self, root_dir="data"):
+            self.root_dir = root_dir
+            self.is_recording = False
+            self.current_episode_dir = None
+            self.frame_idx = 0
+            self.episode_idx = 0
+            if not os.path.exists(root_dir):
+                os.makedirs(root_dir)
+        def start_recording(self):
+            timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+            self.current_episode_dir = os.path.join(self.root_dir, f"episode_{timestamp}")
+            os.makedirs(self.current_episode_dir)
+            os.makedirs(os.path.join(self.current_episode_dir, "images"))
+            self.frame_idx = 0
+            self.is_recording = True
+            print(f"Started recording to {self.current_episode_dir}")
+        def stop_recording(self):
+            if self.is_recording:
+                print(f"Stopped recording. Saved {self.frame_idx} frames.")
+                self.is_recording = False
+                self.current_episode_dir = None
+        def save_frame(self, obs, action):
+            if not self.is_recording:
+                return
+            # Save JSON data (state + action)
+            data = {
+                "timestamp": time.time(),
+                "observation": {k: v for k, v in obs.items() if not isinstance(v, np.ndarray)},
+                "action": action
+            }
+            with open(os.path.join(self.current_episode_dir, f"frame_{self.frame_idx:06d}.json"), "w") as f:
+                json.dump(data, f)
+            # Save Images
+            for k, v in obs.items():
+                if isinstance(v, np.ndarray): # Image
+                    img_path = os.path.join(self.current_episode_dir, "images", f"{k}_{self.frame_idx:06d}.jpg")
+                    cv2.imwrite(img_path, v)
+            self.frame_idx += 1
     class IsaacAlohaMini:
         def __init__(self, world, urdf_path):
             self.world = world
@@ Expand Down Expand Up / @@ -188,6 +239,8 @@ def main(): @@
         print(f"Isaac Sim AlohaMini running. Ports: OBS={PORT_OBS}, CMD={PORT_CMD}")
+        recorder = DatasetRecorder(root_dir="data_sim")
         while simulation_app.is_running():
             world.step(render=True)
@@ Expand All / @@ -203,7 +256,23 @@ def main(): @@
                 joint_cmds = {}
                 vx, vy, vth = 0, 0, 0
+                # Check for system commands
+                if "start_recording" in cmd:
+                    recorder.start_recording()
+                    continue
+                if "stop_recording" in cmd:
+                    recorder.stop_recording()
+                    continue
                 for k, v in cmd.items():
+                    if k == "reset" and v is True:
+                         # Reset
+                         print("Resetting robot...")
+                         world.reset()
+                         joint_cmds = {name: 0.0 for name in aloha.dof_names}
+                         aloha.set_joint_positions(joint_cmds)
+                         continue
                     if k.endswith(".pos"):
                         joint_name = k.replace(".pos", "")
                         joint_cmds[joint_name] = v
@@ Expand All / @@ -226,6 +295,20 @@ def main(): @@
             # 2. Get Obs & Publish
             obs = aloha.get_observations()
+            # Save frame if recording
+            # (Pass current command as action label for now, though it's imperfect as it's the *commanded* not *measured* action)
+            if recorder.is_recording:
+                # Reconstruct action dict from parsed values
+                # This is a simplification; ideally we log exactly what we sent
+                action_log = {
+                    "x.vel": vx,
+                    "y.vel": vy,
+                    "theta.vel": vth,
+                    # Add arm joint targets if we had them easily accessible here
+                }
+                recorder.save_frame(obs, action_log)
             encoded_obs = {}
             # Process images
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

additional sim improvements, dialing in the workflow for remote training #11

Uh oh!

Diff view

Diff view

There are no files selected for viewing