Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ repos:
"--exclude=__init__.py",
]
- repo: https://github.com/PyCQA/flake8
rev: 4.0.1
rev: 6.0.0
hooks:
- id: flake8
- repo: https://github.com/PyCQA/isort
Expand All @@ -30,6 +30,7 @@ repos:
hooks:
- id: codespell
args: ['--ignore-words-list=ro']
exclude: '\.ipynb$'
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v3.1.0
hooks:
Expand Down
41 changes: 21 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,10 @@ The toolbox supports the most comprehensive 6 datasets \& benchmarks and 10+ pop

The toolbox supports the most advanced high-quality navigation dataset, InternData-N1, which includes 3k+ scenes and 830k VLN data covering diverse embodiments and scenes, and the first dual-system navigation foundation model with leading performance on all the benchmarks and zero-shot generalization capability in the real world, InternVLA-N1.

## 🔥 News
## 🔥
- [2025/10] Add a simple [inference-only demo](scripts/eval/inference_only_demo.ipynb) of InternVLA-N1.
- [2025/10] InternVLA-N1 [technique report](https://internrobotics.github.io/internvla-n1.github.io/static/pdfs/InternVLA_N1.pdf) is released. Please check our [homepage](https://internrobotics.github.io/internvla-n1.github.io/).
- [2025/09] Real-world deployment code of InternVLA-N1 is released.
- [2025/09] Real-world deployment code of InternVLA-N1 is released.- [2025/09] Real-world deployment code of InternVLA-N1 is released.
- [2025/07] We are hosting 🏆IROS 2025 Grand Challenge, stay tuned at [official website](https://internrobotics.shlab.org.cn/challenge/2025/).
- [2025/07] InternNav v0.1.1 released.

Expand Down Expand Up @@ -144,38 +145,38 @@ Please refer to the [documentation](https://internrobotics.github.io/user_guide/
#### VLN-CE Task
| Model | Dataset/Benchmark | NE | OS | SR | SPL | Download |
| ------ | ----------------- | -- | -- | --------- | -- | --------- |
| `InternVLA-N1 (S2)` | R2R | 4.89 | 60.6 | 55.4 | 52.1| [Model](https://huggingface.co/InternRobotics/InternVLA-N1-S2) |
| `InternVLA-N1` | R2R | **4.83** | **63.3** | **58.2** | **54.0** | [Model](https://huggingface.co/InternRobotics/InternVLA-N1) |
| `InternVLA-N1 (S2)` | R2R | 4.89 | 60.6 | 55.4 | 52.1| [Model](https://huggingface.co/InternRobotics/InternVLA-N1-S2) |
| `InternVLA-N1` | R2R | **4.83** | **63.3** | **58.2** | **54.0** | [Model](https://huggingface.co/InternRobotics/InternVLA-N1) |
| `InternVLA-N1 (S2)` | RxR | 6.67 | 56.5 | 48.6 | 42.6 | [Model](https://huggingface.co/InternRobotics/InternVLA-N1-S2) |
| `InternVLA-N1` | RxR | **5.91** | **60.8** | **53.5** | **46.1** | [Model](https://huggingface.co/InternRobotics/InternVLA-N1) |
| `InternVLA-N1-Preview (S2)` | R2R | 5.09 | 60.9 | 53.7 | 49.7 | [Model](https://huggingface.co/InternRobotics/InternVLA-N1-Preview-S2) |
| `InternVLA-N1-Preview` | R2R | **4.76** | **63.4** | **56.7** | **52.6** | [Model](https://huggingface.co/InternRobotics/InternVLA-N1-Preview) |
| `InternVLA-N1-Preview (S2)` | R2R | 5.09 | 60.9 | 53.7 | 49.7 | [Model](https://huggingface.co/InternRobotics/InternVLA-N1-Preview-S2) |
| `InternVLA-N1-Preview` | R2R | **4.76** | **63.4** | **56.7** | **52.6** | [Model](https://huggingface.co/InternRobotics/InternVLA-N1-Preview) |
| `InternVLA-N1-Preview (S2)` | RxR | 6.39 | 60.1 | 50.5 | 43.3 | [Model](https://huggingface.co/InternRobotics/InternVLA-N1-Preview-S2) |
| `InternVLA-N1-Preview` | RxR | **5.65** | **63.2** | **53.5** | **45.7** | [Model](https://huggingface.co/InternRobotics/InternVLA-N1-Preview) |

#### VLN-PE Task
| Model | Dataset/Benchmark | NE | OS | SR | SPL | Download |
| ------ | ----------------- | -- | -- | -- | --- | --- |
| `Seq2Seq` | Flash | 8.27 | 43.0 | 15.7 | 9.7 | [Model](https://huggingface.co/InternRobotics/VLN-PE) |
| `CMA` | Flash | 7.52 | 45.0 | 24.4 | 18.2 | [Model](https://huggingface.co/InternRobotics/VLN-PE) |
| `RDP` | Flash | 6.98 | 42.5 | 24.9 | 17.5 | [Model](https://huggingface.co/InternRobotics/VLN-PE) |
| `Seq2Seq` | Flash | 8.27 | 43.0 | 15.7 | 9.7 | [Model](https://huggingface.co/InternRobotics/VLN-PE) |
| `CMA` | Flash | 7.52 | 45.0 | 24.4 | 18.2 | [Model](https://huggingface.co/InternRobotics/VLN-PE) |
| `RDP` | Flash | 6.98 | 42.5 | 24.9 | 17.5 | [Model](https://huggingface.co/InternRobotics/VLN-PE) |
| `InternVLA-N1-Preview` | Flash | **4.21** | **68.0** | **59.8** | **54.0** | [Model](https://huggingface.co/InternRobotics/InternVLA-N1-Preview) |
| `InternVLA-N1` | Flash | **4.13** | **67.6** | **60.4** | **54.9** | [Model](https://huggingface.co/InternRobotics/InternVLA-N1) |
| `Seq2Seq` | Physical | 7.88 | 28.1 | 15.1 | 10.7 | [Model](https://huggingface.co/InternRobotics/VLN-PE) |
| `CMA` | Physical | 7.26 | 31.4 | 22.1 | 18.6 | [Model](https://huggingface.co/InternRobotics/VLN-PE) |
| `RDP` | Physical | 6.72 | 36.9 | 25.2 | 17.7 | [Model](https://huggingface.co/InternRobotics/VLN-PE) |
| `InternVLA-N1` | Flash | **4.13** | **67.6** | **60.4** | **54.9** | [Model](https://huggingface.co/InternRobotics/InternVLA-N1) |
| `Seq2Seq` | Physical | 7.88 | 28.1 | 15.1 | 10.7 | [Model](https://huggingface.co/InternRobotics/VLN-PE) |
| `CMA` | Physical | 7.26 | 31.4 | 22.1 | 18.6 | [Model](https://huggingface.co/InternRobotics/VLN-PE) |
| `RDP` | Physical | 6.72 | 36.9 | 25.2 | 17.7 | [Model](https://huggingface.co/InternRobotics/VLN-PE) |
| `InternVLA-N1-Preview` | Physical | **5.31** | **49.0** | **42.6** | **35.8** | [Model](https://huggingface.co/InternRobotics/InternVLA-N1-Preview) |
| `InternVLA-N1` | Physical | **4.73** | **56.7** | **50.6** | **43.3** | [Model](https://huggingface.co/InternRobotics/InternVLA-N1) |
| `InternVLA-N1` | Physical | **4.73** | **56.7** | **50.6** | **43.3** | [Model](https://huggingface.co/InternRobotics/InternVLA-N1) |

#### Visual Navigation Task - PointGoal Navigation
| Model | Dataset/Benchmark | SR | SPL | Download |
| ------ | ----------------- | -- | -- | --------- |
| `iPlanner` | ClutteredEnv | 84.8 | 83.6 | [Model](https://github.com/InternRobotics/NavDP?tab=readme-ov-file#%EF%B8%8F-installation-of-baseline-library) |
| `ViPlanner` | ClutteredEnv | 72.4 | 72.3 | [Model](https://github.com/InternRobotics/NavDP?tab=readme-ov-file#%EF%B8%8F-installation-of-baseline-library) |
| `iPlanner` | ClutteredEnv | 84.8 | 83.6 | [Model](https://github.com/InternRobotics/NavDP?tab=readme-ov-file#%EF%B8%8F-installation-of-baseline-library) |
| `ViPlanner` | ClutteredEnv | 72.4 | 72.3 | [Model](https://github.com/InternRobotics/NavDP?tab=readme-ov-file#%EF%B8%8F-installation-of-baseline-library) |
| `InternVLA-N1 (S1)` | ClutteredEnv | **89.8** | **87.7** | [Model](https://github.com/InternRobotics/NavDP?tab=readme-ov-file#%EF%B8%8F-installation-of-baseline-library) |
| `iPlanner` | InternScenes | 48.8 | 46.7 | [Model](https://github.com/InternRobotics/NavDP?tab=readme-ov-file#%EF%B8%8F-installation-of-baseline-library) |
| `ViPlanner` | InternScenes | 54.3 | 52.5 | [Model](https://github.com/InternRobotics/NavDP?tab=readme-ov-file#%EF%B8%8F-installation-of-baseline-library) |
| `InternVLA-N1 (S1)` | InternScenes | **65.7** | **60.7** | [Model](https://github.com/InternRobotics/NavDP?tab=readme-ov-file#%EF%B8%8F-installation-of-baseline-library) |
| `iPlanner` | InternScenes | 48.8 | 46.7 | [Model](https://github.com/InternRobotics/NavDP?tab=readme-ov-file#%EF%B8%8F-installation-of-baseline-library) |
| `ViPlanner` | InternScenes | 54.3 | 52.5 | [Model](https://github.com/InternRobotics/NavDP?tab=readme-ov-file#%EF%B8%8F-installation-of-baseline-library) |
| `InternVLA-N1 (S1)` | InternScenes | **65.7** | **60.7** | [Model](https://github.com/InternRobotics/NavDP?tab=readme-ov-file#%EF%B8%8F-installation-of-baseline-library) |



Expand Down Expand Up @@ -243,7 +244,7 @@ If you use the specific pretrained models and benchmarks, please kindly cite the

## 📄 License

InternNav's codes are [MIT licensed](LICENSE).
InternNav's codes are [MIT licensed](LICENSE).
The open-sourced InternData-N1 data are under the <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License </a><a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/80x15.png" /></a>.
Other datasets like VLN-CE inherit their own distribution licenses.

Expand Down
Binary file added assets/realworld_sample_data.tar.gz
Binary file not shown.
13 changes: 10 additions & 3 deletions internnav/agent/internvla_n1_agent_realworld.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ class InternVLAN1AsyncAgent:
def __init__(self, args):
self.device = torch.device(args.device)
self.save_dir = "test_data/" + datetime.now().strftime("%Y%m%d_%H%M%S")
print(f"args.model_path{args.model_path}")
self.model = InternVLAN1ForCausalLM.from_pretrained(
args.model_path,
torch_dtype=torch.bfloat16,
Expand All @@ -42,6 +43,7 @@ def __init__(self, args):
self.resize_w = args.resize_w
self.resize_h = args.resize_h
self.num_history = args.num_history
self.PLAN_STEP_GAP = args.plan_step_gap

prompt = "You are an autonomous navigation assistant. Your task is to <instruction>. Where should you go next to stay on track? Please output the next waypoint's coordinates in the image. Please output STOP when you have successfully completed the task."
answer = ""
Expand Down Expand Up @@ -91,6 +93,12 @@ def reset(self):
self.llm_output = ""
self.past_key_values = None

self.output_action = None
self.output_latent = None
self.output_pixel = None
self.pixel_goal_rgb = None
self.pixel_goal_depth = None

self.save_dir = "test_data/" + datetime.now().strftime("%Y%m%d_%H%M%S")
os.makedirs(self.save_dir, exist_ok=True)

Expand Down Expand Up @@ -118,9 +126,8 @@ def trajectory_tovw(self, trajectory, kp=1.0):

def step(self, rgb, depth, pose, instruction, intrinsic, look_down=False):
dual_sys_output = S2Output()
PLAN_STEP_GAP = 8
no_output_flag = self.output_action is None and self.output_latent is None
if (self.episode_idx - self.last_s2_idx > PLAN_STEP_GAP) or look_down or no_output_flag:
if (self.episode_idx - self.last_s2_idx > self.PLAN_STEP_GAP) or look_down or no_output_flag:
self.output_action, self.output_latent, self.output_pixel = self.step_s2(
rgb, depth, pose, instruction, intrinsic, look_down
)
Expand Down Expand Up @@ -152,7 +159,7 @@ def step(self, rgb, depth, pose, instruction, intrinsic, look_down=False):
)
trajectories = self.step_s1(self.output_latent, rgbs, depths)

dual_sys_output.output_action = traj_to_actions(trajectories)
dual_sys_output.output_trajectory = traj_to_actions(trajectories, use_discrate_action=False)

return dual_sys_output

Expand Down
42 changes: 27 additions & 15 deletions internnav/model/utils/vln_utils.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
from dataclasses import dataclass
from typing import Optional, Tuple

import numpy as np
import torch
from PIL import Image
from torch import Tensor
from typing import Optional, Tuple
import torch
from dataclasses import dataclass


def open_image(image_or_image_path):
Expand All @@ -15,9 +15,11 @@ def open_image(image_or_image_path):
else:
raise ValueError("Unsupported input type!")


def split_and_clean(text):
# Split by <image> while preserving the delimiter
import re

parts = re.split(r'(<image>)', text)
results = []
for part in parts:
Expand All @@ -30,10 +32,11 @@ def split_and_clean(text):
results.append(clean_part)
return results


def chunk_token(dp_actions):
out_list = []
out_list_read = []

for i in range(len(dp_actions)):
xyyaw = dp_actions[i]
x = xyyaw[0]
Expand All @@ -56,7 +59,8 @@ def chunk_token(dp_actions):

return out_list

def traj_to_actions(dp_actions):

def traj_to_actions(dp_actions, use_discrate_action=True):
def reconstruct_xy_from_delta(delta_xyt):
"""
Input:
Expand Down Expand Up @@ -84,7 +88,7 @@ def trajectory_to_discrete_actions_close_to_goal(trajectory, step_size=0.25, tur
turn_angle_rad = np.deg2rad(turn_angle_deg)
traj = trajectory
goal = trajectory[-1]

def normalize_angle(angle):
return (angle + np.pi) % (2 * np.pi) - np.pi

Expand Down Expand Up @@ -120,13 +124,17 @@ def normalize_angle(angle):
pos = next_pos

return actions

# unnormalize
dp_actions[:, :, :2] /= 4.0
all_trajectory = reconstruct_xy_from_delta(dp_actions.float().cpu().numpy())
trajectory = np.mean(all_trajectory, axis=0)
actions = trajectory_to_discrete_actions_close_to_goal(trajectory)
return actions
if use_discrate_action:
actions = trajectory_to_discrete_actions_close_to_goal(trajectory)
return actions
else:
return trajectory


@dataclass
class S2Input:
Expand All @@ -138,29 +146,33 @@ class S2Input:
look_down: Optional[bool] = False
should_infer: Optional[bool] = False


@dataclass
class S2Output:
idx: Optional[int] = -1
is_infering: Optional[bool] = False
output_action: Optional[np.ndarray] = None
output_trajectory: Optional[np.ndarray] = None
output_pixel: Optional[np.ndarray] = None
output_latent: Optional[torch.Tensor] = None
rgb_memory: Optional[np.ndarray] = None # 用于记录pixel goal那一帧的rgb
depth_memory: Optional[np.ndarray] = None # 用于记录pixel goal那一帧的depth
rgb_memory: Optional[np.ndarray] = None # 用于记录pixel goal那一帧的rgb
depth_memory: Optional[np.ndarray] = None # 用于记录pixel goal那一帧的depth

def validate(self):
"""确保output_action、output_pixel和output_latent中只有一个为非None"""
outputs = [self.output_action, self.output_pixel, self.output_latent]
non_none_count = sum(1 for x in outputs if x is not None)
return non_none_count > 0 and self.idx >= 0



@dataclass
class S1Input:
pixel_goal: Optional[np.ndarray] = None
latent: Optional[np.ndarray] = None
rgb: Optional[np.ndarray] = None
depth: Optional[np.ndarray] = None


@dataclass
class S1Output:
# idx: Optional[int] = None
Expand All @@ -171,7 +183,6 @@ class S1Output:
vis_image: Optional[np.ndarray] = None



def image_resize(
img: Tensor,
size: Tuple[int, int],
Expand Down Expand Up @@ -241,6 +252,7 @@ def rho_theta(curr_pos: np.ndarray, curr_heading: float, curr_goal: np.ndarray)

return float(rho), float(theta)


def get_rotation_matrix(angle: float, ndims: int = 2) -> np.ndarray:
"""Returns a 2x2 or 3x3 rotation matrix for a given angle; if 3x3, the z-axis is
rotated."""
Expand All @@ -260,4 +272,4 @@ def get_rotation_matrix(angle: float, ndims: int = 2) -> np.ndarray:
]
)
else:
raise ValueError("ndims must be 2 or 3")
raise ValueError("ndims must be 2 or 3")
Loading
Loading