diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..e52a5ff --- /dev/null +++ b/.gitignore @@ -0,0 +1,5 @@ +**/__pycache__/ +.DS_Store +lamaria_data/ +demo/ +demo_outputs/ diff --git a/README.md b/README.md index 3f1b51f..478209a 100644 --- a/README.md +++ b/README.md @@ -1 +1,259 @@ -# lamaria \ No newline at end of file +

+

The LaMAria Dataset
Benchmarking Egocentric Visual-Inertial SLAM at City Scale

+

+ Anusha Krishnan*, + Shaohui Liu*, + Paul-Edouard Sarlin*, + Oscar Gentilhomme,
+ David Caruso, + Maurizio Monge, + Richard Newcombe, + Jakob Engel, + Marc Pollefeys +

+

+ *Equal contribution +

+

+
+

+

+

+ Website | + Paper +

+ + +We present **LaMAria**, an egocentric, city-scale benchmark for **visual-inertial SLAM**, featuring +~ **22 hours / 70 km** of trajectories with survey-grade control points providing **centimeter-accurate ground truth**. + +Using **LaMAria**, you can: +- Evaluate SLAM systems under real-world egocentric setup: low light, moving platforms, exposure changes, time-varying sensor calibrations. +- Benchmark against highly accurate ground truths. + +This dataset offers 23 training sequences and 63 test sequences. + +

+ Overview of LaMAria
+ Figure 1: Overview of the LaMAria dataset and benchmark. +

+ +To learn more about the dataset, please refer to our main dataset website or our paper. + +## Table of Contents +- [Installation](#installation) +- [Downloading the Dataset](#downloading-the-dataset) +- [Evaluation](#evaluation) + - [Evaluation w.r.t. Control Points](#evaluation-wrt-control-points) + - [Evaluation w.r.t. Pseudo-GT](#evaluation-wrt-pseudo-gt) + - [EVO Evaluation w.r.t. MPS](#evo-evaluation-wrt-mps) +- [Converting VRS to ASL/ROSbag format](#converting-vrs-to-aslrosbag-format) +- [Example Visual-Inertial Optimization](#example-visual-inertial-optimization) +- [BibTeX Citation](#bibtex-citation) + + +## Installation + +Create an environment: +```bash +python3 -m venv lamaria_env +source lamaria_env/bin/activate +``` + +Install the package: +```bash +git clone git@github.com:cvg/lamaria.git +cd lamaria +python -m pip install -r requirements.txt +python -m pip install -e . +``` + +## Downloading the dataset +Our dataset is fully hosted via the archive here. + +```bash +python -m tools.download_lamaria --help +``` + +We provide a downloading script [here](tools/download_lamaria.py), which can be used to download: +- Specific sequences or entire sets (training/test). +- Specific types: + - Raw - Downloads raw `.vrs` files and Aria calibration file. + - ASL - Downloads ASL folder and pinhole calibration file. + - ROSbag - Downloads ROS1 bag and pinhole calibration file. + - All - Downloads both raw and pinhole data. + +Ground truth files are automatically downloaded for the training sequences. + +### Data sizes +- Raw data (vrs + aria_calibrations): ~890G (training: ~117G + test: ~773G) +- ASL folder + pinhole_calibrations: ~1.1T (training: ~170G + test: ~884G) +- ROSbag + pinhole_calibrations: ~1.5T (training: ~247G + test: ~1.3T) + +### Some example commands + +To download the raw data of a specific sequence (e.g., `R_01_easy`): +```bash +python -m tools.download_lamaria --output_dir ./lamaria_data \ + --sequences R_01_easy --type raw +``` +To download 3 custom sequences in rosbag format: +```bash +python -m tools.download_lamaria --output_dir ./lamaria_data \ + --sequences R_01_easy R_02_easy R_03_easy --type rosbag +``` +To download 3 custom sequences in asl format: +```bash +python -m tools.download_lamaria --output_dir ./lamaria_data \ + --sequences R_01_easy R_02_easy R_03_easy --type asl +``` +To download all training sequences in both raw and pinhole formats: +```bash +python -m tools.download_lamaria --output_dir ./lamaria_data \ + --set training --type all +``` + +The downloaded raw data is stored in the following way: +``` +output_dir/ +├── training/ +│ ├── R_01_easy/ +│ │ ├── aria_calibrations/ +│ │ │ └── R_01_easy.json +│ │ ├── ground_truth/ +│ │ │ ├── pGT/ +│ │ │ │ └── R_01_easy.txt +│ │ │ └── control_points/ +│ │ │ └── # if sequence has CPs +│ │ ├── raw_data/ +│ │ │ └── R_01_easy.vrs +│ └── ... +└── test/ # no ground truth + └── sequence_1_1 +``` + +For more information about the training and test sequences, refer to the dataset details. To learn more about the various data formats, calibration files and ground-truths, visit our documentation. + +## Evaluation +Our training and test sequences are categorized into varying challenges. To evaluate your SLAM results on our data, we provide two main ways: +1. **Evaluation via the website**: Upload your results on our website to get evaluation results. **All sequences can be submitted via our website.** Results on test sequences are displayed on the public [leaderboard](https://www.lamaria.ethz.ch/leaderboard). +2. **Standalone evaluation**: Run the evaluation scripts locally using the provided `lamaria` package. These scripts can be run only on the training sequences (since ground truth is required). + +### Input format +The input pose estimate file must be a text file where each line corresponds to a timestamped pose in the following format: +``` +timestamp tx ty tz qx qy qz qw +``` +The timestamp must be in nanoseconds. + +### Download demo data +We provide a small script `quickstart.sh` that downloads demo data from the archive. The standalone evaluations and example visual-inertial optimization can be run on the downloaded data. + +```bash +chmod +x quickstart.sh +./quickstart.sh +``` + +The first sequence of the controlled experimental set (R_01_easy) and additional set (sequence_1_19) are stored in the `demo/` folder. + +### Evaluation w.r.t. Control Points +Three sequences in the controlled experimental set and the additional set sequences can be evaluated w.r.t. control points. This script computes the score and control point recall based on the alignment of the estimated trajectory to the control points. + +To perform the evaluation on the downloaded demo data: +```bash +python -m evaluate_wrt_control_points --estimate demo/estimate/sequence_1_19.txt \ + --cp_json_file demo/sequence_1_19/ground_truth/control_points/sequence_1_19.json \ + --device_calibration_json demo/sequence_1_19/aria_calibrations/sequence_1_19.json \ + --corresponding_sensor imu \ + --output_path demo_outputs/eval_cp +``` + +*This command evaluates the provided estimate w.r.t. control points and stores the results in `demo/eval_cp`. The `--corresponding_sensor` flag indicates which sensor the poses are expressed in (e.g., `imu` or `cam0`).* + +To learn more about the control points and sparse evaluation, refer to Section 4.1 and 4.2 of our paper. + +### Evaluation w.r.t Pseudo-GT +This script evaluates the estimated trajectory w.r.t. the pseudo-dense ground truth from our ground-truthing pipeline. It requires the alignment obtained from the sparse evaluation (w.r.t. control points). The script computes the pose recall @ 1m and @ 5m, after aligning the estimated trajectory to the pseudo-ground truth. + +To perform the evaluation on the downloaded demo data: +```bash +python -m evaluate_wrt_pgt --estimate demo/estimate/sequence_1_19.txt \ + --gt_estimate demo/sequence_1_19/ground_truth/pGT/sequence_1_19.txt \ + --sparse_eval_result demo_outputs/eval_cp/sparse_eval_result.npy +``` + +### EVO Evaluation w.r.t MPS +This script evaluates the estimated trajectory w.r.t. the pseudo-dense ground truth from Machine Perception Services (MPS). It computes the Absolute Trajectory Error (ATE) RMSE between the two trajectories. + +To perform the evaluation on the downloaded demo data: +```bash +python -m evaluate_wrt_mps --estimate demo/estimate/R_01_easy.txt \ + --gt_estimate demo/R_01_easy/ground_truth/pGT/R_01_easy.txt +``` + +This method is used to evaluate results on the controlled experimental set, where the gt estimate file is built directly from the MPS estimated trajectory. + +## Converting VRS to ASL/ROSbag format + +We provide some utility scripts in the `tools/` folder to convert between different data formats used in our dataset. + +
+Click to expand commands +
+ +1. Converting from `.vrs` to ASL folder format: +```bash +python -m tools.vrs_to_asl_folder --vrs_file path/to/sequence.vrs --output_asl_folder path/to/output_asl_folder +``` + +2. Converting from ASL folder to ROS1 bag: +```bash +python -m tools.asl_folder_to_rosbag --input_asl_folder path/to/asl_folder --output_rosbag path/to/output.bag +``` + +3. Undistorting ASL folder images to PINHOLE format: +```bash +python -m tools.undistort_asl_folder --calibration_file path/to/calibration.json --asl_folder path/to/asl_folder --output_asl_folder path/to/output_undistorted_asl_folder +``` +*Note: The undistortion script requires COLMAP to be installed from source. Please follow the instructions [here](https://colmap.github.io/install.html) to install the library.* + + +
+ +## Example Visual-Inertial Optimization + +As a byproduct of our groundtruthing pipeline, we provide an example visual-inertial optimization pipeline built within the `lamaria` package. This pipeline takes as input a pose estimate file (in the format described below) and `.vrs`, performs keyframing and triangulation, and optimizes the trajectory using visual and inertial residuals. + +The input pose estimate file must follow the same format as described in the [input format](#input-format) section above. + +### Additional Installation +To extract images from a `.vrs` file, it is required to install the [VRS Command Line Tools](https://facebookresearch.github.io/vrs/docs/VrsCliTool/). Please follow the instructions [here](https://github.com/facebookresearch/vrs?tab=readme-ov-file#instructions-macos-and-ubuntu-and-container) to install the library from source. + +### Running the example +To run an example visual-inertial optimization on the demo data downloaded following [download demo data](#download-demo-data) (Here the intitial estimation file in the demo is from a run of [OpenVINS](https://docs.openvins.com/)): +```bash +python -m example_vi_optimization --output demo_outputs/vi_optim \ + --vrs demo/sequence_1_18/raw_data/sequence_1_18.vrs \ + --estimate demo/estimate/sequence_1_18.txt +``` + +## BibTeX citation +```bibtex +@InProceedings{Krishnan_2025_ICCV, + author = {Krishnan, Anusha and + Liu, Shaohui and + Sarlin, Paul-Edouard and + Gentilhomme, Oscar and + Caruso, David and + Monge, Maurizio and + Newcombe, Richard and + Engel, Jakob and + Pollefeys, Marc}, + title = {Benchmarking Egocentric Visual-Inertial SLAM at City Scale}, + booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, + year = {2025} +} +``` diff --git a/assets/iccv_logo.png b/assets/iccv_logo.png new file mode 100644 index 0000000..d47a3b0 Binary files /dev/null and b/assets/iccv_logo.png differ diff --git a/assets/teaser_final.png b/assets/teaser_final.png new file mode 100644 index 0000000..00ba238 Binary files /dev/null and b/assets/teaser_final.png differ diff --git a/defaults.yaml b/defaults.yaml new file mode 100644 index 0000000..82772ff --- /dev/null +++ b/defaults.yaml @@ -0,0 +1,45 @@ +# Default configuration for run_pipeline.py +keyframing: + max_rotation: 20.0 + max_distance: 1.0 + max_elapsed: 1e9 # 1 second in ns + +triangulation: + feature_conf: aliked-n16 + matcher_conf: aliked+lightglue + retrieval_conf: netvlad + num_retrieval_matches: 5 + + # relaxed triangulation + # https://github.com/colmap/colmap/tree/3.12.6/src/colmap/sfm/incremental_triangulator.h + merge_max_reproj_error: 15.0 + complete_max_reproj_error: 15.0 + min_angle: 1.0 + + # https://github.com/colmap/colmap/tree/3.12.6/src/colmap/sfm/incremental_mapper.h + filter_max_reproj_error: 15.0 + filter_min_tri_angle: 1.0 + +optimization: + general: + use_callback: true + max_num_iterations: 10 + normalize_reconstruction: false + minimizer_progress_to_stdout: true + update_state_every_iteration: true + + cam: + feature_std: 1.0 + optimize_focal_length: false + optimize_principal_point: false + optimize_extra_params: false + optimize_cam_from_rig: false + + imu: + optimize_scale: false + optimize_gravity: false + optimize_imu_from_rig: false + optimize_bias: true + gyro_infl: 1.0 + acc_infl: 1.0 + integration_noise_density: 0.05 diff --git a/evaluate_wrt_control_points.py b/evaluate_wrt_control_points.py new file mode 100644 index 0000000..212da74 --- /dev/null +++ b/evaluate_wrt_control_points.py @@ -0,0 +1,132 @@ +import argparse +from pathlib import Path + +from lamaria import logger +from lamaria.eval.sparse_evaluation import evaluate_wrt_control_points +from lamaria.structs.control_point import ( + load_cp_json, + run_control_point_triangulation, +) +from lamaria.structs.trajectory import Trajectory +from lamaria.utils.aria import initialize_reconstruction_from_calibration_file +from lamaria.utils.metrics import ( + calculate_control_point_recall, + calculate_control_point_score, +) + + +def run( + estimate: Path, + cp_json_file: Path, + device_calibration_json: Path, + output_path: Path, + corresponding_sensor: str = "imu", +) -> bool: + """Run sparse evaluation for sequences that observe control points. + + Args: + estimate (Path): Path to the pose estimate file. + cp_json_file (Path): Path to the sparse ground-truth JSON file + containing control point data. + device_calibration_json (Path): Path to the device calibration JSON. + output_path (Path): Directory where intermediate data and + evaluation results will be saved. + corresponding_sensor (str): The reference sensor to use + ("imu" or "cam0"). + + Returns: + bool: True if the evaluation was successful, False otherwise. + + Notes: + Expected estimate file format (space-separated columns): + ``` + timestamp tx ty tz qx qy qz qw + ``` + """ + traj = Trajectory.load_from_file( + estimate, invert_poses=False, corresponding_sensor=corresponding_sensor + ) + if not traj.is_loaded(): + logger.error("Estimate could not be loaded") + return False + + init_reconstruction = initialize_reconstruction_from_calibration_file( + device_calibration_json + ) + control_points, timestamp_to_images = load_cp_json(cp_json_file) + + reconstruction = traj.add_estimate_poses_to_reconstruction( + init_reconstruction, timestamp_to_images + ) + + run_control_point_triangulation( + reconstruction, + control_points, + ) + + result = evaluate_wrt_control_points( + reconstruction, + control_points, + ) + + if result is None: + logger.error("Sparse evaluation failed.") + return False + + result_path = output_path / "sparse_eval_result.npy" + result.save_as_npy(result_path) + + score = calculate_control_point_score(result) + recall = calculate_control_point_recall(result) + + logger.info(f"CP Score: {score:.4f}") + logger.info(f"CP Recall @ 1m: {recall:.4f}") + return True + + +if __name__ == "__main__": + parser = argparse.ArgumentParser( + description="Run baseline sparse evaluation" + ) + parser.add_argument( + "--estimate", + type=Path, + required=True, + help="Path to the pose estimate file produced by a specific method", + ) + parser.add_argument( + "--cp_json_file", + type=Path, + required=True, + help="Path to the sparse GT JSON file", + ) + parser.add_argument( + "--device_calibration_json", + type=Path, + required=True, + help="Path to the Aria device calibration JSON file, " + "found on the Lamaria website", + ) + parser.add_argument( + "--output_path", + type=Path, + required=True, + help="Path to the output folder where intermediate data " + "and evaluation results will be saved", + ) + parser.add_argument( + "--corresponding_sensor", + type=str, + default="imu", + choices=["imu", "cam0"], + help="The sensor in which the poses are expressed.", + ) + args = parser.parse_args() + + _ = run( + args.estimate, + args.cp_json_file, + args.device_calibration_json, + args.output_path, + args.corresponding_sensor, + ) diff --git a/evaluate_wrt_mps.py b/evaluate_wrt_mps.py new file mode 100644 index 0000000..427b594 --- /dev/null +++ b/evaluate_wrt_mps.py @@ -0,0 +1,67 @@ +import argparse +from pathlib import Path + +from lamaria import logger +from lamaria.eval.evo_evaluation import evaluate_wrt_mps +from lamaria.structs.trajectory import Trajectory + + +def run( + estimate: Path, + gt_estimate: Path, +) -> bool: + """Evaluate an estimated trajectory with respect to + MPS pGT. There are no control points involved when + evaluating using this function. Resorting to evo + package for the evaluation. + + Args: + estimate (Path): Path to the pose estimate file. + gt_estimate (Path): Path to the pGT pose estimate file. + output_path (Path): Path to save the evaluation results. + Returns: + bool: True if the evaluation was successful, False otherwise. + """ + + est_traj = Trajectory.load_from_file(estimate, invert_poses=False) + if not est_traj.is_loaded(): + logger.error("Estimate could not be loaded") + return False + + gt_traj = Trajectory.load_from_file(gt_estimate, invert_poses=False) + if not gt_traj.is_loaded(): + logger.error("Ground-truth estimate could not be loaded") + return False + + result = evaluate_wrt_mps(est_traj, gt_traj) + + if result is None: + logger.error("Evaluation w.r.t MPS failed") + return False + + logger.info(f"ATE RMSE: {result:.4f} m") + return True + + +if __name__ == "__main__": + parser = argparse.ArgumentParser( + description="Evaluate an estimated trajectory with respect to MPS pGT." + ) + parser.add_argument( + "--estimate", + type=Path, + required=True, + help="Path to the pose estimate file.", + ) + parser.add_argument( + "--gt_estimate", + type=Path, + required=True, + help="Path to the MPS pGT file.", + ) + args = parser.parse_args() + + run( + args.estimate, + args.gt_estimate, + ) diff --git a/evaluate_wrt_pgt.py b/evaluate_wrt_pgt.py new file mode 100644 index 0000000..db82216 --- /dev/null +++ b/evaluate_wrt_pgt.py @@ -0,0 +1,94 @@ +import argparse +from pathlib import Path + +import pycolmap + +from lamaria import logger +from lamaria.eval.pgt_evaluation import evaluate_wrt_pgt +from lamaria.structs.sparse_eval import SparseEvalResult +from lamaria.structs.trajectory import Trajectory +from lamaria.utils.metrics import calculate_pose_recall + + +def run( + estimate: Path, + gt_estimate: Path, + sparse_eval_result: Path, +) -> bool: + """Evaluate an estimated trajectory with respect to a pseudo + ground-truth trajectory that observes control points. + + Args: + estimate (Path): Path to the pose estimate file. + gt_estimate (Path): Path to the pGT pose estimate file. + sparse_eval_result (Path): Path to the .npy file containing + SparseEvalResult data. This file is generated by + `evaluate_wrt_control_points`, and is needed to + transform the estimated trajectory to the pGT. + Returns: + bool: True if the evaluation was successful, False otherwise. + """ + + est_traj = Trajectory.load_from_file(estimate, invert_poses=False) + if not est_traj.is_loaded(): + logger.error("Estimate could not be loaded") + return False + + gt_traj = Trajectory.load_from_file(gt_estimate, invert_poses=False) + if not gt_traj.is_loaded(): + logger.error("pGT Estimate could not be loaded") + return False + + result = SparseEvalResult.load_from_npy(sparse_eval_result) + if result is None: + logger.error("SparseEvalResult could not be loaded") + return False + + sim3d = result.alignment + if not isinstance(sim3d, pycolmap.Sim3d): + logger.error("No valid Sim3d found in SparseEvalResult") + return False + + error = evaluate_wrt_pgt(est_traj, gt_traj, sim3d) + + if error is None: + logger.error("pGT evaluation failed.") + return False + + for threshold in [1.0, 5.0]: + pose_recall = calculate_pose_recall(error, len(gt_traj), threshold) + logger.info(f"Pose Recall @ {threshold}m: {pose_recall:.4f}") + + return True + + +if __name__ == "__main__": + parser = argparse.ArgumentParser( + description="Evaluate an estimated trajectory with respect to pGT." + " Requires a SparseEvalResult .npy file." + ) + parser.add_argument( + "--estimate", + type=Path, + required=True, + help="Path to the pose estimate file.", + ) + parser.add_argument( + "--gt_estimate", + type=Path, + required=True, + help="Path to the pGT pose estimate file.", + ) + parser.add_argument( + "--sparse_eval_result", + type=Path, + required=True, + help="Path to the .npy file containing SparseEvalResult data." + "Result of `evaluate_wrt_control_points`.", + ) + args = parser.parse_args() + _ = run( + args.estimate, + args.gt_estimate, + args.sparse_eval_result, + ) diff --git a/example_vi_optimization.py b/example_vi_optimization.py new file mode 100644 index 0000000..1b9572d --- /dev/null +++ b/example_vi_optimization.py @@ -0,0 +1,220 @@ +import argparse +import shutil +from pathlib import Path + +import pycolmap + +from lamaria.config.options import ( + KeyframeSelectorOptions, + TriangulatorOptions, + VIOptimizerOptions, +) +from lamaria.config.pipeline import PipelineOptions +from lamaria.pipeline.estimate_to_timed_reconstruction import ( + convert_estimate_into_timed_reconstruction, +) +from lamaria.pipeline.keyframe_selection import KeyframeSelector +from lamaria.pipeline.optim.session import SingleSeqSession +from lamaria.pipeline.optim.vi_optimization import VIOptimizer +from lamaria.pipeline.triangulation import run as triangulate +from lamaria.structs.timed_reconstruction import TimedReconstruction +from lamaria.structs.trajectory import Trajectory +from lamaria.structs.vi_reconstruction import VIReconstruction +from lamaria.utils.aria import ( + extract_images_with_timestamps_from_vrs, + get_imu_data_from_vrs_file, + initialize_reconstruction_from_vrs_file, +) + + +def run_estimate_to_timed_recon( + vrs: Path, + images_path: Path, + estimate: Path, +) -> TimedReconstruction: + """Function to convert a general input + estimate file to a TimedReconstruction. + """ + traj = Trajectory.load_from_file(estimate) + init_recon = initialize_reconstruction_from_vrs_file(vrs) + timestamps_to_images = extract_images_with_timestamps_from_vrs( + vrs, images_path + ) + timed_recon = convert_estimate_into_timed_reconstruction( + init_recon, traj, timestamps_to_images + ) + return timed_recon + + +def run_keyframe_selection( + options: KeyframeSelectorOptions, + input_recon: TimedReconstruction, + images_path: Path, + keyframes_path: Path, +) -> TimedReconstruction: + kf_vi_recon = KeyframeSelector.run( + options, + input_recon, + images_path, + keyframes_path, + ) + return kf_vi_recon + + +def run_triangulation( + options: TriangulatorOptions, + reference_model_path: Path, + keyframes_path: Path, + triangulation_path: Path, +) -> pycolmap.Reconstruction: + triangulated_model_path = triangulate( + options, + reference_model_path, + keyframes_path, + triangulation_path, + ) + return pycolmap.Reconstruction(triangulated_model_path) + + +def run_optimization( + vi_options: VIOptimizerOptions, + triangulator_options: TriangulatorOptions, + recon: VIReconstruction, + database_path: Path, +) -> VIReconstruction: + session = SingleSeqSession( + vi_options.imu, + recon, + ) + + optimized_recon = VIOptimizer.optimize( + vi_options, triangulator_options, session, database_path + ) + + optim_vi_recon = VIReconstruction( + reconstruction=optimized_recon, + timestamps=recon.timestamps, + imu_measurements=recon.imu_measurements, + ) + return optim_vi_recon + + +def run_pipeline( + options: PipelineOptions, + vrs: Path, + output_path: Path, + estimate: Path, +): + if not output_path.exists(): + output_path.mkdir(parents=True, exist_ok=True) + recon = None + + # Estimate to Lamaria Reconstruction + image_path = output_path / "images" + init_recon_path = output_path / "initial_recon" + if init_recon_path.exists(): + recon = TimedReconstruction.read(init_recon_path) + else: + recon = run_estimate_to_timed_recon( + vrs, + output_path / "images", + estimate, + ) + + # Keyframe Selection + keyframe_path = output_path / "keyframes" + keyframed_recon_path = output_path / "keyframed_recon" + if keyframed_recon_path.exists(): + recon = TimedReconstruction.read(keyframed_recon_path) + else: + recon = run_keyframe_selection( + options.keyframing_options, + recon, + image_path, + keyframe_path, + ) + recon.write(output_path / "keyframed_recon") + + # Triangulation + triangulation_path = output_path / "triangulated" + tri_model_path = triangulation_path / "model" + database_path = tri_model_path / "database.db" + if tri_model_path.exists(): + recon = TimedReconstruction.read(tri_model_path) + else: + pycolmap_recon = run_triangulation( + options.triangulator_options, + keyframed_recon_path, + keyframe_path, + triangulation_path, + ) + recon = TimedReconstruction( + reconstruction=pycolmap_recon, timestamps=recon.timestamps + ) + recon.write(tri_model_path) + + # Visual-Inertial Optimization + optim_model_path = output_path / "optim_recon" + imu_measurements = None + if optim_model_path.exists(): + imu_measurements = VIReconstruction.read( + optim_model_path + ).imu_measurements + shutil.rmtree(optim_model_path) + optim_model_path.mkdir(parents=True, exist_ok=True) + # Load IMU data + if imu_measurements is None: + imu_measurements = get_imu_data_from_vrs_file( + vrs, + ) + recon = VIReconstruction( + reconstruction=recon.reconstruction, + timestamps=recon.timestamps, + imu_measurements=imu_measurements, + ) + recon_optimized = run_optimization( + options.vi_optimizer_options, + options.triangulator_options, + recon, + database_path, + ) + recon_optimized.write(output_path / "optim_recon") + return recon_optimized + + +if __name__ == "__main__": + parser = argparse.ArgumentParser(description="Run full pipeline.") + parser.add_argument( + "--config", + type=str, + default="./defaults.yaml", + help="Path to the configuration YAML file.", + ) + parser.add_argument( + "--vrs", + type=str, + required=True, + help="Path to the input VRS file.", + ) + parser.add_argument( + "--output", + type=str, + required=True, + help="Path to the output directory.", + ) + parser.add_argument( + "--estimate", + type=str, + required=True, + help="Path to the input estimate file.", + ) + args = parser.parse_args() + + options = PipelineOptions() + options.load(args.config) + run_pipeline( + options, + Path(args.vrs), + Path(args.output), + Path(args.estimate), + ) diff --git a/lamaria/__init__.py b/lamaria/__init__.py new file mode 100644 index 0000000..9a89983 --- /dev/null +++ b/lamaria/__init__.py @@ -0,0 +1,13 @@ +import logging + +formatter = logging.Formatter( + fmt="[%(asctime)s %(name)s %(levelname)s] %(message)s", + datefmt="%Y/%m/%d %H:%M:%S", +) +handler = logging.StreamHandler() +handler.setFormatter(formatter) + +logger = logging.getLogger(__name__) +logger.setLevel(logging.INFO) +logger.addHandler(handler) +logger.propagate = False diff --git a/lamaria/config/options.py b/lamaria/config/options.py new file mode 100644 index 0000000..3015b67 --- /dev/null +++ b/lamaria/config/options.py @@ -0,0 +1,116 @@ +from __future__ import annotations + +from dataclasses import dataclass, field, replace + +from omegaconf import OmegaConf, open_dict + + +def _structured_merge_to_obj(cls, section) -> object: + """ + Merge a YAML section onto a structured + config made from the dataclass `cls`, + then return a dataclass instance. + """ + base = OmegaConf.structured(cls) + merged = OmegaConf.merge(base, section or {}) + return OmegaConf.to_object(merged) + + +# Keyframing options +@dataclass(slots=True) +class KeyframeSelectorOptions: + max_rotation: float = 20.0 # degrees + max_distance: float = 1.0 # meters + max_elapsed: int = int(1e9) # 1 second in ns + + @classmethod + def load(cls, cfg: OmegaConf | None = None) -> KeyframeSelectorOptions: + if cfg is None: + return cls() + + cfg = OmegaConf.create(cfg) + with open_dict(cfg): + if "max_elapsed" in cfg and isinstance(cfg.max_elapsed, float): + cfg.max_elapsed = int(cfg.max_elapsed) + + obj: KeyframeSelectorOptions = _structured_merge_to_obj(cls, cfg) + return obj + + +# Triangulation options +@dataclass(slots=True) +class TriangulatorOptions: + feature_conf: str = "aliked-n16" + matcher_conf: str = "aliked+lightglue" + retrieval_conf: str = "netvlad" + num_retrieval_matches: int = 5 + + # colmap defaults + merge_max_reproj_error: float = 4.0 + complete_max_reproj_error: float = 4.0 + min_angle: float = 1.5 + + filter_max_reproj_error: float = 4.0 + filter_min_tri_angle: float = 1.5 + + @classmethod + def load(cls, cfg: OmegaConf | None = None) -> TriangulatorOptions: + if cfg is None: + return cls() + + return _structured_merge_to_obj(cls, cfg) + + +# Optimization options +@dataclass(slots=True) +class OptCamOptions: + feature_std: float = 1.0 # in pixels + optimize_focal_length: bool = False + optimize_principal_point: bool = False + optimize_extra_params: bool = False + optimize_cam_from_rig: bool = False + + +@dataclass(slots=True) +class OptIMUOptions: + gyro_infl: float = 1.0 + acc_infl: float = 1.0 + integration_noise_density: float = 0.05 + + optimize_scale: bool = False + optimize_gravity: bool = False + optimize_imu_from_rig: bool = False + optimize_bias: bool = False + + +@dataclass(slots=True) +class OptOptions: + use_callback: bool = True + max_num_iterations: int = 10 + normalize_reconstruction: bool = False + minimizer_progress_to_stdout: bool = True + update_state_every_iteration: bool = True + + +@dataclass(slots=True) +class VIOptimizerOptions: + cam: OptCamOptions = field(default_factory=OptCamOptions) + imu: OptIMUOptions = field(default_factory=OptIMUOptions) + optim: OptOptions = field(default_factory=OptOptions) + + @classmethod + def load(cls, cfg: OmegaConf | None = None) -> VIOptimizerOptions: + if cfg is None: + return cls() + + base = cls() + cam = _structured_merge_to_obj(OptCamOptions, cfg.cam) + imu = _structured_merge_to_obj(OptIMUOptions, cfg.imu) + optim = _structured_merge_to_obj(OptOptions, cfg.general) + + return replace( + base, + cam=cam, + imu=imu, + optim=optim, + ) diff --git a/lamaria/config/pipeline.py b/lamaria/config/pipeline.py new file mode 100644 index 0000000..e979d3e --- /dev/null +++ b/lamaria/config/pipeline.py @@ -0,0 +1,60 @@ +from collections.abc import Sequence +from dataclasses import dataclass, field +from pathlib import Path + +from omegaconf import DictConfig, OmegaConf + +from .options import ( + KeyframeSelectorOptions, + TriangulatorOptions, + VIOptimizerOptions, +) + + +@dataclass +class PipelineOptions: + _keyframing_options: KeyframeSelectorOptions = field( + default_factory=KeyframeSelectorOptions + ) + _triangulator_options: TriangulatorOptions = field( + default_factory=TriangulatorOptions + ) + _vi_optimizer_options: VIOptimizerOptions = field( + default_factory=VIOptimizerOptions + ) + + def load( + self, + yaml: Path | str, + cli_overrides: Sequence[str] | None = None, + ) -> None: + """Load configuration from a YAML file and apply any overrides.""" + cfg = OmegaConf.load(str(yaml)) + if cli_overrides: + cfg = OmegaConf.merge( + cfg, OmegaConf.from_dotlist(list(cli_overrides)) + ) + OmegaConf.resolve(cfg) + + self._update_from_cfg(cfg) + + def _update_from_cfg(self, cfg: DictConfig) -> None: + """Update object attributes from a config.""" + self._keyframing_options = KeyframeSelectorOptions.load(cfg.keyframing) + self._triangulator_options = TriangulatorOptions.load(cfg.triangulation) + self._vi_optimizer_options = VIOptimizerOptions.load(cfg.optimization) + + # Properties for keyframing + @property + def keyframing_options(self) -> KeyframeSelectorOptions: + return self._keyframing_options + + # Properties for triangulation + @property + def triangulator_options(self) -> TriangulatorOptions: + return self._triangulator_options + + # Properties for visual-inertial optimization + @property + def vi_optimizer_options(self) -> VIOptimizerOptions: + return self._vi_optimizer_options diff --git a/lamaria/eval/evo_evaluation.py b/lamaria/eval/evo_evaluation.py new file mode 100644 index 0000000..124da18 --- /dev/null +++ b/lamaria/eval/evo_evaluation.py @@ -0,0 +1,113 @@ +import numpy as np +from evo.core import metrics, sync +from evo.core.trajectory import PoseTrajectory3D + +from .. import logger +from ..structs.trajectory import Trajectory + + +def valid_estimate( + est_pose_traj: PoseTrajectory3D, + gt_pose_traj: PoseTrajectory3D, + min_duration_ratio: float = 0.5, +) -> bool: + """Check if the estimated trajectory is at least + half as long as the ground-truth trajectory in terms of + timestamps overlap. + Args: + est_pose_traj (PoseTrajectory3D): Estimated trajectory. + gt_pose_traj (PoseTrajectory3D): Ground-truth trajectory. + Returns: + bool: True if valid, False otherwise. + """ + + est_timestamps = est_pose_traj.timestamps + gt_timestamps = gt_pose_traj.timestamps + + gt_length = (gt_timestamps[-1] - gt_timestamps[0]) / 1e9 + est_length = (est_timestamps[-1] - est_timestamps[0]) / 1e9 + + return not est_length < min_duration_ratio * gt_length + + +def convert_trajectory_to_evo_posetrajectory( + traj: Trajectory, +) -> PoseTrajectory3D: + """Convert a Trajectory to an evo PoseTrajectory3D. + Args: + traj (Trajectory): The trajectory to convert. + Returns: + PoseTrajectory3D: The converted evo trajectory. + """ + assert traj.is_loaded(), "Trajectory is not loaded" + if not traj.invert_poses: + poses = traj.poses + else: + # stored in COLMAP format (i.e., rig_from_world) + poses = [pose.inverse() for pose in traj.poses] + + timestamps = traj.timestamps + assert len(poses) == len(timestamps), "Poses and timestamps length mismatch" + + tvecs = np.array([pose.translation for pose in poses]) + quats = np.array([pose.rotation.quat for pose in poses]) + quats_wxyz = quats[:, [3, 0, 1, 2]] + + evo_traj = PoseTrajectory3D( + positions_xyz=tvecs, + orientations_quat_wxyz=quats_wxyz, + timestamps=np.array(timestamps, dtype=np.float64), + ) + return evo_traj + + +def evaluate_wrt_mps( + est_traj: Trajectory, + gt_traj: Trajectory, +) -> float | None: + """ + Evaluate an estimated trajectory with respect to + MPS pGT. Alignment performed using Umeyama via evo package. + Important: + This function assumes that invert_poses=False for both trajectories. + Args: + est_traj (Trajectory): Estimated trajectory. + gt_traj (Trajectory): Pseudo ground-truth trajectory. + sim3d (pycolmap.Sim3d): Similarity transformation from the + estimated trajectory to the control points obtained from + sparse evaluation. + output_path (Path): Path to save the evaluation results. + + Estimate file format (space-separated columns): + ``` + timestamp(ns) tx ty tz qx qy qz qw + ``` + """ + + est_pose_traj = convert_trajectory_to_evo_posetrajectory(est_traj) + gt_pose_traj = convert_trajectory_to_evo_posetrajectory(gt_traj) + + if not valid_estimate(est_pose_traj, gt_pose_traj): + logger.error("Estimated trajectory is too short compared to pGT.") + return None + + gt_pose_traj_sync, est_pose_traj_sync = sync.associate_trajectories( + gt_pose_traj, + est_pose_traj, + max_diff=1e6, # 1 ms in ns + ) + + try: + est_pose_traj_sync.align(gt_pose_traj_sync, correct_scale=True) + except Exception as e: + logger.error(f"Alignment failed: {e}") + return None + + pose_relation = metrics.PoseRelation.translation_part + trajectories = (gt_pose_traj_sync, est_pose_traj_sync) + ate_metric = metrics.APE(pose_relation) + ate_metric.process_data(trajectories) + + ate_stats = ate_metric.get_all_statistics() + + return ate_stats["rmse"] diff --git a/lamaria/eval/pgt_evaluation.py b/lamaria/eval/pgt_evaluation.py new file mode 100644 index 0000000..d997e61 --- /dev/null +++ b/lamaria/eval/pgt_evaluation.py @@ -0,0 +1,37 @@ +import numpy as np +import pycolmap + +from .. import logger +from ..structs.trajectory import Trajectory, associate_trajectories + + +def evaluate_wrt_pgt( + est_traj: Trajectory, + gt_traj: Trajectory, + sim3d: pycolmap.Sim3d, +) -> None | np.ndarray: + """ + Evaluate an estimated trajectory with respect to a pseudo + ground-truth trajectory that observes control points. + Important: + This function assumes that invert_poses=False for both trajectories. + Args: + est_traj (Trajectory): Estimated trajectory. + gt_traj (Trajectory): Pseudo ground-truth trajectory. + sim3d (pycolmap.Sim3d): Similarity transformation from the + estimated trajectory to the control points obtained from + sparse evaluation. + output_path (Path): Path to save the evaluation results. + """ + + est_traj.transform(sim3d) + est_traj, gt_traj = associate_trajectories(est_traj, gt_traj) + if est_traj is None or gt_traj is None: + logger.error("Trajectory association failed.") + return None + + E = est_traj.positions - gt_traj.positions + error = np.array([np.linalg.norm(e[:2]) for e in E]) + + logger.info("pGT evaluation completed successfully!") + return error diff --git a/lamaria/eval/sparse_evaluation.py b/lamaria/eval/sparse_evaluation.py new file mode 100644 index 0000000..d74089d --- /dev/null +++ b/lamaria/eval/sparse_evaluation.py @@ -0,0 +1,90 @@ +import pyceres +import pycolmap + +from .. import logger +from ..structs.control_point import ( + ControlPoints, + get_cps_for_initial_alignment, +) +from ..structs.sparse_eval import ( + SparseEvalResult, + SparseEvalVariables, + get_problem_for_sparse_alignment, +) + + +def estimate_initial_alignment_from_control_points( + control_points: ControlPoints, +) -> pycolmap.Sim3d | None: + """Estimate a robust Sim3d from control points. + + Args: + control_points (ControlPoints): Control points dictionary. + Returns: + sim3d (pycolmap.Sim3d | None): The estimated Sim3d or None if + estimation failed. + """ + triangulated_cp_alignment, topo_cp_alignment = ( + get_cps_for_initial_alignment(control_points) + ) + + if triangulated_cp_alignment is None or topo_cp_alignment is None: + logger.error("Not enough control points for initial alignment") + return None + + ret = pycolmap.estimate_sim3d_robust( + triangulated_cp_alignment, topo_cp_alignment, {"max_error": 5} + ) + + if ret is None: + return None + + robust_sim3d = ret["tgt_from_src"] + + return robust_sim3d + + +def evaluate_wrt_control_points( + reconstruction: pycolmap.Reconstruction, + control_points: ControlPoints, +) -> SparseEvalResult | None: + """ + Evaluate the trajectory with respect to control points. + + Args: + reconstruction (pycolmap.Reconstruction): Reconstruction object + which contains the estimated poses. + control_points (ControlPoints): Control points dictionary. + output_path (Path): Directory where results will be saved. + + Returns: + sparse_npy_path (Path): Path to the saved SparseEvalResult .npy file. + """ + + robust_sim3d = estimate_initial_alignment_from_control_points( + control_points + ) + + if robust_sim3d is None: + logger.error("Robust Sim3d estimation failed") + return None + + variables = SparseEvalVariables.create_from_inputs( + control_points, + robust_sim3d, + ) + + problem, solver_options, summary = get_problem_for_sparse_alignment( + reconstruction, variables + ) + pyceres.solve(solver_options, problem, summary) + print(summary.BriefReport()) + + variables.update_sim3d_scale() + + result = SparseEvalResult.from_variables( + variables, + ) + + logger.info("Sparse evaluation completed successfully!") + return result diff --git a/lamaria/pipeline/estimate_to_timed_reconstruction.py b/lamaria/pipeline/estimate_to_timed_reconstruction.py new file mode 100644 index 0000000..449ea36 --- /dev/null +++ b/lamaria/pipeline/estimate_to_timed_reconstruction.py @@ -0,0 +1,113 @@ +from bisect import bisect_left +from copy import deepcopy +from pathlib import Path + +import pycolmap + +from ..structs.timed_reconstruction import TimedReconstruction +from ..structs.trajectory import ( + Trajectory, +) +from ..utils.types import InitReconstruction + + +def _image_names_from_folder( + folder: Path, wrt_to: Path, ext: str = ".jpg" +) -> list[Path]: + if not folder.is_dir(): + return [] + images = sorted(n for n in folder.iterdir() if n.suffix == ext) + images = [n.relative_to(wrt_to) for n in images] + return images + + +def _match_estimate_ts_to_images( + timestamps_to_images: dict[int, tuple[Path, Path]], + est_timestamps: list[int], + max_diff: int = 1000000, # 1 ms +) -> tuple[list[tuple[Path, Path]], list[int]]: + left_ts = list(timestamps_to_images.keys()) + images = list(timestamps_to_images.values()) + assert len(images) == len(left_ts), ( + "Number of images and left timestamps must be equal" + ) + + order = sorted(range(len(left_ts)), key=lambda i: left_ts[i]) + left_ts = [left_ts[i] for i in order] + images = [images[i] for i in order] + + matched_images: list[tuple[Path, Path]] = [] + matched_timestamps: list[int] = [] + + # estimate timestamps will be in nanoseconds like vrs timestamps + for est in est_timestamps: + idx = bisect_left(left_ts, est) + + cand_idxs = [] + if idx > 0: + cand_idxs.append(idx - 1) + if idx < len(left_ts): + cand_idxs.append(idx) + + if not cand_idxs: + continue + + best = min(cand_idxs, key=lambda j: abs(left_ts[j] - est)) + if (max_diff is not None) and (abs(left_ts[best] - est) > max_diff): + continue + + matched_images.append(images[best]) + matched_timestamps.append(left_ts[best]) + return dict(zip(matched_timestamps, matched_images)) + + +def convert_estimate_into_timed_reconstruction( + init_reconstruction: InitReconstruction, + estimate: Trajectory, + timestamps_to_images: dict[int, tuple[Path, Path]], +) -> TimedReconstruction: + """ + Populate a TimedReconstruction from a trajectory + """ + est_timestamps_to_images = _match_estimate_ts_to_images( + timestamps_to_images, estimate.timestamps + ) + + assert estimate.corresponding_sensor == "imu" + timestamps = list(est_timestamps_to_images.keys()) + assert len(estimate.poses) == len(timestamps), ( + "The length of traj.poses and timestamps should be equal" + ) + + recon = deepcopy(init_reconstruction) + image_id = 1 + frame_id_to_timestamp = dict() + for frame_id, (pose, timestamp) in enumerate( + zip(estimate.poses, timestamps) + ): + frame = pycolmap.Frame() + frame.rig_id = 1 + frame.frame_id = frame_id + frame.rig_from_world = pose # as it corresponds to imu + + image_names = timestamps_to_images[timestamp] + images_to_add = [] + for cam_id, img_path in [(2, image_names[0]), (3, image_names[1])]: + im = pycolmap.Image( + str(img_path), + pycolmap.Point2DList(), + cam_id, + image_id, + ) + im.frame_id = frame.frame_id + frame.add_data_id(im.data_id) + images_to_add.append(im) + image_id += 1 + recon.add_frame(frame) + for im in images_to_add: + recon.add_image(im) + frame_id_to_timestamp[frame_id] = timestamp + + return TimedReconstruction( + reconstruction=recon, timestamps=frame_id_to_timestamp + ) diff --git a/lamaria/pipeline/keyframe_selection.py b/lamaria/pipeline/keyframe_selection.py new file mode 100644 index 0000000..2cced7c --- /dev/null +++ b/lamaria/pipeline/keyframe_selection.py @@ -0,0 +1,266 @@ +from __future__ import annotations + +import shutil +from copy import deepcopy +from pathlib import Path + +import numpy as np +import pycolmap + +from ..config.options import KeyframeSelectorOptions +from ..structs.timed_reconstruction import TimedReconstruction +from ..utils.aria import get_magnitude_from_transform + + +class KeyframeSelector: + """Class to perform keyframe selection on a TimedReconstruction object.""" + + def __init__( + self, + options: KeyframeSelectorOptions, + data: TimedReconstruction, + ): + self.options = options + self.init_data = data + self.init_recons = data.reconstruction # pycolmap.Reconstruction + self.timestamps = data.timestamps # frame id to timestamp mapping + + self.keyframed_data = TimedReconstruction() + self.keyframe_frame_ids: dict[int, int] = {} + + @staticmethod + def run( + options: KeyframeSelectorOptions, + data: TimedReconstruction, + images_path: Path, + keyframes_path: Path, + ) -> TimedReconstruction: + """Entry point to run keyframing and + copy keyframes into keyframe directory. + """ + + selector = KeyframeSelector(options, data) + kf_recon = selector.run_keyframing() + selector.copy_images_to_keyframes_dir(images_path, keyframes_path) + + return kf_recon + + def _select_keyframes(self): + self.keyframe_frame_ids: dict[int, int] = {} + dr_dt = np.array([0.0, 0.0]) + dts = 0.0 + + init_frame_ids = sorted(self.init_recons.frames.keys()) + new_frame_id = 1 + + for i, (prev, curr) in enumerate( + zip(init_frame_ids[:-1], init_frame_ids[1:]) + ): + if i == 0: + self.keyframe_frame_ids[new_frame_id] = prev + new_frame_id += 1 + continue + + current_rig_from_world = self.init_recons.frames[ + curr + ].rig_from_world + previous_rig_from_world = self.init_recons.frames[ + prev + ].rig_from_world + current_rig_from_previous_rig = ( + current_rig_from_world * previous_rig_from_world.inverse() + ) + + dr_dt += np.array( + get_magnitude_from_transform(current_rig_from_previous_rig) + ) + dts += self.timestamps[curr] - self.timestamps[prev] + + if ( + dr_dt[0] > self.options.max_rotation + or dr_dt[1] > self.options.max_distance + or dts > self.options.max_elapsed + ): + self.keyframe_frame_ids[new_frame_id] = curr + new_frame_id += 1 + dr_dt = np.array([0.0, 0.0]) + dts = 0.0 + + def _build_device_keyframed_reconstruction(self): + old_rig = self.init_recons.rigs[1] + new_rig = pycolmap.Rig(rig_id=1) + camera_id = 1 + cam_map = {} # new to old + + ref_sensor_id = old_rig.ref_sensor_id.id + cam_map[ref_sensor_id] = camera_id + new_ref_sensor = self._clone_camera(ref_sensor_id, camera_id) + camera_id += 1 + self.keyframed_data.reconstruction.add_camera(new_ref_sensor) + new_rig.add_ref_sensor(new_ref_sensor.sensor_id) + + for sensor, sensor_from_rig in old_rig.non_ref_sensors.items(): + new_sensor = self._clone_camera(sensor.id, camera_id) + cam_map[sensor.id] = camera_id + camera_id += 1 + new_sensor_from_rig = deepcopy(sensor_from_rig) + self.keyframed_data.reconstruction.add_camera(new_sensor) + new_rig.add_sensor(new_sensor.sensor_id, new_sensor_from_rig) + + self.keyframed_data.reconstruction.add_rig(new_rig) + + image_id = 1 + for new_fid, frame_id in self.keyframe_frame_ids.items(): + old_frame = self.init_recons.frames[frame_id] + + new_frame = pycolmap.Frame() + new_frame.rig_id = 1 + new_frame.frame_id = new_fid + new_frame.rig_from_world = deepcopy(old_frame.rig_from_world) + + old_image_ids = sorted([d.id for d in old_frame.data_ids]) + + images_to_add = [] + for old_image_id in old_image_ids: + old_image = self.init_recons.images[old_image_id] + new_cam_id = cam_map[old_image.camera_id] + + new_image = pycolmap.Image( + old_image.name, + pycolmap.Point2DList(), + new_cam_id, + image_id, + ) + + new_image.frame_id = new_frame.frame_id + new_frame.add_data_id(new_image.data_id) + images_to_add.append(new_image) + image_id += 1 + + self.keyframed_data.reconstruction.add_frame(new_frame) + for img in images_to_add: + self.keyframed_data.reconstruction.add_image(img) + + def _clone_camera( + self, old_camera_id: int, new_camera_id: int + ) -> pycolmap.Camera: + old_camera = self.init_recons.cameras[old_camera_id] + new_camera = pycolmap.Camera( + model=old_camera.model, + width=old_camera.width, + height=old_camera.height, + params=old_camera.params, + camera_id=new_camera_id, + ) + return new_camera + + def _build_online_keyframed_reconstruction(self): + camera_id = 1 + image_id = 1 + + for new_fid, frame_id in self.keyframe_frame_ids.items(): + old_frame = self.init_recons.frames[frame_id] + old_rig_id = old_frame.rig_id + old_rig = self.init_recons.rigs[old_rig_id] + + cam_id_map = {} + + # IMU cosplaying as a dummy camera + old_ref_sensor_id = old_rig.ref_sensor_id.id + new_ref_sensor = self._clone_camera(old_ref_sensor_id, camera_id) + self.keyframed_data.reconstruction.add_camera(new_ref_sensor) + cam_id_map[old_ref_sensor_id] = camera_id + camera_id += 1 + + new_rig = pycolmap.Rig(rig_id=new_fid) + new_rig.add_ref_sensor(new_ref_sensor.sensor_id) + + for old_sensor, sensor_from_rig in old_rig.non_ref_sensors.items(): + old_sensor_id = old_sensor.id + + if old_sensor_id not in cam_id_map: + new_sensor = self._clone_camera(old_sensor_id, camera_id) + self.keyframed_data.reconstruction.add_camera(new_sensor) + cam_id_map[old_sensor_id] = camera_id + camera_id += 1 + + new_sensor_from_rig = deepcopy(sensor_from_rig) + new_rig.add_sensor(new_sensor.sensor_id, new_sensor_from_rig) + + self.keyframed_data.reconstruction.add_rig(new_rig) + + new_frame = pycolmap.Frame() + new_frame.rig_id = new_rig.rig_id + new_frame.frame_id = new_fid + new_frame.rig_from_world = deepcopy(old_frame.rig_from_world) + + old_image_ids = sorted([d.id for d in old_frame.data_ids]) + images_to_add = [] + for old_image_id in old_image_ids: + old_image = self.init_recons.images[old_image_id] + mapped_cam_id = cam_id_map[old_image.camera_id] + + new_image = pycolmap.Image( + old_image.name, + pycolmap.Point2DList(), + mapped_cam_id, + image_id, + ) + new_image.frame_id = new_frame.frame_id + new_frame.add_data_id(new_image.data_id) + images_to_add.append(new_image) + image_id += 1 + + self.keyframed_data.reconstruction.add_frame(new_frame) + for img in images_to_add: + self.keyframed_data.reconstruction.add_image(img) + + def run_keyframing(self) -> TimedReconstruction: + """Function to run keyframing on lamaria reconstruction.""" + self._select_keyframes() + if len(self.init_recons.rigs.keys()) == 1: # device rig has been added + self._build_device_keyframed_reconstruction() + else: + self._build_online_keyframed_reconstruction() + + # set timestamps keys with new frame ids + self.keyframed_data.timestamps = { + new_fid: self.timestamps[old_fid] + for new_fid, old_fid in self.keyframe_frame_ids.items() + } + return self.keyframed_data + + def copy_images_to_keyframes_dir( + self, + images_path: Path, + output: Path, + ) -> Path: + """Copy images corresponding to + keyframes to a separate directory. + Images are expected to be in `images/left` + and `images/right` subdirectories. + Check: `extract_images_from_vrs` in + `lamaria/utils/aria.py` for more details. + """ + if self.keyframe_frame_ids is None: + raise ValueError( + "Keyframes not selected yet. Run `run_keyframing` first." + ) + + if output.exists() and any(output.iterdir()): + shutil.rmtree(output) + + output.mkdir(parents=True, exist_ok=True) + + for _, frame_id in self.keyframe_frame_ids.items(): + frame = self.init_recons.frames[frame_id] + for data_id in frame.data_ids: + image = self.init_recons.images[data_id.id] + + subdir = "left" if image.name.startswith("1201-1") else "right" + src_path = images_path / subdir / image.name + dst_path = output / image.name + + shutil.copy2(src_path, dst_path) + + return output diff --git a/lamaria/pipeline/optim/callback.py b/lamaria/pipeline/optim/callback.py new file mode 100644 index 0000000..e888560 --- /dev/null +++ b/lamaria/pipeline/optim/callback.py @@ -0,0 +1,49 @@ +from copy import deepcopy + +import numpy as np +import pyceres +import pycolmap + +from ... import logger + + +class RefinementCallback(pyceres.IterationCallback): + def __init__( + self, + poses: list[pycolmap.Rigid3d], + min_pose_change: tuple[float, float] = (0.001, 0.000001), + min_iterations: int = 2, + ): + pyceres.IterationCallback.__init__(self) + self.poses = poses + self.poses_previous = deepcopy(self.poses) + self.min_pose_change = min_pose_change + self.min_iterations = min_iterations + self.pose_changes = [] + + def __call__(self, summary: pyceres.IterationSummary): + if not summary.step_is_successful: + return pyceres.CallbackReturnType.SOLVER_CONTINUE + diff = [] + for pose_prev, pose in zip(self.poses_previous, self.poses): + pose_rel = pose_prev * pose.inverse() + q_rel, t_rel = pose_rel.rotation.quat, pose_rel.translation + dr = np.rad2deg( + np.abs(2 * np.arctan2(np.linalg.norm(q_rel[:-1]), q_rel[-1])) + ) + dt = np.linalg.norm(t_rel) + diff.append((dr, dt)) + diff = np.array(diff) + self.poses_previous = deepcopy(self.poses) + med, q99, max_ = np.quantile(diff, [0.5, 0.99, 1.0], axis=0) + logger.info( + f"{summary.iteration:d} Pose update: " + f"med/q99/max dR={med[0]:.3f}/{q99[0]:.3f}/{max_[0]:.3f} deg, " + f"dt={med[1] * 1e2:.3f}/{q99[1] * 1e2:.3f}/{max_[1] * 1e2:.3f} cm" + ) + self.pose_changes.append((med, q99, max_)) + if summary.iteration >= self.min_iterations and np.all( + q99 <= self.min_pose_change + ): + return pyceres.CallbackReturnType.SOLVER_TERMINATE_SUCCESSFULLY + return pyceres.CallbackReturnType.SOLVER_CONTINUE diff --git a/lamaria/pipeline/optim/imu.py b/lamaria/pipeline/optim/imu.py new file mode 100644 index 0000000..d87c04f --- /dev/null +++ b/lamaria/pipeline/optim/imu.py @@ -0,0 +1,109 @@ +import numpy as np +import pycolmap +from tqdm import tqdm + +from ...config.options import OptIMUOptions +from ...structs.vi_reconstruction import VIReconstruction + + +def preintegrate_imu_measurements( + options: OptIMUOptions, + data: VIReconstruction, +) -> dict[int, pycolmap.PreintegratedImuMeasurement]: + preintegrated_measurements = {} + + imu_measurements = data.imu_measurements + + colmap_imu_opts = pycolmap.ImuPreintegrationOptions() + colmap_imu_opts.integration_noise_density = ( + options.integration_noise_density + ) + imu_calib = load_imu_calibration(options.gyro_infl, options.acc_infl) + + frame_ids = sorted(data.reconstruction.frames.keys()) + timestamps = [data.timestamps[fid] for fid in frame_ids] + + assert len(frame_ids) >= 2, ( + "Need at least two frames to compute preinteg measurements" + ) + ts_sec = np.asarray(timestamps, dtype=np.float64) / 1e9 + + for k in tqdm( + range(len(frame_ids) - 1), # skip the last frame + desc="Loading preintegrated measurements", + ): + i = frame_ids[k] + t1, t2 = ts_sec[k], ts_sec[k + 1] + ms = imu_measurements.get_measurements_contain_edge(t1, t2) + if len(ms) == 0: + continue + integrated_m = pycolmap.PreintegratedImuMeasurement( + colmap_imu_opts, + imu_calib, + t1, + t2, + ) + integrated_m.add_measurements(ms) + preintegrated_measurements[i] = integrated_m + + return preintegrated_measurements + + +def initialize_imu_states( + data: VIReconstruction, +) -> dict[int, pycolmap.ImuState]: + imu_states = {} + + frame_ids = sorted(data.reconstruction.frames.keys()) + timestamps = [data.timestamps[fid] for fid in frame_ids] + + assert len(frame_ids) >= 2, "Need at least two frames to compute imu states" + + ts_sec = np.asarray(timestamps, dtype=np.float64) / 1e9 + + for k in tqdm(range(len(frame_ids) - 1), desc="Loading IMU states"): + i = frame_ids[k] + j = frame_ids[k + 1] + + t1, t2 = ts_sec[k], ts_sec[k + 1] + dt = t2 - t1 + if dt == 0: + raise ValueError(f"Zero dt between frames {i} and {j}") + + pi = data.reconstruction.frames[i].rig_from_world.inverse().translation + pj = data.reconstruction.frames[j].rig_from_world.inverse().translation + + vel = (pj - pi) / dt + imu_states[i] = pycolmap.ImuState() + imu_states[i].set_velocity(vel) + + imu_states[frame_ids[-1]] = pycolmap.ImuState() + vel = imu_states[frame_ids[-2]].velocity + imu_states[frame_ids[-1]].set_velocity(vel) + + return imu_states + + +def load_imu_calibration( + gyro_infl: float = 1.0, acc_infl: float = 1.0 +) -> pycolmap.ImuCalibration: + """Load Aria IMU calibration parameters. + Here: https://facebookresearch.github.io/projectaria_tools/docs/tech_insights""" + + imu_calib = pycolmap.ImuCalibration() + imu_calib.gravity_magnitude = 9.80600 + imu_calib.acc_noise_density = ( + 0.8e-4 * imu_calib.gravity_magnitude * acc_infl + ) + imu_calib.gyro_noise_density = 1e-2 * (np.pi / 180.0) * gyro_infl + imu_calib.acc_bias_random_walk_sigma = ( + 3.5e-5 * imu_calib.gravity_magnitude * np.sqrt(353) * acc_infl + ) # accel_right BW = 353 + imu_calib.gyro_bias_random_walk_sigma = ( + 1.3e-3 * (np.pi / 180.0) * np.sqrt(116) * gyro_infl + ) # gyro_right BW = 116 + imu_calib.acc_saturation_max = 8.0 * imu_calib.gravity_magnitude + imu_calib.gyro_saturation_max = 1000.0 * (np.pi / 180.0) + imu_calib.imu_rate = 1000.0 + + return imu_calib diff --git a/lamaria/pipeline/optim/iterative_global_ba.py b/lamaria/pipeline/optim/iterative_global_ba.py new file mode 100644 index 0000000..d6523db --- /dev/null +++ b/lamaria/pipeline/optim/iterative_global_ba.py @@ -0,0 +1,242 @@ +import pyceres +import pycolmap +from pycolmap import logging + +from ... import logger +from ...config.options import VIOptimizerOptions +from .callback import RefinementCallback +from .residual import add_imu_residuals_to_problem +from .session import SingleSeqSession + + +class VIBundleAdjuster: + """Visual-Inertial Bundle Adjuster class that + adds visual and IMU residuals to the + optimization problem.""" + + def __init__(self, session: SingleSeqSession): + self.session = session + + @staticmethod + def run( + vi_options: VIOptimizerOptions, + ba_options: pycolmap.BundleAdjustmentOptions, + ba_config: pycolmap.BundleAdjustmentConfig, + mapper: pycolmap.IncrementalMapper, + session: SingleSeqSession, + ): + """Entry point for running VI bundle adjustment.""" + vi_ba = VIBundleAdjuster(session) + vi_ba.solve( + vi_options, + ba_options, + ba_config, + mapper, + ) + + def solve( + self, + vi_options: VIOptimizerOptions, + ba_options: pycolmap.BundleAdjustmentOptions, + ba_config: pycolmap.BundleAdjustmentConfig, + mapper: pycolmap.IncrementalMapper, + ): + """Solves the VI bundle adjustment problem.""" + bundle_adjuster = pycolmap.create_default_bundle_adjuster( + ba_options, + ba_config, + mapper.reconstruction, + ) + + problem = bundle_adjuster.problem + + solver_options = ba_options.create_solver_options( + ba_config, bundle_adjuster.problem + ) + pyceres_solver_options = pyceres.SolverOptions(solver_options) + + problem = add_imu_residuals_to_problem( + vi_options.imu, + self.session, + problem, + ) + + # Setup callback if needed + if vi_options.optim.use_callback: + callback = self._init_callback(mapper) + pyceres_solver_options.callbacks = [callback] + + pyceres_solver_options.minimizer_progress_to_stdout = ( + vi_options.optim.minimizer_progress_to_stdout + ) + pyceres_solver_options.update_state_every_iteration = ( + vi_options.optim.update_state_every_iteration + ) + pyceres_solver_options.max_num_iterations = ( + vi_options.optim.max_num_iterations + ) + + # Solve + summary = pyceres.SolverSummary() + pyceres.solve(pyceres_solver_options, bundle_adjuster.problem, summary) + print(summary.BriefReport()) + + def _init_callback(self, mapper: pycolmap.IncrementalMapper): + """Initialize the refinement callback to check pose changes.""" + frame_ids = sorted(mapper.reconstruction.frames.keys()) + poses = list( + mapper.reconstruction.frames[frame_id].rig_from_world + for frame_id in frame_ids + ) + + callback = RefinementCallback(poses) + return callback + + +class GlobalBundleAdjustment: + """Global bundle adjustment with visual-inertial constraints.""" + + def __init__(self, session: SingleSeqSession): + self.session = session + + @staticmethod + def run( + vi_options: VIOptimizerOptions, + pipeline_options: pycolmap.IncrementalPipelineOptions, + mapper: pycolmap.IncrementalMapper, + session: SingleSeqSession, + ): + """Entry point for running global bundle adjustment.""" + gba = GlobalBundleAdjustment(session) + gba.adjust( + vi_options, + pipeline_options, + mapper, + ) + + def adjust( + self, + vi_options: VIOptimizerOptions, + pipeline_options: pycolmap.IncrementalPipelineOptions, + mapper: pycolmap.IncrementalMapper, + ): + assert mapper.reconstruction is not None + + reg_image_ids = mapper.reconstruction.reg_image_ids() + if len(reg_image_ids) < 2: + logger.warning( + "At least two images must be " + "registered for global bundle-adjustment" + ) + + ba_options = self._configure_ba_options( + pipeline_options, len(reg_image_ids) + ) + + # Avoid negative depths + mapper.observation_manager.filter_observations_with_negative_depth() + + # Setting up bundle adjustment configuration + ba_config = pycolmap.BundleAdjustmentConfig() + for frame_id in mapper.reconstruction.reg_frame_ids(): + frame = mapper.reconstruction.frame(frame_id) + for data_id in frame.data_ids: + if data_id.sensor_id.type != pycolmap.SensorType.CAMERA: + continue + ba_config.add_image(data_id.id) + + # Run bundle adjustment + VIBundleAdjuster.run( + vi_options, + ba_options, + ba_config, + mapper, + self.session, + ) + + def _configure_ba_options(self, pipeline_options, num_reg_images): + """Configure bundle adjustment options based on + number of registered images and pipeline options. + """ + ba_options = pipeline_options.get_global_bundle_adjustment() + ba_options.print_summary = True + ba_options.refine_rig_from_world = True # Refine rig poses + + # Use stricter convergence criteria for first registered images + if num_reg_images < 10: + ba_options.solver_options.function_tolerance /= 10 + ba_options.solver_options.gradient_tolerance /= 10 + ba_options.solver_options.parameter_tolerance /= 10 + ba_options.solver_options.max_num_iterations *= 2 + ba_options.solver_options.max_linear_solver_iterations = 200 + + return ba_options + + +class IterativeRefinement: + """Iterative global refinement + through repeated global bundle adjustments.""" + + def __init__(self, session: SingleSeqSession): + self.session = session + + @staticmethod + def run( + vi_options: VIOptimizerOptions, + pipeline_options: pycolmap.IncrementalPipelineOptions, + mapper: pycolmap.IncrementalMapper, + session: SingleSeqSession, + ): + """Entry point for running iterative refinement""" + iter_refiner = IterativeRefinement(session) + iter_refiner.refine( + vi_options, + pipeline_options, + mapper, + ) + + def refine( + self, + vi_options: VIOptimizerOptions, + pipeline_options: pycolmap.IncrementalPipelineOptions, + mapper: pycolmap.IncrementalMapper, + ): + # Configure triangulation options + tri_options = pipeline_options.get_triangulation() + mapper.complete_and_merge_tracks(tri_options) + num_retri_obs = mapper.retriangulate(tri_options) + logging.verbose( + 1, + f"=> Retriangulated observations: {num_retri_obs}", + ) + + # Configure mapper options + mapper_options = pipeline_options.get_mapper() + + # Iterative refinement + for _ in range(pipeline_options.ba_global_max_refinements): + num_observations = mapper.reconstruction.compute_num_observations() + + GlobalBundleAdjustment.run( + vi_options, + pipeline_options, + mapper, + self.session, + ) + + if vi_options.optim.normalize_reconstruction: + mapper.reconstruction.normalize() + + num_changed_observations = mapper.complete_and_merge_tracks( + tri_options + ) + num_changed_observations += mapper.filter_points(mapper_options) + changed = ( + num_changed_observations / num_observations + if num_observations > 0 + else 0 + ) + + logging.verbose(1, f"=> Changed observations: {changed:.6f}") + if changed < pipeline_options.ba_global_max_refinement_change: + break diff --git a/lamaria/pipeline/optim/residual.py b/lamaria/pipeline/optim/residual.py new file mode 100644 index 0000000..7b11c5e --- /dev/null +++ b/lamaria/pipeline/optim/residual.py @@ -0,0 +1,85 @@ +import numpy as np +import pyceres +import pycolmap +from tqdm import tqdm + +from ...config.options import OptIMUOptions +from .session import SingleSeqSession + + +def add_imu_residuals_to_problem( + imu_options: OptIMUOptions, + session: SingleSeqSession, + problem, +): + loss = pyceres.TrivialLoss() + + frame_ids = sorted(session.data.reconstruction.frames.keys()) + + for k in tqdm( + range(len(session.preintegrated_imu_measurements)), + desc="Adding IMU residuals", + ): + i = frame_ids[k] + j = frame_ids[k + 1] + frame_i = session.data.reconstruction.frames[i] + frame_j = session.data.reconstruction.frames[j] + i_from_world = frame_i.rig_from_world + j_from_world = frame_j.rig_from_world + + integrated_m = session.preintegrated_imu_measurements[i] + + problem.add_residual_block( + pycolmap.PreintegratedImuMeasurementCost(integrated_m), + loss, + [ + session.imu_from_rig.rotation.quat, + session.imu_from_rig.translation, + session.log_scale, + session.gravity, + i_from_world.rotation.quat, + i_from_world.translation, + session.imu_states[i].data, + j_from_world.rotation.quat, + j_from_world.translation, + session.imu_states[j].data, + ], + ) + + problem = setup_manifolds_and_constraints( + imu_options, + session, + problem, + ) + + return problem + + +def setup_manifolds_and_constraints( + imu_options: OptIMUOptions, + session: SingleSeqSession, + problem, +): + """Setup manifolds and parameter constraints""" + problem.set_manifold(session.gravity, pyceres.SphereManifold(3)) + problem.set_manifold( + session.imu_from_rig.rotation.quat, pyceres.EigenQuaternionManifold() + ) + + # Apply optimization constraints based on configuration + if not imu_options.optimize_scale: + problem.set_parameter_block_constant(session.log_scale) + if not imu_options.optimize_gravity: + problem.set_parameter_block_constant(session.gravity) + if not imu_options.optimize_imu_from_rig: + problem.set_parameter_block_constant(session.imu_from_rig.rotation.quat) + problem.set_parameter_block_constant(session.imu_from_rig.translation) + if not imu_options.optimize_bias: + constant_idxs = np.arange(3, 9) + for frame_id in session.imu_states: + problem.set_manifold( + session.imu_states[frame_id].data, + pyceres.SubsetManifold(9, constant_idxs), + ) + + return problem diff --git a/lamaria/pipeline/optim/session.py b/lamaria/pipeline/optim/session.py new file mode 100644 index 0000000..7a7b04b --- /dev/null +++ b/lamaria/pipeline/optim/session.py @@ -0,0 +1,30 @@ +from __future__ import annotations + +import numpy as np +import pycolmap + +from ...config.options import OptIMUOptions +from ...structs.vi_reconstruction import VIReconstruction +from .imu import ( + initialize_imu_states, + preintegrate_imu_measurements, +) + + +class SingleSeqSession: + def __init__( + self, + imu_options: OptIMUOptions, + data: VIReconstruction, + ): + self.data = data + self._init_imu_data(imu_options) + + def _init_imu_data(self, imu_options): + self.preintegrated_imu_measurements = preintegrate_imu_measurements( + imu_options, self.data + ) + self.imu_states = initialize_imu_states(self.data) + self.imu_from_rig = pycolmap.Rigid3d() + self.gravity = np.array([0.0, 0.0, -1.0]) + self.log_scale = np.array([0.0]) diff --git a/lamaria/pipeline/optim/vi_optimization.py b/lamaria/pipeline/optim/vi_optimization.py new file mode 100644 index 0000000..fad05ee --- /dev/null +++ b/lamaria/pipeline/optim/vi_optimization.py @@ -0,0 +1,98 @@ +from pathlib import Path + +import pycolmap + +from ...config.options import TriangulatorOptions, VIOptimizerOptions +from .iterative_global_ba import IterativeRefinement +from .session import SingleSeqSession + + +class VIOptimizer: + """Main Visual-Inertial Optimizer Class""" + + def __init__( + self, + vi_options: VIOptimizerOptions, + triangulator_options: TriangulatorOptions, + session: SingleSeqSession, + ): + self.vi_options = vi_options + self.triangulator_options = triangulator_options + self.session = session + + @staticmethod + def optimize( + vi_options: VIOptimizerOptions, + triangulator_options: TriangulatorOptions, + session: SingleSeqSession, + database_path: Path, + ) -> pycolmap.Reconstruction: + """Entry point for running full VI optimization""" + optimizer = VIOptimizer(vi_options, triangulator_options, session) + return optimizer.process(database_path) + + def process(self, database_path: Path) -> pycolmap.Reconstruction: + """Optimize the reconstruction using VI optimization""" + + if not database_path.exists(): + raise FileNotFoundError(f"Database not found: {database_path}") + + if not self.session.data.reconstruction.reg_image_ids(): + raise ValueError("Reconstruction has no registered images.") + + mapper = self._setup_incremental_mapper(database_path) + pipeline_options = self._get_incremental_pipeline_options() + + IterativeRefinement.run( + self.vi_options, pipeline_options, mapper, self.session + ) + + return mapper.reconstruction + + def _setup_incremental_mapper(self, database_path: Path): + """Setup the COLMAP incremental mapper""" + database = pycolmap.Database.open(database_path) + image_names = [ + self.session.data.reconstruction.images[image_id].name + for image_id in self.session.data.reconstruction.reg_image_ids() + ] + database_cache = pycolmap.DatabaseCache.create( + database, 15, False, set(image_names) + ) + mapper = pycolmap.IncrementalMapper(database_cache) + mapper.begin_reconstruction(self.session.data.reconstruction) + return mapper + + def _get_incremental_pipeline_options(self): + """Get incremental pipeline options for COLMAP bundle adjustment""" + pipeline_options = pycolmap.IncrementalPipelineOptions() + pipeline_options.fix_existing_frames = False + + if not self.vi_options.cam.optimize_cam_from_rig: + pipeline_options.ba_refine_sensor_from_rig = False + if not self.vi_options.cam.optimize_focal_length: + pipeline_options.ba_refine_focal_length = False + if not self.vi_options.cam.optimize_principal_point: + pipeline_options.ba_refine_principal_point = False + if not self.vi_options.cam.optimize_extra_params: + pipeline_options.ba_refine_extra_params = False + + # Setting up triangulation and mapper options + pipeline_options.triangulation.merge_max_reproj_error = ( + self.triangulator_options.merge_max_reproj_error + ) + pipeline_options.triangulation.complete_max_reproj_error = ( + self.triangulator_options.complete_max_reproj_error + ) + pipeline_options.triangulation.min_angle = ( + self.triangulator_options.min_angle + ) + + pipeline_options.mapper.filter_max_reproj_error = ( + self.triangulator_options.filter_max_reproj_error + ) + pipeline_options.mapper.filter_min_tri_angle = ( + self.triangulator_options.filter_min_tri_angle + ) + + return pipeline_options diff --git a/lamaria/pipeline/triangulation.py b/lamaria/pipeline/triangulation.py new file mode 100644 index 0000000..46ee1ea --- /dev/null +++ b/lamaria/pipeline/triangulation.py @@ -0,0 +1,150 @@ +from __future__ import annotations + +from collections import defaultdict +from pathlib import Path + +import pycolmap +from hloc import ( + extract_features, + match_features, + pairs_from_retrieval, + triangulation, +) + +from .. import logger +from ..config.options import TriangulatorOptions + + +def get_colmap_triangulation_options( + options: TriangulatorOptions, +) -> pycolmap.IncrementalPipelineOptions: + colmap_options = pycolmap.IncrementalPipelineOptions() + colmap_options.triangulation.merge_max_reproj_error = ( + options.merge_max_reproj_error + ) + colmap_options.triangulation.complete_max_reproj_error = ( + options.complete_max_reproj_error + ) + colmap_options.triangulation.min_angle = options.min_angle + + colmap_options.mapper.filter_max_reproj_error = ( + options.filter_max_reproj_error + ) + colmap_options.mapper.filter_min_tri_angle = options.filter_min_tri_angle + + return colmap_options + + +def pairs_from_frames(recon: pycolmap.Reconstruction): + frame_pairs = set() + by_index = defaultdict(list) + + for fid in sorted(recon.frames.keys()): + fr = recon.frames[fid] + img_ids = sorted([d.id for d in fr.data_ids]) + names = [recon.images[i].name for i in img_ids] + + for i in range(len(names)): + for j in range(i + 1, len(names)): + frame_pairs.add((names[i], names[j])) + frame_pairs.add((names[j], names[i])) + + for k, n in enumerate(names): + by_index[k].append(n) + + adj_pairs = set() + for _, seq in by_index.items(): + for a, b in zip(seq[:-1], seq[1:]): + adj_pairs.add((a, b)) + + return frame_pairs, adj_pairs + + +def postprocess_pairs_with_reconstruction( + sfm_pairs_file: Path, reconstruction: pycolmap.Reconstruction | Path +): + recon = ( + reconstruction + if isinstance(reconstruction, pycolmap.Reconstruction) + else pycolmap.Reconstruction(reconstruction.as_posix()) + ) + + frame_pairs, adj_pairs = pairs_from_frames(recon) + + existing = set() + with open(sfm_pairs_file) as f: + for line in f: + a, b = line.strip().split() + existing.add((a, b)) + + existing = {p for p in existing if p not in frame_pairs} + existing |= adj_pairs + + with open(sfm_pairs_file, "w") as f: + for a, b in sorted(existing): + f.write(f"{a} {b}\n") + + +def run( + options: TriangulatorOptions, + reference_model: Path, # reconstruction path + keyframes_path: Path, + output_path: Path, +) -> Path: + if not keyframes_path.exists(): + raise FileNotFoundError(f"keyframes not found at {keyframes_path}") + + output_path.mkdir(parents=True, exist_ok=True) + hloc_output_path = output_path / "hloc" + pairs_path = hloc_output_path / "pairs.txt" + triangulated_model_path = output_path / "model" + + hloc_output_path.mkdir(parents=True, exist_ok=True) + if not reference_model.exists(): + raise FileNotFoundError( + f"reference_model not found at {reference_model}" + ) + + retrieval_conf = extract_features.confs[options.retrieval_conf] + feature_conf = extract_features.confs[options.feature_conf] + matcher_conf = match_features.confs[options.matcher_conf] + + logger.info( + "HLOC confs: retrieval=%s, features=%s, matcher=%s", + options.retrieval_conf, + options.feature_conf, + options.matcher_conf, + ) + + retrieval_path = extract_features.main( + retrieval_conf, image_dir=keyframes_path, export_dir=hloc_output_path + ) + features_path = extract_features.main( + feature_conf, image_dir=keyframes_path, export_dir=hloc_output_path + ) + + pairs_from_retrieval.main( + retrieval_path, pairs_path, options.num_retrieval_matches + ) + postprocess_pairs_with_reconstruction(pairs_path, reference_model) + + matches_path = match_features.main( + conf=matcher_conf, + pairs=pairs_path, + features=feature_conf["output"], + export_dir=hloc_output_path, + ) + + colmap_opts = get_colmap_triangulation_options(options) + + _ = triangulation.main( + sfm_dir=triangulated_model_path, + reference_model=reference_model, + image_dir=keyframes_path, + pairs=pairs_path, + features=features_path, + matches=matches_path, + mapper_options=colmap_opts, + ) + + return triangulated_model_path diff --git a/lamaria/structs/control_point.py b/lamaria/structs/control_point.py new file mode 100644 index 0000000..c9bbc98 --- /dev/null +++ b/lamaria/structs/control_point.py @@ -0,0 +1,271 @@ +import json +from dataclasses import dataclass, field +from pathlib import Path + +import numpy as np +import pycolmap +from tqdm import tqdm + +from .. import logger +from ..utils.constants import ( + CUSTOM_ORIGIN_COORDINATES, +) +from ..utils.timestamps import get_timestamp_to_images + +ControlPoints = dict[int, "ControlPoint"] + + +@dataclass(slots=True) +class ControlPointSummary: + name: str # geo_id + topo: np.ndarray + covariance: np.ndarray + triangulated: np.ndarray | None = None + + def is_triangulated(self) -> bool: + return self.triangulated is not None + + +@dataclass(slots=True) +class ControlPoint: + name: str # geo_id + topo: np.ndarray + covariance: np.ndarray + image_name_to_detection: dict[str, np.ndarray] | None = field( + default_factory=dict + ) # image_name to [x, y] detection + + triangulated: np.ndarray | None = None # None if triangulation fails + inlier_ratio: float = 0.0 + # inlier list of (image_id, [x, y]) tuples + inlier_detections: list[(int, np.ndarray)] = field(default_factory=list) + + @staticmethod + def from_measurement( + name: str, + measurement_xyz: list[float | None], + unc_xyz: list[float | None], + origin_xyz: tuple[float, float, float], + ) -> "ControlPoint": + m = list(measurement_xyz) + u = list(unc_xyz) + if m[2] is None: + assert u[2] is None, "If z is None, its uncertainty must be None." + m[2] = origin_xyz[2] + u[2] = 1e9 # very uncertain height + + # translated measurement to have smaller numerical values + topo = np.asarray(m, dtype=np.float64) - np.asarray( + origin_xyz, dtype=np.float64 + ) + cov = np.diag(np.square(np.asarray(u, dtype=np.float64))) + + return ControlPoint(name=name, topo=topo, covariance=cov) + + def has_height(self) -> bool: + return bool(self.topo[2] != 0) + + def is_triangulated(self) -> bool: + return self.triangulated is not None + + def summary(self) -> ControlPointSummary: + return ControlPointSummary( + name=self.name, + triangulated=self.triangulated.tolist() + if self.triangulated is not None + else None, + topo=self.topo.tolist(), + covariance=self.covariance.tolist(), + ) + + +def load_cp_json( + cp_json_file: Path, + skip_detections: bool = False, +) -> tuple[ControlPoints, dict]: + """Load control points and timestamp→images mapping from a JSON file. + + Args: + cp_json_file (Path): Path to the sparse ground-truth JSON file. + skip_detections (bool): If True, skip per-image detections and only + load control point measurement data. + + Returns: + Tuple: + - control_points: Parsed control points collection. + - timestamp_to_images: Mapping from timestamps (ns) to image names. + """ + with open(cp_json_file) as file: + cp_data = json.load(file) + + control_points = construct_control_points(cp_data, skip_detections) + timestamp_to_images = get_timestamp_to_images(cp_data) + + return control_points, timestamp_to_images + + +def construct_control_points( + cp_data: dict, + skip_detections: bool = False, +) -> ControlPoints: + """ + Construct ControlPoints dict from JSON data. + + Args: + cp_data (dict): Loaded JSON data containing control point information. + skip_detections (bool): If True, skip per-image detections and only + load control point measurement data. + + Returns: + control_points (ControlPoints): Control points dictionary. + """ + + control_points: ControlPoints = {} + for geo_id, data in cp_data["control_points"].items(): + tag_ids = data["tag_id"] + measurement = data["measurement"] + unc = data["uncertainty"] + images_observing_cp = data["image_names"] + + for tag_id in tag_ids: + control_points[tag_id] = ControlPoint.from_measurement( + geo_id, + measurement, + unc, + CUSTOM_ORIGIN_COORDINATES, + ) + + if not skip_detections: + for image_name in images_observing_cp: + detection = cp_data["images"][image_name]["detection"] + control_points[tag_id].image_name_to_detection[ + image_name + ] = np.array(detection) + + return control_points + + +def transform_points(points, r, t, scale): + transformed_points = [] + for point in points: + transformed_point = scale * r.apply(point) + t + transformed_points.append(transformed_point) + + return transformed_points + + +def transform_triangulated_control_points( + control_points: ControlPoints, r, t, scale +) -> ControlPoints: + """Apply similarity transform to triangulated control points in place.""" + for cp in control_points.values(): + if cp.triangulated is None: + continue + cp.triangulated = scale * r.apply(cp.triangulated) + t + return control_points + + +def run_control_point_triangulation( + reconstruction: pycolmap.Reconstruction, + control_points: ControlPoints, +) -> None: + """ + Triangulate control points and add to control_points dict. + Updates `control_points` in place to add: + + - ``triangulated``: np.ndarray(3,) or None if triangulation fails + - ``inlier_ratio``: float + - ``inlier_detections``: list of observations used for triangulation + Args: + reconstruction (pycolmap.Reconstruction): The COLMAP reconstruction. + control_points (ControlPoints): The control points dictionary. + """ + + image_names_to_ids = { + image.name: image_id + for image_id, image in reconstruction.images.items() + } + + for _, cp in tqdm( + control_points.items(), desc="Triangulating control points" + ): + geo_id = cp.name + images_observing_cp = list(cp.image_name_to_detection.keys()) + + pixel_points = [] + cam_from_worlds = [] + cameras = [] + + image_ids_and_centers = [] + + for image_name in images_observing_cp: + if image_name not in image_names_to_ids: + continue + id = image_names_to_ids[image_name] + image = reconstruction.images[id] + detection = cp.image_name_to_detection[image_name] + pixel_points.append(detection) + cam_from_worlds.append(image.cam_from_world()) + cameras.append(reconstruction.cameras[image.camera_id]) + image_ids_and_centers.append((id, detection)) + + # HANDLING THE CASE WHERE NO IMAGES OBSERVE THE CONTROL POINT + if pixel_points == []: + cp.triangulated = None + cp.inlier_ratio = 0.0 + cp.inlier_detections = [] + continue + + pixel_points = np.array(pixel_points) + try: + output = pycolmap.estimate_triangulation( + pixel_points, + cam_from_worlds, + cameras, + options={ + "residual_type": "REPROJECTION_ERROR", + "ransac": {"max_error": 4.0}, + }, + ) + except Exception as e: + print(f"Error in triangulating control point {geo_id}: {e}") + output = None + + if output is None: + cp.triangulated = None + cp.inlier_ratio = 0.0 + cp.inlier_detections = [] + continue + + cp.triangulated = output["xyz"] + inliers = output["inliers"] + cp.inlier_ratio = np.sum(inliers) / len(inliers) + image_ids_and_centers = [ + image_ids_and_centers[i] for i in range(len(inliers)) if inliers[i] + ] + cp.inlier_detections = image_ids_and_centers + + +def get_cps_for_initial_alignment(control_points: ControlPoints): + """Get control points with z != 0 for initial alignment""" + triangulated_cp_alignment = [] + topo_cp_alignment = [] + for _, cp in control_points.items(): + if not cp.is_triangulated(): + continue + + if cp.has_height(): + triangulated_cp_alignment.append(cp.triangulated) + topo_cp_alignment.append(cp.topo) + + if len(topo_cp_alignment) < 3: + logger.error( + "Not enough control points with height for initial alignment. " + "At least 3 control points with z != 0 are required." + ) + return None, None + + triangulated_cp_alignment = np.array(triangulated_cp_alignment) + topo_cp_alignment = np.array(topo_cp_alignment) + + return triangulated_cp_alignment, topo_cp_alignment diff --git a/lamaria/structs/sparse_eval.py b/lamaria/structs/sparse_eval.py new file mode 100644 index 0000000..90c4c6b --- /dev/null +++ b/lamaria/structs/sparse_eval.py @@ -0,0 +1,189 @@ +import copy +from dataclasses import asdict, dataclass, field +from pathlib import Path + +import numpy as np +import pyceres +import pycolmap +import pycolmap.cost_functions + +from .. import logger +from .control_point import ControlPoint, ControlPointSummary + + +@dataclass(slots=True) +class SparseEvalVariables: + """Container for sparse evaluation optimization variables.""" + + control_points: dict[int, "ControlPoint"] # tag_id to ControlPoint + sim3d: pycolmap.Sim3d + log_scale: np.ndarray = field( + default_factory=lambda: np.array(0.0, dtype=np.float64) + ) + + @classmethod + def create_from_inputs( + cls, + control_points: dict, + sim3d: pycolmap.Sim3d, + ) -> "SparseEvalVariables": + scale = copy.deepcopy(sim3d.scale) + v = cls( + control_points=copy.deepcopy(control_points), + sim3d=copy.deepcopy(sim3d), + log_scale=np.array(np.log(scale), dtype=np.float64), + ) + + return v + + def update_sim3d_scale(self) -> None: + """Propagate optimized log_scale back into sim3d.scale.""" + log_scale = copy.deepcopy(self.log_scale) + self.sim3d.scale = np.exp(log_scale) + + def get_cp_summary(self) -> dict: + """Get a brief summary of control points.""" + summary: dict[int, ControlPointSummary] = {} + for tag_id, cp in self.control_points.items(): + summary[tag_id] = cp.summary() + + return summary + + +def get_problem_for_sparse_alignment( + reconstruction: pycolmap.Reconstruction, + variables: SparseEvalVariables, +) -> tuple: + """Create a Ceres problem for sparse alignment. + Args: + reconstruction (pycolmap.Reconstruction): The COLMAP reconstruction. + variables (dict): The variables dictionary from + create_variables_for_sparse_evaluation. + + Returns: + problem (pyceres.Problem): The Ceres problem. + solver_options (pyceres.SolverOptions): The solver options. + summary (pyceres.SolverSummary): The solver summary. + """ + problem = pyceres.Problem() + problem = add_residuals_for_sparse_eval( + problem, + reconstruction, + variables, + ) + solver_options = pyceres.SolverOptions() + solver_options.minimizer_progress_to_stdout = True + summary = pyceres.SolverSummary() + + return problem, solver_options, summary + + +def add_residuals_for_sparse_eval( + problem, + reconstruction: pycolmap.Reconstruction, + variables: SparseEvalVariables, +) -> pyceres.Problem: + """Add alignment residuals to the Ceres problem. + + Variables consists of - + - ReprojErrorCost for each observation of each control point + - Point3DAlignmentCost for each control point + Args: + problem (pyceres.Problem): The Ceres problem. + reconstruction (pycolmap.Reconstruction): The COLMAP reconstruction. + variables (SparseEvalVariables): The sparse evaluation variables. + """ + if variables.control_points is None or variables.sim3d is None: + return problem + + loss = pyceres.TrivialLoss() + + for tag_id, cp in variables.control_points.items(): + tri = cp.triangulated + if tri is None: + logger.info(f"Control point {tag_id} not triangulated") + continue + + point2d_cov = np.eye(2) + + obs = cp.inlier_detections + for image_id, point2d in obs: + image = reconstruction.images[image_id] + pose = image.cam_from_world() + camera = reconstruction.cameras[image.camera_id] + + point2d = np.asarray(point2d, dtype=np.float64).reshape(2, 1) + cost = pycolmap.cost_functions.ReprojErrorCost( + camera.model, + point2d_cov, + point2d, + pose, + ) + problem.add_residual_block(cost, loss, [tri, camera.params]) + problem.set_parameter_block_constant(camera.params) + + cost = pycolmap.cost_functions.Point3DAlignmentCost( + cp.covariance, + cp.topo, + use_log_scale=True, + ) + problem.add_residual_block( + cost, + loss, + [ + cp.triangulated, + variables.sim3d.rotation.quat, + variables.sim3d.translation, + variables.log_scale, + ], + ) + + problem.set_manifold( + variables.sim3d.rotation.quat, + pyceres.EigenQuaternionManifold(), + ) + + logger.info("Added Point3dAlignmentCost and ReprojErrorCost costs") + + return problem + + +@dataclass(slots=True) +class SparseEvalResult: + """Container for sparse evaluation results.""" + + alignment: pycolmap.Sim3d + cp_summary: dict[int, ControlPointSummary] + + @staticmethod + def from_variables( + variables: SparseEvalVariables, + ) -> "SparseEvalResult": + alignment = copy.deepcopy(variables.sim3d) + cp_summary = variables.get_cp_summary() + + return SparseEvalResult( + alignment=alignment, + cp_summary=cp_summary, + ) + + @classmethod + def load_from_npy(cls, path: Path) -> "SparseEvalResult": + if not path.exists(): + logger.error(f"Result file not found: {path}") + return None + + data = np.load(path, allow_pickle=True).item() + + try: + cp_summary = data["cp_summary"] + alignment = data["alignment"] + except KeyError as e: + logger.error(f"Missing key in data: {e}") + return None + + return cls(cp_summary=cp_summary, alignment=alignment) + + def save_as_npy(self, path: Path) -> None: + path.parent.mkdir(parents=True, exist_ok=True) # just in case + np.save(path, asdict(self)) diff --git a/lamaria/structs/timed_reconstruction.py b/lamaria/structs/timed_reconstruction.py new file mode 100644 index 0000000..af8b809 --- /dev/null +++ b/lamaria/structs/timed_reconstruction.py @@ -0,0 +1,53 @@ +from dataclasses import dataclass, field +from pathlib import Path + +import numpy as np +import pycolmap + + +@dataclass +class TimedReconstruction: + reconstruction: pycolmap.Reconstruction = field( + default_factory=pycolmap.Reconstruction + ) + timestamps: dict[int, int] = field(default_factory=dict) + + @classmethod + def read(cls, input_folder: Path) -> "TimedReconstruction": + """Load reconstruction and timestamps from disk.""" + assert input_folder.exists(), ( + f"Input folder {input_folder} does not exist" + ) + + reconstruction = pycolmap.Reconstruction(input_folder) + + ts_path = input_folder / "timestamps.txt" + assert ts_path.exists(), f"Timestamps file {ts_path} does not exist" + timestamps: dict[int, int] = {} + with open(ts_path) as f: + for line in f: + if line.startswith("#"): + continue + frame_id, ts = line.strip().split() + timestamps[int(frame_id)] = int(ts) + + return cls(reconstruction=reconstruction, timestamps=timestamps) + + def write(self, output_folder: Path) -> None: + """Write reconstruction and timestamps to disk.""" + output_folder.mkdir(parents=True, exist_ok=True) + self.reconstruction.write(output_folder.as_posix()) + + ts_path = output_folder / "timestamps.txt" + frame_ids = sorted(self.timestamps.keys()) + + # sanity check + recon_frame_ids = np.array(sorted(self.reconstruction.frames.keys())) + assert np.array_equal(np.array(frame_ids), recon_frame_ids), ( + "Frame IDs in reconstruction and timestamps do not match" + ) + + with open(ts_path, "w") as f: + f.write("# FrameID Timestamp(ns)\n") + for frame_id in frame_ids: + f.write(f"{frame_id} {self.timestamps[frame_id]}\n") diff --git a/lamaria/structs/trajectory.py b/lamaria/structs/trajectory.py new file mode 100644 index 0000000..2339fdd --- /dev/null +++ b/lamaria/structs/trajectory.py @@ -0,0 +1,347 @@ +from dataclasses import dataclass, field +from decimal import ROUND_HALF_UP, Decimal +from pathlib import Path + +import numpy as np +import pycolmap +from tqdm import tqdm + +from .. import logger +from ..utils.constants import ( + LEFT_CAMERA_STREAM_LABEL, + RIGHT_CAMERA_STREAM_LABEL, +) +from ..utils.timestamps import ( + find_closest_timestamp, + matching_time_indices, +) + + +def _round_ns(x: str | int | float) -> int: + # works for "5120...274.999939", "5.12e11", or ints + if isinstance(x, int): + return x + s = str(x) + return int(Decimal(s).to_integral_value(rounding=ROUND_HALF_UP)) + + +@dataclass(slots=True) +class Trajectory: + """ + Loads and stores traj data from 'estimate' text file with rows: + ts t_x t_y t_z q_x q_y q_z q_w + Blank lines and lines starting with '#' are ignored. + + By default, poses are calculated as rig_from_world + (i.e., inverse of world_from_rig) to satisfy COLMAP format. + + Attributes: + invert_poses (bool): Whether to invert poses to + rig_from_world format. + corresponding_sensor (str): The reference sensor to in which + the trajectory is represented ("imu" or "cam0"). + """ + + invert_poses: bool = True + corresponding_sensor: str = "imu" + _timestamps: list[int] = field(default_factory=list[int]) + _poses: list[pycolmap.Rigid3d] = field( + default_factory=list[pycolmap.Rigid3d] + ) + + @classmethod + def load_from_file( + cls, + path: str | Path, + invert_poses: bool = True, + corresponding_sensor: str = "imu", + ) -> "Trajectory": + """Parse the file, validate format, populate timestamps & poses.""" + self = cls() + self.clear() + path = Path(path) + self.invert_poses = invert_poses + self.corresponding_sensor = corresponding_sensor + + if not path.exists(): + raise FileNotFoundError(f"Estimate file not found: {path}") + + with open(path) as f: + lines = f.readlines() + + state = self._parse(lines) + if not state: + raise RuntimeError("Failed to parse estimate file.") + + return self + + def add_estimate_poses_to_reconstruction( + self, + reconstruction: pycolmap.Reconstruction, + timestamp_to_images: dict, + ) -> pycolmap.Reconstruction: + """ + Adds estimate poses as frames to input reconstruction. + """ + self._ensure_loaded() + reconstruction = self._add_images_to_reconstruction( + reconstruction, timestamp_to_images + ) + + return reconstruction + + def filter_from_indices( + self, + ids: np.ndarray, + ) -> None: + """ + From evo package. + Edits the trajectory based on the provided indices. + """ + self._ensure_loaded() + self._timestamps = [self._timestamps[i] for i in ids] + self._poses = [self._poses[i] for i in ids] + + def transform(self, tgt_from_src: pycolmap.Sim3d) -> None: + """Apply a similarity transformation to all poses in the trajectory.""" + self._ensure_loaded() + _tmp_poses = ( + self._poses.copy() + if self.invert_poses + else [p.inverse() for p in self._poses] + ) + + # tmp_poses are in rig_from_world format + for i, p in enumerate(_tmp_poses): + rig_from_new_world = ( + pycolmap.Sim3d(1, p.rotation, p.translation) + * tgt_from_src.inverse() + ) + new_p = pycolmap.Rigid3d( + rig_from_new_world.rotation, + rig_from_new_world.translation * tgt_from_src.scale, + ) + self._poses[i] = new_p if self.invert_poses else new_p.inverse() + + @property + def timestamps(self) -> list[int]: + self._ensure_loaded() + return self._timestamps + + @property + def poses(self) -> list[pycolmap.Rigid3d]: + self._ensure_loaded() + return self._poses + + @property + def positions(self) -> np.ndarray: + """Returns Nx3 numpy array of positions.""" + self._ensure_loaded() + if not self.invert_poses: + # poses are in world_from_rig format + return np.array([p.translation for p in self._poses]) + else: + return np.array([p.inverse().translation for p in self._poses]) + + @property + def orientations(self) -> np.ndarray: + """Returns Nx4 numpy array of quaternions (x, y, z, w).""" + self._ensure_loaded() + if not self.invert_poses: + # poses are in world_from_rig format + return np.array([p.rotation.quat for p in self._poses]) + else: + # poses are in rig_from_world format + return np.array([p.inverse().rotation.quat for p in self._poses]) + + def as_tuples(self) -> list[tuple[int, pycolmap.Rigid3d]]: + """Return a list of (timestamp, pose) tuples.""" + self._ensure_loaded() + return list(zip(self._timestamps, self._poses, strict=True)) + + def as_dict(self) -> dict[int, pycolmap.Rigid3d]: + """Return a dict mapping timestamp to pose.""" + self._ensure_loaded() + return dict(zip(self._timestamps, self._poses, strict=True)) + + def __len__(self) -> int: + return len(self._timestamps) + + def is_loaded(self) -> bool: + return len(self._timestamps) > 0 + + def clear(self) -> None: + self._timestamps.clear() + self._poses.clear() + + def _ensure_loaded(self) -> None: + if not self._timestamps or not self._poses: + raise RuntimeError( + "Estimate not loaded. Call load_from_file() first." + ) + + # Parses the file lines, populating self._timestamps and self._poses + def _parse(self, lines: list[str]) -> None: + ts_list: list[int] = [] + pose_list: list[pycolmap.Rigid3d] = [] + exists_lines = False + for lineno, line in enumerate(lines, start=1): + line = line.strip() + if not line or line.startswith("#"): + continue + + exists_lines = True + parts = line.split() + if len(parts) != 8: + logger.error( + f"Line {lineno}: expected 8 values, got {len(parts)}" + ) + return False + + try: + ts = _round_ns(parts[0]) + tvec = np.array( + [float(parts[1]), float(parts[2]), float(parts[3])] + ) + qvec = np.array( + [ + float(parts[4]), + float(parts[5]), + float(parts[6]), + float(parts[7]), + ] + ) + + except ValueError as e: + logger.error(f"Line {lineno}: invalid number format: {e}") + return False + + world_from_rig = pycolmap.Rigid3d(pycolmap.Rotation3d(qvec), tvec) + pose = ( + world_from_rig.inverse() + if self.invert_poses + else world_from_rig + ) + + ts_list.append(ts) + pose_list.append(pose) + + if not exists_lines: + logger.error("No valid lines found in the estimate file.") + return False + + self._timestamps = ts_list + self._poses = pose_list + + return True + + def _add_images_to_reconstruction( + self, + reconstruction: pycolmap.Reconstruction, + timestamp_to_images: dict, + ) -> pycolmap.Reconstruction: + pose_data = self.as_tuples() + + image_id = 1 + # imu is the rig in this reconstruction + rig = reconstruction.rig(rig_id=1) + + if self.corresponding_sensor == "imu": + transform = pycolmap.Rigid3d() + else: + # left camera poses are provided + # sensor_from_rig == cam0_from_imu + transform = rig.sensor_from_rig(sensor_id=2) + + for i, (timestamp, pose) in tqdm( + enumerate(pose_data), + total=len(pose_data), + desc="Adding images to reconstruction", + ): + frame = pycolmap.Frame() + frame.rig_id = rig.rig_id + frame.frame_id = i + 1 + + if self.invert_poses: + # poses are in imu/cam_from_world format + # if imu: imu_from_world + # if cam0: cam0_from_world + T_world_rig = pose.inverse() * transform + else: + # poses are in world_from_cam/imu format + T_world_rig = pose * transform + + frame.rig_from_world = T_world_rig.inverse() + + images_to_add = [] + + for label, camera_id in [ + (LEFT_CAMERA_STREAM_LABEL, 2), + (RIGHT_CAMERA_STREAM_LABEL, 3), + ]: + source_timestamps = timestamp_to_images[label]["sorted_keys"] + # offsets upto 1 ms (1e6 ns) + closest_timestamp = find_closest_timestamp( + source_timestamps, timestamp, max_diff=1e6 + ) + if closest_timestamp is None: + raise ValueError + + image_name = timestamp_to_images[label][closest_timestamp] + + im = pycolmap.Image( + image_name, + pycolmap.Point2DList(), + camera_id, + image_id, + ) + im.frame_id = frame.frame_id + frame.add_data_id(im.data_id) + + images_to_add.append(im) + image_id += 1 + + reconstruction.add_frame(frame) + for im in images_to_add: + reconstruction.add_image(im) + + return reconstruction + + +def associate_trajectories( + traj1: Trajectory, + traj2: Trajectory, +) -> tuple[Trajectory, Trajectory]: + """ + From evo package. + Associate two trajectories based on their timestamps. + Args: + traj1 (Trajectory): First trajectory. + traj2 (Trajectory): Second trajectory. + Returns: + tuple[Trajectory, Trajectory]: Associated trajectories. + """ + if not traj1.is_loaded() or not traj2.is_loaded(): + logger.error("Trajectories must be loaded before association.") + return None, None + + first_longer = len(traj1) >= len(traj2) + longer_traj = traj1 if first_longer else traj2 + shorter_traj = traj2 if first_longer else traj1 + + short_idx, long_idx = matching_time_indices( + np.array(shorter_traj.timestamps), + np.array(longer_traj.timestamps), + ) + num_matches = len(long_idx) + if num_matches == 0: + logger.error("No matching timestamps found between trajectories.") + return None, None + + longer_traj.filter_from_indices(long_idx) + shorter_traj.filter_from_indices(short_idx) + + traj1 = traj1 if first_longer else traj2 + traj2 = traj2 if first_longer else traj1 + + return traj1, traj2 diff --git a/lamaria/structs/vi_reconstruction.py b/lamaria/structs/vi_reconstruction.py new file mode 100644 index 0000000..3d23cae --- /dev/null +++ b/lamaria/structs/vi_reconstruction.py @@ -0,0 +1,41 @@ +from dataclasses import dataclass +from pathlib import Path + +import numpy as np +import pycolmap + +from .timed_reconstruction import TimedReconstruction + + +@dataclass +class VIReconstruction(TimedReconstruction): + """Visual–Inertial reconstruction: adds IMU measurements.""" + + imu_measurements: pycolmap.ImuMeasurements | None = None + + @classmethod + def read(cls, input_folder: Path) -> "VIReconstruction": + # Load base data first + base = super().read(input_folder) + + # Load IMU + rectified_imu_data_npy = input_folder / "rectified_imu_data.npy" + assert rectified_imu_data_npy.exists(), ( + f"Rectified IMU data file {rectified_imu_data_npy} does not exist" + ) + rectified_imu_data = np.load(rectified_imu_data_npy, allow_pickle=True) + imu_measurements = pycolmap.ImuMeasurements(rectified_imu_data.tolist()) + + return cls( + reconstruction=base.reconstruction, + timestamps=base.timestamps, + imu_measurements=imu_measurements, + ) + + def write(self, output_folder: Path) -> None: + # Write base data first + super().write(output_folder) + + # Write IMU data + rectified_imu_data_npy = output_folder / "rectified_imu_data.npy" + np.save(rectified_imu_data_npy, self.imu_measurements.data) diff --git a/lamaria/utils/aria.py b/lamaria/utils/aria.py new file mode 100644 index 0000000..0eb2494 --- /dev/null +++ b/lamaria/utils/aria.py @@ -0,0 +1,610 @@ +import json +import shutil +import subprocess +from pathlib import Path + +import numpy as np +import pycolmap +from projectaria_tools.core import data_provider, mps +from projectaria_tools.core.calibration import CameraCalibration, ImuCalibration +from projectaria_tools.core.mps.utils import get_nearest_pose +from projectaria_tools.core.sensor_data import TimeDomain, TimeQueryOptions +from projectaria_tools.core.sophus import SE3 +from projectaria_tools.core.stream_id import StreamId +from scipy.spatial.transform import Rotation +from tqdm import tqdm + +from .. import logger +from .constants import ( + ARIA_CAMERAS, + LEFT_CAMERA_STREAM_ID, + RGB_CAMERA_STREAM_ID, + RIGHT_CAMERA_STREAM_ID, + RIGHT_IMU_STREAM_ID, +) +from .timestamps import find_closest_timestamp +from .types import InitReconstruction + +# ----- Reconstruction functions ----- # + + +def initialize_reconstruction_from_calibration_file( + calibration_file: Path, +) -> InitReconstruction: + """Initialize a COLMAP reconstruction from Aria calibration + json file found on website: https://lamaria.ethz.ch/slam_datasets + Adds a dummy camera as an IMU along with two cameras. + + Args: + calibration_file (Path): + Path to the Aria calibration json file + Returns: + InitReconstruction: The initialized COLMAP reconstruction + """ + reconstruction = pycolmap.Reconstruction() + + imu = pycolmap.Camera(camera_id=1, model="SIMPLE_PINHOLE", params=[0, 0, 0]) + reconstruction.add_camera(imu) + + rig = pycolmap.Rig(rig_id=1) + ref_sensor = pycolmap.sensor_t( + id=1, # imu is the rig + type=pycolmap.SensorType.CAMERA, + ) + rig.add_ref_sensor(ref_sensor) + + for i, (key, _) in enumerate(ARIA_CAMERAS): + cam = camera_colmap_from_json( + calibration_file=calibration_file, + camera_label=key, + ) + cam.camera_id = i + 2 # start from 2 since 1 is imu + reconstruction.add_camera(cam) + + sensor = pycolmap.sensor_t(id=i + 2, type=pycolmap.SensorType.CAMERA) + rig_from_sensor = get_t_imu_camera_from_calibration_file( + calibration_file=calibration_file, + camera_label=key, + ) + sensor_from_rig = rig_from_sensor.inverse() + rig.add_sensor(sensor, sensor_from_rig) + + reconstruction.add_rig(rig) + + return reconstruction + + +# ----- Camera functions ----- # + + +def initialize_reconstruction_with_cameras( + calibration_file: Path, +) -> InitReconstruction: + """Initialize a COLMAP reconstruction from Aria calibration + json file found on website: https://lamaria.ethz.ch/slam_datasets + Adds only the cameras without any dummy IMU. + + Args: + calibration_file (Path): + Path to the Aria calibration json file + Returns: + InitReconstruction: The initialized COLMAP reconstruction + """ + reconstruction = pycolmap.Reconstruction() + + for i, (key, _) in enumerate(ARIA_CAMERAS): + cam = camera_colmap_from_json( + calibration_file=calibration_file, + camera_label=key, + ) + cam.camera_id = i + 1 + reconstruction.add_camera(cam) + + rig = pycolmap.Rig(rig_id=1) + ref_sensor = pycolmap.sensor_t( + id=1, # left camera is the rig + type=pycolmap.SensorType.CAMERA, + ) + rig.add_ref_sensor(ref_sensor) + + sensor1 = pycolmap.sensor_t(id=2, type=pycolmap.SensorType.CAMERA) + sensor_from_rig = get_t_cam_a_cam_b_from_calibration_file( + calibration_file=calibration_file, + camera_a_label="cam1", # right + camera_b_label="cam0", # left + ) + rig.add_sensor(sensor1, sensor_from_rig) + + reconstruction.add_rig(rig) + + return reconstruction + + +def get_camera_params_for_colmap( + camera_calibration: CameraCalibration, + camera_model: str, +) -> list[float]: + """ + Convert Aria CameraCalibration to COLMAP camera parameters. + Supported models: OPENCV_FISHEYE, FULL_OPENCV, RAD_TAN_THIN_PRISM_FISHEYE + Args: + camera_calibration (CameraCalibration): + The projectaria_tools CameraCalibration object + camera_model (str): The COLMAP camera model to use + Returns: + list[float]: The camera parameters in COLMAP format + """ + # params = [f_u {f_v} c_u c_v [k_0: k_{numK-1}] + # {p_0 p_1} {s_0 s_1 s_2 s_3}] + # projection_params is a 15 length vector, + # starting with focal length, pp, extra coeffs + camera_params = camera_calibration.get_projection_params() + f_x, f_y, c_x, c_y = ( + camera_params[0], + camera_params[0], + camera_params[1], + camera_params[2], + ) + + p2, p1 = camera_params[-5], camera_params[-6] + + k1, k2, k3, k4, k5, k6 = camera_params[3:9] + + # FULL_OPENCV model format: + # fx, fy, cx, cy, k1, k2, p1, p2, k3, k4, k5, k6 + + # OPENCV_FISHEYE model format: + # fx, fy, cx, cy, k1, k2, k3, k4 + + if camera_model == "OPENCV_FISHEYE": + params = [f_x, f_y, c_x, c_y, k1, k2, k3, k4] + elif camera_model == "FULL_OPENCV": + params = [f_x, f_y, c_x, c_y, k1, k2, p1, p2, k3, k4, k5, k6] + elif camera_model == "RAD_TAN_THIN_PRISM_FISHEYE": + aria_fisheye_params = camera_params + focal_length = aria_fisheye_params[0] + aria_fisheye_params = np.insert(aria_fisheye_params, 0, focal_length) + params = aria_fisheye_params + + return params + + +def camera_colmap_from_calib(calib: CameraCalibration) -> pycolmap.Camera: + """Loads pycolmap camera from Aria CameraCalibration object""" + if calib.get_model_name().name != "FISHEYE624": + raise ValueError( + f"Unsupported Aria model {calib.get_model_name().name}" + ) + model = "RAD_TAN_THIN_PRISM_FISHEYE" + params = get_camera_params_for_colmap(calib, model) + width, height = calib.get_image_size() + return pycolmap.Camera( + model=model, + width=width, + height=height, + params=params, + ) + + +def camera_colmap_from_json( + calibration_file: Path, # open sourced aria calibration files + camera_label: str, +) -> pycolmap.Camera: + """Loads pycolmap camera from Aria calibration json file found on website: + https://lamaria.ethz.ch/slam_datasets""" + + calib = json.load(calibration_file.open("r")) + camera_data = calib[camera_label] + if camera_data["model"] != "RAD_TAN_THIN_PRISM_FISHEYE": + raise ValueError(f"Unsupported Aria model {camera_data['model']}") + + model = "RAD_TAN_THIN_PRISM_FISHEYE" + params = camera_data["params"] + width = camera_data["resolution"]["width"] + height = camera_data["resolution"]["height"] + return pycolmap.Camera( + model=model, + width=width, + height=height, + params=params, + ) + + +# ----- Transformation functions ----- # + + +def get_closed_loop_data_from_mps( + mps_path: Path, +) -> list[mps.ClosedLoopTrajectoryPose]: + """Get closed loop trajectory data from MPS folder.""" + closed_loop_traj_file = mps_path / "slam" / "closed_loop_trajectory.csv" + data = mps.read_closed_loop_trajectory(closed_loop_traj_file.as_posix()) + return data + + +def get_mps_poses_for_timestamps( + trajectory_data: list[mps.ClosedLoopTrajectoryPose], + timestamps: list[int], +) -> list[SE3]: + """Get T_world_device for a list of device timestamps + in nanoseconds using MPS trajectory data. + Returns None for timestamps where + no trajectory data is found. + """ + poses = [] + if trajectory_data: + for ts in timestamps: + pose_information = get_nearest_pose(trajectory_data, ts) + if pose_information: + t_world_device = pose_information.transform_world_device + poses.append(t_world_device) + else: + poses.append(None) + + return poses + + +def get_t_rig_world_for_device_time_ns( + trajectory_data: list[mps.ClosedLoopTrajectoryPose], + device_time_ns: int, + imu_calibration: ImuCalibration, +): + """Get T_rig_world (rig is right IMU sensor) for a given device time + in nanoseconds using MPS trajectory data and IMU calibration. + Returns None if no trajectory data is found for the given timestamp. + """ + if trajectory_data: + pose_information = get_nearest_pose(trajectory_data, device_time_ns) + if pose_information: + t_world_device = pose_information.transform_world_device + t_device_imu = imu_calibration.get_transform_device_imu() + assert t_device_imu is not None, ( + f"No t_device_imu found for device time {device_time_ns}, \ + check imu calibration" + ) + + t_world_imu = t_world_device @ t_device_imu + t_imu_world = t_world_imu.inverse() + + return t_imu_world + else: + return None + else: + raise ValueError("No trajectory data found") + + +def get_t_imu_camera( + imu_calib: ImuCalibration, + camera_calib: CameraCalibration, +) -> pycolmap.Rigid3d: + """Get T_imu_camera from Aria calibrations. + Returns pycolmap.Rigid3d transform. + """ + + t_device_cam = camera_calib.get_transform_device_camera() + t_device_imu = imu_calib.get_transform_device_imu() + t_imu_device = t_device_imu.inverse() + + t_imu_cam = t_imu_device @ t_device_cam + + colmap_t_imu_cam = rigid3d_from_transform(t_imu_cam) + + return colmap_t_imu_cam + + +def rigid3d_from_transform(transform: SE3) -> pycolmap.Rigid3d: + """Converts projectaria_tools Rigid3d to pycolmap Rigid3d + + Note: to_quat() returns in wxyz format, but pycolmap.Rotation3d + expects xyzw format.""" + + # https://github.com/facebookresearch/projectaria_tools/blob/867105e65cadbe777db355a407d90533c71d2d06/core/python/sophus/SO3PyBind.h#L161 + + qvec = transform.rotation().to_quat()[0] + tvec = transform.translation()[0] + qvec = np.roll( + qvec, -1 + ) # change from w,x,y,z to x,y,z,w for pycolmap format + q = np.array(qvec) + t = np.array(tvec) + + return pycolmap.Rigid3d(pycolmap.Rotation3d(q), t) + + +def get_magnitude_from_transform( + transform: pycolmap.Rigid3d, +) -> tuple[float, float]: + """Returns rotation (in degrees) and + translation (in meters) magnitudes + from a Rigid3d transform + """ + translation = transform.translation + quat_xyzw = transform.rotation.quat + rotation = Rotation.from_quat(quat_xyzw) + dr = np.rad2deg(rotation.magnitude()) + dt = np.linalg.norm(translation) + + return dr, dt + + +def get_t_cam_a_cam_b_from_calibration_file( + calibration_file: Path, + camera_a_label: str, + camera_b_label: str, +) -> pycolmap.Rigid3d: + """Get T_cam_a_cam_b from calibration json file (found on website: https://lamaria.ethz.ch/slam_datasets) + for given camera labels.""" + + calib = json.load(calibration_file.open("r")) + + # body is right imu of Aria, therefore T_b_s is T_imu_camera + t_body_cam_a = calib[camera_a_label]["T_b_s"] + t_body_cam_b = calib[camera_b_label]["T_b_s"] + + t_imu_cam_a = pycolmap.Rigid3d( + pycolmap.Rotation3d(t_body_cam_a["qvec"]), + t_body_cam_a["tvec"], + ) + t_imu_cam_b = pycolmap.Rigid3d( + pycolmap.Rotation3d(t_body_cam_b["qvec"]), + t_body_cam_b["tvec"], + ) + + t_camera_a_camera_b = t_imu_cam_a.inverse() * t_imu_cam_b + + return t_camera_a_camera_b + + +def get_t_imu_camera_from_calibration_file( + calibration_file: Path, + camera_label: str, +) -> pycolmap.Rigid3d: + """Get T_imu_camera from calibration json file (found on website: https://lamaria.ethz.ch/slam_datasets) + for a given camera label.""" + + calib = json.load(calibration_file.open("r")) + + # body is right imu of Aria, therefore T_b_s is T_imu_camera + t_body_cam = calib[camera_label]["T_b_s"] + t_imu_camera = pycolmap.Rigid3d( + pycolmap.Rotation3d(t_body_cam["qvec"]), + t_body_cam["tvec"], + ) + + return t_imu_camera + + +# ----- IMU functions ----- # + + +def get_online_params_for_imu_from_mps( + online_calibs_file: Path, stream_label: str, num_error: float = 1e6 +): + online_calibs = mps.read_online_calibration(online_calibs_file.as_posix()) + online_imu_calibs = {} + num_error = int(num_error) + for calib in tqdm( + online_calibs, desc="Reading online IMU calibration data" + ): + for imuCalib in calib.imu_calibs: + if imuCalib.get_label() == stream_label: + # calib timestamp in microseconds + # convert to nanoseconds and then quantize to milliseconds + timestamp = int(calib.tracking_timestamp.total_seconds() * 1e9) + quantized_timestamp = timestamp // num_error + online_imu_calibs[quantized_timestamp] = imuCalib + + return online_imu_calibs + + +def get_imu_data_from_vrs( + vrs_provider: data_provider.VrsDataProvider, + mps_folder: Path | None = None, +) -> pycolmap.ImuMeasurements: + """Get rectified IMU data from VRS file. + If mps_folder is provided, use online calibration data + from MPS folder. Otherwise, use device calibration from VRS file.""" + imu_timestamps = sorted( + vrs_provider.get_timestamps_ns( + StreamId(RIGHT_IMU_STREAM_ID), TimeDomain.DEVICE_TIME + ) + ) + imu_stream_label = vrs_provider.get_label_from_stream_id( + StreamId(RIGHT_IMU_STREAM_ID) + ) + + if mps_folder is not None: + online_calibs_file = mps_folder / "slam" / "online_calibration.jsonl" + online_imu_calibs = get_online_params_for_imu_from_mps( + online_calibs_file, imu_stream_label + ) + acceptable_diff_ms = 1 # 1 milliseconds + calib_timestamps = sorted(online_imu_calibs.keys()) + else: + device_calib = vrs_provider.get_device_calibration() + calibration = device_calib.get_imu_calib(imu_stream_label) + + # The bottleneck of the code below is vrs_provider.get_imu_data_by_time_ns. + # This can be very fast when a getter for multiple timestamps is available. + ms = [] + for timestamp in tqdm(imu_timestamps, desc="Loading rect IMU data"): + if mps_folder is not None: + quantized_timestamp = timestamp // int(1e6) + closest_ts = find_closest_timestamp( + calib_timestamps, quantized_timestamp, acceptable_diff_ms + ) + + if closest_ts not in online_imu_calibs: + raise ValueError( + f"No calibration found for timestamp {timestamp}" + ) + + calibration = online_imu_calibs[closest_ts] + + imu_data = vrs_provider.get_imu_data_by_time_ns( + StreamId(RIGHT_IMU_STREAM_ID), + timestamp, + TimeDomain.DEVICE_TIME, + TimeQueryOptions.CLOSEST, + ) + if imu_data.accel_valid and imu_data.gyro_valid: + rectified_acc = calibration.raw_to_rectified_accel( + imu_data.accel_msec2 + ) + rectified_gyro = calibration.raw_to_rectified_gyro( + imu_data.gyro_radsec + ) + ts = float(timestamp) / 1e9 # convert to seconds + ms.append( + pycolmap.ImuMeasurement(ts, rectified_acc, rectified_gyro) + ) + + return pycolmap.ImuMeasurements(ms) + + +def get_imu_data_from_vrs_file( + vrs_file: Path, + mps_folder: Path | None = None, +) -> pycolmap.ImuMeasurements: + vrs_provider = data_provider.create_vrs_data_provider(vrs_file.as_posix()) + return get_imu_data_from_vrs(vrs_provider, mps_folder=mps_folder) + + +# ----- VRS utils ----- # + + +# TODO: merge this with initialize_reconstruction_from_calibration_file +def initialize_reconstruction_from_vrs_file( + vrs_file: Path, +) -> InitReconstruction: + """ + Return a pycolmap.Reconstruction with only cameras and rigs populated. + """ + # Initialize VRS data provider + vrs_provider = data_provider.create_vrs_data_provider(vrs_file.as_posix()) + + # build reconstruction + recon = pycolmap.Reconstruction() + device_calibration = vrs_provider.get_device_calibration() + imu_stream_label = vrs_provider.get_label_from_stream_id( + StreamId(RIGHT_IMU_STREAM_ID) + ) + imu_calib = device_calibration.get_imu_calib(imu_stream_label) + + rig = pycolmap.Rig(rig_id=1) + + # DUMMY CAMERA FOR IMU, IMU ID is 1 + imu = pycolmap.Camera(camera_id=1, model="SIMPLE_PINHOLE", params=[0, 0, 0]) + recon.add_camera(imu) + rig.add_ref_sensor(imu.sensor_id) + + for cam_id, sid in [ + (2, StreamId(LEFT_CAMERA_STREAM_ID)), + (3, StreamId(RIGHT_CAMERA_STREAM_ID)), + ]: + stream_label = vrs_provider.get_label_from_stream_id(sid) + camera_calib = device_calibration.get_camera_calib(stream_label) + cam = camera_colmap_from_calib(camera_calib) + cam.camera_id = cam_id + + t_imu_camera = get_t_imu_camera( + imu_calib, + camera_calib, + ) + sensor_from_rig = t_imu_camera.inverse() + + recon.add_camera(cam) + rig.add_sensor(cam.sensor_id, sensor_from_rig) + + recon.add_rig(rig) + return recon + + +def extract_images_from_vrs( + vrs_file: Path, + image_folder: Path, + left_subfolder_name="left", + right_subfolder_name="right", + rgb_subfolder_name="rgb", + verbose: bool = False, + extract_rgb: bool = False, + extract_left: bool = True, + extract_right: bool = True, +): + for camera, stream_id in [ + (left_subfolder_name, LEFT_CAMERA_STREAM_ID), + (right_subfolder_name, RIGHT_CAMERA_STREAM_ID), + (rgb_subfolder_name, RGB_CAMERA_STREAM_ID), + ]: + if camera == rgb_subfolder_name and not extract_rgb: + continue + + if camera == left_subfolder_name and not extract_left: + continue + + if camera == right_subfolder_name and not extract_right: + continue + + output_dir = image_folder / camera + output_dir.mkdir(parents=True, exist_ok=True) + shutil.rmtree(output_dir) + + logger.info( + "Extracting images for camera %s in VRS %s", camera, vrs_file + ) + cmd = f"vrs extract-images {vrs_file} --to {output_dir} + {stream_id}" + stdout = None if verbose else subprocess.PIPE + out = subprocess.run( + cmd, shell=True, stderr=subprocess.STDOUT, stdout=stdout + ) + if out.returncode: + msg = f"Command '{cmd}' returned {out.returncode}." + if out.stdout: + msg += "\n" + out.stdout.decode("utf-8") + raise subprocess.SubprocessError(msg) + logger.info("Done!") + + +def _image_names_from_folder( + folder: Path, wrt_to: Path, ext: str = ".jpg" +) -> list[Path]: + if not folder.is_dir(): + return [] + images = sorted(n for n in folder.iterdir() if n.suffix == ext) + images = [n.relative_to(wrt_to) for n in images] + return images + + +def extract_images_with_timestamps_from_vrs( + vrs_file, images_path +) -> dict[int, tuple[Path, Path]]: + """ + Return timestamps -> image names + """ + + # Initialize VRS data provider + vrs_provider = data_provider.create_vrs_data_provider(vrs_file.as_posix()) + left_ts = sorted( + vrs_provider.get_timestamps_ns( + StreamId(LEFT_CAMERA_STREAM_ID), TimeDomain.DEVICE_TIME + ) + ) + + # Extract images from VRS file + extract_images_from_vrs( + vrs_file=vrs_file, + image_folder=images_path, + ) + + # Get images + left_img_dir = images_path / "left" + right_img_dir = images_path / "right" + left_images = _image_names_from_folder(left_img_dir, left_img_dir) + right_images = _image_names_from_folder(right_img_dir, right_img_dir) + images = list(zip(left_images, right_images)) + + # Create a map + assert len(left_ts) == len(images), ( + "timestamps should have the same length as images" + ) + return dict(zip(left_ts, images)) diff --git a/lamaria/utils/constants.py b/lamaria/utils/constants.py new file mode 100644 index 0000000..516c942 --- /dev/null +++ b/lamaria/utils/constants.py @@ -0,0 +1,19 @@ +# ASL folder name, stream label for Aria cameras +ARIA_CAMERAS: list = [ + ("cam0", "camera-slam-left"), + ("cam1", "camera-slam-right"), +] + +# Aria camera constants +LEFT_CAMERA_STREAM_ID = "1201-1" +RIGHT_CAMERA_STREAM_ID = "1201-2" +RGB_CAMERA_STREAM_ID = "214-1" +LEFT_CAMERA_STREAM_LABEL = "camera-slam-left" +RIGHT_CAMERA_STREAM_LABEL = "camera-slam-right" + +# Right Aria IMU constants +RIGHT_IMU_STREAM_ID = "1202-1" +RIGHT_IMU_STREAM_LABEL = "imu-right" + +# Custom origin coordinates (LV95 / LN02) for translating large CP coordinates +CUSTOM_ORIGIN_COORDINATES = (2683594.4120000005, 1247727.7470000014, 417.307) diff --git a/lamaria/utils/metrics.py b/lamaria/utils/metrics.py new file mode 100644 index 0000000..5714b2a --- /dev/null +++ b/lamaria/utils/metrics.py @@ -0,0 +1,104 @@ +import numpy as np +import pycolmap +from scipy.interpolate import interp1d + +from .. import logger +from ..structs.sparse_eval import SparseEvalResult + + +def piecewise_linear_scoring(): + """ + Error Score + 0 - 0.05 m 20 + 0.05 - 0.20 m 18 + 0.20 - 0.50 m 15 + 0.50 - 1.00 m 12 + 1.00 - 2.00 m 8 + 2.00 - 5.00 m 4 + 5.00 - 10.00 m 0 + > 10.00 m 0 + """ + x_vals = [0.00, 0.05, 0.20, 0.50, 1.00, 2.00, 5.00, 10.00] + y_vals = [20, 20, 18, 15, 12, 8, 4, 0] + + scoring = interp1d( + x_vals, + y_vals, + kind="linear", + fill_value=(y_vals[0], 0), # left=20, right=0 + bounds_error=False, + ) + return scoring + + +def calculate_control_point_score(result: SparseEvalResult) -> float: + """Calculate CP score from SparseEvalResult object.""" + error_2d = calculate_error(result) + if any(e < 0 for e in error_2d): + logger.error("Negative errors found, norm error cannot be negative.") + return 0.0 + + errors = np.nan_to_num(error_2d, nan=1e6) # Large error for NaNs + scoring = piecewise_linear_scoring() + scores = scoring(errors) + score_sum = np.sum(scores) + max_score = 20 * len(errors) + + S_j = (score_sum / max_score) * 100 + return S_j + + +def calculate_control_point_recall( + result: SparseEvalResult, + threshold: float = 1.0, # meters +) -> float: + """Calculate CP recall from SparseEvalResult object.""" + error_2d = calculate_error(result) + if len(error_2d) == 0: + return 0.0 + + if any(e < 0 for e in error_2d): + logger.error("Negative errors found, norm error cannot be negative.") + return 0.0 + + errors = np.nan_to_num(error_2d, nan=1e6) # Large error for NaNs + inliers = np.sum(errors <= threshold) + recall = inliers / len(errors) * 100 + return recall + + +def calculate_error(result: SparseEvalResult) -> np.ndarray: + """Calculate 2D errors from SparseEvalResult""" + sim3d = result.alignment # sim3d cannot be None here + if not isinstance(sim3d, pycolmap.Sim3d): + logger.error("No valid Sim3d found in SparseEvalResult") + return np.array([]) + + error_2d = [] + for _, cp in result.cp_summary.items(): + if not cp.is_triangulated(): + error_2d.append(np.nan) + continue + + transformed_point = sim3d * cp.triangulated + e = np.linalg.norm(transformed_point[:2] - cp.topo[:2]) + error_2d.append(e) + + assert len(error_2d) == len(result.cp_summary), "Error length mismatch" + + return np.array(error_2d) + + +def calculate_pose_recall( + error: np.ndarray, + num_poses_gt: int, + threshold: float = 1.0, # meters +): + """Calculate the percentage of poses within a certain error threshold.""" + if len(error) == 0 or num_poses_gt == 0: + return 0.0 + + num_poses_within_threshold = np.sum(error <= threshold) + + pose_recall = (num_poses_within_threshold / num_poses_gt) * 100 + return pose_recall diff --git a/lamaria/utils/timestamps.py b/lamaria/utils/timestamps.py new file mode 100644 index 0000000..2dbeb58 --- /dev/null +++ b/lamaria/utils/timestamps.py @@ -0,0 +1,102 @@ +from bisect import bisect_left + +import numpy as np + +from .constants import ( + LEFT_CAMERA_STREAM_LABEL, + RIGHT_CAMERA_STREAM_LABEL, +) + + +def matching_time_indices( + stamps_1: np.ndarray, + stamps_2: np.ndarray, + max_diff: float = 1e6, # 1 ms in ns +) -> tuple[list, list]: + """ + From evo package. + Searches for the best matching timestamps of two lists of timestamps + and returns the list indices of the best matches. + """ + matching_indices_1 = [] + matching_indices_2 = [] + for index_1, stamp_1 in enumerate(stamps_1): + diffs = np.abs(stamps_2 - stamp_1) + index_2 = int(np.argmin(diffs)) + if diffs[index_2] <= max_diff: + matching_indices_1.append(index_1) + matching_indices_2.append(index_2) + return matching_indices_1, matching_indices_2 + + +def get_timestamp_to_images(data: dict): + """Loads timestamps to images mapping from JSON data. + Args: + data (dict): The JSON data containing timestamps information. + Returns: + dict: A dictionary mapping stream labels to their respective + timestamp-to-image mappings and sorted timestamp keys. + """ + processed_ts_data = {} + for label in [LEFT_CAMERA_STREAM_LABEL, RIGHT_CAMERA_STREAM_LABEL]: + ts_data = data["timestamps"][label] + processed_ts_data[label] = {int(k): v for k, v in ts_data.items()} + processed_ts_data[label]["sorted_keys"] = sorted( + processed_ts_data[label].keys() + ) + + return processed_ts_data + + +def find_closest_timestamp( + timestamps: list, + target_ts: int, + max_diff: float, +) -> int | None: + """Timestamps must be in nano seconds""" + index = bisect_left(timestamps, target_ts) + if index == 0: + return timestamps[0] + if index == len(timestamps): + return timestamps[-1] + before = timestamps[index - 1] + after = timestamps[index] + if abs(target_ts - before) < abs(target_ts - after): + closest = before + else: + closest = after + + if abs(target_ts - closest) > max_diff: + return None + + return closest + + +def get_matched_timestamps( + left_timestamps: list[int], + right_timestamps: list[int], + max_diff: float, +) -> list[tuple[int, int]]: + matched_timestamps = [] + + assert all(isinstance(ts, int) for ts in left_timestamps), ( + "Left timestamps must be integers" + ) + assert all(isinstance(ts, int) for ts in right_timestamps), ( + "Right timestamps must be integers" + ) + + if len(left_timestamps) < len(right_timestamps): + for lts in left_timestamps: + closest_rts = find_closest_timestamp( + right_timestamps, lts, max_diff + ) + if closest_rts is not None: + matched_timestamps.append((lts, closest_rts)) + else: + for rts in right_timestamps: + closest_lts = find_closest_timestamp(left_timestamps, rts, max_diff) + if closest_lts is not None: + matched_timestamps.append((closest_lts, rts)) + + return matched_timestamps diff --git a/lamaria/utils/types.py b/lamaria/utils/types.py new file mode 100644 index 0000000..b1901ff --- /dev/null +++ b/lamaria/utils/types.py @@ -0,0 +1,5 @@ +from typing import TypeAlias + +import pycolmap + +InitReconstruction: TypeAlias = pycolmap.Reconstruction diff --git a/pyproject.toml b/pyproject.toml new file mode 100644 index 0000000..adf6dbb --- /dev/null +++ b/pyproject.toml @@ -0,0 +1,24 @@ +[project] +name = "lamaria" +version = "0.1.0" +requires-python = ">=3.9,<3.15" +description = "LaMAria: Benchmarking Egocentric Visual-Inertial SLAM at City Scale" +readme = { file = "README.md", content-type = "text/markdown" } + +[project.urls] +Homepage = "https://lamaria.ethz.ch" +Repository = "https://github.com/cvg/lamaria" +Issues = "https://github.com/cvg/lamaria/issues" + +[tool.setuptools] +packages = ["lamaria"] + +[tool.setuptools.package-data] +lamaria = [ + "assets/*", + "**/*.yaml", +] + +[build-system] +requires = ["setuptools>=61.0", "wheel"] +build-backend = "setuptools.build_meta" \ No newline at end of file diff --git a/quickstart.sh b/quickstart.sh new file mode 100755 index 0000000..5321af8 --- /dev/null +++ b/quickstart.sh @@ -0,0 +1,19 @@ +#!/usr/bin/env bash +set -euo pipefail + +URL="https://cvg-data.inf.ethz.ch/lamaria/demo.zip" + +if [ -t 1 ]; then + PROGRESS="--progress=bar:force:noscroll" +else + PROGRESS="--progress=dot:giga" +fi + +echo "[quickstart] downloading demo.zip (~6 GB)…" +wget -c $PROGRESS -O demo.zip "$URL" + +echo "[quickstart] extracting…" +unzip -q -o demo.zip +rm demo.zip + +echo "[quickstart] done. demo is in ./demo" \ No newline at end of file diff --git a/requirements.txt b/requirements.txt index a32dc68..c100451 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,5 +1,18 @@ -projectaria_tools +# commit hash: 33c134e633ddc8193f4b547d905bf1df5799df63 +--find-links https://cvg-data.inf.ethz.ch/lamaria/packages/pyceres/ + +# https://github.com/colmap/colmap/pull/2625 +# commit hash: 2633d1c4c3d4ae0f17a462faa3e334d9345a9c2b +--find-links https://cvg-data.inf.ethz.ch/lamaria/packages/pycolmap/ + +projectaria-tools[all] omegaconf +evo==1.31.1 +bagpy==0.5 +cv-bridge scipy +ruff==0.12.7 +pyceres==2.6 ; python_version >= "3.9" and python_version < "3.15" +pycolmap==3.13.0.dev0 ; python_version >= "3.9" and python_version < "3.15" -e git+https://github.com/cvg/Hierarchical-Localization.git@1252817#egg=hloc diff --git a/tools/asl_folder_to_rosbag.py b/tools/asl_folder_to_rosbag.py new file mode 100644 index 0000000..5abd907 --- /dev/null +++ b/tools/asl_folder_to_rosbag.py @@ -0,0 +1,179 @@ +from __future__ import annotations + +import argparse +import csv +from dataclasses import dataclass +from pathlib import Path + +import cv2 +import rosbag +import rospy +from cv_bridge import CvBridge +from sensor_msgs.msg import Imu +from tqdm import tqdm + + +@dataclass(frozen=True) +class CamSample: + ts_ns: int + path: Path + topic: str + frame_id: str + + +@dataclass(frozen=True) +class ImuSample: + ts_ns: int + gyro: tuple[float, float, float] + accel: tuple[float, float, float] + topic: str + frame_id: str + + +def _is_int(s: str) -> bool: + try: + int(s) + return True + except Exception: + return False + + +def _read_cam_csv( + csv_path: Path, images_dir: Path, topic: str, frame_id: str +) -> list[CamSample]: + samples: list[CamSample] = [] + with csv_path.open("r", newline="") as f: + reader = csv.reader(f) + for row in reader: + if not row: + continue + if not _is_int(row[0]): + continue + ts_ns = int(row[0]) + img_rel = row[1] + samples.append( + CamSample(ts_ns, images_dir / img_rel, topic, frame_id) + ) + + return samples + + +def _read_imu_csv(csv_path: Path, topic: str, frame_id: str) -> list[ImuSample]: + samples: list[ImuSample] = [] + with csv_path.open("r", newline="") as f: + reader = csv.reader(f) + for row in reader: + if not row: + continue + if not _is_int(row[0]): + continue + ts_ns = int(row[0]) + wx, wy, wz = (float(x) for x in row[1:4]) + ax, ay, az = (float(x) for x in row[4:7]) + samples.append( + ImuSample(ts_ns, (wx, wy, wz), (ax, ay, az), topic, frame_id) + ) + + return samples + + +def asl_to_rosbag( + input_asl_folder: Path, + output_rosbag: Path, + cam0_topic: str = "/cam0/image_raw", + cam1_topic: str = "/cam1/image_raw", + imu_topic: str = "/imu0", + cam0_frame: str = "cam0", + cam1_frame: str = "cam1", + imu_frame: str = "imu0", +) -> None: + aria = input_asl_folder / "aria" + cam0_dir, cam1_dir, imu_dir = aria / "cam0", aria / "cam1", aria / "imu0" + + if not cam0_dir.exists(): + raise FileNotFoundError(f"Missing camera folder: {cam0_dir}") + if not imu_dir.exists(): + raise FileNotFoundError(f"Missing IMU folder: {imu_dir}") + + cam0 = _read_cam_csv( + cam0_dir / "data.csv", cam0_dir / "data", cam0_topic, cam0_frame + ) + cam1 = [] + if cam1_dir.exists() and (cam1_dir / "data.csv").exists(): + cam1 = _read_cam_csv( + cam1_dir / "data.csv", cam1_dir / "data", cam1_topic, cam1_frame + ) + imu = _read_imu_csv(imu_dir / "data.csv", imu_topic, imu_frame) + + if not cam0: + raise RuntimeError("cam0 data is empty.") + if not imu: + raise RuntimeError("IMU data is empty.") + + bag_timeline = [] + bag_timeline += [(s.ts_ns, "cam", s) for s in cam0] + bag_timeline += [(s.ts_ns, "cam", s) for s in cam1] + bag_timeline += [(s.ts_ns, "imu", s) for s in imu] + bag_timeline.sort(key=lambda x: x[0]) + + output_rosbag.parent.mkdir(parents=True, exist_ok=True) + bridge = CvBridge() + bag = rosbag.Bag(str(output_rosbag), "w") + + print( + f"[asl_to_rosbag] Writing {len(bag_timeline):,} " + f"messages -> {output_rosbag}" + ) + + for ts_ns, kind, sample in tqdm( + bag_timeline, desc="Writing bag", unit="msg" + ): + stamp = rospy.Time.from_sec(ts_ns * 1e-9) + + if kind == "cam": + cv_image = cv2.imread(str(sample.path), cv2.IMREAD_GRAYSCALE) + if cv_image is None: + print(f"Warning: could not load image at path {sample.path}") + continue + img_msg = bridge.cv2_to_imgmsg(cv_image, encoding="mono8") + img_msg.header.stamp = stamp + img_msg.header.frame_id = sample.frame_id + bag.write(sample.topic, img_msg, t=stamp) + + else: + imu_msg = Imu() + imu_msg.header.stamp = stamp + imu_msg.header.frame_id = sample.frame_id + wx, wy, wz = sample.gyro + ax, ay, az = sample.accel + ( + imu_msg.angular_velocity.x, + imu_msg.angular_velocity.y, + imu_msg.angular_velocity.z, + ) = wx, wy, wz + ( + imu_msg.linear_acceleration.x, + imu_msg.linear_acceleration.y, + imu_msg.linear_acceleration.z, + ) = ax, ay, az + + imu_msg.orientation_covariance[0] = -1 + imu_msg.angular_velocity_covariance[0] = -1 + imu_msg.linear_acceleration_covariance[0] = -1 + + bag.write(sample.topic, imu_msg, t=stamp) + + bag.close() + + +if __name__ == "__main__": + parser = argparse.ArgumentParser() + parser.add_argument( + "--input_asl_folder", type=Path, required=True, help="Input ASL folder" + ) + parser.add_argument( + "--output_rosbag", type=Path, required=True, help="Output rosbag file" + ) + + args = parser.parse_args() + asl_to_rosbag(args.input_asl_folder, args.output_rosbag) diff --git a/tools/download_lamaria.py b/tools/download_lamaria.py new file mode 100644 index 0000000..ffeb895 --- /dev/null +++ b/tools/download_lamaria.py @@ -0,0 +1,328 @@ +from __future__ import annotations + +import argparse +import re +import sys +from collections.abc import Iterable +from pathlib import Path + +import requests +from bs4 import BeautifulSoup +from tqdm import tqdm + +BASE_URL_DEFAULT = "https://cvg-data.inf.ethz.ch/lamaria/" + +FOLDERS = [ + "raw_data", + "aria_calibrations", + "asl_folder", + "pinhole_calibrations", + "rosbag", + "ground_truth", +] +DEFAULT_SUBFOLDERS = ["training/", "test/"] +GT_SUBFOLDERS = ["pseudo_dense/", "sparse/"] + +PAYLOADS = { + "raw": ["raw_data", "aria_calibrations"], + "asl": ["asl_folder", "pinhole_calibrations"], + "rosbag": ["rosbag", "pinhole_calibrations"], + "pinhole": ["asl_folder", "pinhole_calibrations", "rosbag"], + "all": [ + "raw_data", + "aria_calibrations", + "asl_folder", + "pinhole_calibrations", + "rosbag", + ], +} + + +def fetch_index(full_url: str) -> list[str]: + """Return list of hrefs on an Apache index page (files and folders).""" + r = requests.get(full_url, timeout=30) + r.raise_for_status() + soup = BeautifulSoup(r.text, "html.parser") + hrefs = [] + for a in soup.find_all("a", href=True): + href = a["href"] + # skip parent directory, sort/query links, and anchors + if href == "../": + continue + if href.startswith("?") or "?" in href or href.startswith("#"): + continue + hrefs.append(href) + return hrefs + + +def build_catalog(base_url: str) -> dict[str, list[tuple[str, str]]]: + catalog = {} + for folder in FOLDERS: + entries = [] + subfolders = ( + GT_SUBFOLDERS if folder == "ground_truth" else DEFAULT_SUBFOLDERS + ) + for sub in subfolders: + url = base_url.rstrip("/") + "/" + folder + "/" + sub + try: + items = fetch_index(url) + except Exception as e: + print(f"[warn] Could not read {url}: {e}", file=sys.stderr) + items = [] + for it in items: + if it.endswith("/"): + continue + entries.append((sub, it)) + catalog[folder] = entries + return catalog + + +def names_from_listing(files: Iterable[str]) -> list[str]: + names = set() + pattern = r"\.(zip|tar|tar\.gz|tgz|7z|bag|vrs|json|txt|csv|tar\.xz)$" + for f in files: + f = f.rstrip("/") + name = re.split(pattern, f, maxsplit=1, flags=re.IGNORECASE)[0] + names.add(name) + return sorted(names) + + +def derive_splits( + catalog: dict[str, list[tuple[str, str]]], +) -> tuple[list[str], list[str]]: + raw_entries = catalog.get("raw_data", []) + train_files = [fname for (sub, fname) in raw_entries if sub == "training/"] + test_files = [fname for (sub, fname) in raw_entries if sub == "test/"] + train = sorted(names_from_listing(train_files)) + test = sorted(names_from_listing(test_files)) + return train, test + + +def raw_split_map(catalog: dict[str, list[tuple[str, str]]]) -> dict[str, str]: + mapping: dict[str, str] = {} + for sub, fname in catalog.get("raw_data", []): + if sub not in ("training/", "test/"): + continue + if fname.startswith("?") or "?" in fname: + continue + base = names_from_listing([fname])[0] + if base in mapping and mapping[base] == "training/": + continue + mapping[base] = sub + return mapping + + +def pick_files_for_sequence( + catalog: dict[str, list[tuple[str, str]]], + sequence: str, + folders: list[str], + split: str | None, +) -> list[tuple[str, str, str]]: + matches: list[tuple[str, str, str]] = [] + for folder in folders: + for sub, fname in catalog.get(folder, []): + if folder != "ground_truth" and split is not None and sub != split: + continue + if fname.startswith(sequence): + matches.append((folder, sub, fname)) + return matches + + +def ensure_dir(p: Path) -> None: + p.mkdir(parents=True, exist_ok=True) + + +def human_size(n: int) -> str: + for unit in ["B", "KB", "MB", "GB", "TB"]: + if n < 1024: + return f"{n:.1f}{unit}" + n /= 1024 + return f"{n:.1f}PB" + + +def download_file( + url: str, dest: Path, session: requests.Session, chunk_size: int = 1 << 20 +) -> None: + headers = {} + existing = dest.stat().st_size if dest.exists() else 0 + + head = session.head(url, allow_redirects=True, timeout=30) + head.raise_for_status() + total = ( + int(head.headers.get("Content-Length", "0")) + if head.headers.get("Content-Length") + else None + ) + + if total is not None and existing > 0 and existing < total: + headers["Range"] = f"bytes={existing}-" + elif total is not None and existing == total: + print(f"[skip] {dest} (already complete, {human_size(total)})") + return + + with session.get(url, headers=headers, stream=True, timeout=60) as r: + r.raise_for_status() + mode = "ab" if "Range" in headers else "wb" + total_to_show = None + if total is not None: + total_to_show = total - existing if "Range" in headers else total + + with ( + open(dest, mode) as f, + tqdm( + total=total_to_show, + unit="B", + unit_scale=True, + unit_divisor=1024, + initial=0, + desc=dest.name, + leave=False, + ) as pbar, + ): + for chunk in r.iter_content(chunk_size=chunk_size): + if chunk: + f.write(chunk) + pbar.update(len(chunk)) + + if total is not None and dest.stat().st_size != total: + print( + f"[warn] Size mismatch for {dest} " + f"(expected {total}, got {dest.stat().st_size})" + ) + + +if __name__ == "__main__": + parser = argparse.ArgumentParser(description="LaMAria dataset downloader") + + group = parser.add_mutually_exclusive_group(required=False) + parser.add_argument( + "--output_dir", + required=True, + type=str, + help="Root output folder", + ) + group.add_argument( + "--set", + choices=["training", "test"], + help="Download all sequences from this split", + ) + group.add_argument( + "--sequences", + nargs="+", + help="Explicit sequence names (e.g. R_01_easy sequence_3_17). " + "If provided, --set is not needed.", + ) + parser.add_argument( + "--type", + choices=["raw", "asl", "rosbag", "pinhole", "all"], + required=True, + help="Type of data to fetch: " + "raw = raw_data + aria_calibrations; " + "asl = asl_folder + pinhole_calibrations; " + "rosbag = rosbag + pinhole_calibrations; " + "pinhole = asl_folder + pinhole_calibrations + rosbag;" + "all = everything", + ) + + args = parser.parse_args() + root = Path(args.output_dir) + ensure_dir(root) + + print("[info] Building catalog from server…") + catalog = build_catalog(BASE_URL_DEFAULT) + train_names, test_names = derive_splits(catalog) + split_lookup = raw_split_map(catalog) # seq -> 'training/' or 'test/' + + if args.sequences: + target_sequences = args.sequences + global_split = None + elif args.set == "training": + target_sequences = train_names + global_split = "training/" + elif args.set == "test": + target_sequences = test_names + global_split = "test/" + else: + # choose a sensible default if nothing specified + print("[info] No --set or --sequences provided; nothing to do.") + sys.exit(0) + + # To fetch + folders = PAYLOADS[args.type].copy() + plan = [] + + created_split_dirs = set() + + for seq in target_sequences: + if global_split is not None: + seq_split = global_split + else: + seq_split = split_lookup.get(seq) + if seq_split is None: + raise ValueError( + f"Could not determine split for sequence: {seq}" + ) + + if seq_split in ("training/", "test/"): + split_folder = "training" if seq_split == "training/" else "test" + else: + split_folder = "unknown" + + seq_root = root / split_folder / seq + if split_folder not in created_split_dirs: + ensure_dir(root / split_folder) + created_split_dirs.add(split_folder) + ensure_dir(seq_root) + + selected = pick_files_for_sequence( + catalog, + seq, + folders, + split=(seq_split if seq_split in ("training/", "test/") else None), + ) + + if seq_split == "training/": + selected += pick_files_for_sequence( + catalog, seq, ["ground_truth"], split=None + ) + + if not selected: + print(f"[warn] No files found for: {seq}") + continue + + for folder, sub, fname in selected: + url = BASE_URL_DEFAULT + folder + "/" + sub + fname + if folder == "ground_truth": + if sub.rstrip("/") == "pseudo_dense": + sub_dest = "pGT" + elif sub.rstrip("/") == "sparse": + sub_dest = "control_points" + else: + raise ValueError("Unsupported folder.") + dest_dir = seq_root / folder / sub_dest + else: + dest_dir = seq_root / folder + + ensure_dir(dest_dir) + dest = dest_dir / fname + plan.append((folder, sub, fname, url, dest)) + + if not plan: + print("[warn] Nothing to download with the given filters.") + sys.exit(0) + + print("\n[download] Starting downloads.") + with requests.Session() as sess: + for _, _, _, url, dest in plan: + print(f"[get] {url}") + try: + download_file(url, dest, sess) + except Exception as e: + print( + f"[error] Failed: {url} -> {dest}\n {e}", + file=sys.stderr, + ) + + print( + "\n[done] All downloads attempted. Files stored under:", root.resolve() + ) diff --git a/tools/undistort_asl_folder.py b/tools/undistort_asl_folder.py new file mode 100644 index 0000000..047f781 --- /dev/null +++ b/tools/undistort_asl_folder.py @@ -0,0 +1,180 @@ +import argparse +import copy +import json +import shutil +import subprocess +import tempfile +from pathlib import Path + +import pycolmap +from tqdm import tqdm + +from lamaria.utils.aria import ( + initialize_reconstruction_with_cameras, +) +from lamaria.utils.constants import ( + ARIA_CAMERAS, +) + + +def undistort_reconstruction( + image_path: Path, + rec_path: Path, + output_path: Path, + ratio_blank_pixels: float = 0.2, + verbose: bool = False, +) -> dict[int, pycolmap.Camera]: + with tempfile.TemporaryDirectory() as tmp_path: + tmp_path = Path(tmp_path) + + # Run COLMAP undistortion + print( + f"Undistorting the images to {tmp_path}, this may take a while..." + ) + cmd = f"colmap image_undistorter --image_path {image_path} \ + --input_path {rec_path} --output_path {tmp_path} \ + --blank_pixels {ratio_blank_pixels}" + stdout = None if verbose else subprocess.PIPE + out = subprocess.run( + cmd, shell=True, stderr=subprocess.STDOUT, stdout=stdout, check=True + ) + if out.returncode: + msg = f"Command '{cmd}' returned {out.returncode}." + if out.stdout: + msg += "\n" + out.stdout.decode("utf-8") + raise subprocess.SubprocessError(msg) + print("Done undistorting!") + + # Copy the cameras + rec = pycolmap.Reconstruction(tmp_path / "sparse") + cameras = {i: copy.copy(c) for i, c in rec.cameras.items()} + + if output_path.exists(): + if not output_path.is_dir() or output_path.iterdir(): + raise ValueError( + f"Output path {output_path} is not an empty directory." + ) + output_path.rmdir() + shutil.move(tmp_path / "images", output_path) + + return cameras + + +def write_cameras_json(cameras: dict[int | str, pycolmap.Camera], path: Path): + camera_dicts = {} + for key, c in cameras.items(): + d = d = c.todict() + d["model"] = c.model.name + d["params"] = c.params.tolist() + camera_dicts[key] = d + path.write_text(json.dumps(camera_dicts)) + + +def undistort_asl( + calibration_file: Path, asl_path: Path, output_asl_path: Path, **kwargs +): + if output_asl_path.exists(): + raise ValueError(f"{output_asl_path=} already exists.") + if not asl_path.exists(): + raise FileNotFoundError(f"{asl_path=}") + if not calibration_file.exists(): + raise FileNotFoundError(f"{calibration_file=}") + + recon = initialize_reconstruction_with_cameras( + calibration_file=calibration_file, + ) + # Create a dummy COLMAP reconstruction. + image_ext = ".png" + image_id = 1 + rig = recon.rig(rig_id=1) + colmap_images = {} + for i, (key, _) in enumerate(ARIA_CAMERAS): + cam = recon.cameras[i + 1] + image_names = sorted( + p.relative_to(asl_path) + for p in (asl_path / "aria" / key).glob(f"**/*{image_ext}") + ) + colmap_images[key] = [] + for n in image_names: + im = pycolmap.Image( + n, + pycolmap.Point2DList(), + cam.camera_id, + image_id, + ) + image_id += 1 + colmap_images[key].append(im) + + zipped_images = list(zip(*[colmap_images[key] for key, _ in ARIA_CAMERAS])) + + for j, (left_im, right_im) in enumerate( + tqdm( + zipped_images, + total=len(zipped_images), + desc="Adding images to reconstruction", + ) + ): + frame = pycolmap.Frame() + frame.rig_id = rig.rig_id + frame.frame_id = j + 1 + frame.rig_from_world = pycolmap.Rigid3d() + left_im.frame_id = frame.frame_id + right_im.frame_id = frame.frame_id + frame.add_data_id(left_im.data_id) + frame.add_data_id(right_im.data_id) + recon.add_frame(frame) + recon.add_image(left_im) + recon.add_image(right_im) + + with tempfile.TemporaryDirectory() as temp_rec_path: + recon.write(temp_rec_path) + cameras_undist = undistort_reconstruction( + asl_path, temp_rec_path, output_asl_path, **kwargs + ) + + # Copy the undistorted cameras in json. + cameras_undist = { + key: cameras_undist[i + 1] for i, (key, _) in enumerate(ARIA_CAMERAS) + } + write_cameras_json( + cameras_undist, output_asl_path / "aria" / "cameras.json" + ) + + # Copy the other files + for p in asl_path.glob("**/*"): + if p.is_dir() or p.suffix == image_ext: + continue + p_out = output_asl_path / p.relative_to(asl_path) + p_out.parent.mkdir(exist_ok=True, parents=True) + shutil.copyfile(p, p_out) + + +if __name__ == "__main__": + parser = argparse.ArgumentParser() + parser.add_argument( + "--calibration_file", + type=Path, + required=True, + help="Path to Aria calibration json file", + ) + parser.add_argument( + "--asl_folder", + type=Path, + required=True, + help="Path to input Aria distorted ASL folder", + ) + parser.add_argument( + "--output_asl_folder", + type=Path, + required=True, + help="Path to output undistorted ASL folder", + ) + parser.add_argument("--ratio_blank_pixels", type=float, default=0.2) + args = parser.parse_args() + + undistort_asl( + args.calibration_file, + args.asl_folder, + args.output_asl_folder, + ratio_blank_pixels=args.ratio_blank_pixels, + ) diff --git a/tools/vrs_to_asl_folder.py b/tools/vrs_to_asl_folder.py new file mode 100644 index 0000000..6a104f3 --- /dev/null +++ b/tools/vrs_to_asl_folder.py @@ -0,0 +1,289 @@ +import argparse +import csv +import os +from pathlib import Path + +from projectaria_tools.core import data_provider +from projectaria_tools.core.sensor_data import TimeDomain, TimeQueryOptions +from projectaria_tools.core.stream_id import StreamId +from tqdm import tqdm + +from lamaria import logger +from lamaria.utils.aria import extract_images_from_vrs +from lamaria.utils.constants import ( + LEFT_CAMERA_STREAM_ID, + RIGHT_CAMERA_STREAM_ID, + RIGHT_IMU_STREAM_ID, +) +from lamaria.utils.timestamps import ( + get_matched_timestamps, +) + + +def remove_images_when_slam_drops( + image_folder: Path, + left_timestamps: list[int], + right_timestamps: list[int], + matched_timestamps: list[tuple[int, int]], + left_subfolder_name="cam0/data", + right_subfolder_name="cam1/data", +): + left_image_folder = os.path.join(image_folder, left_subfolder_name) + right_image_folder = os.path.join(image_folder, right_subfolder_name) + + assert os.path.exists(left_image_folder) and os.path.exists( + right_image_folder + ) + + original_left_images = sorted(os.listdir(left_image_folder)) + original_right_images = sorted(os.listdir(right_image_folder)) + + assert len(original_left_images) == len(left_timestamps) + assert len(original_right_images) == len(right_timestamps) + + left_camera_mapping = { + ts: img for ts, img in zip(left_timestamps, original_left_images) + } + right_camera_mapping = { + ts: img for ts, img in zip(right_timestamps, original_right_images) + } + + matched_left_ts = [left_ts for left_ts, _ in matched_timestamps] + matched_right_ts = [right_ts for _, right_ts in matched_timestamps] + + if len(left_timestamps) != len(matched_timestamps): + for ts, img in left_camera_mapping.items(): + if ts not in matched_left_ts: + print(f"Removing {img}") + os.remove(os.path.join(left_image_folder, img)) + elif len(right_timestamps) != len(matched_timestamps): + for ts, img in right_camera_mapping.items(): + if ts not in matched_right_ts: + print(f"Removing {img}") + os.remove(os.path.join(right_image_folder, img)) + else: + raise ValueError("No images to remove") + + return matched_left_ts + + +def rename_images_in_folder( + aria_folder: Path, + image_timestamps, + left_subfolder_name="cam0/data", + right_subfolder_name="cam1/data", + image_extension=".jpg", +) -> list[int]: + for subfolder in [left_subfolder_name, right_subfolder_name]: + subfolder_path = aria_folder / subfolder + + if not subfolder_path.exists() and not subfolder_path.is_dir(): + raise ValueError( + f"{subfolder_path} does not exist or is not a directory" + ) + + original_images = sorted( + [ + f + for f in os.listdir(subfolder_path) + if f.endswith(image_extension) + ] + ) + if len(original_images) == 0: + raise ValueError(f"No images found in {subfolder_path}") + + if len(original_images) != len(image_timestamps): + raise ValueError( + f"Number of images {len(original_images)} \ + does not match number of timestamps \ + {len(image_timestamps)} in {subfolder_path}" + ) + + for ts, img in zip(image_timestamps, original_images): + old_image_path = subfolder_path / img + new_image_path = subfolder_path / f"{ts}{image_extension}" + os.rename(old_image_path, new_image_path) + + return image_timestamps + + +def write_image_timestamps_to_txt(image_timestamps: list, txt_file: Path): + with open(txt_file, "w") as f: + for timestamp in image_timestamps: + f.write(f"{timestamp}\n") + + +def write_image_csv(image_timestamps, cam_folder): + images = os.listdir(os.path.join(cam_folder, "data")) + assert len(images) > 0 + + images = sorted(images, key=lambda img: int(img.split(".")[0])) + + assert len(images) == len(image_timestamps) + for ts, img in zip(image_timestamps, images): + assert int(img.split(".")[0]) == ts, f"{img} != {ts}" + + data_csv = os.path.join(cam_folder, "data.csv") + if os.path.exists(data_csv): + os.remove(data_csv) + + with open(data_csv, "w", newline="") as f: + writer = csv.writer(f) + for timestamp, image in zip(image_timestamps, images): + row = [timestamp, image] + writer.writerow(row) + + +def write_imu_data_to_csv(vrs_provider, csv_file): + imu_timestamps = vrs_provider.get_timestamps_ns( + StreamId(RIGHT_IMU_STREAM_ID), TimeDomain.DEVICE_TIME + ) + + last_timestamp = None + if os.path.exists(csv_file): + with open(csv_file) as f: + last_row = None + for _ in csv.reader(f): + pass + if last_row is not None: + last_timestamp = int(last_row[0]) + + if last_timestamp is not None: + imu_timestamps = [ts for ts in imu_timestamps if ts > last_timestamp] + + if not imu_timestamps: + logger.info(f"No new IMU data to write to {csv_file}") + return + + with open(csv_file, "a", newline="") as f: + writer = csv.writer(f) + for timestamp in tqdm(imu_timestamps, desc="Appending IMU data to CSV"): + imu_data = vrs_provider.get_imu_data_by_time_ns( + StreamId(RIGHT_IMU_STREAM_ID), + timestamp, + TimeDomain.DEVICE_TIME, + TimeQueryOptions.CLOSEST, + ) + if imu_data.accel_valid and imu_data.gyro_valid: + accel = imu_data.accel_msec2 + gyro = imu_data.gyro_radsec + + row = [ + timestamp, + gyro[0], + gyro[1], + gyro[2], + accel[0], + accel[1], + accel[2], + ] + writer.writerow(row) + + +def form_aria_asl_folder( + vrs_file: Path, output_asl_folder: Path, has_slam_drops=False +): + if output_asl_folder.exists(): + raise ValueError(f"{output_asl_folder=} already exists.") + + aria_folder = output_asl_folder / "aria" + aria_folder.mkdir(parents=True, exist_ok=True) + + dataset_name = vrs_file.stem + vrs_provider = data_provider.create_vrs_data_provider(vrs_file.as_posix()) + + # Get all image timestamps (in ns) + image_timestamps = vrs_provider.get_timestamps_ns( + StreamId(LEFT_CAMERA_STREAM_ID), TimeDomain.DEVICE_TIME + ) + assert len(image_timestamps) > 0, "No timestamps found" + + right_image_timestamps = None + matched_timestamps = None + if has_slam_drops: + right_image_timestamps = vrs_provider.get_timestamps_ns( + StreamId(RIGHT_CAMERA_STREAM_ID), TimeDomain.DEVICE_TIME + ) + assert len(right_image_timestamps) > 0, ( + "No right camera image timestamps found" + ) + assert len(right_image_timestamps) != len(image_timestamps), ( + "Left and right camera image timestamps are the same" + ) + matched_timestamps = get_matched_timestamps( + left_timestamps=image_timestamps, + right_timestamps=right_image_timestamps, + max_diff=1e6, # 1 ms in nanoseconds + ) + + assert len(matched_timestamps) > 0, "No matched timestamps found" + + extract_images_from_vrs( + vrs_file=vrs_file, + image_folder=aria_folder, + left_subfolder_name="cam0/data", + right_subfolder_name="cam1/data", + ) + + if has_slam_drops: + assert ( + right_image_timestamps is not None + and matched_timestamps is not None + ) + image_timestamps = remove_images_when_slam_drops( + aria_folder, + image_timestamps, + right_image_timestamps, + matched_timestamps, + ) + + image_timestamps = rename_images_in_folder( + aria_folder, + image_timestamps, + ) + + imu_folder = output_asl_folder / "aria" / "imu0" + imu_folder.mkdir(parents=True, exist_ok=True) + imu_csv = imu_folder / "data.csv" + + write_imu_data_to_csv( + vrs_provider, + imu_csv, + ) + + write_image_timestamps_to_txt( + image_timestamps, + aria_folder / f"{dataset_name}.txt", + ) + write_image_csv(image_timestamps, aria_folder / "cam0") + write_image_csv(image_timestamps, aria_folder / "cam1") + + +if __name__ == "__main__": + args = argparse.ArgumentParser() + + args.add_argument( + "--vrs_file", + type=Path, + required=True, + help="Path to input VRS file", + ) + args.add_argument( + "--output_asl_folder", + type=Path, + required=True, + help="Path to output ASL folder", + ) + args.add_argument( + "--has_slam_drops", + action="store_true", + help="Whether the VRS file has dropped SLAM frames", + ) + + args = args.parse_args() + + form_aria_asl_folder( + args.vrs_file, + args.output_asl_folder, + has_slam_drops=args.has_slam_drops, + )