Skip to content

ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras

Why this mattered

ORB-SLAM2 mattered because it turned feature-based visual SLAM from a set of specialized research demonstrations into a reusable, real-time system spanning the main camera regimes used in robotics: monocular, stereo, and RGB-D. Earlier systems often excelled under one sensor assumption or required substantial engineering to adapt; ORB-SLAM2 unified tracking, local mapping, loop closing, relocalization, and map reuse around ORB features and bundle-adjustment-based optimization. That made accurate long-term visual localization practical on ordinary CPUs, with stereo and RGB-D inputs resolving metric scale and monocular mode preserving broad applicability.

Its paradigm shift was as much infrastructural as algorithmic. By releasing a complete open-source implementation that worked “out of the box” across public benchmarks and real environments, Mur-Artal and Tardos gave robotics, AR/VR, autonomous driving, and drone researchers a common baseline that was both strong and inspectable. After ORB-SLAM2, new SLAM papers increasingly had to compare against a robust public system rather than against fragmented prototypes, which raised the empirical standard for the field.

The paper also helped define the bridge from classical geometric SLAM to later hybrid systems. Subsequent breakthroughs in visual-inertial SLAM, semantic SLAM, dense mapping, neural scene representations, and learned feature pipelines often treated ORB-SLAM2 as either a baseline, a front-end/back-end template, or a component to extend. Its influence came from showing that careful engineering of geometric primitives, place recognition, keyframe selection, and optimization could deliver reliable real-time mapping before deep learning became central to perception pipelines.

Abstract

We present ORB-SLAM2, a complete simultaneous localization and mapping (SLAM) system for monocular, stereo and RGB-D cameras, including map reuse, loop closing, and relocalization capabilities. The system works in real time on standard central processing units in a wide variety of environments from small hand-held indoors sequences, to drones flying in industrial environments and cars driving around a city. Our back-end, based on bundle adjustment with monocular and stereo observations, allows for accurate trajectory estimation with metric scale. Our system includes a lightweight localization mode that leverages visual odometry tracks for unmapped regions and matches with map points that allow for zero-drift localization. The evaluation on 29 popular public sequences shows that our method achieves state-of-the-art accuracy, being in most cases the most accurate SLAM solution. We publish the source code, not only for the benefit of the SLAM community, but with the aim of being an out-of-the-box SLAM solution for researchers in other fields.

Sources