VSLAM and Navigation

This section covers Visual Simultaneous Localization and Mapping (VSLAM) for robot navigation. VSLAM enables robots to build maps of their environment while simultaneously determining their position within that map using visual sensors.

Introduction to VSLAM

Visual SLAM (VSLAM) is a technology that allows robots to understand their environment using visual sensors like cameras. Unlike traditional SLAM that might use LiDAR, VSLAM uses visual features from cameras to create maps and determine the robot's position.

Key Components of VSLAM

Feature Detection: Identifying distinctive visual elements
Feature Matching: Associating features across different views
Pose Estimation: Determining the robot's position and orientation
Map Building: Creating a representation of the environment
Loop Closure: Correcting accumulated errors when revisiting locations

Setting up VSLAM with ORB-SLAM

Here's how to set up a basic ORB-SLAM system:

import rclpy
from rclpy.node import Node
from sensor_msgs.msg import Image
from cv_bridge import CvBridge
import cv2
import numpy as np

class VSLAMNode(Node):
    def __init__(self):
        super().__init__('vslam_node')

        # Initialize CV bridge
        self.bridge = CvBridge()

        # Create subscribers for camera images
        self.image_sub = self.create_subscription(
            Image,
            '/camera/image_raw',
            self.image_callback,
            10
        )

        # Initialize ORB detector and matcher
        self.orb = cv2.ORB_create(nfeatures=1000)
        self.bf_matcher = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)

        # Initialize VSLAM state
        self.previous_frame = None
        self.previous_keypoints = None
        self.previous_descriptors = None
        self.current_pose = np.eye(4)  # 4x4 identity matrix
        self.map_points = []

        self.get_logger().info('VSLAM node initialized')

    def image_callback(self, msg):
        # Convert ROS image to OpenCV image
        cv_image = self.bridge.imgmsg_to_cv2(msg, desired_encoding='bgr8')

        # Detect features in current frame
        keypoints, descriptors = self.orb.detectAndCompute(cv_image, None)

        if self.previous_frame is not None and descriptors is not None:
            # Match features between current and previous frames
            matches = self.bf_matcher.match(
                self.previous_descriptors,
                descriptors
            )

            # Sort matches by distance
            matches = sorted(matches, key=lambda x: x.distance)

            # Extract matched points
            if len(matches) >= 10:  # Need minimum matches for pose estimation
                src_points = np.float32([self.previous_keypoints[m.queryIdx].pt for m in matches]).reshape(-1, 1, 2)
                dst_points = np.float32([keypoints[m.trainIdx].pt for m in matches]).reshape(-1, 1, 2)

                # Estimate pose transformation
                transformation, mask = cv2.findHomography(
                    src_points, dst_points, cv2.RANSAC, 5.0
                )

                if transformation is not None:
                    # Update current pose based on transformation
                    self.update_pose(transformation)

        # Store current frame for next iteration
        self.previous_frame = cv_image
        self.previous_keypoints = keypoints
        self.previous_descriptors = descriptors

    def update_pose(self, transformation):
        # Update the robot's estimated pose based on visual odometry
        # This is a simplified version - real implementations use more sophisticated methods
        dt = np.eye(4)
        dt[:2, :3] = transformation[:2, :3]  # Use translation part

        self.current_pose = np.dot(self.current_pose, dt)

        self.get_logger().info(f'Updated pose: {self.current_pose[:2, 3]}')

def main(args=None):
    rclpy.init(args=args)
    vslam_node = VSLAMNode()

    try:
        rclpy.spin(vslam_node)
    except KeyboardInterrupt:
        vslam_node.get_logger().info('Shutting down VSLAM node')
    finally:
        vslam_node.destroy_node()
        rclpy.shutdown()

if __name__ == '__main__':
    main()

Using NVIDIA Isaac ROS for VSLAM

NVIDIA Isaac ROS provides optimized VSLAM capabilities. Here's how to set it up:

import rclpy
from rclpy.node import Node
from sensor_msgs.msg import Image, CameraInfo
from geometry_msgs.msg import PoseStamped
from nav_msgs.msg import Odometry
import numpy as np

class IsaacVSLAMNode(Node):
    def __init__(self):
        super().__init__('isaac_vslam_node')

        # Publishers
        self.odom_pub = self.create_publisher(Odometry, '/visual_odom', 10)
        self.pose_pub = self.create_publisher(PoseStamped, '/visual_pose', 10)

        # Subscribers
        self.image_sub = self.create_subscription(
            Image,
            '/camera/image_raw',
            self.image_callback,
            10
        )
        self.info_sub = self.create_subscription(
            CameraInfo,
            '/camera/camera_info',
            self.camera_info_callback,
            10
        )

        # VSLAM state
        self.camera_matrix = None
        self.distortion_coeffs = None
        self.current_pose = np.eye(4)
        self.frame_count = 0

        self.get_logger().info('Isaac VSLAM node initialized')

    def camera_info_callback(self, msg):
        """Process camera calibration info"""
        self.camera_matrix = np.array(msg.k).reshape(3, 3)
        self.distortion_coeffs = np.array(msg.d)

    def image_callback(self, msg):
        """Process camera images for VSLAM"""
        # This would interface with Isaac ROS VSLAM pipeline
        # In a real implementation, you'd use Isaac's optimized VSLAM
        self.process_vslam_frame(msg)

        # Publish odometry estimate
        self.publish_odometry()

    def process_vslam_frame(self, image_msg):
        """Process a single frame for VSLAM"""
        # Placeholder for Isaac ROS VSLAM integration
        # In practice, you'd use Isaac's optimized pipeline here
        self.frame_count += 1

        # Simulate pose update (in real implementation, this comes from VSLAM)
        if self.frame_count % 10 == 0:  # Update every 10 frames
            # Add a small translation to simulate movement
            delta_x = 0.01  # meters
            delta_y = 0.005  # meters

            self.current_pose[0, 3] += delta_x
            self.current_pose[1, 3] += delta_y

    def publish_odometry(self):
        """Publish odometry information"""
        odom_msg = Odometry()
        odom_msg.header.stamp = self.get_clock().now().to_msg()
        odom_msg.header.frame_id = 'map'
        odom_msg.child_frame_id = 'base_link'

        # Set position
        odom_msg.pose.pose.position.x = float(self.current_pose[0, 3])
        odom_msg.pose.pose.position.y = float(self.current_pose[1, 3])
        odom_msg.pose.pose.position.z = float(self.current_pose[2, 3])

        # For simplicity, set orientation to identity (in real implementation, extract from rotation matrix)
        odom_msg.pose.pose.orientation.w = 1.0

        self.odom_pub.publish(odom_msg)

def main(args=None):
    rclpy.init(args=args)
    node = IsaacVSLAMNode()

    try:
        rclpy.spin(node)
    except KeyboardInterrupt:
        node.get_logger().info('Shutting down Isaac VSLAM node')
    finally:
        node.destroy_node()
        rclpy.shutdown()

if __name__ == '__main__':
    main()

Map Building and Localization

Creating and maintaining a map is essential for navigation:

import rclpy
from rclpy.node import Node
from sensor_msgs.msg import Image
from geometry_msgs.msg import PointStamped, PoseWithCovarianceStamped
from visualization_msgs.msg import MarkerArray, Marker
import numpy as np
import cv2
from collections import deque

class VSLAMMapper(Node):
    def __init__(self):
        super().__init__('vslam_mapper')

        # Publishers
        self.map_pub = self.create_publisher(MarkerArray, '/vslam_map', 10)
        self.landmark_pub = self.create_publisher(Marker, '/landmarks', 10)

        # Subscribers
        self.image_sub = self.create_subscription(
            Image,
            '/camera/image_raw',
            self.image_callback,
            10
        )

        # Initialize mapping components
        self.orb = cv2.ORB_create(nfeatures=2000)
        self.bf_matcher = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=False)

        # Map data
        self.keyframes = deque(maxlen=100)  # Keep recent keyframes
        self.landmarks = []  # 3D points in the map
        self.current_pose = np.eye(4)

        # Tracking
        self.previous_descriptors = None
        self.previous_keypoints = None
        self.frame_id = 0

        self.get_logger().info('VSLAM Mapper initialized')

    def image_callback(self, msg):
        """Process image and update map"""
        # Convert ROS image to OpenCV
        cv_image = self.bridge.imgmsg_to_cv2(msg, desired_encoding='bgr8')

        # Detect keypoints and descriptors
        keypoints, descriptors = self.orb.detectAndCompute(cv_image, None)

        if descriptors is not None:
            if self.previous_descriptors is not None:
                # Match current features with previous ones
                matches = self.bf_matcher.knnMatch(
                    self.previous_descriptors, descriptors, k=2
                )

                # Apply Lowe's ratio test
                good_matches = []
                for match_pair in matches:
                    if len(match_pair) == 2:
                        m, n = match_pair
                        if m.distance < 0.75 * n.distance:
                            good_matches.append(m)

                # Update pose based on matches
                if len(good_matches) >= 10:
                    self.update_pose_from_matches(good_matches, keypoints)

            # Store current frame as keyframe if significantly different
            if self.should_add_keyframe(descriptors):
                self.add_keyframe(cv_image, keypoints, descriptors)

        # Update previous data
        self.previous_descriptors = descriptors
        self.previous_keypoints = keypoints
        self.frame_id += 1

        # Publish updated map
        self.publish_map()

    def should_add_keyframe(self, descriptors):
        """Determine if current frame should be added as keyframe"""
        if len(self.keyframes) == 0:
            return True

        # Simple heuristic: add keyframe every 30 frames or if significantly different
        return self.frame_id % 30 == 0

    def add_keyframe(self, image, keypoints, descriptors):
        """Add current frame as keyframe to map"""
        keyframe = {
            'image': image,
            'keypoints': keypoints,
            'descriptors': descriptors,
            'pose': self.current_pose.copy(),
            'frame_id': self.frame_id
        }

        self.keyframes.append(keyframe)

    def update_pose_from_matches(self, matches, current_keypoints):
        """Update pose based on feature matches"""
        # Extract matched points
        prev_pts = np.float32([self.previous_keypoints[m.queryIdx].pt for m in matches]).reshape(-1, 1, 2)
        curr_pts = np.float32([current_keypoints[m.trainIdx].pt for m in matches]).reshape(-1, 1, 2)

        # Compute essential matrix and estimate pose
        if len(prev_pts) >= 5:
            E, mask = cv2.findEssentialMat(
                curr_pts, prev_pts,
                cameraMatrix=self.camera_matrix,
                method=cv2.RANSAC,
                prob=0.999,
                threshold=1.0
            )

            if E is not None:
                # Recover pose from essential matrix
                _, R, t, _ = cv2.recoverPose(E, curr_pts, prev_pts, self.camera_matrix)

                # Create transformation matrix
                transformation = np.eye(4)
                transformation[:3, :3] = R
                transformation[:3, 3] = t.flatten()

                # Update current pose
                self.current_pose = np.dot(self.current_pose, transformation)

    def publish_map(self):
        """Publish the current map as visualization markers"""
        marker_array = MarkerArray()

        # Create markers for landmarks
        for i, landmark in enumerate(self.landmarks):
            marker = Marker()
            marker.header.frame_id = "map"
            marker.header.stamp = self.get_clock().now().to_msg()
            marker.ns = "landmarks"
            marker.id = i
            marker.type = Marker.SPHERE
            marker.action = Marker.ADD

            marker.pose.position.x = landmark[0]
            marker.pose.position.y = landmark[1]
            marker.pose.position.z = landmark[2]

            marker.pose.orientation.w = 1.0
            marker.scale.x = 0.05
            marker.scale.y = 0.05
            marker.scale.z = 0.05
            marker.color.a = 1.0
            marker.color.r = 1.0
            marker.color.g = 0.0
            marker.color.b = 0.0

            marker_array.markers.append(marker)

        self.map_pub.publish(marker_array)

def main(args=None):
    rclpy.init(args=args)
    mapper = VSLAMMapper()

    try:
        rclpy.spin(mapper)
    except KeyboardInterrupt:
        mapper.get_logger().info('Shutting down VSLAM mapper')
    finally:
        mapper.destroy_node()
        rclpy.shutdown()

if __name__ == '__main__':
    main()

Integrating VSLAM with the ROS navigation stack:

import rclpy
from rclpy.node import Node
from nav_msgs.msg import OccupancyGrid, Path
from geometry_msgs.msg import PoseStamped, PointStamped
from sensor_msgs.msg import Image
from tf2_ros import TransformBroadcaster
from tf2_geometry_msgs import do_transform_pose
import tf2_ros
import tf2_geometry_msgs
import numpy as np
import threading

class VSLAMNavigator(Node):
    def __init__(self):
        super().__init__('vslam_navigator')

        # Publishers and subscribers
        self.goal_sub = self.create_subscription(
            PoseStamped, '/move_base_simple/goal', self.goal_callback, 10
        )
        self.vslam_pose_sub = self.create_subscription(
            PoseStamped, '/visual_pose', self.vslam_pose_callback, 10
        )

        # Navigation publishers
        self.nav_goal_pub = self.create_publisher(PoseStamped, '/goal_pose', 10)
        self.path_pub = self.create_publisher(Path, '/vslam_path', 10)

        # TF broadcaster
        self.tf_broadcaster = TransformBroadcaster(self)

        # Navigation state
        self.current_pose = None
        self.goal_pose = None
        self.navigation_active = False

        # Timer for navigation loop
        self.nav_timer = self.create_timer(0.1, self.navigation_loop)

        self.get_logger().info('VSLAM Navigator initialized')

    def vslam_pose_callback(self, msg):
        """Update current pose from VSLAM"""
        self.current_pose = msg

    def goal_callback(self, msg):
        """Receive navigation goal"""
        self.goal_pose = msg
        self.navigation_active = True
        self.get_logger().info(f'Navigation goal received: {msg.pose.position.x}, {msg.pose.position.y}')

    def navigation_loop(self):
        """Main navigation loop"""
        if not self.navigation_active or self.current_pose is None or self.goal_pose is None:
            return

        # Calculate path to goal
        path = self.calculate_path_to_goal()

        if path:
            self.path_pub.publish(path)

            # Check if we've reached the goal
            distance = self.calculate_distance_to_goal()
            if distance < 0.5:  # 0.5 meter tolerance
                self.get_logger().info('Goal reached!')
                self.navigation_active = False

    def calculate_path_to_goal(self):
        """Calculate path to goal (simplified - in practice use path planning algorithms)"""
        if self.current_pose is None or self.goal_pose is None:
            return None

        path_msg = Path()
        path_msg.header.frame_id = "map"
        path_msg.header.stamp = self.get_clock().now().to_msg()

        # For this example, create a straight line to goal
        current_pos = self.current_pose.pose.position
        goal_pos = self.goal_pose.pose.position

        # Simple linear interpolation (in practice, use proper path planning)
        steps = 10
        for i in range(steps + 1):
            t = i / steps
            pose = PoseStamped()
            pose.header.frame_id = "map"
            pose.header.stamp = self.get_clock().now().to_msg()

            pose.pose.position.x = current_pos.x + t * (goal_pos.x - current_pos.x)
            pose.pose.position.y = current_pos.y + t * (goal_pos.y - current_pos.y)
            pose.pose.position.z = current_pos.z + t * (goal_pos.z - current_pos.z)

            # Set orientation toward goal
            dx = goal_pos.x - current_pos.x
            dy = goal_pos.y - current_pos.y
            yaw = np.arctan2(dy, dx)

            # Convert yaw to quaternion
            pose.pose.orientation.z = np.sin(yaw / 2.0)
            pose.pose.orientation.w = np.cos(yaw / 2.0)

            path_msg.poses.append(pose)

        return path_msg

    def calculate_distance_to_goal(self):
        """Calculate distance to navigation goal"""
        if self.current_pose is None or self.goal_pose is None:
            return float('inf')

        current = self.current_pose.pose.position
        goal = self.goal_pose.pose.position

        distance = np.sqrt(
            (goal.x - current.x)**2 +
            (goal.y - current.y)**2 +
            (goal.z - current.z)**2
        )

        return distance

def main(args=None):
    rclpy.init(args=args)
    navigator = VSLAMNavigator()

    try:
        rclpy.spin(navigator)
    except KeyboardInterrupt:
        navigator.get_logger().info('Shutting down VSLAM navigator')
    finally:
        navigator.destroy_node()
        rclpy.shutdown()

if __name__ == '__main__':
    main()

Loop Closure and Map Optimization

Advanced VSLAM systems implement loop closure to correct drift:

import rclpy
from rclpy.node import Node
from sensor_msgs.msg import Image
from geometry_msgs.msg import PoseWithCovarianceStamped
from std_msgs.msg import Bool
import numpy as np
import cv2
from sklearn.cluster import DBSCAN
from collections import deque
import gtsam  # Georgia Tech Smoothing and Mapping library

class VSLAMOptimizer(Node):
    def __init__(self):
        super().__init__('vslam_optimizer')

        # Subscribers
        self.vslam_pose_sub = self.create_subscription(
            PoseWithCovarianceStamped, '/amcl_pose', self.pose_callback, 10
        )
        self.loop_closure_sub = self.create_subscription(
            Bool, '/loop_closure_detected', self.loop_closure_callback, 10
        )

        # Publishers
        self.optimized_pose_pub = self.create_publisher(
            PoseWithCovarianceStamped, '/optimized_pose', 10
        )

        # Loop closure detection
        self.pose_history = deque(maxlen=1000)  # Store recent poses
        self.keyframe_poses = deque(maxlen=100)  # Store keyframe poses
        self.loop_closure_threshold = 2.0  # meters

        # GTSAM optimizer
        self.graph = gtsam.NonlinearFactorGraph()
        self.initial_estimate = gtsam.Values()
        self.current_key = 0

        self.get_logger().info('VSLAM Optimizer initialized')

    def pose_callback(self, msg):
        """Process pose updates and detect potential loop closures"""
        # Store current pose in history
        pose_data = {
            'position': np.array([
                msg.pose.pose.position.x,
                msg.pose.pose.position.y,
                msg.pose.pose.position.z
            ]),
            'orientation': np.array([
                msg.pose.pose.orientation.x,
                msg.pose.pose.orientation.y,
                msg.pose.pose.orientation.z,
                msg.pose.pose.orientation.w
            ]),
            'timestamp': msg.header.stamp.sec + msg.header.stamp.nanosec * 1e-9
        }

        self.pose_history.append(pose_data)

        # Check if this should be a keyframe (every 50 poses)
        if len(self.pose_history) % 50 == 0:
            self.keyframe_poses.append(pose_data)
            self.add_pose_to_optimizer(pose_data)

        # Check for potential loop closures
        self.check_for_loop_closure(pose_data)

    def check_for_loop_closure(self, current_pose):
        """Check if current pose is close to any previous keyframe (potential loop closure)"""
        for i, kf_pose in enumerate(self.keyframe_poses):
            # Calculate distance to keyframe
            distance = np.linalg.norm(
                current_pose['position'] - kf_pose['position']
            )

            if distance < self.loop_closure_threshold:
                # Potential loop closure detected
                self.get_logger().info(f'Potential loop closure detected at keyframes: {len(self.keyframe_poses)-1} and {i}')
                self.handle_loop_closure(len(self.keyframe_poses)-1, i, current_pose, kf_pose)
                break

    def handle_loop_closure(self, current_idx, previous_idx, current_pose, previous_pose):
        """Handle detected loop closure"""
        # In a real implementation, you would add a constraint between the poses
        # and optimize the entire trajectory using GTSAM
        self.get_logger().info(f'Adding loop closure constraint between poses {current_idx} and {previous_idx}')

    def add_pose_to_optimizer(self, pose_data):
        """Add current pose to the GTSAM optimizer"""
        # Create a pose key
        pose_key = gtsam.symbol('x', self.current_key)

        # Create pose (for simplicity, using 2D pose)
        pose_2d = gtsam.Pose2(
            pose_data['position'][0],
            pose_data['position'][1],
            np.arctan2(
                2 * (pose_data['orientation'][3] * pose_data['orientation'][2]),
                1 - 2 * (pose_data['orientation'][2]**2 + pose_data['orientation'][3]**2)
            )
        )

        # Add initial estimate
        self.initial_estimate.insert(pose_key, pose_2d)

        # Add odometry constraint (simplified)
        if self.current_key > 0:
            prev_key = gtsam.symbol('x', self.current_key - 1)
            # Add odometry factor between consecutive poses
            pass

        self.current_key += 1

    def loop_closure_callback(self, msg):
        """Handle confirmed loop closure"""
        if msg.data:
            self.optimize_trajectory()
            self.get_logger().info('Trajectory optimized after loop closure')

    def optimize_trajectory(self):
        """Optimize the entire trajectory using GTSAM"""
        # Create optimizer parameters
        params = gtsam.LevenbergMarquardtParams()
        params.setVerbosity("ERROR")
        params.setMaxIterations(100)

        # Create optimizer
        optimizer = gtsam.LevenbergMarquardtOptimizer(self.graph, self.initial_estimate, params)

        # Get optimized values
        result = optimizer.optimize()

        self.initial_estimate = result

def main(args=None):
    rclpy.init(args=args)
    optimizer = VSLAMOptimizer()

    try:
        rclpy.spin(optimizer)
    except KeyboardInterrupt:
        optimizer.get_logger().info('Shutting down VSLAM optimizer')
    finally:
        optimizer.destroy_node()
        rclpy.shutdown()

if __name__ == '__main__':
    main()

Best Practices for VSLAM

Feature Rich Environments: Ensure the environment has sufficient visual features
Lighting Consistency: Maintain consistent lighting conditions for better tracking
Motion Constraints: Avoid rapid motion that can cause feature tracking failure
Initialization: Properly initialize the system before autonomous navigation
Scale Recovery: Use additional sensors for scale recovery in monocular systems
Computational Efficiency: Optimize algorithms for real-time performance

Hands-on Exercise

Create a complete VSLAM system that:

Processes camera images to extract visual features
Estimates robot motion using visual odometry
Builds a map of the environment
Integrates with the ROS navigation stack
Implements basic loop closure detection

This exercise will help you understand the complete VSLAM pipeline for robotic navigation.

Introduction to VSLAM​

Key Components of VSLAM​

Setting up VSLAM with ORB-SLAM​

Using NVIDIA Isaac ROS for VSLAM​

Map Building and Localization​

Integration with Navigation Stack​

Loop Closure and Map Optimization​

Best Practices for VSLAM​

Hands-on Exercise​