SIFT (Scale-Invariant Feature Transform) is a computer vision algorithm for detecting and describing local features in images. Developed by David Lowe in 1999, SIFT has become a fundamental tool for various applications due to its robustness and versatility.
Use Cases of SIFT
👀 Object Recognition
SIFT is widely used for detecting and identifying specific objects within complex scenes. Its ability to extract distinctive features that are invariant to scale, rotation, and illumination changes makes it effective for recognizing objects across different viewpoints and conditions.
🪡 Image Stitching
One of SIFT’s prominent applications is in image stitching. The algorithm helps find corresponding points between overlapping images, allowing for seamless alignment and blending to create panoramas.
🗿 3D Reconstruction
SIFT is used in 3D modeling and reconstruction tasks. By identifying matching points between images taken from different angles, it enables the triangulation of 3D point positions, enabling the reconstruction of scenes or objects in 3D space.
🤖 Robot Navigation and Mapping
In robotics, SIFT is employed for navigation and mapping purposes. Autonomous robots can use SIFT features to localize themselves within an environment and build maps of their surroundings.
Minimal Example
This example demonstrates the use of SIFT for feature detection and matching between two images using Numpy and OpenCV. It includes steps for keypoint detection, descriptor computation, matching, filtering good matches, and applying RANSAC to find a homography matrix. The results are visualized by drawing the matches between the two images.
Input images
👇 image_1.jpg
👇 image_2.jpg
👇 image_3.jpg
👇 image_4.jpg
👇 image_5.jpg
Setup and feature detection
import numpy as np
import cv2
import sys
# Print Python and OpenCV versions for reference
print(f"Python version: {sys.version}")
print(f"OpenCV version: {cv2.__version__}")
# Load an image
# Repeat with 5 different input images to detect keypoints
img1 = cv2.imread('data/image_1.jpg', cv2.IMREAD_GRAYSCALE) # input image
# Initialize SIFT detector
sift = cv2.xfeatures2d.SIFT_create()
# Detect keypoints and compute descriptors for both images
kp1, des1 = sift.detectAndCompute(img1, None)
Detected features
👇 image_1.jpg
(3238 keypoints)
👇 image_2.jpg
(2830 keypoints)
👇 image_3.jpg
(1721 keypoints)
👇 image_4.jpg
(1612 keypoints)
👇 image_5.jpg
(1660 keypoints)
Descriptor Matching
# Minimum number of good matches required
MIN_MATCH_COUNT = 4
# Load the query and train images in grayscale
img1 = cv2.imread('data/image_1.jpg', cv2.IMREAD_GRAYSCALE) # queryImage
img2 = cv2.imread('data/image_2.jpg', cv2.IMREAD_GRAYSCALE) # trainImage
# Initialize SIFT detector
sift = cv2.xfeatures2d.SIFT_create()
# Detect keypoints and compute descriptors for both images
kp1, des1 = sift.detectAndCompute(img1, None)
kp2, des2 = sift.detectAndCompute(img2, None)
# Initialize brute-force matcher and perform k-nearest neighbor matching
bf = cv2.BFMatcher()
matches = bf.knnMatch(des1, des2, k=2)
# Print total number of matches before RANSAC
print(f'Matches before RANSAC: {len(matches)}')
# Apply Lowe's ratio test to filter good matches
good = [m for m, n in matches if m.distance < 0.7 * n.distance]
# Print number of good matches
print(f'Number of "good" matches: {len(good)}')
# Sort good matches based on distance
good = sorted(good, key=lambda val: val.distance)
# Check if there are enough good matches
if len(good) > MIN_MATCH_COUNT:
# Extract coordinates of matched keypoints
src_pts = np.float32([kp1[m.queryIdx].pt for m in good]).reshape(-1, 1, 2)
dst_pts = np.float32([kp2[m.trainIdx].pt for m in good]).reshape(-1, 1, 2)
# Find homography matrix using RANSAC
M, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC, 5.0)
matchesMask = mask.ravel().tolist()
# Print homography matrix
print("Homography matrix:")
print(M)
# Get dimensions of query image
h, w = img1.shape
pts = np.float32([[0, 0], [0, h-1], [w-1, h-1], [w-1, 0]]).reshape(-1, 1, 2)
else:
print(f"Not enough matches are found - {len(good)}/{MIN_MATCH_COUNT}")
matchesMask = None
# Set drawing parameters for matches
draw_params = dict(
matchColor=(0, 255, 0), # Draw matches in green color
singlePointColor=None,
matchesMask=matchesMask, # Draw only inliers
flags=2
)
# Print number of matches after applying homography (RANSAC)
if matchesMask:
print(f'Number of matches after RANSAC: {matchesMask.count(1)}')
# Draw matches on the images
img_matches = cv2.drawMatches(img1, kp1, img2, kp2, good, None, **draw_params)
cv2.imshow('Matches', img_matches)
# Wait for a key press and close all windows
cv2.waitKey(0)
cv2.destroyAllWindows()
Top matches before RANSAC
A good match is defined by the number of matches filtered by distance shorter than 0.7 (p1.distance < 0.7 * p2.distance).
👇 Between image_1.jpg
and image_3.jpg
(409 good matches)
👇 Between image_1.jpg
and image_4.jpg
(235 good matches)
👇 Between image_1.jpg
and image_5.jpg
(559 good matches)
👇 Between image_2.jpg
and image_3.jpg
(199 good matches)
👇 Between image_2.jpg
and image_4.jpg
(213 good matches)
👇 Between image_2.jpg
and image_5.jpg
(48 good matches)
Top matches after RANSAC
RANSAC (Random Sample Consensus) is an iterative algorithm used to estimate parameters of a mathematical model from a set of observed data that contains outliers.
These results are non-deterministic due to random sampling used in RANSAC.
👇 Between image_1.jpg
and image_3.jpg
(471 good matches)
👇 Between image_1.jpg
and image_4.jpg
(56 good matches)
👇 Between image_1.jpg
and image_5.jpg
(620 good matches)
👇 Between image_2.jpg
and image_3.jpg
(80 good matches)
👇 Between image_2.jpg
and image_4.jpg
(262 good matches)
👇 Between image_2.jpg
and image_5.jpg
(36 good matches)
Analsis of Results
How well does the method work?
The method works well. The results show good matches between our object image and images containing multiple objects. It is surprising that the objects were well detected even with obstacles present.
Does it work equally well on the different examples?
Both yes and no. The number of matches varies widely based on the position and color intensity distribution of the objects. Objects with barriers to viewing have a much smaller number of keypoints and matches. The second textbook (“Computer Vision and Pattern Recognition” - image_2.jpg
) generally has a smaller number of matches due to its smaller font size for the book title and large mix of dark gray colors.
However, the perspective transformations applied through the homography matrices derived using cv2.findHomography()
show very good matches with the objects placed in different images. Therefore, I would conclude that our method works well, although not equally across all examples.
Current Status of SIFT
While SIFT (Scale-Invariant Feature Transform) is still a powerful and widely used algorithm in computer vision, some newer alternatives have emerged that offer improved performance in certain areas:
- SURF (Speeded Up Robust Features)
- ORB (Oriented FAST and Rotated BRIEF)
- AKAZE (Accelerated-KAZE)
- BRISK (Binary Robust Invariant Scalable Keypoints)