Part A: Image Warping and Mosaicing

Shooting the Pictures

Below are images I shot so the transforms between them are projective so I can create image mosaics using them. I chose images at School of Law, Haas, and South Hall.

Before Image
School of Law - Left
After Image
School of Law - Right
Before Image
Haas - Left
After Image
Haas - Right
Before Image
South Hall - Left
After Image
South Hall - Right

Recovering Homographies

We can represent the projective transformation between each left and right image using a homography matrix, \(H\), Our projective transformation can be written as \(Hp = p'\). Where \(p\) is a homogeneous coordinate of a point in the left image, and \(p'\) is its corresponding point in the right image. $$ H = \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & 1 \end{bmatrix} $$ The projective transformation is as follows: $$ \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & 1 \end{bmatrix} \begin{bmatrix} x_1 \\ y_1 \\ 1 \end{bmatrix} = \begin{bmatrix} wx_1' \\ wy_1' \\ w \end{bmatrix} $$ Using our corresponding pairs of points we can recover the homography, \(H\), by solving the following system using least squares. We stack each correspondence pair in a matrix, \(P\), on the left, and a vector \(q\) on the right. Since we have 8 degrees of freedom, we need at least 4 points. $$ \begin{bmatrix} x_1 & y_1 & 1 & 0 & 0 & 0 & -x_1'x_1 & -x_1'y_1 \\ 0 & 0 & 0 & x_1 & y_1 & 1 & -y_1'x_1 & -y_1'y_1 \\ \end{bmatrix} \begin{bmatrix} a \\ b \\ c \\ d \\ e \\ f \\ g \\ h \end{bmatrix} = \begin{bmatrix} x_1' \\ y_1' \end{bmatrix} $$

Warp the Images

After we recovered the homography, \(H\), we want to be able to warp images using it. To do this we take the corners of the image we would like to warp and apply H to get the tranformed coordinates on the warped image. We do this as follows: $$ \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & 1 \end{bmatrix} \begin{bmatrix} x_1 \\ y_1 \\ 1 \end{bmatrix} = \begin{bmatrix} wx_1' \\ wy_1' \\ w \end{bmatrix} $$ $$ warped\ pixel\ location = \begin{bmatrix} \frac{x_1'}{w} \\ \frac{y_1'}{w} \end{bmatrix} $$ Afterwards, we create a bounding box in the warped image that contains these morphed corners. We use inverse warping to determine what these pixels would be in the original images and use interpolation to set these pixels in our warped image.

Image Rectification

To test if our warp works lets try to "rectify" some images by selecting a planar surface of an image and projecting it onto a flat frontal-facing plane. Here are the results below:

Physics Building: Original
Before Image
Physics Building: Warped
After Image
Hearst Mining: Original
Before Image
Hearst Mining: Warped
After Image

We see the surfaces we chose have been warped to be frontal-facing.

Image Mosaics

After we project the left image into the right images space, we need to blend the two images together. We can do this using 2-band "laplacian stack" blending. First, we create a mask of where the projected images are on our blending canvas that will encompass both images. After we create these masks, we apply a distance transform to find the distances of pixels in the mask to their nearest edge and then normalize these distance values from 0 to 1 for both images.

Here are the results of this step on the School of Law left and right images:

Before Image
School of Law: Left Distance Transform
After Image
School of Law: Right Distance Transform

Now we apply a low-pass filter and a high-pass filter on the left and right images. We blend the low-pass and high-pass left and right images separately and sum them to create our blended images. To blend the low-pass filtered images we apply a weighted linear combination on the left and right images using the distance transformed computed above. To blend the high-pass filtered images we just selected the pixel from the image that has a greater distance value in each image.

Here are the results of the mosaics using the images chosen from the Shooting the Pictures section:

Derek Hybrid Image
School of Law Mosaic
Derek Hybrid Image
Haas Mosaic
Derek Hybrid Image
South Hall Mosaic

Part B: Autostitching

Corner Detection

For autostitching, we want to first create a set of interest points to use. To create this initial set of interest points we can use the Harris Interest Point Detector. We can use the Harris Interest Point Detector to get a matrix of "corner strengths" representing the strength of the interest points in the image and set of interest points to use.

Applying the detector below we get the following set of interest points for a sample image:

Before Image
Haas - Left: Original
After Image
Haas - Left: Harris

Adaptive Non-Maximal Suppression (ANMS)

We see that the Harris interest points are clustered together and not evenly spaced throughout the image. We can run adaptive non-maximal suppression to select interest points with strong corner reponse that are spaced through the image. For each interest point we compute the radius as follows: $$ r_i = \min_j |\mathbf{x}_i - \mathbf{x}_j|, \text{ s.t. } f(\mathbf{x}_i) < c_{\text{robust}}f(\mathbf{x}_j), \mathbf{x}_j \in \mathcal{I} $$ We select the top k points with the largest radii to make up our set of evenly spread out interest points.

Setting k=500, here is the comparison of the Harris interest points and the points after running ANMS:

Before Image
Haas - Left: Harris
After Image
Haas - Left: ANMS

Extracting Feature Descriptors

Once we have our finalized interest points, we want to extract features for each point. To do this we take a 40 x 40 window around each point. We then downsample using a scaling factor of 5 to create 8 x 8 patches. We bias/gain-normalize to have zero mean and unit variance. We then flatten the patch into a vector, giving us a feature descriptor for each interest point.

Below we have an example of normalized patch before it is flattened:

Derek Hybrid Image
Feature Descriptor Patch

Feature Matching

With the features extracted, we need to match features across our images to form correspondences. We first compute the distances between the two sets of feature descriptors we have for our two images. Then we iterate through our first set of descriptors and find the first and second nearest neighbor in the second set of descriptors. To reduce the number of outliers we use Lowe's trick and compute ratio of the first nearest neighbor distannce and the second nearest neighbor distance. If the ratio is below some threshold, \(t\), we then count the feature in first feature descriptor set as a match with its nearest neighbor in the second feature descriptor set.

Here are the feature matches for the School of Law using \(t = 0.1\):

Derek Hybrid Image
School of Law: Feature Matches

RANSAC

Even with Lowe's trick we are left with some incorrect feature matches. When we compute the homography using least squares with these outliers we skew the compute homography, creating invalid stitching and warps. In order to remove the amount of outliers we apply RANSAC to create a set of inliers to compute our homography.

The steps are as follows:

  1. Select 4 feature matches at random
  2. Compute the homography H (exact)
  3. Compute inliers where \(dist(p_i', Hp_i) < \epsilon \)
  4. Repeat steps 1-3 until we hit our max iterations
  5. Keep the largest set of inliers found in our iterations
  6. Recompute the homography H using least-squares on the largest set of inliers

Here are our feature matches after running RANSAC on the School of Law images:

Derek Hybrid Image
School of Law: RANSAC

Results

Here are the results comparing the hand-stitched mosaics versus the auto-stitched mosaics using feature matching and RANSAC:

Before Image
School of Law: Manual
After Image
School of Law: Auto
Before Image
Haas: Manual
After Image
Haas: Auto
Before Image
South Hall: Manual
After Image
South Hall: Auto

What I Learned

What I found particularly interesting and cool was Lowe's trick and RANSAC as a tool to mitigate incorrect correspondences and automatically chose the right ones to compute the homography.