Augmented reality without glasses using projection mapping (planar scene)

source code:

In this project, we use a RaspberryPi-Projector (RPP) kit developped in a previous post to project new information on the scene that will follow the shape of the scene 3d surface, thus augmenting the real surface with computer generated graphics without AR glasses. In order to recover the shape of the scene, and the location of the projector with respect to the scene, we need a camera. By observing the image of the projection of a known pattern, we can obtain geometric constraints about the structure of the scene, and estimate its 3d shape. 

To simplify this first mapping project, we will suppose that the scene is a plane. The generalization to complexe shape will be done in a other post. Using the planar constraint, it will be simple to establish the mapping between the pixels in the camera  and the pixels in the projector. In this special case, the linear transformation induced by the plane is known as a homography.

Mechanical configuration

For the concept to work, the position of the camera must remain fixed relative to the projector for the duration of the experient. For this purpose, we designed a rigid support for the RPP and the camera in freecad and 3d printed it. We used PLA filament for the print as we do not need the structure to maintain its position for a very long period. The result is shown in the next figure. 

Optical configuration

For the optical configuration, we need to decide the focal of the camera lenses. We need to image a scene of approximatively 300mm, at a distance of 400mm. We know that the pi hq camera used in this setup integrates a sony imx477, a 12MP rolling shutter cmos having a resolution is 4056x3040 with 1.55um pixels. Thus, we can find the wanted focal distance f with a simple principle of trigonometry, with similar triangle, as shown of the left below. The large triangle formed by the scene and the distance to the scene is related to the smaller triangle with known image plane dimension, but unkown focal. Solving for f gives us a focal of 8.12mm (f=3.042*400/150= 8.12mm)

Thus, a lens with a focal of approx. 8mm will result in an appropriate FoV (Field of View). So we ordered the arducam 8mm cs-mount lens, compatible with the 1/2.3" format of the cmos. It is important for the lens to match the cmos format, to avoid dark corner or incomplete images.

For projectors, the focal is replaced by the concept of throw ratio to specify the field of view. It is defined as the ratio of projection plane depth and projected image width.  For example, the evm2000 has a nominal throw ratio of 1.6 (image of 1m at a distance of 1.6m). However, the ratio can be different when working in close range since focus adjustment changes the magnifying of the optics. We measured a value of 2.2 at 400mm (400mm distance and 180mm image), as shown in the next pictures:

The projector is 400mm from the wall

The projected image is very small, having a width of 180mm

With a closeup, the image widen to 250mm, which is more practical. However, it is dimmer due to the increased total area illuminated and the budget lens show strong reflection. 

This results in a very small image at 400mm. Since we can't move the projector further away (space restrictions), we need to find a way to decrease the throw ratio. Fortunatly, there is a way. There is a type of lens that can be added to an optical system to change the focal distance and magnification, a close-up lenses. On the web, these lenses are popular for smartphone to make wide angle and macro photo. 

In our case, we need to change the magnification, so we used a low cost 0.63x wide angle adaptor lens for smartphone. We had to use some MacGyver to fix the closeup. This recovers the original throw ratio of 1.6. Of course, as in all optic design, there is a draw back. Here, the scene illuminance (intensity radiating from the scene surface) is reduced because the light is distributed on a larger surface. There is also reflection in the lens reducing the efficiency of the system.  Still this leaves sufficient intensity.  We tried other magnification factors, including a fisheye lens that gave a very wide image, but distortion and intentity is not acceptable.  

The surface illuminated with the nominal throw ratio of 1.6 is too small

The close-up lense adjust the throw ratio at 1.6 resulting in an acceptable illuminated surface 

The low cost lenses kit for smartphone that was used in the experiment

With the setup complete, we acquired an image with the pi, shown in the next figure. The FoV is as anticipated:

Image taken at 400mm from the wall, by the 12MP pi hd camera mounted with 8mm lens. The image show the desktop projected by the dlp2000 with a closeup.  


The homography gives a linear mapping between the camera and the projector.  So, pointing a laser on the plane, it will be possible to detect the point in the camera image, project the point in the projector image, and trace a path with the projector that will follow the laser. Since the mapping is linear, it does not model directly the distortion of the camera and of the projector lens. Depending on the quality of the mapping, we might need to estimate it. 

Software - Projection

To project an image, we write directly to the framebuffer of the pi. This avoids the desktop, the frame around windows, and it is always full screen. In our case, the projector is at fb0, with dimension 640x360. So we need to create a python numpy array mapped to the frame buffer.  The image format fo the framebuffer is BRG565 (or is it RGB565?), and in the format, a pixel is coded on 16bit (red 5bit, green 6bit, blue 5bit. I created a mapping using np.memmap, as follow
mappedBuffer = np.memmap('/dev/fb0', dtype=np.uint8, mode='w+', shape=(360, 640, 2 ))

when we need to draw the image using opencv, all that is needed is to create the image and convert it to brg565 before writing to the buffer, as mappedBuffer[:] = img. For example, this draws a white cross with a frame

 img= np.zeros([360,640],np.uint8)

img_ = cv2.cvtColor(img, cv2.COLOR_BGR2BGR565)
mappedBuffer[:] = img_

Finding the homography mapping 

To obtain the mapping between the projector and camera image, we need at least 4 corresponding points. We will create an image of corner different position, project it, and detect it in the camera image using the asn detector implemented in a previous blog  (Detecting image corner with ASN).   The method is slow but the correspondence is direct. 

Projected image 360x640

Captured image by the pi hd camera 

Corners detected by ASN. We will need to filter out the corners that are not of interest, to keep only the one detected at the center.

To simplify the image analysis, the first image to be projected is completly white and the second is completly black. By finding the significantly brighten pixels in the white image w.r.t the black image, we can find the portion of the camera image covered by the projector. 

To have some redundency in the homography estimation, we project a grid of 18 corners at different positions. To find the relevant corner in the image, we search for a corner that is inside the portion projected. The avoid corners on the boundary, we ensure that all the pixels inside a small radius centered on the corner are part of the portion belonging to the projector. To speed up things we chenged the camera resolution to 320x240. This is more than enough for this project and will allow a much better framerate. This creates two list of points, camera_points and projector_points. The homogaphy between the points is a 3x3 matrix that can be computed with opencv, 

We could also estimate the homography without opencv, it is linear in the equation :

[p1]xHp0t =0  

where [p]x is the skew symmetric matrix of p, and the exponent "t" means transpose. Simply create the nx9 matrix and do a svd decomposition. The solution H is the the eigenvector associated with the smallest eigenvalue. In all this, the points are in homogeneous coordinates, this means a 1 is added as 3rd value p=[x,y,1].  And is you solved for H, that's it, H is the mapping to between the camera and projector.  If pc is a point in the camera, the corresponding point pp  in the projector is given as 

pp  = Hpct  

Be sure to normalize pp  by its 3rd coordinate to go back in non-homogeneous coordinate  in python, p /= p[2].  We can check is out by comparing the image points transfered to the projector image. I will show the coordinates I obtained with this particular setup, at its particular position w.r.t the plane, just to give an idea of the type of value to expect. The coordinates and the homography will be invalid as soon as the setup change. In this example, I was using a camera resolution of 320x240, and projector resolution of 640x360. 


[[ 49.909  98.843]

 [ 49.654 129.373]

 [ 49.845 159.842]

 [ 79.386  99.298]

 [ 79.284 130.237]

 [ 79.466 161.02 ]

 [109.459  99.735]

 [109.48  130.938]

 [109.68  161.92 ]

 [139.966 100.008]

 [140.022 131.382]

 [140.12  162.423]

 [170.498 100.11 ]

 [170.568 131.348]

 [170.546 162.397]

 [200.755 100.051]

 [200.872 131.097]

 [200.789 161.899]]


[[ 80.783  79.422]

 [ 79.979 158.874]

 [ 80.371 238.167]

 [159.613  79.959]

 [159.226 160.243]

 [159.595 240.127]

 [239.582  80.431]

 [239.529 161.167]

 [239.951 241.335]

 [320.241  80.471]

 [320.286 161.413]

 [320.442 241.505]

 [400.499  80.067]

 [400.59  160.427]

 [400.435 240.314]

 [479.583  79.261]

 [479.801 158.895]

 [479.495 237.923]]


[[ 80  80]

 [ 80 160]

 [ 80 240]

 [160  80]

 [160 160]

 [160 240]

 [240  80]

 [240 160]

 [240 240]

 [320  80]

 [320 160]

 [320 240]

 [400  80]

 [400 160]

 [400 240]

 [480  80]

 [480 160]

 [480 240]]

H@camera_points - projector_points

[[ 0.783 -0.578]

 [-0.021 -1.126]

 [ 0.371 -1.833]

 [-0.387 -0.041]

 [-0.774  0.243]

 [-0.405  0.127]

 [-0.418  0.431]

 [-0.471  1.167]

 [-0.049  1.335]

 [ 0.241  0.471]

 [ 0.286  1.413]

 [ 0.442  1.505]

 [ 0.499  0.067]

 [ 0.59   0.427]

 [ 0.435  0.314]

 [-0.417 -0.739]

 [-0.199 -1.105]

 [-0.505 -2.077]]

Quiver plot of the error H@camera_points - projector_points

H= [[ 2.702e+00 -4.147e-03 -5.328e+01]

 [-1.441e-02  2.614e+00 -1.778e+02]

 [ 9.592e-05 -2.595e-06  1.000e+00]]

There is significant error remaining, close to 2 pixels.  This is due to the image distorsion. It can be modeled using the radial and tangential distortion with a non-linear optimisation. Typically I use the levenberg-marquardt algorithm to estimate the paramters of this model. It is implemented in scipy. However, we will ignore it for now. 

Real time Mapping

this section show how to do the mapping between the camera and projector in "real time", and draw the line that follow the pointer. It's not really real time, it's more "interactive time". The first step is create a fast acquisition thread. The camera will be running in continuous using the python picamera2 package.  We created a class, derived from threading, that will be in charge of the camera. It returns images on demand. I found it to be faster than calling the picamera2 capture_array() in a loop in the main thread.

The camera exposure time is set to 0.5ms. At this speed, we only see the laser pointer in the camera image, and not the overlayed projection. This is possible because the laser is order of magnitude brighter than the projector.  A typical image is shown in the next figure:

Acquisition of the laser pointer by the camera with exposure time of 0.5ms

Zoom of the laser point

The detection is fairly simple since the only pixel above 0 are the pixel belonging to the laser spot. I used the fast detector of opencv to find the dot. Its fast enough for this application and will run under 5ms on a 320x240 image on a raspberry pi 4. When a point is captured, it is mapped to the projector image, and added in a list of coordinates. A function is called to draw line segment between the points: 

To avoid connecting points where the pointer stoped and started again, we added a timestamp to the coordinate. When the timestamp is too big, let say, 5 frames, we dont draw the segment. Furthermore, to avoid redrawing the same segments more than ones, the timestamp is also used as a flag stating the point are connected in the image.

To add some cool colors to the drawing (this was the first suggestion from young users), each segment is drawn using a random rgb color using randint of numpy. Finally, when the laser point the top left corner of the projector image, we clear the list of points and the image.  This made the program much more friendly to use.