Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

1 Mosaicing Video Sequences

2001

With the advent of cheap, but relatively low-resolution, video sensors, the importance of automatically mosaicing many small video images to one large image has increased. This is due mainly to the many useful applications that may be based on such technology, such as hand-held mobile scanning, multi-resolution imaging, panoramic spreads and video compression. The process of creating a mosaiced image consists of first finding the geometric registration of each small image from the sequence to a global image plane, and then combining all the registered images into one smooth, pleasing image. We review two existing methods for finding the registration: sequential and canvas mosaicing, and discuss the advantages and disadvantages of each of those algorithms. After suggesting two methods to improve the reliability and the accuracy of the basic sequential algorithm, we present our algorithm, which combines and enhances the sequential and canvas methods. We demonstrate how our algorithm o...

Mosaicing Video Sequences Arnon Netzer Craig Gotsman Computer Science Dept. Technion - Israel Institute of Technology Haifa 32000, Israel Abstract With the advent of cheap, but relatively low-resolution, video sensors, the importance of automatically mosaicing many small video images to one large image has increased. This is due mainly to the many useful applications that may be based on such technology, such as hand-held mobile scanning, multi-resolution imaging, panoramic spreads and video compression. The process of creating a mosaiced image consists of first finding the geometric registration of each small image from the sequence to a global image plane, and then combining all the registered images into one smooth, pleasing image. We review two existing methods for finding the registration: sequential and canvas mosaicing, and discuss the advantages and disadvantages of each of those algorithms. After suggesting two methods to improve the reliability and the accuracy of the basic sequential algorithm, we present our algorithm, which combines and enhances the sequential and canvas methods. We demonstrate how our algorithm overcomes most of the pitfalls of each of the two simpler algorithms. Finally, we present a novel approach to combining the registered images utilizing both geometrical and content information in order to improve the visual quality of the mosaic. Keywords: Image mosaicing, image registration. Contact Author: Craig Gotsman Computer Science Dept. Technion – Israel Institute of Technology Haifa 32000, Israel Email gotsman@cs.technion.ac.il Phone: +972-4-8294336 Fax: +972-4-8294353 1 1. Introduction With the advent of cheap, but relatively low-resolution, video sensors, the importance of automatically mosaicing many small video images to one large image has increased. This is due mainly to the many useful applications that may be based on such technology, such as hand-held mobile scanning [1], multi-resolution imaging [2,3,17], creating high resolution stills from videos [4], panoramic spreads [5,6,13] and video compression [7]. In the general case, mosaicing two arbitrary images of a three dimensional scene is not simple. Green and Heckbert [8] showed how do this in the case the camera movement is given. Several works explore finding this 3D camera movement from the images themselves [14,15]. Jaillon and Montavert [16] show how to register when knowledge of the three dimensional structure of the scene is available. Since a true 3D solution is difficult, in many cases a 2D registration between pixels is sought. Finding this may be achieved by optical flow methods [18,23] , but this does not produce registrations accurate enough for video mosaicing. Aiger and Cohen [19] used an iterative algorithm to improve the registration accuracy of the optical flow solution. Such solutions are common in three dimensional imaging for medical applications, where the quality of the source images is poor to begin with and the accuracy of the registration is less important. Herman and Peleg [9] suggested to register each image in the sequence to its predecessor without reconstructing depths or transforming the images to the same plane. This can produce in some cases a pleasing result, but usually results in a distorted image of the scene. In the cases where the registration between the images can be described as one global 2D projective transformation, the problem can be solved much more accurately. Among those cases are all the scenes consisting of planar objects (e.g. documents, black boards, satellite pictures etc.) or panoramic spreads. Mann [10] showed how to mosaic two images taken from a planar scene using a transformation with eight parameters. Most works to date deal with the mosaicing of a small number of large images. In this paper we consider the case of a large number of small images, a scenario in which accumulated error may be significant, and an arbitrary camera trajectory may result in overlaps between arbitrary images of the sequence. There are two basic approaches for generalizing a a two-image registration algorithm to a longer sequence of images. One is sequential mosaicing [11], in which each image is registered to the one before it in the sequence. Those registrations are then accumulated to produce a transformation between each image and global image plane. The second method, called canvas mosaicing [12] registers each image directly to the canvas - the “big” image which is gradually being constructed from the smaller images on the global image plane. The sequential method encounters several difficulties. If a single image in the sequence is corrupted, or lacks sufficient information for a successful registration, the sequence is broken. In this case, the best result possible is an estimate based on previous trends. Furthermore, if a single registration is inaccurate, the error is propagated through the rest of the sequence. This problem is amplified when the family of allowed transformations is rich. Other problems arise from the accuracy of the registration algorithm. Since registration involves solving a non-linear optimization problem, even sophisticated algorithms such as the LevenbergMarquardt procedure [22] are susceptible to the classical local minima and plateau pitfalls. Moreover, in a situation in which the image sequence is the result of a simple sensor translation, numerical errors may disguise them as a more complex transformation (e.g. scaling), confusing the algorithm. 2 The advantage of the canvas method is that each new image is registered in a way that produces the best "global” fit, thus resulting in an image which is smoothest to the eye. However, since errors may have been accumulated in the registration of previous images, it might become impossible, after a while, to register properly a new image which overlaps a region previously imaged to the canvas. This is because that canvas area may no longer be a clean consistent image, rather a mosaic of many (possibly incorrectly) registered images. Encountering such a scenario can mislead the algorithm totally. Experience shows that neither of the two mosaicing methods works very well in practice. In order to devise a more robust algorithm, we enhance each of the methods and then combine the two, thus minimizing their respective difficulties and capitalizing on their individual strengths. We then suggest ways for combining the registered images into a “big” image in a visually pleasing manner. The rest of this paper is organized as follows: Section 2 describes the basic registration procedure for two images. Section 3 describes the problems encountered by these simple methods when extending them to image sequences, and our solutions to these problems. In Section 4 we propose a mosaicing method based on the image content to minimize seams. We conclude in Section 5. 2. Registering Two Images When registering two images I and I’ taken from a scene containing planar objects, the registration may be approximated well by a 3 x 3 projective transform matrix:  x ′  m0  y ′  = m    3  w′ m6 m1 m4 m7 m2   x  m5   y  m8   w Since the projective transform is unique up to a scaling factor, the matrix can be reduced to an eight parameter transformation: x' ( xi , yi ) = m0 xi + m1 y i + m2 , m6 xi + m7 y i + 1 y ' ( xi , y i ) = m3 x i + m 4 y i + m5 m6 xi + m7 y i + 1 (2.1) The transformation parameter vector m is found by minimizing the intensity error function between the two images: E ( I , I ' ) = ∑ ei2 = ∑ [I ′(xi′ , y i′ ) − I (xi , y i )] 2 i (2.2) i where xi′ , yi′ are as in (2.1). This is based on the assumption that the minimum is achieved when the images are best registered [18]. The error function can be minimized using the Levenberg-Marquardt method [22]. To use this method, derivatives of ei are calculated for each of the projective matrix parameters m0 K m7 . From these derivatives a weighted gradient vector b is calculated, as well as an approximation of the Hessian matrix A . In order to avoid calculating second order derivatives, the product of first order derivatives is used as an approximation of A . At each iteration, the projective matrix parameters are updated by ∆m = ( A + λ I ) b with λ decreasing near the minimum, giving more weight to A . The advantage of the Levenberg - Marquardt method is its ability to combine advancing in the gradient direction when it is far −1 3 from the minimum, and take into account the curvature when approaching the minimum. As all non-linear minimization techniques, this method converges to a local minimum, hence must be initialized with a “reasonable” initial guess. It is common to use a transformation consisting of translation only as this initial guess (only m2 and m5 are non-zero), implicitly assuming that the translation parameters in the transformation matrix are significantly larger than the others. This translation may be found efficiently using the multi-pyramid method [23]. 3. Registering an Image Sequence This section elaborates on how the basic two-image registration method is extended to handle a long image sequence. 3.1. The “Sequential” Algorithm In the sequential extension of the basic two-image registration procedure, each image I i is registered to its predecessor I i −1 only. Denote this transformation by Ti . These are accumulated to produce the transformation of each image TC 0 = I (the identity ) and TC i = TC i −1 * Ti for i ≥ 1 (see Figure 3.1). Figure 3.1: The sequential algorithm: Each image Ii to the canvas: is registered to its predecessor only. The registrations are accumulated to produce the transformation of each image to the canvas TCi = TCi −1 * Ti The drawback of this method is that in cases when an inaccuracy is introduced during the registration process, this inaccuracy or “noise” affects not only the current registration, but all the following ones (see Figure 3.4(a)). Furthermore, if one registration fails due to a “bad” image or an image lacks sufficient information for a successful registration, the sequence is broken. In this case, the best result that can be hoped for is an estimate based on previous trends. 3.2. The “Window” Algorithm To improve the sequential algorithm, we propose to broaden the base of the registration. Since the source images originate in a video sequence, in all probability each image overlaps more than one predecessor. At any given time i , consider a window of 2n + 1 images, where 4 n = 3 . For the images I i − n K I i −1 (the dotted squares) the registration has already been determined. For each new image I i + n entering the window, n Figure 3.2: The window algorithm: A window of 2n+1 images for registrations are calculated. n is the history “depth”. The window is centered on image I i , so the window contains images I i − n K I i K I i + n . When treating image I i , the registrations of I i − n K I i −1 to the canvas have already been determined. For each new image I i + n entering the window, n registrations Ti +j n are calculated to the images I j : j = i ... i + n − 1 (see Fig 3.2). In addition, for each such transformation an error measure E is calculated - the average sum of the squares of the intensity difference between the two images: 2 1 m I i ( x k , y k ) − I j (Ti j ( x k , y k )) (3.1) ∑ m k =1 where k iterates through all the m pixels in the conjunction of the images, and xi′ , yi′ are calculated using (2.1). Each registration with an error measure E greater than a given Ei j = [ ] threshold is considered to be a failure. Once all the registrations for the image I i + n against each of the images I i KI i +n −1 have been computed, transformations to the canvas TCi K TCi + n are calculated. Each of these is calculated by weighing all the relevant registrations, namely, all the registrations whose E is under a fixed threshold (see Figure 3.3): i+n TC i = ∑ TC j * Ti j Ei j i+n 1 ∑ j j =i − n E i j =i − n (3.2) −1 Note that Ti j = (T ji ) , namely that a transformation to the future is the inverse of the transformation to the past of the same images, hence there is no need to calculate it again. At this point we have all the possible relevant registrations for the image I i , and a new image entering the window will not effect the transformation TCi . The image I i may then be merged into the canvas and the window advanced one image. 5 Figure 3.3: When two to the future. n=3 the relevant registrations for the image I i +1 are three transformations to the past and 3.2.1. Experimental Results The window algorithm proved to be especially effective in its ability to “skip” over “bad” images. When one image in the sequence cannot be registered at all, this image is not merged into the canvas, and the next image based on a registration deeper into the history for calculating its transformation. This leads to an overall improvement in the accuracy of the mosaic (see Figure 3.4). (a) (b) Figure 3.4: The window mosaicing algorithm applied to a sequence of 200 images (70x100 pixels each). The order of the sequence is from the top-left clockwise. Towards the end of the sequence the images return to an area in the canvas containing earlier information. (a) 1-image window. Notice the error accumulated during the mosaicing process. (b) 3-image window. The improvement in the registration accuracy relative to (a) is evident. Experiments with windows of different sizes showed a significant improvement in the overall registration accuracy as n was increased from 1 to 3, and almost no improvement at all when increased beyond 5. This supports the theory that part of the inaccuracy in the algorithm is white noise introduced due to numerical errors, so when averaged it converges rapidly to zero. Hence the significant gain for small n ’s and marginal for larger n ’s. The drawback of the window algorithm is the increase in time complexity. The computation time is n times that of the basic sequential algorithm. 6 3.3. The “Multi-Stage” Algorithm One of the problems encountered when using a non linear optimization procedure is the plateau problem. In the vicinity of the minimum the derivatives become small, and inaccuracies may be introduced. Since the projective transformation has eight parameters, each of them can have an error of ∆ mi introduced to it due to inaccuracies in the minimization procedure. This error may not be significant when mosaicing two images, but it may become crucial when mosaicing long sequences of images. Furthermore, the long-term consequence of error is different for each parameter. While a small error in translation stays constant regardless of the number of registrations following it (it may even average to zero assuming such errors have the characteristics of white noise), an error in rotation is amplified by the number of images following (see Figure 3.5). Another related problem is the ``disguise’’ problem. Sometimes two images may be registered in more than one way. Consider an image containing a horizontal line starting at the left end and continuing for one hundred pixels, and another image containing the same line, only 150 pixels in length. There is no way to know if the “correct” registration between the two is translation of fifty pixels to the left, or scaling along the x axis (see Figure 3.6). The problems described above can result in a situation in which even though the input image sequence is the result of a simple sensor translation, numerical errors and disguise issues will mislead the algorithm to yield a more complex transformation. This results in an erroneous output even for simple inputs. To deal with these problems, we suggest a multi-stage algorithm for computing the transformation. Categorize the family of possible transformations into five families with decreasing priorities: Translation , Rigid, Similarity, Affine and Projective. (a) (b) Figure 3.5: The effect of an error in transformation parameter space. (a) Translation parameter error. The error is propagated to all the following images. The distance between each image and its correct position remains constant. (b) Rotation parameter error. The error is propagated to all the following images. The distance between each image and its correct position increases with time. 7 (a) (b) Figure 3.6: The “disguise” problem. There is no way of knowing whether the “correct” registration transformation between (a) and (b) is translation or scaling along the x axis. Given an image I to be registered to an image I ′ , five transformations are calculated, each under the constraints of its respective family. For each of these transformations, an error measure ETranslation , E Rigid , ESimilarity , E Affine and EPr ojective is calculated using (3.1). E final calculated as in (3.3), and the transformation corresponding to E final chosen. Efinal = min(min(mi n(min(ETranslation , ERigid ∗ c), ESimilarity∗ c),EAffine ∗ c), EProjective∗ c) (3.3) When c is a constant larger then one, a higher level family will be chosen only if it yields a significant improvement in the error measure. 3.3.1. Experimental Results The multi-stage algorithm showed vast improvement in the cases when the correct transformation belonged to a simple family such as translation only (see Figure 3.7). On the other hand we retained the ability to handle transformation from more complex families (see Figure 3.8). 8 (a) (b) Figure 3.7: The multi-stage algorithm applied to a sequence of fifty 70x100 pixel images, where the actual transformation between each two successive images is approximately a five pixel translation. (a) Single-stage algorithm results. Note the shear that has crept in. (b) Multi-stage algorithm results. Figure 3.8: The multi-stage algorithm applied to a “general” input - a sequence of forty 70x100 pixel images, where the actual transformation between each two successive images contains translation, rotation, shear and perspective elements. 9 The weakness of this algorithm is in the cases where there is no significant difference between the error measures of transformations from different families, but the correct transformation is indeed from a higher family. In these cases, it is very important to choose the right factor c . If c is too big, it will not permit choosing from a higher family even when it is necessary. On the other hand, too small a factor will not filter out the noises. Experimenting with different values of c showed best results when 105 . < c < 110 . . As before, the drawback of this algorithm is the increase in complexity, which is basically multiplied by the number of families used. 3.4. The “Combined” Algorithm As mentioned above, there are two approaches to generalizing the basic two-image registration procedure to a sequence of images. One is sequential mosaicing in which each image is registered its predecessor in the sequence. The other is the canvas method which registers each image directly to the canvas. Consider the case in which the image sequence creates a closed loop, and a new image I i is to be registered to a place in the canvas where the image I i − k was mapped to in the past (see Figure 3.9). Figure 3.9: The image past. Ii should be registered to a place in the canvas where the image Ii − k was mapped in the In sequential mosaicing, the algorithm continues to use only the information from the near history. The information from the image I i − k will not be taken into account. On the other hand, canvas mosaicing might incur a deadlock if the information from the near history and that on the canvas conflict. In this case there might be no transformation for I i consistent with both I i−1 and I i − k . Our “combined” algorithm combines the sequential and canvas mosaicing utilizing their respective advantages and overcoming their disadvantages. First we calculate for each new image Ii the transformation to the canvas TCi using the sequential algorithm. Based on this transformation an error measure ECi is calculated, much alike in here the error is calculated between the image and the canvas: 10 (3.1), only EC = ∑ [I ( xi , y i ) − C ( x ' i , y ' i )] 2 (3.4) i where i traverses all the pixels in I , x i′, y i′ are calculated as in Eq. (2.1) and C is the canvas. The transformation TCi is then used as the initial guess for finding a registration ~ ~ ~ TCi directly to the canvas. An error measure ECi is calculated for TCi too. If this shows ~ significant improvement over TCi ( ECi ∗ c < ECi where c is a factor greater than 1) it is adopted. If not, TCi is adopted. This method, however, does not take into account the fact that in video sequences the overlap between consecutive images is relatively large, thus most of the information in the canvas where I i is to be registered comes from its recent history. Hence, the combined algorithm will show little improvement over the sequential algorithm. This problem may be rectified by the following: At any given time i , two instances of the canvas are considered, one at time i , denoted by Ci , the other at time i − n , denoted by Ci − n . While the error measurements are calculated with respect to Ci , the canvas registra- ~ tion TCi is calculated with respect to Ci − n , ensuring the utilization of “older” information. 3.4.1. Experimental Results ~ The combined algorithm uses the parameter c to decide whether TCi or TCi should be used. A large c will ensure that the combined algorithm will not “harm” the sequential result. On the other hand, too large a c will not allow using the canvas information. We found that useful values for c are 10 . < c < 115 . . The parameter n depends on the characteristics of the image sequences, and should be chosen in a manner that will leave Ci − n with the relevant information. We found that n ’s between 5 and 25 produce good results (see Figure 3.10). Note that n can be changed dynamically based on the transformation being calculated. (a) (b) Figure 3.10: The combined algorithm applied to a sequence of fifty 70x100 pixel images, where the actual transformation between each two successive images is approximately a five pixel translation. The order that the image stream was taken is from the bottom right corner moving left, up to the top left corner, and then moving right to the top right corner. (a) Sequential algorithm results. (b) Combined algorithm results. 11 4. Placing the Seams in Low Activity Regions Our final contribution was triggered by the observation that registration errors in the mosaic are much more visible when they occur in regions containing significant image activity. The same error, occurring in a low activity region, may be practically invisible. It is not easy to devise a general algorithm that “tucks” registration errors into low activity regions. However, in some cases, a simple utilization of this principle may yield significant improvements. Such a case is the mosaicing of video-scanned text. In this case, all that is needed is to find the gap between the text lines, and to place the seams within that gap. Our experimental results show that the success rate of a typical OCR algorithm increased from approximately 85% to 99% when presented with an input generated by this improved mosaicing procedure (see Fig. 4.1). 5. Conclusion The importance of video image mosaicing will continue to increase as cheaper and smaller sensors become available. In order for this technology to be really useful, it must yield good robust results, preferably at real-time rates. While the first requirement depends on algorithmic quality, the second will be addressed somewhat by the expected rapid increase in computing power over the next few years. It seems, therefore, that superior, but somewhat slow, algorithms are to be preferred over fast inferior ones. This is the reason we propose quality algorithms, even at the price of them being somewhat complex. Our algorithms register based on intensity information. Better results might possibly be obtained if the individual color components are considered. 6. References [1] Toshiba, VideoBrush Corporation , http://www.videobrush.com/ [2] M. Elad and A. Feuer, "Super-resolution restoration of continuous image sequence - adaptive filtering approach" IEEE Trans. Image Proc., December 1995. [3] M. Irani and S. Peleg “Improving resolution by image registration” GMIP(53), pp. 231239., May 1991 [4] S. Mann and R. Picard. “Constructing high quality stills from video” IEEE Trans. Image Proc., pp. 13-16 November 1994. [5] E. Chen, "QuickTime VR - An image-based approach to virtual environment navigation". Proc. of SIGGRAPH, pp. 29-38, August 1995. [6] L. McMillan and G. Bishop. “Plenoptic modeling: An image based rendering system”, Proc. of SIGGRAPH, pp. 39-46, August 1995. [7] M. Irani, P. Anandan and S. Hsu, "Mosaic based representations of video sequences and their applications". Proc. of IEEE ICCV pp. 605-611, 1995. [8] N. Green and P. Heckbert. “Creating raster omnimax images from multiple prespective views using the elliptical weighted avarege filter” IEEE CG&A pp. 21-27, June 1986. 12 [9] S. Peleg and J. Herman, “Panoramic mosaic by manifold projection”, Proc. of CVPR, June 1997, pp. 338-343. [10] S. Mann “Composing multiple pictures of the same scene: Generalized large displacement 8-parameters motion” IS&T Cambridge, May 1993. [11] R. Szeliski, ``Video mosaics for virtual environments,'' IEEE CG&A 13, pp. 22-30, 1996. [12] US Patent No. 5649032, ”System for automatically aligning images to form a mosaic image.” David Sarnoff Research Center, Inc., Princeton, NJ. [13] A. Krishnan and A. Ahuja “Panoramic image acquisition” Proc. of IEEE CVPR. pp. 379-384 June 1996. [14] M. Glisher and A. Witkin “Through-the-lens camera control” Proc. of SIGGRAPH pp. 331-340 July 1992. [15] M. Irani, B. Rousso, S. Peleg “Recovery of ego-motion using image stabilization” Proc. of CVPR-94, pp. 39-45 June 1994. [16] P. Jaillon and A. Montavert “Image mosaicing applied to three-dimensional surfaces” Proc. of IEEE CVPR pp. 253-257 October 1994. [17] P.J. Burt and E.H Adelson “A multiresolution spline with application to image mosaics” ACM Transactions on Graphics 2(4):217-236, 1983. [18] D.C. Barber “Registration of low resolution medical images” Phys. Med. Biol. 27(3), pp. 87-96 1992. [19] D. Aiger and D. Cohen “Mosaicing ultrasonic volumes for visual simulation” Tel Aviv University, Computer Science Dept. technical report. [20] J. D. Foley, A. Van Dam, S. K. Feiner, and J. F. Hughes. “Computer graphics: principles and practice”. Addison-Wesley, Reading, MA, 2nd Edition, 1990. [21] P.S. Heckbert, "Fundamentals of texture mapping and image warping, "Masters Thesis, Dept. of EECS, UCB, Technical Report No. UCB/CSD 89/516, June 1989. [22] W.H. Press, B.P. Flannery, S.A. Teukolsky, and W.T. Vetterling. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, Cambridge, England, second edition, 1992. [23] L.H. Quam. “Hierarchical warp stereo.” In Image Understanding Workshop, pp. 149155, December 1984. 13 (a) (b) Figure 4.1: Sequence of approximately 1000 images at resolution of 70x100 pixels. (a) Registration using the multi-stage algorithm with a 1-image window. The seams of the canvas are very evident in the text lines. Applying OCR to this resulted in recognition of only 85% of the characters. (b) Registration such that the seams are placed between the lines. Applying OCR to this resulted in recognition of 98% of the characters. 14