Bezier curve-based generic shape encoder

Ferdous Sohel

Bezier curve-based generic shape encoder

2010, Image Processing, …

Open Research Online The Open University’s repository of research publications and other research outputs A Bezier curve-based generic shape encoder Journal Article How to cite: Sohel, Ferdous A.; Karmakar, Gour C.; Dooley, Laurence S. and Bennamoun, M. (2010). curve-based generic shape encoder. IET Image Processing, 4(2), pp. 92–102. A Bezier For guidance on citations see FAQs. c 2010 IEEE Version: Accepted Manuscript Link(s) to article on publisher’s website: http://dx.doi.org/doi:10.1049/iet-ipr.2008.0128 Copyright and Moral Rights for the articles on this site are retained by the individual authors and/or other copyright owners. For more information on Open Research Online’s data policy on reuse of materials please consult the policies page. oro.open.ac.uk A Bezier Curve-Based Generic Shape Encoder Ferdous A. Sohel1*, Gour C. Karmakar2, Laurence S. Dooley3, and Mohammed Bennamoun1 1 School of Computer Science and Software Engineering, The University of Western Australia, WA 6009, Australia 2 3 Gippsland School of Information Technology, Monash University, Churchill, Victoria 3842, Australia Department of Communication and Systems, The Open University, Milton Keynes, MK7 6AA, United Kingdom Abstract Existing Bezier curve based shape description techniques primarily focus upon determining a set of pertinent Control Points (CP) to represent a particular shape contour. While many different approaches have been proposed, none adequately consider domain specific information about the shape contour like its gradualness and sharpness, in the CP generation process which can potentially result in large distortions in the object’s shape representation. This paper introduces a novel Bezier Curve-based Generic Shape Encoder (BCGSE) that partitions an object contour into contiguous segments based upon its cornerity, before generating the CP for each segment using relevant shape curvature information. In addition, while CP encoding has generally been ignored, BCGSE embeds an efficient vertex-based encoding strategy exploiting the latent equidistance between consecutive CP. A nonlinear optimisation technique is also presented to enable the encoder is automatically adapt to bit-rate constraints. The performance of the BCGSE framework has been rigorously tested on a variety of diverse arbitrary shapes from both a distortion and requisite bit-rate perspective, with qualitative and quantitative results corroborating its superiority over existing shape descriptors. Keywords: Bezier curve, shape coding, bit-rate, nonlinear optimisation, distortion. 1. Introduction * Corresponding author – E-mail: Ferdous.Sohel@csse.uwa.edu.au, Ferdous.Sohel@ieee.org, Tel.: +61 8 6488 2796, Fax: +61 8 6488 1089. Postal address: CSSE, UWA, 35 Stirling HWY, Crawley, WA 6009, Australia. Other authors’ e-mails: Gour.karmakar@infotech.monash.edu.au, l.s.dooley@open.ac.uk, and m.bennamoun@csse.uwa.edu.au Bezier Curves (BC) were independently developed by P. de Casteljau and P. E. Bézier, and while their origin can be traced to the design of car body shapes in the automotive industry, their contemporary applications encompass many diverse disciplines. In particular, their robustness in curve and surface representation means they pervade many fields of multimedia technology including shape description of characters [1], [2] and objects [3], active shape lip modelling [4], shape error concealment for MPEG-4 objects [5] and surface mapping [6]. The classical BC is defined by a set of control points (CP) with the number and orientation of these points governing the overall size and shape of the curve. In shape encoding applications, the distance between a shape contour and the approximating curve (distortion) crucially depends upon the generated CP, so efficacious CP computation is vital for any BC-based shape coder. Furthermore, the use of a single BC to represent a complex shape is computationally very expensive as a high order BC will be mandated, with the corresponding CP calculations also incurring a substantial computational cost. To reduce this overhead, composite BC [7] have been used to represent more complex shapes, whereby the entire shape is sub-divided into segments, with each individually represented by a BC. Cinque et al. [3] introduced a Shape Description using Cubic polynomial Bezier curves (SDCB) technique which divides the shape-boundary into an a priori number of segments, each comprising the same number of boundary points, with the CP evenly distributed over the entire boundary irrespective of its complexity. As both the segment division and CP generation processes consider only the number of shape points, this approach is independent of boundary complexity which can lead to large distortions, even when a high number of relatively short segments are used. Moreover, as the segments are of equal length in terms of boundary points, the strategy does not consider aspects of a shape such as its cornerity and branches. Composite BC (CBC) has successfully been applied in the Automatic outline Capture of Arabic Fonts (ACAF) [1] and Shape Description for Chinese calligraphy Characters (SDCC) [2] algorithms to respectively describe Arabic and Chinese character outlines, as well as in Active Shape Lip Modelling (ASLM) [4] applications to represent different lip formations. In all these algorithms however, while the shape is divided into segments by considering the cornerity at the shape points, more localised shape information within each segment is not subsequently considered during CP generation. The two end-points of each segment are respectively chosen as the start and end CP, with intermediate points calculated in a variety of ways. For instance in both SDCC and ASLM, the intermediate CP locations are derived from the two tangents at the segment endpoints using either the intersection of the tangents [2] or a computationally intensive trial-and-error approach [4]. The CP on the tangents at the segment ends however, does not guarantee efficient shape approximation as they are not necessarily representative of the entire segment. In contrast, ACAF [1] adopts a distortion minimisation approach, though this is computationally expensive and usually dictates a further subdivision of a segment in order to obtain low distortion. Within the MPEG-4 standardisation process, the vertex-based Operational-Rate-Distortion (ORD) optimal shape coding framework has been developed using parametric B-splines [8] and [9], which has been further extended in [10]. While these techniques provide rate-distortion (RD) optimisation flexibility, they as evidenced in Section 4, are also computationally very expensive due to using an exhaustive shortest-path search method in a directedacyclic-graph containing a large number of vertices. One recurring feature in all the aforementioned algorithms is that the CP may not of necessity reside on the shape contour, a consequence of which is an increase in the descriptor length. As the CP describe a shape, their efficient encoding will considerably aid in reducing the descriptor length and thereby the resulting communications cost. While SDCB [3] adopts a parametric approach to CP encoding, the techniques delineated in ACAF, SDCC and ASLM, conspicuously do not comment upon their respective CP encoding strategy. The parametric descriptor used in SDCB generates four CP for each segment and specifically comprises: i) the absolute coordinates of the first and fourth CP, ii) the directional angle and magnitude distance of the second CP from the first CP, and iii) the directional angle and magnitude distance of the third CP from the fourth CP. Both the angle and distance parameters are encoded using floating point notation which renders this approach unsuitable for low-bit rate video applications such as, video streaming over the Internet and mobile video transmission for hand-held devices, where innate bandwidth limitations demand highly efficient bit minimisation techniques. This paper presents a novel Bezier Curve-based Generic Shape Encoder† (BCGSE) that reduces both the distortion and shape descriptor length by applying an efficient CP generation strategy incorporating shape information, concomitant with an improved vertex coding scheme. Unlike [1], [2] and [4], CP generation takes cognisance of the cornerity of a contour in segmenting a shape boundary, with localised information about the curvature of the segment being employed to obtain the CP. In the ensuing coding phase, an innovative extension to the Object-Adaptive Vertex Encoding (OAVE) technique [13] is proposed. The original OAVE encodes a set of vertices by adapting the representation to the dynamic range of the relative locations of the vertices before utilising † Preliminary ideas involved in this research were introduced in [11] and [12] an octant-based representation for each individual vertex. The new enhancement E-OAVE, entails exploiting the inherent cyclic relationship between consecutive CP to further improve the coding efficiency of OAVE, with the corollary of combining the new CP generation scheme with better bit-rate coding being superior RD performance. In addition, to adaptively accommodate bit-rate constraints for BCGSE, a nonlinear optimisation has been utilised to sustain admissible bit-rates. The performance of the BCGSE framework has been extensively tested and analysed on a variety of different object shapes, with both quantitative and qualitative results consistently confirming its superiority compared with existing shape descriptor methods. The remainder of this paper is organised as follows: Section 2 provides a short overview of Bezier curve theory, while Section 3 presents the new BCGSE shape descriptor framework including the new CP generation and efficient vertex coding strategies. Experimental results are analysed in Section 4 to endorse the improved RD performance of the BCGSE model, while some general conclusions are provided in Section 5. 2. The Classical Bezier Curves { } The Bernstein form of an nth order BC with a CP set V j = v j ,0 , v j ,1,K, v j , n is defined by: v j (t ) = ∑ v j,k Bkn (t ) n k =0 ,0 ≤ t ≤ 1 (1) where Bkn (t ) = ⎛⎜ n ⎞⎟(1 − t )n − k t k are the Bernstein polynomials with ⎛⎜ n ⎞⎟ being the combination function, t is the ⎝k ⎠ ⎝k ⎠ weight of subdivision which determines the number of points on the BC, and subscript j represents the j th curve segment within the series of CBC that defines the shape. The attraction of BC being a member of a family of parametric curves, is that even with a few CP at the encoder, an arbitrary number of curve points can be generated at the decoder by controlling the steps in t , with the greater the number of CP on a curve leading to a smoother reconstruction. This unique parametric curve set includes both Hermite curves and splines, with BC chosen as the shape descriptor because it is computationally efficient and straightforward to calculate and unlike B-splines, no coordination is required at the segment end-points. Another cogent reason for choosing BC curves is they are affine invariant [14], so can be effectively used in searching any affine transformed shape such as for example, in multimedia retrieval applications. In this paper, cubic BC are used to define each shape segment, as lower order curves such as quadratic BC are less smooth. Whilst higher degree curves are preferable for shape approximation, they require more CP to be calculated and as a consequence the curve generation overhead is computationally greater. Moreover, encoding large numbers of CP inevitably mandates a higher bit-rate. The functional form of the cubic BC is represented as: v j (t ) = (1 − t )3 v j ,0 + 3t (1 − t )2 v j ,1 + 3t 2 (1 − t )v j ,2 + t 3v j ,3 (2) 3. Shape Encoding Strategy The BCGSE framework comprises two constituent components, namely the calculation of the CP for each segment, and secondly an efficient CP encoding strategy based on a new extension to the OAVE algorithm [13]. A set of strategies have also been proposed to accommodate the stringent bit-rate limits. These will now be respectively considered. Determining the Control Points In the first step of CP determination, the starting point of the contour needs to be determined, with in this paper, the highest curvature point being selected as this point. There are two main reasons for this choice: i) it is highly probable the highest curvature point will eventually be a segment end [15], and ii) such a starting point affords the desirable characteristic of being affine invariant, which is not always the case in other popular approaches, such as the first point on a raster-scan basis (from left to right, top to bottom). The next step in the CP generation involves sub-dividing a shape into segments at its corner points, where the number of segments may be either defined a priori or dynamically. The cornerity of a boundary point is characterised by the maximum arc-chord deflection at that point, so any point possessing a local cornerity maximum is in fact a corner [16]. Different approaches for detecting the corner points of a shape has been analysed and compared in [17], with the generic conclusion that the Beus-Tiu algorithm [18] provides the best perceptual results, and as a consequence this technique is employed in the new BCGSE shape descriptor framework. CP are selected from a set of points obtained from the boundary points and curvature-related shape information concerning the boundary, with the concept of significant and supplementary points being introduced so contour portions with more rapidly changing shape features like sharp edges and corners, are given greater priority over flatter portions. Significant points are actually the least number of shape points that can generate the original shape without any distortion, i.e., the boundary points are significant points where there is a change in shape direction. This crucially means that shape information is integrated into the CP generation process and that consecutive significant points will not necessarily be separated by 1 pel as is the case with shape points, so the larger the distance between consecutive significant points, the greater their influence upon the shape approximation. In these circumstances, a shape descriptor based solely on significant points can produce a higher distortion because influential significant points may be excluded from being CP. To reduce the likelihood of losing such influential significant points as CP, supplementary points are inserted equidistantly between the significant points. If the combination of significant and supplementary points is collectively referred to as candidate boundary points (CBP) in CP calculations, then a higher number of supplementary points infers the CBP tend towards the original shape points, while if there are insufficient supplementary points some significant points may not be adequately represented. To balance these extrema, the average distance between consecutive significant points over the entire shape is used to govern the judicious insertion of supplementary points. This procedure is mathematically formalised as follows: { Let the shape segments be represented as Si = si,0 , si,1,L, si , S 1≤i ≤ N i −1 }, where N is the number of segments, Si the number of shape points in the i th segment and for a closed shape si, S −1 = s(i +1)% N ,0 , where % is the i { } modulus operator, so the set Si forms an enclosed contour. If the significant points of the i th segment are denoted by Sig i ( then Sig i 1≤i ≤ N ) = sig i,0 , sig i,1 , L , sig i, Sig −1 ⊆ S i i where Sigi is the cardinality of the set Sigi . If d sigi, k −1, sigi, k denotes the Euclidean distance between two consecutive significant points sigi, k −1 and sigi , k in the i th segment, the average distance between consecutive significant points over the entire shape is d avg = 1 N ∑ N i =1 1 Sig i −1 ∑ d (sig i,k −1 , sig i,k ) . If d (sig i,k −1 , sig i,k ) > d avg , then supplementary points are now inserted Sig i −1 k =1 between sigi , k −1 and sigi , k , with the first (sp1) being placed a distance d avg from sigi , k −1 , and provided ( ) d sp1, sig i , k > d avg , a further supplementary point is placed a distance d avg from sp1 . This process is repeated until ( ) d spl , sig i, k ≤ d avg , where spl is the last supplementary point. Now let the CBP to be used to calculate the CP for { each segment be defined as:- Bi = bi ,0 , bi,1 L, bi, B 1≤i ≤ N i −1 }. The CP of a cubic BC for the i th segment can then be determined by equal sampling, which as will be shown in the next section, reduces the overall bit-rate requirement: vi ,0 = bi ,0 ; vi ,1 = b ⎡ Bi ⎤ ; vi ,2 = b ⎡ 2* Bi ⎤ ; vi ,3 = bi , B −1 i i, i, ⎢⎢ 3 ⎥⎥ ⎢⎢ 3 ⎥⎥ (3) It is therefore only necessary to encode those particular CP from which the approximated shape can be decoded, and in order to achieve better efficiency, these CP are normally differentially encoded. Control Point Encoding For encoding purposes, an Enhanced Object-Adaptive Vertex Encoding (E-OAVE) method is proposed that both differentially encodes the CP and utilises the inherent equidistance between the consecutive CP obtained from (3). The main reason for choosing OAVE is that apart from efficiently exploiting the coordinate information, it crucially enables the inherent regularity in the CP distances to be seamlessly embedded, thereby facilitating further bit-rate savings. OAVE comprises two major component blocks: i) object-level relative location dynamic range adaptation and ii) vertex-based encoding. Observing the equidistance between the consecutive CP, OAVE is enhanced in the first block by taking advantage of the fact that since the CP are approximately equidistance, the dynamic range will be lower. The overall bit-rate is then reduced by the second component block. The complete E-OAVE algorithm will now be delineated. If C = {c0 , c1, L , cL −1} is the ordered set of vertices to be encoded, then the various steps involved in the new enhanced object-level relative location dynamic range adaptation process are given in Algorithm 1. Table 1: Relative-location dynamic ranges and their indication symbols Dynamic Range Indicator 0 1 2 3 4 5 6 7 Relative-location dynamic range -1≤x,y≤1 -3≤x,y≤3 -7≤x,y≤7 -15≤x,y≤15 -31≤x,y≤31 -63≤x,y≤63 -127≤x,y≤127 -255≤x,y≤255 Algorithm 1: Object-level relative location dynamic range adaptation. ( ) ( ) 1. Calculate the relative address R of vertices Ri = Ri , x , Ri , y = ci , x − ci −1, x , ci , y − ci −1, y , 1 ≤ i ≤ L − 1 . 2. Determine X min = min Ri , x { 1≤i ≤ L −1 } and Y absolute value. { min { } = min Ri , y , where min is the minimum and ⋅ is the 1≤i ≤ L −1 } and Δy { 3. Obtain Δx max = max{Δxi } = max Ri , x − X min 4. Select two indicators ind x and ind y from Table 1, which correspond to the smallest dynamic range that includes Δxmax and Δy max . Also two indicators ind1x and ind1y corresponding to X min and Ymin . 5. Encode indicators ind x , ind y , ind1x and ind1y using 3-bit fixed length code (FLC). max = max{Δy i } = max Ri , y − Ymin } Since the CP are sampled at equal distances in (3), the deviation of the Ri, x components with respect to X min (and also Ri, y with respect to Ymin ) are not very high, so both Δxmax and Δymax in Step 3 will be small, as will the ensuing bit requirement. If conversely the deviations between consecutive CP are high, the coding will incur at most 6 extra bits (for an additional pair of dynamic range indicators) compared with the original OAVE algorithm. It needs to be highlighted however, that given the CP calculation scheme in (3), pragmatically the likelihood of this occurrence is extremely low (almost zero). The next stage in the BCGSE model is vertex coding, where with the exception of the first CP ( c0 ), whose absolute address is directly encoded, all other CP are differentially encoded in accordance with Algorithm 2. Figure 1: Octants in the Cartesian coordinates Algorithm 2: Encoding the relative address Ri for each vertex. 1. Determine the octant number of Ri according to Figure 1. 2. Encode each octant number using a 3-bit FLC. 3. Determine the major component from the octant number and encode its corresponding Δxi or Δyi values respectively using ind x + 1 or ind y + 1 bits depending upon the coordinate. 4. For the minor component, encode Δxi or Δyi , respectively ⎧ ⎧ ⎡ ⎡ ⎛ ⎞⎤ ⎫ ⎛ ⎞⎤ ⎫ − X min + 1⎟⎥ ⎬ bits. min ⎨ind x + 1, ⎢log 2 ⎜ R − Ymin + 1⎟⎥ ⎬ and min ⎨ind + 1, ⎢log ⎜ R 2 , , y i x i y ⎝ ⎠⎥ ⎭ ⎝ ⎠⎥ ⎭ ⎢ ⎢ ⎩ ⎩ using 5. Add up all required bits, Rtotal . To appreciate the overall coding efficiency achieved by the new E-OAVE algorithm, consider the following example. Let C = {(60, 51), (80,60 ), (62, 52 ), (45,45), (62,37 )} , so R = {(20,9), (− 18,−8), (− 17,−7 ), (− 17,−8)} . This means X min = 17 and Ymin = 7 , so ind1x = 4 and ind1y = 2 . Also Δx = {3,1,0,0} and Δy = {2,1,0,1} from Step 3 of Algorithm 1. Hence Δxmax = 3 and Δymax = 2 , with ind x = 1 and ind y = 1 . As all initial parameters must be coded, to encode R1 , E-OAVE requires 3 bits for the octant number, and 2 bits each for Δx1 and Δy1 giving a total of 7 bits. In contrast, for the original OAVE technique [13]: X max = 20 and Ymax = 9 , so ind x = 4 and ind y = 3 , thus to encode R1 incurs 3 bits for the octant, 5 bits for the major component and a further 4 bits for the minor component, i.e., a total of 14 bits. This represents a 50% saving for every CP, though it needs to be emphasised that setting the initial parameters in OAVE incurs only 6 bits (for ind x , ind y ), while it takes 12 bits for E-OAVE (for ind x , ind y , ind1x and ind1y ) and in the worst case, a further 14 bits to encode X min and Ymin , so the overhead for parameter initialisation can be up to a maximum of 26 bits. A pragmatic interpretation of this impost is that provided the number of CP is greater than four, E-OAVE will guarantee a lower bit-rate requirement, since four CP merely translates to a single cubic BC segment. The main conclusion from the above analysis is that whenever E-OAVE is applied in preference to OAVE, increasing the number of curve segments will always lead to more efficient coding. The key difference between the OAVE and E-OAVE approaches is best visualised in the symbolic example in Figure 2. While in the original OAVE, the window-of-interest to encode the next CP is bounded by X max and Ymax , in E-OAVE, the area bounded by X min and Ymin is ignored because the next CP cannot lie with this region, so the overall window size is reduced and coding efficiency improved. (a) Original OAVE (b) Enhanced OAVE Figure 2: Illustration of the difference between the original OAVE and E-OAVE techniques As will be evinced in Section 4, this new encoding strategy provides notably superior results compared with the dynamic fixed length coding (DFLC) strategy in [11] and [12] because DFLC only utilises the periodicity in the CP intervals. In contrast, E-OAVE exploits both the coordinate level information from the octant basis and object level information from the dynamic range indicators, conjointly with the CP distance regularity. In certain CP coding scenarios, the resulting bit-rate may exceed the admissible bit-rate whereupon it is important to take remedial action to maintain the permissible bit-rate. The next section presents a nonlinear optimisation solution for determining the number of segments able to be accommodated for a prescribed bit-rate. Nonlinear optimisation technique to accommodate admissible bit-rates Without loss of generality, the overall bit-rate requirement Rtotal is a non-decreasing function of N, since as N increases, so does the number of CPs, with the corollary that in such situations, Rtotal will increase. Since the octant number together with the major and minor components of each CP has to be encoded, for an admissible bit-rate constraint Rmax , pragmatically N cannot be made arbitrarily large. Conversely, a smaller N implies a less efficient utilisation of the available bit-rate and the propensity of large distortion, so an optimum value of N must be determined. The relationship between N and the bit-rate requirement is not of necessity linear, because for larger N, the distance between consecutive CP is smaller. Consequently, while the CP number linearly increases with N, the corresponding bit-rate requirement will not be linear, which precludes N being determined by a linear search method. A nonlinear optimization strategy is therefore proposed, with the problem formulated as follows: maximise N ′ , where N ′ is the candidate value of N subject to : Rtotal (N ′) ≤ Rmax N ′ ∈ Z + , where Z + is the set of positive integers. (4) The solution to (4) is the optimal value of N for BCGSE, which can be obtained by various search techniques including, the bisection method [19], convex hull search [20] and Bezier search [20]. For simplicity, the bisection method is adopted in this framework. Figure 3: Example showing the bisection method for determining optimal N while maintaining Rmax . Figure 3 illustrates the principle behind the bisection method to obtain the optimal N , i.e., maximum N ′ , for an admissible bit-rate Rmax . Two initial values of N ′ are chosen such that N1 results in Rtotal being lower than Rmax whilst N 2 gives an Rtotal greater than Rmax . By exploiting the non-decreasing property of Rtotal , the optimal value of N must then lie between N1 and N 2 . The bisection interval thus becomes N m = N1 + N 2 2 , with Rtotal being recalculated so if Rtotal (N m ) ≥ Rmax then N 2 = N m , otherwise N1 = N m . Repeating this procedure generates ever tighter bounds on the optimal N, so if at some point N1 ≥ N 2 , then N 2 is the maximum value of N ′ , i.e., the optimal N. The choice of the two initial values of N ′ controls both the convergence and computational cost of determining the optimal N, with the important constraint Rtotal (N1 ) ≤ Rtotal (N ) ≤ Rmax ≤ Rtotal (N 2 ) always being upheld. Ideally the initial values of N1 and N 2 will be 1 and ∞ respectively, though Rmax imposes an inherent upper and lower bound upon N 2 , with at least 5 bits being required to encode a CP: 3 bits for the octant number, and 1 bit each for the major and minor components. Furthermore, as each additional cubic curve segment requires 3 CP, if the initial parameters are coded by r ′ bits, the maximum value of N 2 will be ⎢ ⎢⎣ Rmax − r ′ ⎥ 3⋅5 ⎥⎦ =⎢ ⎢⎣ Rmax − r ′ ⎥ , 15 ⎥⎦ where ⎣ ⎦ is the flooring operator. With regard to N1 , a maximum of 19 bits is mandated to encode any CP, comprising 3 bits for the octant number, and 8 bits each for the major and minor components. Therefore, if the Rmax is meant to be fully R − r ′ ⎥ ⎢ Rmax − r ′ ⎥ utilised, the minimum N1 value will be ⎢ max . In summary, since = ⎢⎣ 3⋅19 ⎥⎦ ⎢⎣ 57 ⎥⎦ N2 N1 ≈ 4 , the optimal value of N can be determined in two iterations of the bisection method, so highlighting the computational efficiency of the BCGSE non-linear optimisation technique in automatically adapting to bit-rate constraints. Decoding shape information From a decoder perspective, since the differential and parametric representation of the encoded shape information and its periodicity are dynamically determined, and by implication also the delimiter of each parameter, the decoder will be able to correctly parse these parameters from the encoded bit stream to achieve shape reconstruction. Computational complexity analysis The three constituent components for the total computational time for these algorithms are: the division of the boundary contour into segments, CP calculation for each segment and CP encoding. As the new BCGSE model computational time. For CP calculation, BCGSE, [1] and [4] take O( Smax ) , where S max is the maximum number applies a similar type of cornerity detection to that used in [1], [2] and [4], the first phase always takes the same of boundary points for a segment, while the SDCC algorithm [2] requires O(r ⋅ Smax ) , where r is the number of iterations necessary to ensure convergence and also involves the computationally intensive process of calculating both a chord-length and the Bernstein function for each value of t . SDCB [3] conversely, is computationally efficient because it simply divides the shape into segments and generates the CP based on the number of boundary points though since the segment division is made arbitrarily without any perceptual consideration, large distortions can ensue as will be witnessed in the next section. The computational expenditure of the nonlinear optimisation solution for determining segment numbers in the BCGSE framework increases commensurately with the number of bisection iterations, with the overhead incurred to calculate the optimal N taking ⎡log 2 (N 2 − N1 )⎤ iterations, where ⎡ ⎤ is the ceiling operator. Given the proposed initial values of N1 and N 2 , only two iterations are required to obtain the optimal value of N . Apart from these BC-based shape coding techniques, for completeness the experimental results of BCGSE will be compared with the vertex-based ORD optimal shape coding framework [8], [9] in the next section, so it is worth ( ) highlighting the overall computational complexity of vertex-based algorithms is in fact O N B3 , where N B is the total number of contour points on a shape. 4. Results and Analysis The widely-adopted [20] shape distortion measurement metrics Dmax and Dms are used for the peak and Mean-Square (MS) distortions respectively. These are formally expressed as: Dmax = max 1≤i ≤ N 0 ≤ j ≤ S i −1 Dms = ( 1 M ) ∑ max ( d ′ si, j , vi ) ∑ d ′2 (si, j , vi ) 1≤ i ≤ N 0 ≤ j ≤ S i −1 (5) (6) where d ′ si , j , vi is the minimum Euclidean distance of si , j for the j th shape point of the i th segment, from the corresponding BC approximation vi , and M is the number of contour points, with Dmax and Dms both being measured by the accurate distortion measurement technique in [21]. The performance of BCGSE has been rigorously tested upon a number of popular object shapes used extensively in the literature [1], [3] and [4]. In addition, the MPEG-4 Dn metric was also used to validate the new BCGSE model provided superior performance to existing techniques for any accepted metric. Dn represents shape distortion by the ratio of the number of erroneous pixels in the approximating shape to the total number of pixels in the original shape [8]. It is formally defined as: Dn = number of pixels mismatched in the approximated shape number of pixels in the original shape (7) and is normally defined in percentile terms. A series of experiments were performed upon the various test shapes. The subjective results are presented in Figure 4, while the corresponding numerical distortion and bit-rate results are summarised in Tables 2 and 3 respectively. Figure 4(a) shows a comparison of the decoded shape using SDCB and the BCGSE algorithm applied upon the Lip object [4], which is characterised by repetitious vertices and loops with 5 segments. It is visually apparent the BCGSE shape approximation is very close to the original, while SDCB generates a structurally very different shape. This is numerically confirmed by the corresponding distortion results in Table 2 which reveal SDCB produced a Dmax, Dms and Dn of 10.2 pel, 5.4 pel2 and 2.55% respectively, in contrast to only 1.45 pel, 0.89 pel2 and 0.61% for BCGSE, so endorsing the underlying strategy of considering both cornerity and loops of a shape. Table 2 also corroborates that BCGSE consistently produced the lowest distortion of all the methods analysed (including those which considered curvature information when sub-dividing the shape into segments), so vindicating the integration of shape information within the CP generation process. 0 5 10 15 20 25 30 20 Shape BCGSE SDCB 30 40 50 60 (a) (b) 0 20 Shape BCGSE ASLM SDCC 40 60 0 20 (c) 20 30 30 40 40 50 50 0 10 20 (e) Shape BCGSE ASLM SDCC 60 Shape BCGSE ASAF 70 70 30 60 (d) 20 60 40 40 50 60 70 0 10 20 30 40 50 60 70 (f) Figure 4: Experimental results for various selected tests shapes: a) Lip; b)-d) Fish; and e)-f) Arabic characters Table 2: Distortion values for various shape representations Shape → Algorithm↓ Dmax (pel) BCGSE 3.0 SDCB 6.0 ACAF 4.0 SDCC 3.9 ASLM 6.0 Fish [3] Dms (pel2) 2.7 4.3 3.2 3.46 6.55 Arabic character [1] Dms Dn (%) Dn (%) Dmax (pel) (pel2) 0.85 1.12 0.80 0.61 1.81 2.1 1.35 0.95 1.46 1.3 0.95 0.65 1.45 1.35 1.2 0.95 1.84 1.4 1.4 0.66 Dmax (pel) 1.45 10.2 1.65 1.7 1.8 Lip [4] Dms (pel2) 0.89 5.4 1.05 1.2 1.5 Dn (%) 0.61 2.55 0.65 0.71 0.73 The next series of experiments were performed upon the Fish shape [3] which is characterised by having some sharp as well as more gradual contour portions. The comparative results for SDCB, ACAF, ASLM and SDCC for 7 segments are presented in Figures 4(b)-(d) and Table 2. BCGSE once again produced the lowest peak, MS and Dn distortion values of 3.0 pel, 2.7 pel2 and 0.85% respectively, outperforming all other BC-based shape descriptors, with a similar observation being made for the Arabic character [1] shape in Figures 4(e)-(f). Table 3a: Bit-rate requirements in shape representation where N is the number of segments. The values in the parentheses indicate the generated peak distortion (Dmax) in pel. N=5 N=6 Algorithm BCGSE 126 (3.85) 166 (3.0) SDCB [3] 244 (6.0) 292 (6.0) DFLC [11], [12] 193 (3.85) 230 (3.0) OAVE* 195 (3.85) 228 (3.0) Lip [4] BCGSE 123 (1.45) 159 (1.42) SDCB [3] 244 (10.2) 292 (9.5) DFLC [11], [12] 193 (1.5) 230 (1.42) OAVE* 170 (1.5) 190 (1.42) * CP encoded using the original OAVE technique [13]. Shape Fish [3] N=7 194 (3.0) 340 (5.0) 267 (3.0) 267 (3.0) 191 (1.0) 340 (8.0) 267 (1.0) 217 (1.0) N =8 234 (2.5) 388 (4.0) 304 (2.5) 280 (2.5) 215 (1.0) 388 (7.0) 304 (1.0) 235 (1.0) Table 3b: Average bit-rate requirements in shape representation where N is the number of segments. The values in the parentheses indicate the generated peak distortion (Dmax) in pel. N=5 N =10 N =15 Shape Algorithm MissAmerica BCGSE 205 (8.0) 355 (5.0) 525 (2.0) (qcif) SDCB [3] 480 (10) 940 (6.0) 1140 (2.4) DFLC [11], [12] 302 (8.0) 401 (5.0) 555 (2.0) OAVE* 242 (8.0) 375 (5.0) 530 (2.0) Akiyo(qcif) BCGSE 200 (8.0) 352 (5.0) 520 (2.0) SDCB [3] 480 (10) 940 (6.0) 1140 (2.4) DFLC [11], [12] 300 (8.0) 400 (5.0) 550 (2.0) OAVE* 241 (8.0) 370 (5.0) 527 (2.0) Stefan (sif) BCGSE 265 (10.0) 435 (8.0) 540 (4.0) SDCB [3] 480 (12) 960 (10.0) 1140 (6.0) DFLC [11], [12] 355 (10.0) 510 (8.0) 610 (4.0) OAVE* 280 (10.0) 458 (8.0) 585 (4.0) * CP encoded using the original OAVE technique [13]. qcif (176×144 pels ) resolutions of a frame. N =20 515 (1.0) 1920 (1.0) 550 (1.0) 520 (1.0) 512 (1.0) 1920 (1.0) 547 (1.0) 518 (1.0) 585 (2.0) 1920 (3.0) 650 (2.0) 605 (2.0) and sif (352×240 pels ) are the spatial To evaluate the impact of the E-OAVE strategy for CP encoding on the shape descriptor length, an investigation was undertaken analysing the bit-rate requirement for different techniques for various segment numbers, N . For an equitable comparison, it was assumed that the absolute coordinate values used in each algorithm required one byte of memory, though as mentioned in Section 1, both the direction and magnitude components in the SDCB algorithm are usually specified as floating point numbers, and so in practice these would mandate a parameter storage overhead of at least 4 bytes within the IEEE Standard 754 floating point numbering standard. Table 3a evinces that for the Fish object with 7 segments, the BCGSE descriptor length was 194 bits compared with 267 bits for the original OAVE CP encoding technique, a 28% bit rate improvement. This is directly attributable to the coding performance of E-OAVE in exploiting the causal regularity in CP distances, with analogous performance improvements observed for differing N values, as well as for the Lip object. Interestingly for the Fish shape, the DFLC techniques [11] and [12] also required 267 bits compared with the 340 bits for parametric coding [3], to corroborate the rationale behind E-OAVE in achieving superior bit rates in comparison to existing CP encoding methods. It is especially noteworthy in Table 3a, that while BCGSE, DFLC [11], [12] and the original OAVE [13] algorithm all encoded exactly the same CP set, thereby generating the same distortion and N values for each shape, BCGSE always incurred the lowest bit-rate. In contrast, SDCB [3] produced a different CP set and by implication, different distortion and bit-rate values. Table 3b also presents results for the popular standard test shape sequences Miss America, Akiyo and Stefan. These results also exhibit the same trend that BCGSE always provided superior results compared to the other techniques. To provide a comparative analysis of the computational complexities, the overall CPU times required for CP generation of the various algorithms were determined. Each algorithm was implemented in Matlab 6.1 (The Mathworks Inc.) and run on a 2.8GHz Pentium-4 processor, with 512MB RAM under Windows XP. As all existing algorithms did not have an explicit CP coding strategy, only the CPU times incurred in the CP generation phase were compared. The results in Table 4 reveal that BCGSE required less time than both the ASAF and SDCC techniques, so that for example, the overall CPU time requirements for the Arabic character were 7.9, 9.01 and 9.03 seconds respectively. The higher ASAF time is due to every boundary point in a curve segment involving a computationally expensive chord-length parameterisation and CP calculation, while for SDCC the trial-and-error method employed for CP generation involves iteratively computing the distortion for all vertices in the segment together with their tangents. In contrast, BCGSE firstly reduces the number of vertices by determining the significant points and then using this smaller vertex set instead of the larger set of boundary points. Interestingly Table 4 shows that both SDCB and ASLM were computationally faster than BCGSE, though it is important to stress that ASLM employs only a quadratic BC which necessitates the calculation of just one intermediate CP compared with BCGSE, requires two points, since it uses a cubic order BC [7]. The principal drawback in using lower order curves is that they produce higher distortions as evidenced by the corresponding results in Table 2. SDCB in fact, only processes the index number of the vertices on the boundary and so does not consider shape information at all which inevitably leads to higher distortion values, particularly when the shape contour exhibits sharp variations such as in the Lip object and Arabic character. Table 4: Required CPU time (in seconds) for various algorithms to generate the CP for different shapes. Shape → Algorithm↓ BCGSE SDCB ACAF SDCC ASLM* Fish [3] Arabic character [1] 3.01 1.72 4.02 3.99 2.51 7.01 4.01 9.01 9.03 6.05 *ASLM uses quadratic BC while all other algorithms use cubic BC Lip [4] 2.32 1.62 3.05 3.12 2.25 For completeness a further set of experiments were conducted comparing the proposed BCGSE model with the classical vertex-based ORD optimal shape coding framework [9], [10] using B-splines. Figures 5(a) and (b) respectively plot the computational-time (for coding only) vs. distortion and bit-rate vs. distortion (RD) results upon the Fish shape. (a) Distortion vs coding time (b) Distortion vs bit-rate Figure 5: Comparative performance of the new BCGSE model and the vertex-based ORD optimal shape coding framework for the Fish shape Figure 5(a) reveals that while the coding time for BCGSE remains approximately constant for increasing admissible distortion, confirming the complexity analysis in Section 3, the vertex-based optimal ORD framework incurs significantly higher computational and power overheads at larger distortions due to the exhaustive search method used to locate the minimum bit-rate path within the admissible distortion bound from a large-weighted directed-acyclic-graph. This huge discrepancy in coding performance more than offsets the bit-rate savings evident in Figure 5(b), of the slightly better overall RD performance of the ORD framework. Table 5: Adaptive admissible bit-rate results: The optimal number of segments N obtained at different prescribed bit rates for various shape sequences. The values in parentheses represent the respective utilised bit-rates. Shape MissAmerica(qcif) Akiyo(qcif) Stefan (sif) Admissible bit-rate Rmax (bits) 230 440 N=5 (205) N=13 (438) N=6 (228) N=13 (435) N=4 (229) N=10 (435) Finally, a series of experiments were conducted to appraise the performance of the nonlinear optimisation technique for adaptive bit-rate constraint, with the corresponding results summarised in Table 5 for the test shape sequences used earlier in Table 3b. These reveal that with an admissible bit-rate Rmax = 230 bits for MissAmerica, a maximum of 5 segments are required with 205 bits being utilised, while for 440 bits, 13 segments is the optimal number which correspondingly used 438 bits. These results are notably congruent with the findings in summarised Table 3b. Similar findings are observed for the other sequences to corroborate the capability of the BCGSE paradigm to sustain an admissible bit rate by automatically adapting to any bit-rate constraints. 5. Conclusion While Bezier curves have been applied in many different domains and applications including object shape description, a critical aspect in their performance is the appropriate selection of the segments and control points (CP) This paper has presented a generic shape encoder using Bezier curves (BCGSE) algorithm which provides an innovative strategy for CP calculation along with sub-dividing a shape into segments by considering domain specific shape information and a new efficient CP coding strategy called Enhanced Object-Adaptive Vertex Encoding. It has also developed a nonlinear optimisation strategy to enable the encoder to adapt to admissible bit-rate constraints. Both perceptual and numerical results have conclusively proven the improved performance in terms of both minimum distortion and bit-rate requirements of the new BCGSE model in comparison with other existing shape descriptor techniques. 6. Acknowledgements The authors acknowledge this work has been supported by a Monash University Post Publications Award, ARC discovery grants (DP0664228 and DP0771294), a University of Western Australia postdoctoral fellowship and a research development award. The authors especially wish to thank the reviewers for their perceptive comments, criticisms and overall guidance in significantly improving this paper. 7. References [1] Sarfraz, M., and Khan, M.A.: ‘Automatic outline capture of Arabic fonts’, Information Sciences, 2002, 140, (34), pp. 269-281 [2] Yang, H.-M., Lu, J.-J., and Lee, H.-J.: ‘A Bezier curve-based approach to shape description for Chinese calligraphy characters’, Proceedings of Sixth International Conference on Document Analysis and Recognition, 2001, pp. 276-280 [3] Cinque, L., Levialdi, S., and Malizia, A.: ‘Shape description using cubic polynomial Bezier curves’, Pattern Recognition Letters, 1998,19, (9), pp. 821-828 [4] Shdaifat, I., Grigat, R., and Langmann, D.: ‘Active shape lip modelling’, Proceedings of IEEE International Conference on Image Processing (ICIP), 2003, pp. 875-878 [5] Soares, L.D., and Pereira, F.,: ‘Spatial shape error concealment for object-based image and video coding’, IEEE Transactions on Image Processing, 2004, 13, (4), pp.586-599 [6] Zhang, R., and Wang, G., : ‘Some estimates of the height of rational Bernstein-Bezier triangular surfaces’, Proc. Geometric Modeling and Processing, 2004, pp.79-84 [7] Bartels, R.H., Beatty, J.C., and Barsky, B.A.,: ‘An introduction to splines for use in computer graphics & geometric modeling’, Morgan Kaufmann Publishers, 1987 [8] Katsaggelos, A.K., Kondi, L.P., Meier, F.W., Ostermann, J., and Schuster, G.M.,: ‘MPEG-4 and rate-distortionbased shape-coding techniques’, Proceedings of IEEE, 1998, 86, (6), pp.1126-1154 [9] Kondi, L.P., Melnikov, G., and Katsaggelos, A.K.,: ‘Joint optimal object shape estimation and encoding’, IEEE Transactions on Circuits and Systems for Video Technology, 2004, 14, (4), pp.528-533 [10] Sohel, F.A., Dooley, L.S., and Karmakar, G.C.,: ‘New- dynamic enhancements to the vertex-based ratedistortion optimal shape coding framework’, IEEE Transactions on Circuits and Systems for Video Technology, 2007, 17, (10), 1408-1413. [11] Sohel, F.A., Karmakar, G.C., and Dooley, L.S.,: ‘A generic shape descriptor using Bezier curves’, Proc. Int. Conf. Information Technology: New Trends in Image Processing (ITCC), 2005, II, pp. 95-100 [12] Sohel, F.A., Karmakar, G.C., and Dooley, L.S., ‘An improved shape descriptor using Bezier curves’, Proc. First Int. Conf. Pattern Recognition and Machine Intelligence (PReMI), Lecture Notes on Computer Science, Springer, 2005, 3776, pp.401-406 [13] O’Connell, K.J.,: ‘Object-adaptive vertex-based shape coding method’, IEEE Transactions on Circuits and Systems for Video Technology, 1997, 7, (1), pp. 251-255 [14] Hill Jr., F. S.,: ‘Computer Graphics’, Prentice Hall, Englewood Cliffs, 1990 [15] Chetverikov, D. and Szabo, Z.,: ‘A simple and efficient algorithm for detection of high curvature points in planar curves’, Proc. 10th International Conference, CAIP 2003, Groningen, The Netherlands, Lecture Notes on Computer Science, Springer, 2003, 2756, pp. 746-753 [16] Phillips, T.-Y., and Rosenfeld, A.,: ‘A method of curve partitioning using arc-chord distance’, Pattern Recognition Letters, 1987, 5, (4), pp. 285-288 [17] Liu, H.C., and Srinath, M.D.,: ‘Corner detection from chain-code’, Pattern Recognition, 1990, 20, (3), pp. 51-68 [18] Beus, H.L., and Tiu, S.S.H.,: ‘An improved corner detection algorithm based on chain coded plane curves’, Pattern Recognition, 1987, 20, (3), pp. 291-296 [19] Scarborough, J. B.,: ‘Numerical mathematical analysis’, Baltimore, Johns Hopkins, 1966 [20] Schuster, G.M., and Katsaggelos, A.K.,: ‘Rate-distortion based video compression-optimal video frame compression and object boundary encoding, Kluwer Academic Publishers, 1997 [21] Sohel, F.A., Dooley, L.S., and Karmakar, G.C.,: ‘Accurate distortion measurement for generic shape coding’, Pattern Recognition Letters, 2006, 27, (2), pp.133-142

Log In

Bezier curve-based generic shape encoder

Related papers

Related papers

Related topics