Keywords

1 Introduction

The feedforward neural network models have been proposed in recent decades providing a broad range of solutions to a even vast set of problems. These problems may require the ability to generalize large amounts of data from natural or artificial phenomena that require high complex nonlinear mappings - for example, pattern classification problems and regression.

These models focus on different treatments to the network behaviour, ranging from the learning process, to how the input is treated to produce the output. The Least-Mean-Square (LMS), proposed by Widrow and Hoff [1] is the foundation for adaptive filtering. For the Perceptron network, the LMS behaves as a low pass filter, attenuating high frequency components and passing up the low-frequency components; however, the Perceptron is only able to solve problems linearly separable. In fact, algorithms more complex than LMS, based upon Backpropagation [2] and some variations, such as, Quickprop [3] and RPROP [4], through the addition of more than one hidden layer, provide the ability to solve nonlinear problems. However, according to their training set size, the training phase.

In recent years, neural network models have been proposed for edge detection in images. In [5], a competitive self-organized neural network is used to qualify the types of edges which are derived from the sum of the magnitudes of the differences from the central pixel in a mask size of \(3 \times 3\) applied to grayscale images. The output is a fuzzy system used to map the pixels used as edge and non-edge. A similar approach is held in [6] where a feedforward neural network is used with n neurons in the input layer, \( 2n + 1 \) neurons in the hidden layer and a single neuron in the output layer. An error retro-propagation algorithm is used as a learning rule. Furthermore, a fuzzy logic system is adopted to improve the generalization capability of the network. In [7] is used a feedforward neural network which uses the backpropagation algorithm as learning rule. In such work, the image is first binarized and then pass through edge detection process. Differently from the previous approaches, the masks of size \(2 \times 2\) are used for feature extraction and pixel classification is divided into two classes: border and no border.

From the point of view of image processing, edge detection is given by a two-dimensional high-pass filtering. In addition to the neural networks models, there are classical models which are based on the calculation of the gradient luminosity function or on the estimation of corresponding masks to the derivative of the second order, as in the case of the Laplacian operator [8]. Among the methods that use the calculation of the gradient, we can cite the well-known edge operators, such as, Sobel, Prewitt [9] and Roberts. Moreover, Canny [10] proposes an approach that seeks to improve how the edge is determined through a paradigm that defines that a point marked as edge is the closest as possible to the center of a real edge, so that a single edge must have only one point. Canny uses a high-pass filtering with a mask estimated from the first order derivative vertical and horizontal. In possession of those vertical and horizontal images, the method follows its course by calculating the brightness gradient generating the magnitude image as output.

In this context, in this paper we propose a model along with RBF that behaves just like an adaptive high-pass filter. In addition, we will address its usage in image processing with focus on edge detection. Our proposal will be able to estimate a single mask, the weight vector, considering a linearly separable problem. To demonstrate its effectiveness and flexibility, we compare our proposal with traditional edge detectors, such as, Prewwit, Sobel and Canny.

2 One-Dimensional Approach

Our main goal is through a method be able to distinguish between two types of features in signal behavior: (i) high-frequency transition behavior and; (ii) low frequency signal. In addition, the method must maximize the first case, and subdue the second one.

For this to be possible, we will consider five paradigms previously determined, including: (i) filtering the signal held by the method is adaptive in the sense that it considers neighboring from the point being processed; (ii) the mask is estimated from a binary linear system, i.e., a hyperplane between high and low frequency features must be enough to distinguish them; (iii) the mask should be versatile, with the possibility of different amounts of neighbors from the point that is being processed; (iv) the method performs a transformation in the input signal f so that f can be distinguish by the mask; (v) the problem is defined by a noisy Gaussian distribution.

In signal processing, we say that the convolution is the answer to an impulse that was inferred as input into a system. The discrete convolution is given by:

$$\begin{aligned} h(k) = \sum \limits _{n=-\infty }^{\infty }f(n)g(k-n) \end{aligned}$$
(1)

where g is the filter that moves to the right relative to the signal f at each iteration.

In this article, the equation of convolution should be extended to the matrix form so that f is partitioned into several inputs of same size of g, the mask that determines the neighborhood towards the current point. Consider a signal \(f = (x_1, \ldots , x_m)\) and \(\mathbf g = (g_1, \ldots , g_n)\), responsible for making the adaptive high-pass filtering. Thus, the signal will be partitioned in m feature vectors \(\mathbf f _i\) so that each \(\mathbf f _i\) has size n. Lets take a example with \(m = 4\) and \(n = 5\):

$$\begin{aligned} \mathbf h = \mathbf{Fg } = \left[ \begin{array}{cccccc} 0 &{} 0 &{} x_1^* &{} x_2 &{} x_3\\ 0 &{} x_1 &{} x_2^* &{} x_3 &{} x_4\\ x_1 &{} x_2 &{} x_3^* &{} x_4 &{} 0\\ x_2 &{} x_3 &{} x_4^* &{} 0 &{} 0\\ \end{array}\right] \left[ \begin{array}{rrcccrr} g_1\\ g_2\\ g_3\\ g_4\\ g_5 \end{array}\right] \end{aligned}$$
(2)

where the \(x_j^*\) is the point being processed. Note that, with a mask of size \(n = 5\), the point \(x_j^*\) will have four neighbors and its value after adaptive filtering will be a inner product of \(\mathbf f _i\) with \(\mathbf g \). By observing the Eq. (2), we can define the convolution in matrix form as a linear system, in which \(\mathbf g \) acts as an adaptive filter in \( \mathbf F \) and each line \(\mathbf f _i\) in \( \mathbf F \) acts as a feature vector.

It is straightforward to realize that the adaptive filter \(\mathbf g \) is equivalent to the unknown variables of a linear system which in the neural networks context refer to the weight vector \(\mathbf w \), see Eq. (2). Kohonen and Ruohonen [11] proposed OLAM (Optinal Linear Associative Memory) a model that estimates the weight vector \(\mathbf w \) through the pseudo-inverse of the feature matrix. Furthermore, OLAM has a extremely fast learning ability for a totally linear separable problem. The calculation of the weight vector by OLAM is performed based on matrix operations, as can be seen in Eq. (3), in which, \(\mathbf X ^\mathbf + = (\mathbf X ^\mathbf T {} \mathbf X )^{\mathbf {-1}}{} \mathbf X ^\mathbf T \) is called the pseudo-inverse of the features matrix \(\mathbf X , \mathbf D \) is the matrix that contains the label for each line of \( \mathbf X \), and \( \mathbf W ^\mathbf * \) is the matrix of optimal weights.

(3)

Thus, we can conclude that:

(4)

where the training patterns X and their labels D should be determined during the training.

As the homogeneous regions can be translated into low frequency signals in the feature space, we denote them by a vector of length n which has only values equal to 1 in its structure. Moreover, as we want to attenuate low frequency signals, we define their label to 0. Once defined the features of the homogeneous region, it is straightforward to consider the inverse for a high frequency region. In this case, it is given by the difference in relation to the neighboring point being processed. Therefore, we represent our pattern as a vector of length n filled with zeros, but with its center with value 1. From these statements, we obtain:

$$\begin{aligned} \mathbf X = \left[ \begin{array}{rrcccrr} 0_{11} &{} 0_{12} &{} \ldots &{} 1_{1\lceil n/2 \rceil } &{} \ldots &{} 0_{1n-1} &{} 0_{1n}\\ 1_{21} &{} 1_{22} &{} \ldots &{} 1_{2\lceil n/2 \rceil } &{} \ldots &{} 1_{2n-1} &{} 1_{2n} \end{array} \right] \text { and } \mathbf D = \left[ \begin{array}{rrcccrr} 1_{1}\\ 0_{2} \end{array} \right] \end{aligned}$$
(5)

where the first line of \(\mathbf X \) and \(\mathbf D \) represents, respectively, the high-frequency feature and the high-frequency label – the second line is the representation for low-frequency feature and label. In this sense, the high-frequency points will tends to 1 and the low-frequency points will tends to 0.

In fact, we can not compute the pseudo-inverse on \(\mathbf {X}\), because it is a non-singular matrix. To overcome this issue, we must apply a regularization term \(\lambda \) which is multiplied by the identity matrix I. Therefore, we obtain the equation

$$\begin{aligned} \mathbf {h} = \mathbf {F}[(\mathbf {X}^{\mathbf {T}}\mathbf {X} + \lambda \mathbf I )^{\mathbf {-1}}{} \mathbf X ^\mathbf{T }{} \mathbf D ] \end{aligned}$$
(6)

It is noteworthy that the central feature in the two patterns in X has its values equal to 1 on purpose, because these values are the bias. This approach is extremely important for the development of the our proposal in its final state, as will be presented next.

2.1 Signal Transformation

So far we have covered the first three predefined paradigms, which, in a nutshell, define that: (i) the filter is adaptive; (ii) the mask is estimated from a binary linear system, and; (iii) the mask is versatile with respect its size n. However, despite our method estimate the filter from a linear system, the edge detection problem is not linear. In this sense, our proposal must take it into consideration. For solve this problem, our proposal makes a transformation in each line \(\mathbf f _i\) in \(\mathbf F \) so that \(\mathbf f _i\) can be solved by \(\mathbf g \).

Thus, we consider a Gaussian radial basis function in which the point being processed acts as its mean \(\mu _i = x_j^*\). That said, we can then define our transformation as:

$$\begin{aligned} \varPhi _{ij}(x_j, \mu _i, \sigma ) = \exp \left( {\dfrac{-\Vert x_j-\mu _i\Vert }{2\sigma ^2}} \right) \end{aligned}$$
(7)

where \(\sigma \) is the standard deviation, \(x_j\) is a neighbor of \(x_j^*\), and \(\varPhi _{ij}\) is a kernel function which performs the transformation in the signal through a Gaussian radial basis function. Now, using the \(\mathbf F \) matrix in Eq. (2) as example, we can extend it to:

$$\begin{aligned} \mathbf F = \left[ \begin{array}{cccccc} 0 &{} 0 &{} \varPhi _{11}^* &{} \varPhi _{12} &{} \varPhi _{13}\\ 0 &{} \varPhi _{21} &{} \varPhi _{22}^* &{} \varPhi _{23} &{} \varPhi _{24}\\ \varPhi _{31} &{} \varPhi _{32} &{} \varPhi _{33}^* &{} \varPhi _{34} &{} 0\\ \varPhi _{42} &{} \varPhi _{43} &{} \varPhi _{44}^* &{} 0 &{} 0\\ \end{array}\right] \end{aligned}$$
(8)

It is noteworthy that the transformation of \(\mathbf f _i\) behaves differently for different values of \( \sigma \). When \( \sigma \rightarrow + \infty \), the signal is considered to be homogeneous, regardless of having high-frequency regions. In contrast, when \( \sigma \rightarrow 0 \) it is considered that all different neighbors \(x_j^*\) have values close to zero. Such behavior results in multiple high-frequency regions when in fact they are not.

It is important to realize that this approach is extremely sensitive to noise. Thus, to overcome this situation, it is necessary to apply a low-pass filter to smooth the signal before passing it to our method. However, the low-pass filter must allow to find the center of the edge and maximize it. Therefore, we must consider that the center of the edge is formulated in such a way that it can be represented by \(\left[ 0_1, \ldots , 1_{\lceil n/2 \rceil }, \ldots , 0_n \right] \).

Intuitively, we can predict that the way to maximize the center of the edge in such conditions is to in the high frequency transition, the transition center is in equal distance on the two extremes. Thus, Gaussian filtering becomes mandatory because it performs a smoothness eliminating noise and smoothing edges.

The 1-D Gaussian filter can be obtained from:

$$\begin{aligned} G(x, \sigma ) = \exp \left( \dfrac{-x^2}{2\sigma ^2} \right) \end{aligned}$$
(9)

It is depicted in Figs. 1 and 2 examples of approximation of homogeneous and high frequency transition regions, respectively. As one can see in Fig. 1, the low frequency region \(\mathbf f _i\) is close to zero values, however, after signal transformation values were approximated from a vector containing only values equal to 1. According to the \(\mathbf X \) and \(\mathbf D \) matrix in Eq. (5), the point \(x_j^* \in \mathbf f _i\) being processed will be attenuated by \(\mathbf g \). In Fig. 2, the signal processed by the Gaussian filter is translated so that the network recognizes \(x_j^* \in \mathbf f _i\) as a transition high frequency and will maximize it.

Fig. 1.
figure 1

From top to bottom: (a) low frequency input \(\mathbf f _i\), (b) signal smoothing using Gaussian filtering and (c) input transformation by our proposal.

Fig. 2.
figure 2

From top to bottom: (a) high frequency input \(\mathbf f _i\), (b) signal smoothing using Gaussian filtering and (c) input transformation by our proposal.

Once formulated our proposal, we observe their behavior in relation to how it is parameterized. We perform two operations using Gaussian – Gaussian filtering and low-pass parametric approach – we must define two values of variance, where \( \sigma _1 \) represent the variance in Eq. (9) and \( \sigma _2\) represent the variance in Eq. (7). Furthermore, the mask size is defined by n, which preferably should always be defined with valuesodd in order to preserve the central point \(x_j^*\), and the same number of neighbors on both sides.

3 Two Dimensional Approach

From 2D approach, the mask should not be defined only in relation to the horizontal direction, but also considering the vertical and diagonal directions. With that in mind, we now have to deal with a matrix of size \(m \times n\). However, our proposal is quite flexible because we just have to transform the 2D mask \(m \times n\) into a 1D vector of size \(m \cdot n\). This way, our matrix X can be defined by:

$$\begin{aligned} \mathbf X = \left[ \begin{array}{rrcccrr} 0_{11} &{} 0_{12} &{} \ldots &{} 1_{1\lceil n/2 \rceil } &{} \ldots &{} 0_{1(m \cdot n)-1} &{} 0_{1(m \cdot n)}\\ 1_{21} &{} 1_{22} &{} \ldots &{} 1_{2\lceil n/2 \rceil } &{} \ldots &{} 1_{2(m \cdot n)-1} &{} 1_{2(m \cdot n)} \end{array} \right] \end{aligned}$$
(10)

thereby, a mask with 8-connected neighbors (\( m = 3 \) and \( n = 3 \)) would be incorporated into a vector of size 9, in which the currently point being processed is located at the position 5. Thus, for our proposal become two-dimensional, this is the only modification to be made. Moreover, the Gaussian filter must also be extended into a two-dimensional format, which can be estimated by:

$$\begin{aligned} G(x, y, \sigma ) = \exp \left( \dfrac{-(x^2 + y^2)}{2\sigma ^2} \right) \end{aligned}$$
(11)

Following to the results obtained in the two-dimentional approach, Fig. 3 illustrates the edge detection with different parameters.

Fig. 3.
figure 3

Edge detection by our proposal using \(m = n = 3\) (b, c, d) and \(m = n = 15\) (f, g, h), and \(\sigma _1 = 1\), with different values for \(\sigma _2=\{0.1,0.05,0.025\}\).

As presented in Fig. 3, only the regions with high frequency transition were classified as edge. Furthermore, it is possible to detect lower frequency regions due to the flexibility of our proposal. In Fig. 3(c) and (d) is depicted some edge detection results by changing \(\sigma _2\) to 0.05 and 0.025, respectively.

One can see that as the value of \(\sigma _2\) decreases, the local maxima representing the edges are maximized as well as the more image details are detected. Besides the mask \(3\times 3\) (Fig. 3(b–d)), larger masks can be used, as shown in Fig. 3(f–h). – it is noteworthy that the examples use the same mask at the low-pass filter and the edge detection, which is not necessarily required.

The disadvantage of using very large masks is because in the smoothing process, the image may lose information and, depending on the application, this information may be critical. However, the advantage, as can be seen in Fig. 3 is the local maximum of the edges, i.e., the centers, are more exposed. Such advantage is handy if we need to find the centers of the edges due to the maximized exposure of such the local maxima.

4 Results and Discussion

In this section we briefly compare our proposal to three widespread methods in image processing literature; among them: (i) Prewitt; (ii) Sobel, and; (iii) Canny. We should emphasize that, in relation to these methods, our proposal performs a kind of estimate of the magnitude images. Thus, for a fairer comparison in relation to the Canny method, we only used the calculation of the magnitude image, which is based on two filterings performed by an estimated mask by computing the first derivative Gaussian in horizontal and vertical directions, and then we compute the gradient from these two images.

In order to improve our evaluation and discussion when comparing our proposal to the State-of-art, we employed four statistical measurements. The four statistical measurements are the accuracy (ACC), F-Measure (\(F_\beta \)), the PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structural Similarity) [12].

Table 1. Results of our proposed method with different configurations (\( \sigma _1 \) and \( \sigma _2 \)) for the metrics, ACC, \(\text {F}_{\beta }\), PSNR and SSIM, adopting Prewwit (P), Sobel (S) and Canny (C) as Ground Truth (GT).
Table 2. Results of our proposed method with different configurations (\( m, n, \sigma _1 \) and \( \sigma _2 \)) for the metrics, ACC, \(\text {F}_{\beta }\), PSNR and SSIM, adopting Prewwit (P), Sobel (S) and Canny (C) as Ground Truth (GT).

Tables 1 and 2 shows a comparison of our results with the methods Prewitt, Sobel and Canny. The comparison is carried out by adopting the output from the such methods as ground truth, so the metrics can be computed. The results show the configuration of our proposal for analysis. The parameter values used are \( m = \{3, 9 \}, n = \{3, 9 \}, \sigma _1 = \{1, 10 \} \) and \( \sigma _2 = \{0.025 , 0,05, 0.075, 0.1, 0.250 \} \). We used an image dataset with 12 images and we compute the mean and standard deviation for each metric and each configuration.

Observing Tables 1 and 2, it is possible to notice the flexibility of our proposal against Prewitt, Sobel and Canny. We obtain results that depending on the initial configuration, may or may not be close to the results from other methods. In Table 1, where the mask is \(3\times 3\) (\( m = n = 3 \)), regarding the SSIM, we highlight a greater similarity to the Prewitt method when \( \sigma _1 = 1 \) and \( \sigma _2 = 0.1 \). With respect to the PSNR metric, the noise is less significant compared the Sobel method when \( \sigma _1 = 1 \) and \( \sigma _2 = 0.1 \). However, when \( \sigma _2 = \) 0.25, we get closer results for ACC and F\(_{\beta }\), except for the Canny method, which concentrated the best results for ACC and F\(_{\beta }\) when \( \sigma _2 = 0.1 \).

Observing Table 2, where the mask is \(9\times 9\) (\( m = n = 9 \)), we see lower scores if we take into account the similarity with the resulting images. This is an advantage of our proposal due to no restriction results in its magnitude image. In all cases the results have distinction, whether or not discrepancy.

Regarding the algorithm execution time, we stored the time that the model took to train, group the data and process the information. Also, in our simulations we used Matlab version R2014b. For the mask \( 3 \times 3\) we achieved in average 2.15 s to carry out our simulations. As for the mask \( 9 \times 9\), the average time was around 3.5 s. This is expected because a mask \(9 \times 9\) results in a feature vector of size 81 against 9 when a mask \(3 \times 3\) is adopted.

What most distinguishes our proposal from methods based on gradient, and we consider it an advantage in its usability, is the flexibility with respect to it can adapt depending on the current application need, so feasible results can be generated through the parameter tuning. This flexibility is built into the system, especially with the variation of the initial variable \( \sigma _2\) as well as the variation of the mask size. This ability to cover a lot of possible outcomes makes it possible to open new possibilities for different applications in imaging context.

5 Conclusion

It was presented in this paper a neural network capable of estimating an adaptive high-pass filter for signal processing based on an parametric approximation held in the hidden layer. Moreover, our proposal was employed for detecting edges in images. With its flexibility, one can obtain different results allowing its use for various kinds of applications. Two key features of their flexibility are in: (i) the ability to estimate the training patterns, providing freedom to choose the size of the mask being used at the signal filtering and (ii) the choice of \(\sigma _2\) values, which may affect the network to a more restricted parametric regression, or less restricted.

As future works, we intent to create an efficient post-processing method which is able to determine the edges of the central pixel while maintaining the integrity of the segment edges in the images.