1. Introduction
A time series is a set of observations, each one being recorded at a specified time. Time series analysis has been an important branch of both the stochastic process and mathematical statistics. Various time series can be found in the fields of engineering, science, sociology, and economics. The theory and methods of time series analysis have been extensively developed and achieved great success in the modeling and prediction of time series [
1].
There are several famous time series models, such as autoregressive (AR), autoregressive moving average (ARMA), autoregressive integrated moving average (ARIMA), and autoregressive conditional heteroskedasticity (ARCH), which have been proposed for the purpose of future prediction [
1]. There is extensive literature on the prediction of the future for some system using these models. For example, Metghalchi et al. proposed testing moving average technical trading rules for the NASDAQ (National Association of Securities Dealers Automated Quatations) composite index. They showed that moving average rules indeed have predictive power and could discern a recurring-price pattern for profitable trading [
2]. Li et al. presented an intelligent prediction approach for degradation prognostics of rotating machinery based on an asymmetric penalty sparse decomposition algorithm combined with an autoregressive moving average-recursive least square algorithm (ARMA-RLS) and wavelet neural network [
3].
Note that all of the data concerned with the models mentioned above are represented by real numbers or vectors. However, in this big-data era, various complex data have arisen in many fields of sciences and technologies. Among them, the interval-valued data, or more general, the set-valued data, have received great attention in recent years, since they are, in some sense, the extension of incomplete, missing, or censored data. Examples include the interval representing the salary range for a person, the interval representing the range of blood pressure for a person, the range of the weather temperature for a special day in some city, and some data represented by a complex medical image, symmetric color picture, etc. In the system decision-making area, we also face human perception mixed data, such as linguistic data, whose values are not numeric but are words or sentences of some language, some of which can be represented by nearly symmetric fuzzy numbers. We refer to such data as fuzzy data.
Accordingly, in recent years, the stochastic processes with set-valued members have received attention in the literature. Li et al. [
4] considered fuzzy set-valued Gaussian processes and Brownian motions, in which the classical Gaussian stochastic process was extended to a case where the process elements are allowed to take values of fuzzy sets, and a new fuzzy Brownian motion was firstly introduced. Bongiorno [
5] presented a note on the former Brownian motion, where it was pointed out that the former fuzzy set-valued Brownian motion can be handled by an
n-dimensional vector-valued Wiener process, since the expectation of the fuzzy set-valued element is a constant. Furthermore, Wang et al. [
6] firstly proposed an interval-valued stationary time series modeling approach, in which an interval-valued
p-order autoregressive (AR(
p)) model was proposed. Note that, here, they did not considered the stochastic process or time series with linguistic data. These works raise the possibility that some extension of time series modeling [
1] to linguistic data (perception mixed data) could be realized under the consideration of ordinary stochastic correlation between the elements of the time series process.
We are aware that interval-valued or linguistic-valued data benefit from having a higher volume of information compared to real number-valued data. For instance, finance and economics are far from being free from imprecision or uncertainty. In the process of reducing some economy-related quantities and magnitudes to numbers and mathematical concepts, we have to deal with a wealth of vague terms (confidence, fear, instability, risk, etc.) which are meaningful for us. For example, a set of stocks with small volatility or countries with high unemployment rates are not crisp descriptions, since the words “small” and “high” are vague in meaning, reflecting a judgment of the observers for the observed objects based on their own perception. Also, the investor’s expected values of the future returns for investments are often given in a linguistic form such as “very optimum”, “around the values of last year’s return”, “may at least cover the cost”, etc. One typical feature of the linguistic data is that the data are characterized with fuzziness, therefore, it is often recommended to employ the fuzzy sets to model the linguistic data. Using a fuzzy set to model linguistic data is meaningful: the fuzzy set is not only easier to apply than words in mathematical modeling, but it also embraces more information with respect to the empirical judgment, as well as the emotional reaction of the human, than that of real numbers.
It has been demonstrated that the extension of time series models to the case of linguistic data (fuzzy data) was developed along two lines—parametric methods and nonparametric methods—in the literature.
When the parametric method is applied, the form of the original time series models is not changed; instead of the original real number-valued data, the linguistic data and their arithmetic operations are used. Such work can be found in Wang [
7], in which the authors primarily proposed a special conceptualized
p-order autoregressive model AR(
p) (where
p is a positive integer and
) with n-dimensional fuzzy data [
8] in the way of the set-valued stochastic process, wherein the semi-linear structure of the space of all fuzzy sets, the expectation, variance, and covariance of fuzzy random variables ([
9]) are considered for the construction of the model. However, there was no work on the model’s estimation. Wang [
10] further noted that former autoregressive models contain some deficiencies, so the model was complemented with an ARMA model and its primary application in financial market forecasting was proposed. Jung et al. [
11] also considered a unified approach to asymptotic behavior for parameter estimation for an AR(1) model of a fuzzy number-valued time series, where a brief outline on the modeling of time series with fuzzy number inputs and fuzzy number outputs was given. An illustrative example of the AR(1) model with fuzzy numbers is that of the Dow Jones Industrial Average (DJI) index time series [
11]. A significant advantage of the parametric methods is that the original natural relationships between the elements of the time series are maintained and investigated during the modeling.
When the nonparametric method is applied, we not only change the form of the original time series models, but also replace the original data with linguistic data (fuzzy data). There are a number of studies on this topic, which is called a fuzzy time series. For instance, in [
12,
13], the fuzzy time series were firstly proposed as a series with elements taking the values of linguistic or vaguely described data, and the elements can be linked with each other using fuzzy logical relationships that need to be given subjectively by a human. Various improvements and developments on the above fuzzy logical relationship-based fuzzy time series were given by [
14,
15,
16,
17], and others, where more effective forecasting models, such as two-factor high-order fuzzy time series forecasting, deterministic vector long-term forecasting, etc., were proposed. The fuzzy logical relationship-based fuzzy time series modeling methods are largely based on intellectual computing, such as the fuzzy relational equations and approximate reasoning. It should be pointed out that such soft computing methods may optimally capture the fuzzy information involved in the elements of the time series, however, the natural stochastic relationships between the elements of the time series are completely ignored, which may lead not only to a biased prediction for the future when we apply the fuzzy time series models for forecasting, but also to a disdain for investigating the mathematical statistical properties of the time series.
Our main interests are in the parametric methods for modeling the time series with linguistic data mentioned above, where the obtained previous results are reviewed. We are aware that there are several fundamental problems, such as parameter estimation (model estimation), asymptotic properties of the estimators, etc., which remain to be investigated further. For instance, parameter estimation has been carried out only for the AR(1) and ARMA(1,1) models with fuzzy data [
10,
11], and the asymptotic properties (consistency properties) of the estimators have been obtained only for the AR(1) model with fuzzy data [
11]. In this study, based on previous works [
7,
10,
11], we firstly investigated the asymptotic properties of the estimators for a (1,1)-order autoregressive moving average model ARMA(1,1) based on linguistic data (fuzzy data), then used the justified ARMA(1,1) model to forecast the future of the HSI with a simulation analysis.
This article proceeds as follows. In
Section 1, the related previous work and some existing problems are discussed.
Section 2 introduces the basic concepts of fuzzy sets, arithmetic operations for fuzzy sets, correlation, and independence, as well as expectation and Fréchet variance, and covariance under the
metric
(proposed by Näther [
9]) for fuzzy random variables. In
Section 3, the asymptotic properties for a special ARMA model for fuzzy data-valued time series with standardized terms is described, and some extension of the classical results on causality for the ARMA models is presented. In
Section 4, an empirical analysis of the proposed models in the linguistic monthly HSI time series modeling and prediction is detailed. In
Section 5, we present a conclusion for this article.
3. A Fuzzy Set Valued ARMA Model Based on a Standardized Process
Based on the concepts of the Fréchet covariance and Fréchet linear correlation for the FRVs defined in the former section, we consider some autoregressive models for fuzzy data-valued time series. In a real-world situation, one may perceive such a process as a sequence of investment approximate returns by time. Even the observers timely evaluations on some stock prices may also form such a time series. Note that an example of autoregressive sequence of one-dimensional FRVs and the related correlation function had already been proposed by Feng et al. [
26].
Definition 2. Let be a process of FRVs valued in with second order under the metric . If t denotes the time points, then is said to be a fuzzy data valued time series. The Fréchet covariance function of the process is defined by , . The process is said to be wide-sense (weakly) stationary if it holds that are independent of t, where is the set of all integers.
Note that for a wide-sense stationary fuzzy data-valued time series, the Fréchet covariance function can be simply denoted by , since .
Example 1. For a process of Gaussian FRVs ([27]): , where random vector , an n-dimensional Gaussian distribution with zero mean vector, the Fréchet covariance function can be carried out aswhere are real-valued n-dimensional random vectors with multivariate Gaussian distribution , and is the classical covariance of random variables It is obvious that a process of Gaussian FRVs is mutually uncorrelated in the sense of the Fréchet correlation if and only if the process of the Gaussian random vectors is mutually uncorrelated in the sense of the Fréchet correlation. Note that the Fréchet correlation between two random vectors is different from the conventional concept of correlation of two random vectors in multivariate statistics; the former depends on the Fréchet covariance, whereas the latter depends on the ordinary covariance matrix. Also, in this example, we can determine that the wide-sense stationarity of the process of Gaussian FRVs is equivalent to the wide-sense stationarity of the process of Gaussian random vectors.
In the following, we consider a special error term process, which may help us to propose an applicable ARMA model with fuzzy data in the area of financial data analysis.
Definition 3. ([10]) Let be a process of fuzzy random sets valued in with second order under the metric . is said to be a standardized process of FRVs if it holds thatwhere Obviously, a standardized process of FRVs is wide-sense stationary. Sometimes, a standardized process of fuzzy random sets can be viewed in the sense of a white noise process, i.e., a fuzzy observation on a conventional white noise process, which means that if is a term of a white noise process , then can be viewed as some fuzzy observation on satisfying the membership value . Note that, in general, is not unique, as it depends on the observers’ opinions, and different observers may set different membership functions .
In the one-dimensional case, we present a standardized process of FRVs based on a real-valued white noise process. However, it is difficult to give a standardized process of FRVs in an n-dimensional case ().
Example 2. Let be a white noise process, i.e., . We define a process of FRVs as follows, It is easy to know that is a standardized process of FRVs.
In the following, we always assume that the standardized process of FRVs can be used for modeling the error term process of a time series model with fuzzy data.
Definition 4. ([10]) A process of FRVs with second order under the metric is said to be a fuzzy set-valued p-order autoregressive (briefly, AR(p) with fuzzy data) process if is wide-sense stationary and, for any , it holds thatwhere is a real number-valued parameter, is a standardized process of FRVs, and p is a natural number. Definition 5. ([10]) A process of FRVs with second order under the metric is said to be a fuzzy set-valued -order autoregressive moving average (briefly, ARMA() with fuzzy data) process if is wide-sense stationary and, for any , it holds thatwhere are real number-valued parameters, is a standardized process of FRVs, and are natural numbers. An ARMA() process of FRVs is said to be a causal ARMA() process under the metric if it has a wide-sense stationary solution almost everywhere, i.e., there exists a positive (or negative) number series such that converges in probability under the metric and , a.e., where is a standardized process of FRVs.
Example 3. Let be a wide-sense stationary process of fuzzy random sets with second order, set , then, by (5) of Lemma 1, we have , thus, is wide-sense stationary and with fuzzy zero expectations , where is independent of t and not unique.
Lemma 3. ([10]) Let be an AR(1) with fuzzy data: , where is a standardized process of FRVs. Then, possesses a wide-sense stationary solution almost everywhere if , and . For the estimation of an AR(1) with fuzzy data based on sample from the process of FRVs with second order, we can determine that
- (1)
If the AR(1) model is causal, then an estimator of the parameter
can be
, where
are the sample-based estimators of the Fréchet covariance
, respectively, and
=
.
- (2)
If the AR(1) with fuzzy data is not causal, then we may employ the least square method proposed by [
20] to estimate the parameter
.
Now, we consider applying the least square estimation method proposed by [
20] under the concerned metric
to estimate an ARMA(1,1) model
, (
). Assume that we have the observations
on the process, and we generate some terms
of a standardized process, where it is assumed that
.
The estimation of the model can be carried out by minimizing the function
on the set
We obtain the least square estimates of the parameters
as follows,
and
, otherwise, the estimators
are not a suitable solution.
If the parameters , then their least square estimators can be carried out by replacing with , respectively, in the above formula of .
The asymptotic properties of the least square estimators for ARMA() with fuzzy data can be given as follows.
Theorem 3. Let be an ARMA(1,1) process with fuzzy data , and the least square estimators shown in (42),(43) exist on under the selected distance based on a sample . If , and and are uncorrelated for , , and , then the least square estimators are weakly consistent. In a special case of , are consistent.
Proof. From (
42), the definition of
, and the equality
, we have
Replacing
with
, then
. Iterating the above equality, it holds that:
By the assumption and Definition 3 and (
34), we have
,
,
, and
Set the numerator and the denominator of (44) as follows,
From (45), we have
, and
,
Thus, we have
It can also be determined that
is bounded, since
,
. By the assumption and Chebyshev’s inequality, it holds that
. After computation, it can also be determined that
and
is bounded, by the assumption and Chebyshev’s inequality, it holds that
Thus,
i.e.,
.
Obviously, when , i.e., is consistent.
Set the numerator and the denominator of (43) as follows,
Then, we have
, and
is bounded, by the assumption and Chebyshev’s inequality, it holds that
Also, we have
and
is bounded, by the assumption and Chebyshev’s inequality, it holds that
Thus,
i.e.,
. ☐
Remark 7. (1) The proposed AR, ARMA model for the processes of FRVs is an extension of the autoregressive sequence model proposed by Feng et al. [26]. (2) In the proposed models, the so-called standardized process of FRVs plays an important role, as the causality of the AR(p) and ARMA() with fuzzy data are defined, and we only present an example of the standardized process in the one-dimensional case. This standardized process is a special error term process only.
(3) In the general case, without the restriction of the second order for the FRVs, the processes of the FRVs may not be posed for the standardized processes, and, at most, we may set an AR(p) with fuzzy data aswhere is only an unexplained remainder process of the plus operation among the successive elements in process , and it may be no longer standardized, . This general case is a hard open problem. (4) The considered metric can also be extended to a general metric, like ρ, given in the literature [9,25]. 4. An Empirical Analysis of the ARMA() Models with Fuzzy Data
In this section, we consider an empirical analysis for the proposed ARMA model with fuzzy data so as to demonstrate the goodness of the model. To this end, we use the following procedure: Step (1) investigate and collect the data from a practical time series related to the concerned problem; Step (2) generate the perception mixed fuzzy data based on the real data; Step (3) select and estimate the model based on the obtained fuzzy data; Step (4) give the results of prediction using the estimated model; Step (5) compare the model with other available models.
It is well known that the financial market is a complex, non-stationary, noisy, chaotic, and dynamic system. The main reason is the fact that a huge amount of information is reflected in the financial market. The main factors include the economic condition, political situation, traders’ expectations and emotions, catastrophes, and other unexpected events. Stock market data have to be considered in the framework of uncertainties. Therefore, predictions of stock market prices and their directions with high accuracy are quite difficult.
We consider the problem of predicting the trends of monthly HSI by means of the ARMA models for linguistic data, and here the linguistic data are the perception mixed HSI data.
Step 1
Consider the observations in three time series of close value, low value, and high value of the monthly HSI in the time period from January 2009 to December 2013, as shown in
Figure 1, where, for simplicity, the employed data are the original data divided by 1000. Generally speaking, the observations can be simply expressed as a finite number series. For instance, since there are a total of 60 months in the time period from January 2009 to December 2013, we may assume that the three finite series
denote the observations in the three time series for close value, low value, and high value in the time period from January 2009 to December 2013, respectively; here,
i is a serial number.
Step 2
Note that each monthly data implies very complex information about the random variation of the market, the psychological responses, and judgment-based behaviors of the market participators in one month-long period. In order to gain more informative predictions of the HSI trends, it is suggested to use the three data—the close value, low value, and high value—simultaneously in an appropriate way, in which the evaluator’s perception ought to be mixed, and the perception has to be vague, since the background information hidden behind the three data is so complicated that there is no way to make the perception clear. Though some predictions can be made through the ordinary time series models using a single close value or average value during the time period, the predicted judgment could be much more biased, as the data used here lack completeness of information. Therefore, we view the three values (close value, low value, high value) of each monthly data integrally as linguistic data, i.e., perception mixed financial data, and model it with a simple triangular (or symmetric ) fuzzy number (
-fuzzy number [
9]) defined on the interval [low value, high value] of the fluctuation. As mentioned above, by
we denote the close value, low value, and high value of the
ith observation of the monthly HSI, respectively, and, according to the expression of an
-fuzzy number [
9], the three data form a simple
-fuzzy number
where
denote the core, the left spread, and the right spread of the
fuzzy number
, respectively, (
), and
denote the shape functions of the
-fuzzy number. For simplicity, the shape functions are often taken as
. According to this procedure, the linguistic monthly data of HSI from January 2009 to December 2013 can be determined, and they are shown in
Table 1. (Note that the serial numbers
represent Jan. 2014, Feb. 2014, ⋯, respectively.)
Step 3
For
-fuzzy data
, whose
-cut is
, where
for the above
, we have the support function of
as
and the sample-based Fréchet covariance for linguistic monthly HSI in
Table 1 can be computed using
The wide-sense stationarity of the considered linguistic monthly HSI time series may be obtained approximately from the stationarity of both series
The magnitude of the sample autocorrelation functions of the latter two series decay geometrically to zero, and the sample partial autocorrelation functions are negligible for lags greater than 1. Thus, we may fit an ARMA(1,1) with fuzzy data for the linguistic monthly HSI time series, because usually an ARMA is better than an AR, though the AR(1) with fuzzy data can also be employed here [
11]. For estimating the model, according to Definition 3, a standardized process of FRVs
is generated, as shown in
Table 2, based on a generated white noise process
.
For the estimation of the parameters, here we assume that this standardized process
basically satisfies the condition of Theorem 3. Applying Equations (
42) and (
43) of the least square estimators for the ARMA(1,1) model with fuzzy data in
Section 3 to the data from
Table 1 and
Table 2 (the case of
), we obtain the estimated ARMA(1,1) of the concerned linguistic monthly HSI with Matlab as
Step 4
For the simplicity of computation and comparison, we only consider the prediction for the former 10 months in 2014. A predicted linguistic monthly HSI for the 10 months from January 2014 to October 2014 (the serial numbers
) are obtained using the prediction formula
; both the real linguistic monthly HSI and the obtained predicted linguistic monthly HSI for the 10 months are shown in
Table 3.
Table 3, in fact, also gives a direct comparison between the real and the predicted linguistic monthly HSI. The comparison indicates that the obtained forecasting model is quite reasonable in capturing the complex uncertain and imprecise information, since the linguistic forecasted data provide more information than the crisp data, so the decision makers could consider the best and worst possible situations. On the other hand, the accuracy of forecasting using this model could be improved by adjusting the terms of the standardized process.
Step 5
Note that the predicted linguistic monthly HSI in
Table 3, in fact, gives the predictions of the close value series, low value series, and high value series of the monthly HSI simultaneously. Thus, the comparisons of the real close values with the predicted close values, the real low values with the predicted low values, and the real high values with the predicted high values can be done. For instance, the comparison of the close values shown in
Table 4 indicates that the predictions for values numbered 62, 65, 66, 67, 68, 70 in the list are with absolute errors less than 0.632, relative errors less than 2.74%, and the predictions for the remainder values have absolute errors within the interval
, and relative errors within the interval
. Similarly, the comparisons regarding the low values and high values, respectively, of the monthly HSI can also be carried out.
Remark 8. The study of the fuzzy set-valued time series modeling is just in its infancy. There are only two estimated fuzzy set-valued models like AR(1) and ARMA(1,1) [10,11] that can be considered for model comparison under special conditions. However, it is obvious here that the fuzzy set-valued ARMA(1,1) model is better than the fuzzy set-valued AR(1) model for the forecast of the linguistic monthly HSI data. On the other hand, it may not be appropriate to compare the fuzzy set-valued time series models with the classical time series models straightforwardly, since the types of data treated by the two kinds of time series are different. In a special case, we may compare the predicted close values obtained by the proposed fuzzy set-valued ARMA(1,1) model above with the predicted close values obtained by the ordinary AR(1) or AR(2) or AR(3) or AR(1,1) models through a comparison of their prediction absolute errors and relative errors (note that using time series technology, it can be verified that the ordinary AR(1) or AR(2) or AR(3) or AR(1,1) models can be appropriately applied for the prediction of the concerned time series of close values). The comparison results of AR(1) or AR(2) or AR(3) or AR(1,1) with the real close values are shown in Table 5, Table 6, Table 7 and Table 8, respectively. Finally, a comparison result of the prediction errors from the fuzzy ARMA(1,1) with the prediction errors from AR(1), AR(2), AR(3), and AR(1,1) for the case of close values of the monthly HSI are shown in Table 9, which indicates that, on average, the prediction accuracy of our proposed model is better than that of the other four ordinary time series models, since the average absolute error 0.691 of fuzzy ARMA(1,1) is less than the average absolute errors 1.182, 1.194, 1.487, 1.191 of AR(1), AR(2), AR(3), ARMA(1,1), respectively. Further, the average relative error 3.03% of fuzzy ARMA(1,1) is less than the average relative errors 5.03%, 5.08%, 6.29%, 5.07% of AR(1), AR(2), AR(3), ARMA(1,1), respectively. Also, the error data shown in Table 9 indicate that for the months numbered 61,63,66,67,68,70, both the absolute errors and the relative errors of fuzzy ARMA(1,1) are less than those of AR(1), AR(2), AR(3), ARMA(1,1), thus, the prediction accuracy of our proposed model is better than that of the other four ordinary time series models. For the month numbered 62, the absolute errors and the relative errors of fuzzy ARMA(1,1) are slightly larger than those of AR(1), AR(2), AR(3), ARMA(1,1), but the differences for the absolute errors and relative errors are not more than 0.086 and 0.34%, respectively. Thus, the prediction accuracy of our proposed model is almost the same as that of the other four ordinary time series models. For the month numbered 64, the absolute errors and the relative errors of fuzzy ARMA(1,1) are larger than those of AR(1), AR(2), AR(3), ARMA(1,1), but the differences for the absolute errors and relative errors are not more than 0.454 and 2.059%, respectively, thus, the prediction accuracy of our proposed model is not better than that of the other four ordinary time series models. For the month numbered 65,the absolute errors and the relative errors of fuzzy ARMA(1,1) are slightly larger than those of AR(1), AR(2), ARMA(1,1); the differences for the absolute errors and relative errors are not more than 0.048 and 0.207%, respectively, but the absolute errors and the relative errors of fuzzy ARMA(1,1) are less than those of AR(3), thus, the prediction accuracy of our proposed model is not better than that of the other three ordinary time series models AR(1), AR(2), ARMA(1,1), but it is better than that of AR(3). For the month numbered 69, the absolute errors and the relative errors of fuzzy ARMA(1,1) are slightly larger than those of AR(1), AR(2), ARMA(1,1); the differences for the absolute errors and relative errors are not more than 0.547 and 2.39%, respectively, but the absolute errors and the relative errors of fuzzy ARMA(1,1) are less than those of AR(3), thus, the prediction accuracy of our proposed model is not better than that of the other three ordinary time series models AR(1), AR(2), ARMA(1,1), but it is better than that of AR(3). Similarly, the same comparison can be done for the high values and low values of the monthly HSI.