++cooke, T. E. (1998) - Regression Analysis in Accounting Disclosure Studies.
++cooke, T. E. (1998) - Regression Analysis in Accounting Disclosure Studies.
++cooke, T. E. (1998) - Regression Analysis in Accounting Disclosure Studies.
To cite this article: T. E. Cooke (1998) Regression Analysis in Accounting Disclosure Studies, Accounting and
Business Research, 28:3, 209-224, DOI: 10.1080/00014788.1998.9728910
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”)
contained in the publications on our platform. However, Taylor & Francis, our agents, and our
licensors make no representations or warranties whatsoever as to the accuracy, completeness, or
suitability for any purpose of the Content. Any opinions and views expressed in this publication
are the opinions and views of the authors, and are not the views of or endorsed by Taylor &
Francis. The accuracy of the Content should not be relied upon and should be independently
verified with primary sources of information. Taylor and Francis shall not be liable for any
losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities
whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or
arising out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Any substantial
or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or
distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use
can be found at http://www.tandfonline.com/page/terms-and-conditions
Arrounling and Business Research, Vol. 28. No. 3. pp. 209-224. Summer 1998 209
properties of ranks and their use in regression analysis, an extension is proposed that provides an alternative
mapping that replaces the data with their normal scores. The normal scores approach retains the advantages of
using ranks but has other beneficial characteristics, particularly in hypothesis testing. Regressions based on untrans-
formed data, on the log odds ratio of the dependent variable, on ranks and regression using normal scores, are
applied to data on the disclosure of information in the annual reports of companies in Japan and Saudi Arabia. It
is found that regression using normal scores has some advantages over ranks that, in part, depend on the structure
of the data. However, the case studies demonstrate that no one procedure is best but that multiple approaches are
helpful to ensure the results are robust across methods.
gers.2 The transformation proposed is achieved by vides information on the distribution of observed
dividing the normal distribution into the number val~es.~
of observations plus one segments on the basis Many statistical tests are based on the assump-
that each segment has equal probability (van der tion that the data come from a normal distribution
Waerden, 1952, 1953). In effect, the ranks of the or that a sufficiently large sample is available to
data are substituted by scores on the normal dis- appeal to asymptotically normality of the test sta-
tribution and so the normal scores approach may tistic. For example, in regression analysis an as-
be considered to represent an extension of the rank sumption is that the error term is normally dis-
method. tributed. These issues are considered later when a
It should be noted that the various transfor- comparison is made between OLS and Rank
mations and approaches discussed in this paper Regression.
are not mutually exclusive. In practice, several ap- One approach to assessing the normality of the
proaches can be undertaken to try to ensure that data is the normal probability plot in which the
the results are not method-driven but are robust observed values are matched with expected values
across methods. from the normal distribution. Visual inspection of
The rest of the paper is divided into four sec- data can be supported by statistical tests such as
tions. In the next section, consideration is given to standard tests on skewness and kurtosis (Stuart
data examination and transformations. Section 3 and Ord, 1983)6,the Kolmogorov-Smirnov (K-S)
Downloaded by [Erciyes University] at 11:09 03 January 2015
considers some of the transformations in the con- test and its modification by Shapiro-Wilks and
text of disclosure studies. The procedure for Rank Lilliefors.
Regression is outlined in more detail and the ad- Transformation of data is useful in regression
vantages and disadvantages of this approach are analysis when the relationship between the de-
reviewed. The use of normal scores is then pro- pendent and independent variables is inherently
posed as an alternative to Rank Regression. Sec- non-linear, when the distribution of the errors is
tion 4 provides an examination of two small da- not approximately normal, and where there are
tabases on information disclosed in the annual problems of heteroscedasticity or non-independ-
reports of companies in Japan and Saudi Arabia ence of the error terms. Where possible, the deter-
and includes an analysis using standard OLS, mining factor should be based on the underlying
Rank Regression, regression using normal scores, theoretical relationship. It should be appreciated
and regression using a log odds ratio transforma- that transformations of disclosure measures and
t i ~ nThe
. ~ two case studies serve to illustrate some independent variables are proxies for underlying
of the points made earlier in the paper. Section 5
provides a summary and conclusions. constructs and hence, while theory may specify a
functional form for the underlying theoretical con-
struct, it is unlikely to hold for empirical proxies.
In determining linearity, the important factor is
the functional form of the relationship. The par-
ameters must be linear so that the independent
variables can be transformed to produce a linear
2. Data examination and transformations model.
Once data is collected it is important to review it There are other circumstances when transfor-
carefully, regardless of the type of analysis that is mations may also be considered. For example,
being proposed. The examination may be under- Iman and Conover (1979500) have argued that
taken in a number of ways. For example, a his- ‘the rank transform approach has an obvious ad-
togram of the observed values is one approach in vantage when the dependent variable is a mono-
which the values are divided into intervals of equal tonic function of the independent variable(s) and
size and each column shows the number of cases this monotonic relationship is non-linear in na-
within each i n t e ~ a l .Such
~ analysis may be ex- ture’. However, when non-linear monotonic rela-
tended to identify outliers. The examination pro- tionships constitute a problem it is possible and
sometimes desirable to undertake transformations
* It is recognised that the mapping could be on to an alter- of the data other than by using ranks.
native distribution, but in the context of explaining disclosure Given normality and independence of the error
scores in corporate annual reports the normal distribution is term, the F-statistic can be used since large F-val-
required when linear regression is considered an appropriate
tool of analysis.
I am grateful to Jawaher A1 Modahki who allowed me to An alternative or additional approach is the boxplot which
use some of the raw data she collected on Saudi Arabia as part is useful in summarising the distribution of the observed values
of her doctoral thesis that she successfully completed in 1996. and identifying outliers.
Rank Regression and use of normal scores in regression did 6A check of the third and fourth moments against the mo-
not form part of her doctoral thesis. ments for the normal distribution is a useful apprach because
A modification of this approach is the stem-and-leaf plot, I...the effect of lack of normality makes itself felt most often
providing additional information to the histogram. through these measures’ (Zaman, 1996:181).
SUMMER 1998 21 1
gitimately transformed, where necessary, and used Alternatively, the regression model may be
in regression analysis. One transformation is to specified in terms of the conditional mean of the
rank the dependent and independent variables. dependent variable E(y(x,, ...xK) which is deter-
The rank transform procedure has been stated mined conditionally on observations on the inde-
by Iman and Conover (1979). Given a dependent pendent variables. The regression model assumes:
variable y with n observations, the observations
are placed in order and ranked from 1 to n (from E(Yilxl,i, x2, i, * . . xk, i) = a + * * * BkXk, i (3)
smallest to largest). The procedure is to rank both and for homoscedasticity,
dependent and independent variables so that with
Var (Yilxl,i,. . . xk, i) = .
2
R(yi) being the rank assigned to the ith smallest
value of Y, each of the independent variables (X, In order to derive equation (3) the joint distri-
(i = 1, ..., k) is replaced with their corresponding bution of Y and vector X must be known, but to
ranks 1 to n. Tied values are conventionally as- estimate the parameters the exact form of the joint
signed the mean of the ranks for which they are distribution is not required. In order to perform
tied. tests of hypotheses about the parameters, the form
The regression is undertaken on the ranks. For of the distribution is required. It is commonly as-
example, a bivariate relationship expressing &yi) sumed that the joint distribution of Y and the in-
in terms o,f R(x,) would result in a regression of dependent variables X, is multivariate normal,
Downloaded by [Erciyes University] at 11:09 03 January 2015
the form R(y,) = &+ &xi). Given that the OLS though there are other joint distributions which
regression line passes through the mean points also have the property that the conditional mean
(R(x)), (R(y)) = ([n + 11/2, [n +I] /2) conventional of Y given the Xj is linear in Xj-for example, the
ordinary least squares may be applied giving the Pareto di~tribution.~
estimated value of the coefficients as & = [(n + 1) In practice, estimators of the parameters in lin-
121 (1 -0) and = 1-([6Z(R(yi) - (R(x~))~)/ear regression, with the joint normality of Y with
[n(n2- l)]. Where no ties exist, Spearman’s Rank the Xs assumption, have been found to be quite
correlation coefficient (rho) is the Pearson corre- robust even when the normality assumption does
lation coefficient applied to ranks and can be not quite hold. In using Rank Regression the
found as: requirement of random samples still holds, but the
normal distribution of Y does not.
1 - 6Zd:/n (n2- 1) (1)
where di = R(Yi)- R(Xi). (In the presence of ties 3.2. Advantagesldisadvantages of Rank Regression
this formula is merely approximate.) in accounting research
For the case of more than one independent vari- In the accounting literature, rank transforma-
able the multiple regression is found by fitting: tions of the residuals and forecasts from a linear
R(Yi) = a + Bl R(x1i) + BzR(x2i) + regression model were used by Beaver et al. (1979)
. . .+ &R(Xki) + €, (2) in assessing the relationship between unexpected
earnings and risk-adjusted returns. As a technique
by least squares. it did not become particularly popular, but has
In effect, the Rank Regression specification been used recently by Cheng et al. (1992) to evalu-
shown above is an application of standard multi- ate the specification of the cross-sectional OLS
ple regression and, as such, the data must fulfil two model that related unexpected earnings to risk-ad-
main conditions for hypothesis testing: normality justed security returns. Cheng et al. (1992) argued
of the errors and constant variance (homoscedas- that the rank transformation is invariant to power
ticity). Appropriate tests to assess the hypothesis transformations that preserve order.
that the observations on the dependent variable In other words, it is not necessary to standar-
are normally distributed have been indicated in the dise, log or undertake any power transformation
previous section. or any monotonic transformation because they re-
The homoscedastic assumption m a y be assessed sult in the same assignment of ranks. Rank trans-
by visual inspection of the residuals, or by using a formations are also relatively insensitive to outli-
specific test such as the Goldfeld-Quandt test, the ers. They found that ‘.. the use of power or rank
Breusch-Pagan test or the White test (see Ken- transformation of the forecast error variable pro-
nedy, 1992: 117-119).8 It follows from these as- duces a substantial improvement in R2’ (Cheng et
al. 1992: 589). The appropriateness of using R2 in
sumptions that the regressors xj and the disturb- such circumstances is considered later in the paper.
ance term are statistically independent, implying
cov (Xj,E) = 0.
Note that in the classical linear model, the experimental
values of the independent variables are fixed, whereas in the
In a time series context ARCH or GARCH models of het- linear Regression model they are the result of a random
eroscedasticity may be appropriate. process.
SUMMER 1998 213
In 1993, Lang and Lundholm used rank trans- servations of ranks 8 and 9 is required in order to
formations in their paper on cross-sectional deter- proceed to estimate Pi.
minants of analysts' ratings of corporate disclo- In the case of a bivariate Rank Regression, the
sures. They argue that the technique of Rank Spearman correlation coefficient may be tested for
Regression has value when the theoretical relation- significance using an exact test for which tables are
ship between the dependent and independent var- published. In the case of multiple regression, one
iables is not known but monotonic.I0 Rank Re- of the main disadvantages of Rank Regression is
gression is also useful when the relationship that of testing the significance of the estimated
between the dependent and independent variables coefficients.
is not strictly linear and there is no theoretical ba- To undertake statistical tests of a hypothesis
sis for suggesting a relationship between Y and X. there needs to be knowledge of the distribution of
Wallace et al. (1994) used Rank Regression in the dependent variable, Y,or the joint distribution
their study of the relationship between the com- of Y and the independent variables. The form of
prehensiveness of corporate annual reports and the distribution, for samples of moderate size, de-
firm characteristics in Spanish listed companies. A termines the type of statistical test that can be per-
sample of 50 companies was selected to produce a formed, and without knowledge of the distribution
disclosure index and 60% of the variability of the the correct significance levels cannot be deter-
indexes was explained by nine independent varia- mined. Given (joint) normality of Y and X in bi-
Downloaded by [Erciyes University] at 11:09 03 January 2015
bles. Wallace et al. (1994 47) suggest that: variate regression, the analysis of variance F-test
'... there is no theoretically correct way of may be used to test the null hypothesis that there
describing the association between the de- is no linear relationship between the two variables.
pendent and the explanatory variables. In In testing the hypothesis about D in the bivariate
such a circumstance, Lang and Lundholm case the t test of 13 (Ho: S=O), or equivalently of
(1993) have suggested the use of rank (OLS) r, is the same as the F test. In the multivariate case
regression as a powerful method of coping where Y is normally distributed, the F statistic
with data sets with non-linear but monotonic tests the null hypothesis that all the coefficients ex-
relations between dependent and independ- cept the constant term are zero, i.e. D, = 13, = .....
= 0, = 0.11
ent variables. If a dependent variable
changes in just one direction (either up or Since ranks are distribution-free, testing for
down) as the explanatory variable increases significance using the F and t-tests are not appro-
(i.e. if the relationship between them is mon- priate. An additional concern with Rank Regres-
otonic) a higher-ranked independent variable sion is that the error structure cannot be normal
will correspond to a higher-ranked variable, and the mapping of individual observations to
regardless of the precise relation between the ranks is a somewhat arbitrary transformation.
two unranked variables.' Another feature of using ranks is that the data
after transformation are ordinal rather than inter-
A similar approach was adopted in Wallace and val and therefore the tests are effectively non-para-
Naser (1995). As well as these advantages Rank metric and as such are weaker than parametric
Regression is conceptually simple, preserves order, tests. This may be important when the sample size
and is effective in modelling monotonic relation- is small-a characteristic of many disclosure
ships where outliers are a serious problem. studies.
A weakness of Rank Regression is that it is dif- Siege1 and Castellan (1988) point out that para-
ficult to interpret 13, as the effect on Yi of a mar- metric tests are better than non-parametric tests
ginal increase of l in X,. For values of Di of - l when the assumptions of a parametric statistical
or +1, Oidoes have definite interpretation and for model, in terms of the data, are met. This is be-
zero there is, of course, no association. However, cause the power-efficiency of the parametric test is
within the range - 1 to +1, ignoring zero, inter- much greater than for non-parametric tests. In the
pretation is difficult. For example, if in a sample context of disclosure studies where data collection
Rank Regression D, = 0.7 then an increase in the is often onerous, parametric tests have obvious
rank of the right-hand-size variable increases the advantages.
rank of the left-hand-side variable by 0.7, but the In summary, Rank Regression has a number of
rank of the left hand side variable are integer. If advantages as well as some inherent weaknesses.
the rank of the left-hand-side variable is estimated
by the regression to be 8.7 does this imply the 'I There is a disagreement in the statistical literature about
value of the left hand size variable is close to the violations of the regression assumptions. For example, Kerlin-
value that has rank 9? An additional assumption ger and Pedhazar (1973) argue that OLS is 'robust' to viola-
of (linear) interpolation between the values of ob- tions whereas Bibby (1977) maintains that violations render the
technique almost worthless. In practice, the F-test for equality
of variances is sensitive to departures from normality,although
lo An alternative approach is non-parametric regression. t-tests are less sensitive.
214 ACCOUNTING A N D BUSINESS RESEARCH
Consequently, the use of normal scores is advo- mined, (b) the F and t-tests are meaningful and (c)
cated in this paper as an additional approach to the power of the F and t-tests may be used. In
the use of ranks. Normal scores effectively extend addition, the regression coefficients derived using
the rank approach to eliminate some of the weak- normal scores are meaningful, whereas Di from
nesses while retaining the advantages. Rank Regression is difficult to interpret for most
values.
The Appendix shows the derivation of the nor-
3.3. Development of the normal scores approach mal scores measure and its application to test the
An alternative to Rank Regression and other null hypothesis that all samples are identical.
data transformations when non-linearity is a A further characteristic of normal scores is that
problem is proposed here and is based on normal the approach offers a means whereby a non-nor-
scores. The transformation proposed is from ac- ma1 dependent variable may be transformed into
tual observations to the normal distribution by di- a normal one and as such offers a further advan-
viding the distribution into the number of obser- tage over ranks. A normally distributed dependent
vations plus one regions on the basis that each variable implies that the errors are also normally
region has equal probability. This method is re- distributed by the assumptions of OLS.
ferred to as the van der Waerden approach (van The normal scores approach has the same ad-
der Waerden, 1952, 1953).12In effect, the ranks are vantages as ranks when there are problems of
Downloaded by [Erciyes University] at 11:09 03 January 2015
being substituted by scores on the normal distri- monotonicity and non-linearity. Normal scores
bution and so the normal scores approach may be preserve monotonicity in relationships as do ranks,
considered to represent an extension of the rank with higher-ranked values of the independent var-
method. For example, if there are six observations iables being associated with higher-ranked values
the normal distribution would be divided into of the dependent variables (the converse is also
seven equally probable parts so that the original true). In addition, when there is non-linearity with
values are replaced by normal scores (here data concentration, normal scores disperse that
-1.0676, -0.5659, -0.1800, 0.1800, 0.5659, concentration, an advantage also gained when us-
1.0676) rather than the ranks 1, 2 ..., 6.13 The re- ing ranks. Whether normal scores are better than
gression analysis would then proceed using the ranks in dealing with problems of monotonicity
normal scores as the dependent variable. In addi- and non-linearity depends on the structure of the
tion, continuous independent variables may also data.
be transformed to normal scores. An implication of the use of normal scores is
In 1938, Fisher and Yates first suggested the that if an additional case is added to the sample,
replacement of the original observations in stan- all the normal scores would need to be recom-
dard normal theory tests. Tables of normal scores puted. However, the same will also usually be true
were developed for different size samples by David of ranks: if an additional case is added to the sam-
et al. (1968), Harter (1961, 1969) and Owen ple then some of the ranks will change unless the
(1962).14 The proposal suggested here is to utilise new observation is ranked higher than all existing
knowledge about normal scores to transform re- observations. The probability of such an event is
gression variables. The main advantage of replac- low since the observation would be in the tail of
ing the ranks by normal scores is that the resulting the distribution. Thus, the recomputing dis-
tests would have exact statistical properties be- advantage when new observations are added ap-
cause (a) significance levels can now be deter- plies to ranks in many circumstances, such that the
disadvantage of normal scores relative to ranks is
negligible.
lZ The van der Waerden approach may be summarised as = An issue that so far has not been considered is
r/(n + 1). Alternatives include Blom = (r - 3/8)/(n + 1/4), whether the transformation to normal scores
Rankit = (r - 1/2)/n, and Tukey = (r - 1/3)/(n + 113). should involve only the dependent variable or
"These figures were derived using SPSS and represents one should relate to both the dependent and independ-
approach to deriving normal scores. An alternative approach
would be based on expected values, such that in this case the ent variables. In most instances it should be the
normal distribution would be divided into six parts and the latter, although if the only reason to use the ap-
normal score would be taken as the expected value of each part. proach is to normalise the dependent variable,
Thus, the normal score for the first observation may be then the former might be considered. Changing
calculated as follows:
only the dependent variable implies changing the
u, = E ( X J = r - xf (x) dx which is the p.d.f. of N ( p , a,) relationship between the dependent variable and
where P (X < c) = Iln all independent variables.
This requires substantial computation but tables are available
such as in Lindley and Scott (1984).
l4 I am grateful to Dr K. Read who informed me that var-
iations on the suggestion made in this paper, with respect to 3.4. Log of the odds ratio
normal scores, have been used in the medical research literaure An alternative transformation to those already
(e.g. Morgan (1992)). outlined is the log of the odds ratio. When it is
SUMMER 1998 215
I 0.00 + I
viation of approximately 0 and 1 respectively (see
Table 1).
I6 A variable that takes only positive values cannot be nor- l 7 SPSS provides a significance level up to 0.200 but after
mal since by definition, it must lie in the range - a to +=. For that reports >0.200.
a random variable with small variance and relatively large Where the dependent variable is converted into normal
mean, the probability that it falls less than zero may be scores necessarily the Kolmogorov-Smirnov statistic will be
minimal. unity.
Figure 4
N o d plot of the dependent variable (Japan)
2.40
1.60 -- *
**
0.80 -- ***
* **
0.00 -- *
**
-0.80 -- *
**
-1.60 -- *
*
-2.40 --
I mue5
Detrended normal plot of the dependent variable (Japan)
0.48
**
0.32 -- * **
*
0.16 -- * ** *
* *
0.00 --
*
-0.16 --
*
-0.32 -- *
*
-0.48 -- *
Downloaded by [Erciyes University] at 11:09 03 January 2015
I I I 1
I I
Singvi and Desai, 1971; Spero, 1979; Firth, 1979; relationship is somewhat uncertain (Wallace et al.,
Cooke, 1989a). Thus, multiple listed companies 1994).
are expected to disclose more information than
those listed only on the TSE, particularly where
the extent of disclosure in foreign stock markets is
greater than that in Japan (see for example, Biddle
and Saudagaran, 1991). Borrowing ratio
This ratio measures the proportion of total as-
sets financed by bank borrowings. It has been hy-
Industry sector pothesised that companies with a higher propor-
This variable is included as a dummy in which tion of their assets financed by bank borrowings
a distinction is made between manufacturing en- will disclose more information in their annual re-
terprises and non-manufacturing companies. The ports to meet some of the needs of their lenders
expectation based on the literature is that manu- (see Jensen and Meckling, 1976; Myers, 1977;
facturing companies disclose more information Schipper, 1981; Leftwich et al., 1981; Belkaoui and
than non-manufacturing enterprises (see Stanga, Kahl, 1978; Malone et al., 1993; and Wallace et
1976; Cooke, 1989c), although the direction of the al., 1994).
I I
Table 1
Descriptive statistics of the voluntary disclosure indexes Japan-
Data sources
Untransformed Log o d d ratio Rank Normal scores
Mean 0.204 - 1.461 18.000 0.002
Standard deviation 0.907 0.602 10.226 0.920
Minimum 0.070 - 2.590 1.500 - 1.732
Maximum 0.410 - 0.360 35.00 1.915
Skewness 0.760 0.183 0.004 0.024
S.E.Skewness 0.397 0.398 0.398 0.398
Kurtosis 2.404 2.410 1.799 2.478
S.E.Kurtosis 0.777 0.778 0.778 0.778
-
Z test skewness 1.914 0.460 0.010 0.060
Z test - skewness - 0.767 3.098 - 1.544 - 0.671
Kolmogorov-Smirnov (Lilliefors- signif.) 0.000 0.017 >0.200 >0.200
218 ACCOUNTING AND BUSINESS RESEARCH
Table 2
Regression analyses of determinants of disclosure scores by Japanese corporations
(4) (5)
Independent Variable
Non manufacturing - 0.084687 -0.632783 - 13.911085 - 1.060822 - 1.226670
companies
(- 3.354)* ( - 4.245)* ( - 5.874)* (-4.628)* (- 5.572)*
Gearing 0.165240 1.038538 0.324748 1.732437 0.308971
(2.549)* (2.714)* (3.168)* (2.944)* (2.934)*
Turnover 3.265241E-08 1.844500E-07 0.322267 2.416326E-07 0.287893
(2.465)* (2.359)* (2.304)* (2.010) (2.035)'
Multiple listed 0.059111 0.339115 2.993745 0.476006 0.344652
(1.962) (1.907) (0.939) (1.741) (1.183)
Constant 0.156742 - 1.710596 9.075517 - 0.344931 0.217481
(7.751)* (- 14.328)* (3.361) (- 1.879) (1.502)
Downloaded by [Erciyes University] at 11:09 03 January 2015
The upper figures for each variable are coefficients and the lower figures are the t-statistics. The coefficients of
the excluded dummy variables are all 1.00000 since they act as benchmarks for the included dummies.
*=significant at the 5% level.
# = l/n (Yi-Yi)z,see text for further discussion.
Model 1 Regression using untransformed data.
Model 2 Regression using the log odds ratio @n(x/(l-x)].
Model 3 Regression using ranked data.
Model 4 Regression using a transformed dependent variable to normal scores.
Model 5 Regression using normal scores for the dependent and independent variables.
corporations. The disclosure data on 33 companies ing a score of between 80% and 83%. Figure 7
is chosen because the dependent variable reveals shows the boxplot which reveals a median of 8O%,
characteristics of non-normality. a 25th percentile of 68% and a 75th percentile of
Figure 6 shows the stem-and-leaf plot of the un- 75%. The smallest observed value that is not an
transformed voluntary disclosure scores. The stem outlier is 66% and the largest observed value that
indicates that the disclosure index scores range is not an outlier is 91%. The extreme scores of 33%
from 33% to 95%, with the majority of cases hav- and 95% may be considered to be outliers.
Figures 8 and 9 show the normal plot and de-
trended normal plot of the dependent variable.
The normal plot shows that the observations do
Figure 6 not cluster around a straight line and the devia-
Stem-and-leaf plot of the untransformed voluntary tions from a straight line are not randomly dis-
disclosure scores (Saudi Arabia) tributed around zero. The visual interpretation
may be supported by some statistical tests. The
Frequency Stem & Leaf descriptive statistics are shown in Table 3. The
3.00 Extremes (0.33), (0.54), (0.54) mean of the untransformed data is 0.766 with a
3.00 6 . 668 standard deviation of 0.117 and a range from
3.00 7* 124 0.333 to 0.953. Standard tests on skewness and
6.00 7 . 556679 kurtosis reveal problems of skewness and kurtosis
14.00 8 * oooO1111112223 of the dependent variable. The non-parametric Lil-
1 .00 8 . 5 liefors test is significant at the 5% level, revealing
2.00 9* 01
1.00 Extremes (.95) considerable evidence of non-normality. When the
Stem width 0.10 log odds ratio of the dependent variable is used
Each leaf: 1 case@) the mean is 1.278 with a standard deviation of
0.653. Standard tests on skewness and kurtosis re-
220 ACCOUNTING AND BUSINESS RESEARCH
Figure 7
Boxplot of the untransfomed voluntary disclosure scores (Saudi Arabia)
1.20 --
0.90 --
0.60 --
Downloaded by [Erciyes University] at 11:09 03 January 2015
0.30 t
veal a problem in terms of the latter. Since kurtosis tory and the Kolmogorov-Smirnov test statistic
exceeds 3, a smaller proportion of cases falls into suggests normality (K-S greater than 5%).19
the upper tail of the distribution than those of a
normal distribution. The Lilliefors test is signifi- l9 Other transformations in terms of powers, roots and logs
cant at the 5% level, confirming problems of were used in an attempt to convert the data to approximately
non-normality. normal. None of these transformations was able to correct for
both skewness and kurtosis. Thus, the rank and normal scores
When the dependent variable is ranked or trans- approaches
formed into normal scores there is no longer any formations. have advantages in this instance over other trans-
It should be noted that joint tests of skewness and
apparent problem of non-normality. Both stan- kurtosis are sometimes thought to be biased (see Doornik and
dard tests on skewness and kurtosis are satisfac- Hendry, 1994).
Figure 8
Normnl plot of the dependent variable (Saudi Arabia)
2*40
1.60 --I
*
*
0.80 -- ***
*
0.00 -- *
**
-0.80 -- *
**
-1.60 -- *
*
-2.40 --
I I 1 I
SUMMER 1998 221
Figure 9
Detrended normal plot of the dependent variable (Saudi Arabia)
1.80
I
1.20 --
0.60
*****
0.00 - * **
* ****
- 0.60 - * ****
- 1.20 --
- 1.80 + *
Downloaded by [Erciyes University] at 11:09 03 January 2015
I I I I
Table 3
-
Descriptive statistics of the disclosure indexes Saudi Arabia
Data sources
Untransformed Log odds ratio Rank Normal scores
Mean 0.766 1.278 17.000 O.OO0
Standard deviation 0.117 0.653 9.669 0.922
Minimum 0.333 - 0.690 1.000 - 1.890
MaximUll 0.953 3.010 33.000 1.890
Skewness - 1.887 - 0.411 - 0.001 -0.001
S.E.Skewness 0.409 0.409 0.409 0.409
Kurtosis 8.206 5.870 1.799 2.513
S.E.Kurtosis 0.798 0.798 0.798 0.798
Z test - skewness -4.619 1.005 - 0.002 - 0.002
-
Z test kurtosis 6.520 3.596 - 1.505 - 0.610
Kolmogrov-Smirnov (Lilliefors- signif.) 0.000 0.000 >0.200 >0.200
222 ACCOUNTING AND BUSINESS RESEARCH
be significant at the 5% level in the untransformed Using normal scores for both the dependent and
model, the log odds of the dependent variable independent variables (Model 5, Table 4) gives a
model and when the dependent variable was trans- negative l?* although the coefficient of determina-
formed to normal scores. In the other two models, tion itself was of course positive. The fi2 based on
ranked transformation and all continuous varia- transformation of the dependent variable into nor-
bles transformed to normal scores, these two var- mal scores is highest (0.14626) followed by the un-
iables were found not to be significant (Table 4). transformed model (0.09932), the log odds ratio
The constant term was found to be significant transform of the dependent variable (0.09544),
in the untransformed model, the log odds of the Rank Regression (0.00345), and finally the regres-
dependent variable model, and the rank trans- sion that uses transforms of both dependent and
formation model. Thus, the coefficients attached independent variables to normal scores
to certain variables and whether they are signifi- ( - 0.017 12).
cant depends both on the data and on the type of
transformation undertaken.
In the case of Saudi Arabia, the measure of best
5. Summary and conclusions
fit used was again the MSE. The log odds ratio of This paper has reviewed some possible transfor-
the dependent variable had the lowest MSE mations that attempt to deal with theoretical re-
(0.0119), followed by the normal scores of both the lationships that are not well known or where
Downloaded by [Erciyes University] at 11:09 03 January 2015
dependent and independent continuous variables measures are merely proxies for underlying con-
(0.0128), the normal scores of the dependent vari- structs. The transformations include Rank Regres-
able (0.0283), the MSE based on the ranks with sion in disclosure studies when dealing with non-
interpolation (0.0381), and finally by the unad- linear and linear relationships when such
justed dependent variable (0.1239). The fact that relationships are, by hypothesis, monotonic.
the MSE of the regression based on an unadjusted The shortcomings have been identified and an
dependent variable was substantially different extension based on normal scores proposed. The
from the others indicate advantages of other ap- normal scores approach, like the rank method,
proaches, such as the normal scores method. preserves monotonicity and with non-linear rela-
In terms of the coefficient of determination, the tionships disperses the concentration of data.
l?* was found to be very low using ranked data. However, the normal scores method has a number
rable 4
Regression analyses of determinants of disclosure scores by Saudi Arabian corporations
Model
(1) (2) (3) (4) (5)
lndependent Variable
Government 8.896046E-04 0.005696 -0.118695 0.012505 0.028172
investment
(0.868) (0.996) (0.610) (1S94) (0.131)
Capital -2.41870E-05 - 1.36207E-04 -0.275854 -2.29157E-04 -0.225778
( - 2.194)* (- 2.214)* (-1.451) ( - 2.714)* (- 1.140)
Constant 0.776554 1.323254 19.671706 0.010300 -9.62725E-04
(30.270)* (9.242)* (4.868)* (0.052) (- 0.006)
The upper figures for each variable are coefficients and the lower figures are the t-statistics. The coefficients of
the excluded dummy variables are all 1.00000 since they act as benchmarks for the included dummies.
* =significant at the 5% level.
# = lln (Yi-'?J2,see text for further discussion.
Model 1 Regression using untransformed data.
Model 2 Regression using the log odds ratio @n(x/(l-x)].
Model 3 Regression using ranked data.
Model 4 Regression using a transformed dependent variable to normal scores.
Model 5 Regression using normal scores for the dependent and independent variables.
SUMMER 1998 223
duced some differences in terms of significance of tutional investors in Australia’. Accounting and Business Re-
variables and magnitude of the coefficients. As a search, 11 (Autumn): 259-265
Beaver, W. H., Clarke, R. and Wright, W. F. (1979). ‘The
result, it is possible that biased estimates may oc- association between unsystematic security returns and the
cur and substantive errors could go unnoticed if magnitude of earnings forecast errors’. Journal of Accounting
the data is not analysed appropriately. Research, Autumn.
The important lesson to be learned from the two Belkaoui, A. and Kahl, A. (1978). Corporate Financial Disclo-
sure in Canada. Research Monograph No. 1 of the Canadian
case studies is that the success of the transforma- Certified General Accountants Association. Vancouver: Ca-
tions in improving the fit of the model is depend- nadian Certified General Accountants Association.
ent on the structure of the data. In the case of the Bibby, J. (1977). ‘The General Linear Model-A Cautionary
data on Japan, the MSE based on ranked data Tale’. OMuircheartaigh, C. A. and Payne, C. (eds.), The
Analysis of Survey Data, 2: Model Fitting. New York: Wiley.
with interpolation was found to be best. In the Biddle, G. and Saudagaran, S. (1991). ‘Foreign stock listings:
case of the data on companies in Saudi Arabia, it benefits, costs and the accounting policy dilemma’. Accounting
was found that the log odds ratio of the dependent Horizons, September.
variable provided the best fit and using normal Buzby, S . L. (1972). ‘An empirical investigation of the relation-
ship between the extent of disclosure in corporate annual re-
scores (both models) provided a better fit than us- ports and two company characteristics’. Unpublished doctoral
ing ranks. dissertation, Pennsylvania State University.
In conclusion, the normal scores approach has Chang, L. and Most, K. S. (1977). ‘Investor uses of financial
theoretical advantages over the use of pure ranks statements: an empirical study’. Singapore Accountant.
although when applied to the two case studies Cheng, C. S., Hopwood, W.S . and McKeown, J. C. (1992).
“on-linearity and specification problems in unexpected earn-
there was no overwhelming case for one particular ings response regression model’. The Accounting Review, 67
approach. This emphasises the point that it is im- (July): 579-598.
portant to examine the structure of the data and Choi, F. D. S. (1973). ‘Financial disclosure and entry to the
the relationships between the dependent and in- European capital market’. Journal of Accounting Research, 11
(Autumn): 159-175.
dependent variables if errors in interpretation are Cooke, T. E. (1989a). ‘Disclosure in the corporate annual re-
to be avoided. ports of Swedish companies’. Accounting and Business Re-
search (Spring): 113-124.
Cooke, T. E. (1989b). An empirical study offinancial disclosures
Appendix by Swedish companies. New York: Garland Publishing.
Cooke, T. E. (1989~).‘Voluntary corporate disclosure by Swed-
Suppose data XJ,,. ..X, are ordered from smallest ish companies’. Journal of International Financial Management
to largest to give the order statistics X&Y&... X(,,,. and Accounting, 2 (Summer): 171-195.
Cooke, T. E. (1991). ‘An assessment of voluntary disclosure in
Following Lehman (1975) mapping the n ordered the annual reports of Japanese corporations’. The Inter-
observations to a N(0.I) distribution with density national Journal of Accounting, 26(3): 174189.
function 4(x) by taking the expected value of q,, Cooke, T. E. (1992). ‘The impact of size, stock market listing
as the Normal Score as in and industry type on disclosure in the annual reports of J a p
anese listed corporations’. Accounting and Business Research
zi= E+
[I
Since the natural estimate in (1) is difficult to com-
(Summer): 229-237.
David, F. N., Barton, D. E., Ganeshalingam, S., Harter, H. L.,
Kim, P. J. and Merrington, M. (1968). Normal centroidr, me-
dian and scores for ordinal data. Cambridge: Cambridge Uni-
versity Press, London.
pute for a given dataset an alternative Normal Doornik, J. A. and Hendry, D. F. (1994). ‘A practical test for
Score proposed by van der Waerden (1952, 1953) univariate and multivariate normality’. Discussion paper. Ox-
is to redace the exDectations in (1) bv ford: Nuffield Collcee.
224 ACCOUNTING AND BUSINESS RESEARCH
Draper, D. (1988). ‘Rank-based robust analysis of linear mod- Lindley, D. V. and Scott, W. F. (1984). New Cambridge Ele-
els’. Statistical Science, May. mentary Statistical Tables. Cambridge: Cambridge University
Epstein, M. (1975). ‘The usefulness of annual reports to cor- Press.
porate stockholders’. California State University. Los Ang- Malone, D., Fries, C. and Jones, T. (1993). ‘An empirical
eles: Bureau of Business and Economic Research. investigation of the extent of corporate financial disclosure in
Firth, M. (1979). ‘The impact of size, stock market listing and the oil and gas industry’ Journal of Accounting Auditing and
auditors on voluntary disclosure in corporate annual reports’. Finance, 8, new series, (Summer): 249-273.
Accounting and Business Research, 9 (Autumn): 273-280. McCabe, B. P. M. (1989). ‘Misspecification tests in economet-
Fisher, R. A. and Yates, F. (1938). Statistical tables for bio- rics: based on ranks’. Journal of Econometrics, 40.
logical agricultural and medical research. Edinburgh: Oliver Morgan, B. J. T. (1992). Analysis ofquantal response data. Lon-
and Boyd. don: Chapman and Hall.
Fox, J. (1984). Linear statistical models and related methods. Myers, S. C. (1977). ‘Determinants of corporate borrowing’.
New York John Wiley. Journal of Financial Economics, 4 (November).
Harter, H. L. (1961). ‘Expected values of normal order statis- Nicholls, D. and Ahmed, K. (1995). ‘Disclosure quality in cor-
tics’. Biometrika, 48: 151-165. porate annual reports of non-financial companies in Bangla-
Harter, H. L. (1969). Order statistics and their use in testing and desh’. Research in Accounting in Emerging Economies, 3:
estimation. Washington DC: US Government Printing Office. 149-70.
Hines, R. D. (1982). ‘The usefulness of annual reports: the Owen, D. B. (1962). Handbook of staristical tables. Reading,
anomaly between the efficient markets hypothesis and share- Massachusetts: Addison-Wesley.
holder surveys’. The Accounting Review (Autumn): 296-309. Schipper, K. (1981). ‘Discussion of voluntary corporate disclo-
Hoaglin, D. C.,Mosteller, F. and Tukey, J. W. (1983). Under- sure: the case of interim reporting’. Journal of Accounting Re-
standing robust and exploratory data analysis. New York: John search, supplement.
Downloaded by [Erciyes University] at 11:09 03 January 2015