1. Introduction
Claude Shannon’s seminal work [
1,
2] has exerted a considerable impact on many fields of science. One of those fields is psychology. Shannon’s precise definitions of such concepts as information, entropy, and channel-capacity revolutionized the way psychologists thought about human cognition and how they measured it [
3,
4,
5,
6,
7,
8]. The mind (and later the brain) was construed as a processing device—one that can encode, manipulate, and transmit information [
9]. From a historic perspective, information theory served as a major catalyst in launching the cognitive revolution [
10]. Consider, for example, Shannon’s notion of a limited-capacity channel. This idea was (and continues to be) instrumental in research on human attention and memory [
11,
12], as it was originally consequential for the discovery of capacity limitations on short-term memory [
6] or for the uncovering of
Hick’s law [
13], a remarkable regularity by which mean response times (RT) increases linearly with the
log number of choice response alternatives. The discovery of Hick’s law and other conceptual advances [
14] have turned RT into one of the central methodologies in psychological theorizing and experimentation [
15,
16,
17,
18]. The main assumption guiding practitioners of RT methodologies is that observed RT reflect the duration of internal cognitive processes [
19]. The more demanding a cognitive operation is, the longer its completion time [
16]. Most researchers rely primarily on mean RT as their dependent variable. However, there is now an emerging consensus among psychologists that RT distributions conceal more information than meets the eye [
20,
21,
22] and that this fine-grained information should be extracted to advance information processing models [
23]. Consequently, recent years have seen a rise in the number of distributional RT measures and models for review; see [
17]. These approaches aim at capturing broader statistical aspects of the RT distributions to provide a richer and more fine-grained data set against which human cognition can be modeled and tested. One of the most theoretically valuable quantities lurking in RT distributions is their entropy [
24]. Entropy is closely related to such psychological constructs as
processing capacity [
11,
25],
workload [
26], and
efficiency [
27], which are often measured using RT methodologies [
28,
29].
However, to date, no valid measure of entropy on continuous RT distributions has been developed. The reason for this is that available extensions of discrete entropy measures to continuous distributions (i.e., differential entropy) were fraught with problems and inconsistencies [
30]. First, the entropy of discrete distributions is always positive, whereas the differential entropy of continuous distribution can take any value on the extended real line. Second, the entropy of empirical distributions cannot be used to approximate the differential entropy of theoretical continuous distributions. Third, Shannon’s entropy does not take into consideration the content represented by probabilities, but only their probabilistic values. This is unfortunate because the precise quantitative characterization of total and instantaneous entropy in RT distributions can provide considerable statistical and theoretical advantages. One notable benefit has to do with measurement. A valid entropy measure will allow comparison across various tasks, irrespective of those tasks’ units of measurement, by adhering to a single quantity of entropy (e.g., bits). Another contribution is in introducing richer theoretical accounts of abstract psychological constructs such as
processing capacity [
25] and the ability to associate these psychological constructs with their corresponding entropy signatures in the brain [
31,
32,
33]. Fortunately, recent advances in information theory offer a new measure of differential entropy in continuous distributions. It has been developed by Rao and colleagues [
30,
34,
35,
36] and is known as the
cumulative residual entropy (CRE). This measure successfully extends the Shannon entropy to random variables with continuous distributions thanks to three major properties: (a) it is consistently defined in both the continuous and discrete domains, (b) it is always nonnegative, and (c) it can be computed from empirical data and asymptotically converge to true values [
30]. These properties of the CRE make it a viable measure of entropy in continuous distributions in many domains of application [
36,
37].
The present work takes advantage of these developments and imports the CRE to the field of psychology. Here, I demonstrate how differential entropy measures on empirical and theoretical RT distributions can be constructed. This approach allows for stronger statistical inferences and well-motivated theoretical interpretations of cognitive phenomena. The remainder of the paper is structured as follows. I start by introducing the canonical RT functions deployed in psychology. I then introduce the differential entropy measure—CRE [
30,
34,
35]—and show how it can be applied to RT distributions. In addition, a novel measure of instantaneous cumulative residual entropy [iCRE(t)] is developed that captures the momentary entropy at time
t, and its slope gauges the rate of information gain. Finally, the relations of these novel entropy measures to existing RT functions and psychological concepts such as
mental effort and
processing capacity are discussed.
2. Basic Tools for Stochastic Modeling of RT
Cognitive phenomena are not deterministic but are rather subjected to a certain degree of uncertainty and variability [
16]. This idea was succinctly put by Ebbinghaus [
38], who referred to “the constant flux and caprice of mental events” (p. 19). Psychologists widely agree that models of human behavior and cognition should incorporate stochastic mechanisms. To capture the stochastic nature of cognitive processes, RTs are treated as a
random variable [
16,
18] and are represented by empirical and theoretical continuous distributions. In general, there are two primary practices for making inferences based on RT distributions. Proponents of the
parametric approach fit a theoretical distribution to empirical RT data and use the estimated parameters in building and testing cognitive models [
39]. For example, the
exGaussian distribution [
21,
40,
41,
42] has often been used [
43]. This distribution closely mimics many aspects of empirical RT distributions, such as their negatively skewed shape and long right tail. It also provides excellent fits to experimental data. I use this distribution throughout this paper in simulations and demonstrations. An RT in this distribution is the sum of a random value from a Gaussian distribution with mean
and standard deviation
and a random value from an exponential distribution with mean
. The density function is
with mean
and standard deviation
.
Figure 1 presents the outcome of simulations with the exGaussian distribution, demonstrating how changes in parameters (
,
, and
) induce changes in the location, scale, and shape of the distribution. The initial distribution parameters were set to
,
and
. In each simulation, one of the parameters was increased to demonstrate its influence; see [
21,
44].
An alternative practice of modeling RT distributions is the
aparametric approach. Proponents of this practice make distributional-free inferences based on moments and cumulants [
45,
46].
In both the parametric and non-parametric approaches to RT distributions, researchers harness a number of probability-theory continuous functions to model stochastic aspects of information processing [
16]. The first tool is the
probability distribution function (PDF). The PDF (see
Figure 2a) measured with RT has a single mode and is negatively skewed with a long tail extending to the right [
43]. However, the stochastic nature of information processing is not immediately discernible from the PDF. To capture these aspects directly, researchers harness the
cumulative distribution function (CDF),
, where
t indicates any value of the continuous random variable of time, and
T stands for one specific value. Townsend and Ashby [
16] noted that “... it may frequently turn out that the distribution function
is much easier to calculate directly than is the density function
. Once the distribution function is known, the density function can easily be derived through Eq. 3.1 [which is the derivative of F(t)]” (p. 25).
gives the probability that the participant has responded by time
t (see
Figure 2b). This function starts at 0 and levels off at 1, when there is certainty that the participant has responded. A cumulative probability of 0.9 at some time
t, for example, means that
of responses have terminated by that time. The PDF and the CDF are related such that the CDF is the integrated PDF from 0 to the current
t:
Reciprocally, the PDF at
t is the derivative of the CDF
.
A third function that is often deployed in the RT literature is the
survivor function,
, which captures the probability that the participant has not responded yet by time
t, i.e.,
(see
Figure 2c). Note that
and
complement each other:
Another function that plays an important role in RT modeling is the
hazard function [
47]. This function is well-established in longitudinal or distributional (duration, transition, or failure-time) analysis [
48]. The survival technique has already been applied in many areas of psychology [
47,
49,
50,
51]. Townsend and Ashby [
11] were the first to apply it to RT distributions:
where the numerator is the probability density function,
, for task completion times, giving the probability that the task will be completed at some particular time
t, and the denominator is the survivor function,
, designating the probability that the task will not have been completed by time
t. The hazard function is a conditional probability function, providing the likelihood that the process fails in the next instant, given that it has not yet failed (see
Figure 2d). A decreasing hazard function means that as the process gets longer, the likelihood is higher that it will terminate in the next instant. An increasing hazard function entails that as the process continues, the likelihood is high that it will not terminate on the next instance. A constant hazard function suggests that the likelihood that the process will terminate on the next moment is independent of its duration to that moment [
40]. The shape of the hazard function can be used to decide among classes of cognitive models, evaluate models, and generate predictions for RT distributions [
18].
In the psychological modeling of response times, the hazard function
has often been interpreted in terms of the
energy or
efficiency invested by a participant in a task [
11,
24,
45,
50,
51,
52]. The logic behind this is that the hazard function expresses the likelihood of a worker completing the task in the next instant, given that the worker has not yet completed the task. A worker with low capacity has a low likelihood of completing the task in the next instance given that she has not finished it yet, whereas a worker with high capacity has a high likelihood of completing the task in the next moment given that she has not completed it yet. Finally, I introduce the related
integrated hazard function H(t) [
16]:
which takes the integral of the hazard function from 0 to time
t. The function starts at 0 and increases monotonically to infinity (see
Figure 2e). The integrated hazard function also equals
—a mathematical identity that greatly assists researchers in simplifying computations [
11,
12,
53]. The value of the
cumulative hazard function up to time
t has been interpreted as the cumulative amount of work done or effort expanded in a task by time
t [
51].
The hazard (and the integrated hazard) function meets the desirable characteristics of any viable measure of dynamic capacity [
53]: (a) it is bounded at the lower end by zero and bounded on the upper end by the physical limits of the system, (b) its integral captures the cumulative amount of work accomplished by some time, and (c) the cumulative hazard itself is bounded at the lower end by zero and unbounded at the upper end. These aspects are consistent with extant approaches to
capacity in contemporary psychological literature [
54]. These approaches often relate to the efficiency by which information is processed, bearing close affinity to the idea of “bandwidth” or energy invested per unit of time. As I show in the next sections, these beneficial statistical aspects also characterize the entropy measures developed here for RT distributions.
The five RT functions presented here are critical for the construction of our novel entropy measures.
Figure 2 presents simulations of these five functions using the exact exGaussian parameters from the previous simulation.
It is important to note that capacity can be measured at different levels of granularity. It can be assessed at the level of the
system, as is the case when participants process entire faces composed of individual features (e.g., nose, eyes). The capacity of a system can be potentially affected by the global workload (i.e., number of featural elements to be processed) [
45,
51]. The more elements to be processed, the greater the capacity needed to complete the processing. Alternatively, capacity can be assessed at the local level of the
individual elements that compose a system [
53] (e.g., processing the nose in a face).
Many psychologists treat the processing of tasks/stimuli/elements in terms of a set of (in)dependent channels, along which information accumulation takes place [
16,
26,
54]. Processing along each channel is modeled with the canonical RT functions. Psychologists apply the term “channel” as a metaphor, without actually deploying the standard entropy metric. The metaphor likely attained traction in the heydays of the cognitive revolution and has remained fruitful in psychology ever since. For example, in James Townsend’s seminal work [
12,
45],
serial or
parallel processing systems are modeled as a set of corresponding channels, along which accumulation of evidence takes place either one-channel-after-the-other (i.e., serial architecture) or simultaneously across all channels (i.e., parallel architecture). The capacity of a single channel is measured by the
integrated hazard function, and general formal derivations quantify the joint capacity of the system for
serial or
parallel channel architectures, contingent on the operative
stopping rule adopted by the observer (i.e.,
self-terminating or
exhaustive) [
45,
55].
3. The Cumulative Residual Entropy (CRE)
Shannon [
1,
56] proposed entropy as a measure of
uncertainty U in discrete distributions:
where
s are the probabilities derived from the distribution
F. This measure of uncertainty has important properties: (a) it is always positive, (b) it vanishes
if and only if it is a certain event, (c) entropy is increased by adding an independent component, and (d) entropy is decreased by conditioning. One may think of the following differential entropy as a possible extension:
However, as noted at the outset, this measure leads to various challenges and inconsistencies. First, whereas the entropy of discrete distributions is positive, this differential entropy of continuous distributions can also be negative. Second, empirical distributions cannot be used to approximate this differential entropy of theoretical continuous distributions. Third, in Shannon entropy, the content represented is not taken into consideration, only the probabilistic values. Fourth, it is only defined for distributions with densities. Fifth, in contrast to discrete entropy, the reduction in the conditional differential entropy of
X given
Y does not imply that is
X a function of
Y. To solve all these problems, Rao and colleagues [
30,
34] presented the
cumulative residual entropy (CRE) of
X. The CRE extends the Shannon entropy to random variables with continuous distributions, and ameliorates all of the noted problems:
where
is the survivor function, which captures the probability that an event has not yet occurred for values of
. The CRE has several marked properties: (1) CRE is consistently defined in both the continuous and discrete domains; (2) CRE is always nonnegative; (3) CRE can be computed from empirical data, and these computations asymptotically converge to the true values; and lastly, (4) the conditional CRE of
X given
Y is zero
if and only if X is a function of
Y. All of these properties are supported by the rigorous proofs given in [
30]. These authors also proved that the CRE of the empirical distribution almost surely converges to the true CRE. Their proof also provides a way of computing the CRE for empirical distributions.
Equipped with knowledge of basic RT functions and the differential entropy measure CRE, we are now ready to move on to quantifying the entropy of RT distributions.
5. Simulations of the CRE
Next, I simulated data from parametric distributions [
40] (the exGaussian and exponential). The goal of these simulations was twofold: (a) to demonstrate the CREs of different distributions, and (b) to provide a practical example on how to calculate the integral of the CRE using non-analytic methods afforded by the statistical environment R [
58].
Figure 3a–i depicts simulations with the exGaussian distribution. Two hypothetical experimental tasks were assumed. In three simulations, I have independently manipulated each of the three exGaussian parameters (
,
, and
) across the two tasks (distributions). In the first simulation, I have manipulated the
parameter across the two distributions but kept the
and
identical. One of the distributions (blue) was given a
value of 800 compared to the baseline distribution (pink,
). Recall that in the exGaussian distribution, the mean and the variance are affected differently by the manipulation of the parameters. The mean is composed of the
and
parameters [
], whereas the variance is composed of the
and
parameters [
]. Therefore, changes in
selectively influence the mean of the distribution. As can be noted in
Figure 3a, the PDF of the fast task illustrated in pink is shifted to the left of the slow task (illustrated in blue), but the shapes of the two distributions are identical. This shift in location can also be noted in
Figure 3b, where the survivor function of the fast task (pink) is located to the left of the slow task (blue). This entails that the probability that the fast task will have been completed by time
t is higher than that of the slow task. Most importantly,
Figure 3c depicts the CRE functions [
] for the two tasks. The CREs are similar, but again, the CRE of the easier task is shifted to the left. To calculate the entropy of these CREs, I have deployed the
AUC (area under the curve) tool, which incorporates a numerical integration algorithm and is available as part of the
MESS R package [
59]. AUC uses linear or natural spline interpolation for two vectors to compute the area under the curve. In our case, the first vector contains the
t values, while the second vector contains the corresponding [
] values. The integration was held from 0 to
∞ (a practical value of 2500 ms was used as the upper limit of the integral). As a quick visual inspection of
Figure 3c may suggest, the cumulative residual entropy of the two distributions was found to be comparable, amounting to approximately 66 bans.
In a second simulation, I manipulated the
parameter across the two distributions but kept the
and
identical. One of the distributions (blue) was given a
value of 100 compared to the baseline distribution (pink,
). This manipulation selectively influences the variance of the distribution. The results of these simulations can be seen in
Figure 3d–f. The blue PDF distribution has higher variance than the pink distribution, which is also reflected in the corresponding CRE functions. Using the AUC integration tool, I found that the distribution with the higher variance also produced larger total cumulative residual entropy (108 bans) compared to the distribution with the lower variance (66 bans).
In a third simulation, I set the
parameter in the blue distribution to 120 (compared to the baseline pink distribution for which
). This manipulation affects both the mean and the variance of the distribution because
and
. The outcome of this simulation can be seen in
Figure 3g–i. This manipulation, which doubled the value of
, resulted in higher variance in the PDF and larger CRE in the blue distribution compared to the pink distribution (151 bans compared to 66 bans).
Another set of simulations was held with the exponential distribution. As can be noted in
Figure 4a, the task depicted by the green color is slower than the one illustrated in red (it has a lower
rate). The difference in values of the
parameter also entails that their corresponding PDF, survivor, and CRE functions differ in shape (see
Figure 4b,c). Visual inspection of
Figure 4c suggests that the areas under the two corresponding CRE functions are not equal. Indeed, the computation of the integral of the CREs using the AUC tool from 0 to
∞ (a practical value of 30 was used was used as the upper limit of the integral) revealed that the slow task yielded higher cumulative residual entropy (=1.96 bans) than the fast task (=1.07 bans). These values are in accordance with the analytic solution (see Equation (13)). The
for the slow task amounts to
, while the
for the fast task
. These are the exact parameter values initially used in the simulation.
The cumulative residual entropy (CRE) of RT distribution provides a viable measure of the amount of entropy induced by a task. From a psychological point of view, the CRE can be interpreted as reflecting the total amount of
energy [
11] or
effort [
25] invested by the participant in accomplishing the task at hand. The more capacity a task demands, the larger the amount of entropy it produces. This interpretation is consistent with extant psychological models of processing capacity and attention [
45,
46].
One notable aspect of the simulations is the possibility that mean RT and entropy, as measured by the CRE, are two independent aspects of RT distributions. The simulations showed that speed of processing and entropy are not necessarily dependent on each other. We could simulate RT distributions that have different means but identical CREs; conversely, we could generate RT distributions with the same mean RT but different CREs. However, in psychological research, a task that produces, on average, shorter mean RT is considered to be easier to perform, and is also expected to yield lower variance [
60]. Because our conclusions are based only on simulated data, an open question still remains to be answered of whether a fast task/condition is also characterized by lower values of entropy than a slow task/condition. This hypothesis can be tested empirically with data from real experiments.
Another related point that deserves a comment concerns the relations between variance and entropy. It is likely that RT distributions with higher variance also have a larger amount of CRE. After all, variance is another measure of uncertainty [
61,
62]. This intuition was not refuted by the current simulations. But the entropy metric is preferable to the statistical measure of variance due to its richer theoretical import.
6. Instantaneous CRE
The application of the CRE to the RT distribution can give us the total amount of entropy in an experimental condition/task. This is very useful when comparing performance across conditions/tasks. However, in the stochastic modeling of response times [
11], psychologists are also interested in
instantaneous aspects of processing. One notable example for the relations between global and instantaneous measures of processing is the
hazard function and
integrated hazard function. The former is the derivative of the latter, giving the amount of energy or effort invested in a task at a given moment in time. In a similar vein, we can use the CRE to gauge the instantaneous entropy at a given point in time
t. This can be accomplished by taking the derivative of the CRE at time
t:
Integrating from
to
will give us back the CRE. However, this quantity is a weighted measure because the entropy of each event
is weighted by its probability of occurrence
x [
56]. Thus, the unweighted cumulative residual entropy at time
t equals
where
is a function that quantifies the instantaneous cumulative residual entropy at time
t. It is a positive, linearly increasing function bounded between 0 and
∞ (see
Figure 5b or
Figure 5d). This function meets the desirable mathematical aspects of a capacity measure assessed across the course of a processing event. It is bounded at the lower end by zero and is unbounded at the upper end [
50,
53].
Figure 5b presents the
for two exGaussian distributions. As can be noted, values of the
increase linearly from zero to infinity as a function of time. The function is related directly to the survivor function and can be interpreted as the amount of information gained from the stimulus or alternatively the reduction in uncertainty with time. The iCRE(t) gives the amount of information gained at time
t and is measured in units of information (e.g., bits). When the processing of a stimulus has not been started yet, the probability that processing has not been finished yet is 1, so the amount of information gained from this stimulus at this moment in time is 0. However, as
decreases from 1 toward 0, more and more information is gained. This assumption about the accumulation of information from the stimulus is a cornerstone of
RT diffusion models [
63]. Interpreting the
as a reduction in entropy is also analogous to saying that the certainty about the stimulus/task’s final state of processing is increasing. Thus, when
is equal 0,
, entailing that the information one gains with respect to the hypothesis that processing has finished by time
t is very large. This function can also stand for the self-information of a processing channel [
56]. When it reaches infinity, or practically a relatively high value, one can safely conclude that processing has ended.
The inherits the properties of the CRE: (1) it is consistently defined in both the continuous and discrete domains, (2) it is nonnegative, (3) it can be computed from empirical data, and (4) it increases by the addition of another variable (processing channel) if and only if the variables (channels) are independent. Another property of the is that it is increasing monotonically as a function of processing time. All of these make it an excellent measure of processing capacity.
8. Relations of the iCRE(t) to Other RT Functions
The
bears close affinity with two useful RT functions (see Luce’s classic book [
18], p. 18). First, the residual entropy function has quantitatively the same value, though with an opposite sign, as the log survivor function:
which is practical for summarizing and plotting temporal aspects of RT data [
18]. Second, the
is mathematically identical to the
integrated hazard function for which the following equivalence has already been introduced:
The
instantaneous cumulative residual entropy function and the
integrated hazard function are gauged by the same mathematical equation and can be viewed as conveying a similar conceptual definition of processing efficiency or capacity. However, the two indices differ in their units of measurement. The former is measured in units of the to-be-processed elements, whereas the latter is measured in universal units of information (bits). This confers the
a marked advantage, because it allows researchers to compare performance across different conditions, tasks, or experimental procedures based on a common unit of measurement (e.g., bits). Previous applications of the hazard function to questions of human performance and capacity, in particular, have circumvented the problem of different units of measurement by using the ratio of (cumulative) hazard functions [
50]. A ratio gives a pure quantity that gets rid of the units of measurement. Wenger and colleagues [
50,
51,
65], for example, have harnessed Cox’s regression [
66], which relies on the ratio of hazards, to model the psychological construct of processing capacity. Townsend and colleagues [
12,
45] have developed a
capacity coefficient that circumvents the problem by expressing a ratio of cumulative hazard functions. With the
and CRE measures, comparisons across conditions, experiments, and tasks can be made directly.
Recently, Lappin and colleagues [
24] have presented the
cumulative hazard function . The function measures the cumulative work—detection progress—up to time
t. They have also proposed an information theoretic interpretation of the derivative function
b(t):
which measures the rate of change in the cumulative hazard function
B(t) in units of bits/s. Note that the derivative of the cumulative hazard function is the hazard function itself, which, according to these authors, can be scaled to obtain units of bits. These functions bear close affinity to the measures developed here.
9. Simulations of the iCRE(t)
Figure 5a,f depicts the outcome of simulations with the exGaussian distribution, using the exact same parameter values used earlier. In these simulations, two hypothetical tasks/conditions are modeled that differ (in each simulation) with respect to one of the three exGaussian parameters (
,
, or
). The results of the first simulation are presented in
Figure 5a,b, where the two exGaussians differed only in the
parameter, which implies a selective change in the means of the two distributions. Recall that in the exGaussian distribution,
, whereas
. The
of the fast task (plotted in pink) is shifted to the left relative to the slower task (plotted in blue), but the rate of information gain (reduction in entropy) of the two tasks, as indicated by the slopes of their
functions, is identical (
bans/ms).
The results of the second simulation are presented in
Figure 5c,d, where the two exGaussians differed only in the value of their
parameter. This manipulation selectively influences the variance of the distribution. The blue PDF distribution, with the higher standard deviation, yielded an iCRE(t) slope that was comparable to that observed in the pink distribution (
bans/ms).
The results of the third simulation are depicted in
Figure 5e,f, where the two exGaussians differed only in the value of their
parameter. This manipulation affects both the mean and the variance of the distribution. The blue distribution (with the larger
value) exhibited a significantly slower rate of information gain (0.0084 bans/ms) than the pink distribution (0.0164 bans/ms).
The simulations exemplified the utility of our new entropy measure in uncovering important statistical effects that are not immediately discernible from the PDFs or the CRE functions. Notably, the rate of information gain can be the same across two experimental tasks/conditions even when one condition/task is slower than the other. This was demonstrated in the first simulation. Moreover, while higher variance in the PDF of one condition/task might entail larger values of total entropy (as indicated by the integral of the CRE in the second simulation), it does not necessarily imply a lower rate of information gain. Finally, we noted that when both mean and variance are increased, the information transmission rate can decrease. These observations are based on one type of distribution (the exGaussian), but they suggest intricate relations between the parameters of a distribution and its corresponding iCRE(t) function. These relations are not directly observable from the PDF or even CRE functions.
Lastly, an issue that deserves a comment concerns the wiggling of the iCRE(t) line at high values of
t. As can be noted in
Figure 5b,d,f, the line becomes jerky for long
t values. Recall that this function is estimated directly from the survivor function
. It is likely that the number of data points at prolonged times is drastically reduced. Naturally, at longer times, there are only few instances that survive. This scarcity of data points can account for the relatively unstable estimation of the iCRE(t) function with these late
t values.
10. Discussion
Human behavior is governed by unobserved internal structures (e.g., attention). These are hidden from direct observation, leaving only residual traces in behavior. In this context, one can appreciate the immense utility of response times (RT) in unearthing cognitive structures and in facilitating the development of quantitative models of human cognition and behavior. Most researchers use mean RT as the central dependent variable while ignoring the great deal of information lurking in the entire distribution. One notable property of entire RT distributions is their
entropy, which can be of considerable theoretical importance. The concept of entropy has strong conceptual links to psychological notions such as mental energyand efficiency.However, to date, no valid measure of entropy on RT distributions has been developed. This is mainly because available elaborations of discrete entropy measures to differential entropy were fraught with problems and inconsistencies. Recent advances in information theory, and in particular the development of the
cumulative residual entropy (CRE) [
30], have allowed researchers in many areas to begin measuring differential entropy in continuous distributions [
36,
37]. The present study has taken the first step in applying the CRE to one of psychology’s most important dependent variables—response times (RT) distributions. Here, I have developed a set of novel measures of entropy on RT distributions and uncovered their close formal affinity with existing RT functions.
The entropy measures of RT distributions developed here have great theoretical and practical benefits in assessing the level of energy or effort invested by an observer in a given task. Potential contributions of such measures are numerous. For example, one can compare the amount of entropy exerted by two experimental conditions or by two populations. Another potential contribution is in relating measures of RT entropy to signatures of entropy in the brain [
31]. In addition, the slope of the
—a measure of the rate by which information is gained or transmitted from a stimulus—can be particularly useful in estimating the energy needed to perform a task or in modeling internal mechanisms of processing. One area that can benefit from these developments is the study of parallel/serial systems and their overall capacity [
67].
Many stochastic models of human judgment and choice take into consideration both response times and accuracy. A notable example is Ratcliff’s diffusion model [
68]. Future work on the CRE in psychology should seek to incorporate the dependent variable of accuracy. A methodology for measuring information transmission with RT and accuracy does exist based on the notion of
speed accuracy tradeoff (SAT) [
64,
69,
70]—the observed regularity by which the probability of correct response increases as RTs get longer. The information
I transmitted by the stimulus is computed as a function of accuracy P with the formula:
at a given mean RT, and plotted against it. Typically,
I increases monotonically with mean RT. The interpretation of this function is that the observer extracts more information as their responses are prolonged. This is an important measure, but it relies on mean RTs and not on the entire RT distribution. In addition, it is based on the premise that the speed–accuracy trade-off function is always positive—an assumption that may sometimes be violated [
71]. The proposed CRE measures avoid these problems, because they are computed on entire distributions and make minimal assumptions about processing.
The present study has developed new measures of entropy on RT distributions, with the main focus being their interpretation as capacity indices. However, it should be highlighted that these entropy measures have a general purpose and can be potentially valuable in other applications where the notions of capacity or energy are not involved. For example, when fitting empirical RT distributions, the difference in entropy between observed and expected distributions can serve as a minimization function for the algorithm searching the parameter space. Another useful application is in the area of model comparison, where one can deploy differences in total entropy to decide between competing models.