Psychophysics Notes
Psychophysics Notes
Psychophysics Notes
- Panpsychism and monism, including “Nanna, oder über das Seelenleben der Pflanzen” (1848)
and “Über die physikalische und philosophische Atomenlehre” (1853)
• Everything in nature has a soul (“beseelt”) (= panpsychisme)
• He rejects the cartesian dualism of mind and body
• He defends a monistic view on the relation between the physical and the mental as two
aspects of the same (only one identity)
▪ Important core notion of his philosophical view on nature and starting point of
psychophysics, he wanted to study quantitatively the relation between these two
• Monism in the form of pan → everything is psychic, mental
• Starting point psychophysics = monism
▪ The start was already before the start of psychology
- He wanted to map the functional relation between the physical and the mental and thus
develops psychophysics
• = the exact science of the functional relation between the body and the mind
- “Elemente der Psychophysik” (1860):
• Start of psychophysics (even before the official start of psychology)
• Milestone in the development of psychology as a science
- Lowest threshold (“floor”): the minimal stimulus intensity (or signal strength) that is required
to be perceived
• Absolute is most often the lowest threshold, but you can also have an absolute threshold
at the other side of the spectrum (at the high end), so these two concepts aren’t the
same!
- Different ways to measure this (next lecture)
• Mostly a straightforward detection task
- Ideal situation: a discrete step function
• Discrete step function from nothing at all to perceiving the stimulus
- In reality: often no abrupt transition
- Within the perceptual range of a stimulus dimension, we can examine the differential threshold
(DL) or the just noticeable difference (JND or j.n.d.)
• These two are the same (DL and JND)
- Once again we can use several tasks (next lecture)
- Mostly discrimination tasks
• For absolute threshold we (usually) do a detection task, for difference threshold we
(usually) do a discrimination task (is this the same or different?)
➔ So increments are in proportion to absolute stimulus level: basic notion behind Weber’s law
➔ The bigger the standard stimulus, The more needs to be added to notice a difference
- Basic rule discovered by Ernst Heinrich Weber (1795-1878) in 1834 (“De pulsu, resorptione,
auditu et tactu”)
- Most straightforward formulation: the stimulus intensity must be increased by a constant
fraction of its value in order to obtain a just noticeable difference
- This fraction is usually represented as:
• k = ΔI / I
• l for intensity and ΔI for the smallest (“increment”) added (increment of intensity level
divided by intensity level of standard stimulus)
• → Intensity that leads to a JND
- → Weber fraction or Weber constant (= k, which is a proportion, not a fixed number)
- S = I (stimulus intensity)
- First graph:
• You have to add more and more for
larger values, you have to add
proportionally more
• Large slope = low sensitivity, low
slope = high sensitivity
- Second graph: weber fraction itself
• You always have the same fractional
measure within a sensory dimension
• Constant over the entire stimulus intensity domain
• Again, larger slope = low sensitivity, lower slope = high sensitivity
- Usually Weber’s law applies well for the middle part of the stimulus range (flat U-function)
• So applies well at middle level, but at lower end and higher end you get some increase,
so not a real flat function but a little bit of increase at both ends (U-function)
• So exceptions of Weber’s law in extreme dimensions
- Deviation is often quite major for very low stimulus intensities
- Modifications:
• ΔI = k ( I + Ir ) with Ir a small added value
• ΔI = k.In with n not necessarily equal to 1
- Big differences in sensitivity:
• k = .016 for luminance
• k = .33 for sound volume
V. Weber-Fechner law
- Fechners major contribution in developing methods (see before, he also wanted a mapping
between stimulus intensity and sensations)
- Psychophysical measurement requires a starting point and a measurement unit
• Ruler has a 0 point, and a certain increment (millimetre, centimetre) = measurement unit
- Fechner realized that the absolute threshold (RL, where you begin sensing the stimulus) could
be used to determine the starting point and the JND to determine a measurement unit
- This way he derived from Weber’s law (ΔR/ R = k) the so-called Weber-Fechner law:
• S = k log R
• Sensation = logarithm of stimulus intensity multiplied by k
- Not an obvious graph
• Sensation on x-axis, stimulus strength on y-axis
- You need a bit more before you start having a
sensation, so the graph doesn’t starts exactly at 0 of
the y-axis for 0 at the x-axis
• Starting point is higher than 0 on the stimulus
sensation
- Rather than to add a constant addition on sensation
axis, you have to multiply by Weber fraction, you have
to make the step larger and larger to get equal steps
on sensation axis
• Therefore the graph is not an linear shape
• We multiply on the y-axis to get equally big additions on sensation (x-axis)
- Strength of sensation logarithm multiplied by weber fraction
- Significance: to increase the strength of the sensation (S) as an arithmetic sequence (summed
with a constant) one has to increase the stimulus intensity (R) according to a geometric
sequence (multiplied by a constant)
• Arithmetic = summed with a constant (x-axis)
• Geometric sequence = multiplied (y-axis)
- Indirect scaling method (see later)
• You use threshold measurements as an indirect quantitative measurement to come up
with a good scale, indirect measurement to come up with a good scale
I. Introduction
- Psychophysics: scientific study of the (quantitative) relations between physical stimuli and
sensations
- Classical psychophysics: threshold measurements as an
indirect scaling method
- Modern psychophysics: signal detection theory and
direct scaling methods (more broadly applicable than
sensory modalities)
- Fechner (1860):
• The absolute threshold (RL) and the just noticeable difference (JND) can be used to
determine the starting point and the measurement unit, respectively (both necessary to
truly measure sensations):
- S = k . log R
• To the strength of the sensation (S) as an arithmetic sequence (summed with a
constant), one has to the stimulus intensity (R) according to a geometric sequence
(multiplied by a constant)
- Weber-Fechner Law (based on indirect scaling method):
- Psychometric function:
• A function that describes the relationship between stimulus
intensity and probabilities of observer responses in a
classification (forced-choice) task
1) Main tasks
a. Yes/no tasks:
- Advantages:
• By assigning two kinds of stimuli randomly to both intervals or positions, performance
can easily be compared to chance
• Under certain conditions (see SDT) the percentage of correct responses corresponds to
the area under the ROC curve, without the obligation to use the same cumbersome
procedure
- Disadvantages:
• A large number of trials is needed (especially if one wants to know the complete
psychometric function)
c. Identification task:
a. Method of adjustment:
- The easiest and most straightforward way to determine thresholds is to ask the subject to set
the physical stimulus value a certain way:
- For example measurement of an absolute threshold:
• → Increased stimulus intensity until the subject detects the stimulus
- It is usually measured by alternating a number of ascending and descending sequences (left)
- Instructions are very important in adjustment tasks, because the subject controls the stimulus
level him/herself (directly or indirectly via experimenter)
- The easiest is to use matching instructions:
• Setting the test stimulus in such a way that it corresponds to a standard or a reference
stimulus (right)
b. Method of limits:
- Variation of the intensity of just one stimulus (e.g., luminance or volume) → after each
stimulus, subjects have to report whether they perceived the stimulus or not
• The threshold will be determined by alternation of a number of ascending and
descending sequences (cf. adjustment method; see Table 1)
• The mean of the stimulus intensity of the last two trials is used as a transition point
within each sequence
• The mean of the transition points within several sequences can be used as a measure of
threshold
➔ So similar as the previous, but slightly different in the way to calculate the threshold
➔ More trial by trial (before more constant)
- Average for finding the transition point
- Do it several times, and then the
average of all of these transition points
- The experimenter chooses a number of stimulus values around the threshold (for example
based on adaptive procedure) → each of the stimulus values (for example 5 or 7) is presented a
fixed number of times (for example 50) in random order
• So present trials of different intensity in random order
• For each of these stimulus values, a frequency can be plotted for a number of response
categories (for example ‘yes’ answers in a yes/no task or one of the two alternative
responses in 2AFC), possibly as a proportion or percentage
• Such graph is called a psychometric function
- Psychometric function:
• Usually a continuous S-shaped function
• Exact threshold determination is not trivial
- Disadvantage:
• Many trials are necessary and not all data points are
useful
• (solution: adaptive procedure; see after)
3) Function estimation
- Necessary to estimate the true function by fitting a curve through the data points (left)
• Usually in papers you see curves like this (instead of datapoints like above)
- Example (right):
• Results of a standard two-interval forced-choice (2IFC) experiment
• The various stimulus contrasts are illustrated on the abscissa
• Black circles give the proportion of correct responses for each contrast
• The green line is the best fit of a psychometric function, and the calculated contrast
detection threshold (CT) is indicated by the arrow
▪ The contrast difference is 0.09 for participant to detect
▪ You need a curve to see this, not just the data points
- Theoretical point of view: data are often fitted with a cumulative Gaussian distribution
• Probit analysis is a possibility (Finney, 1971)
- Reason: the internal representation of the stimulus is supposed to have a normal distribution
- The perceived difference between two stimuli is inversely proportional to have overlap
between both normal distributions (Z-scores):
• Good stimulus representation (small σ) → steep slope
• Less good stimulus representation (larger σ) → shallower slope
- Much more variance so much more uncertain of your perception
• The closer to the ideal step function, the more precise you are (and so more precise
perception) → smaller standard deviation
- Point of subjective equality (PSE): The physical magnitude of a stimulus at which it appears
perceptually equal in magnitude to that of another stimulus
• 50% point of subjective equality → usually, not situated on the 0- value (because of the
response bias)
• There is some response bias of the participant so the curve will be a bit shifted
• Green = 360, blue = 290 (estimated)
- Example: at which point considers the participant the face as
angry depending on the color of the face? (data from 1
participant)
• Red they perceived much sooner as angry, they say more
easily they are angry
• Blue they perceived much sooner as fearful, they say more
easily they are fearful
- → So you can shift the PSE with the different colors
- Alternative functions:
• Logistic function (small difference, but faster)
• Weibul function (larger difference)
- The choice of the stimulus levels is very important for threshold determination:
• Points around 20% and 80% of correct answers are the most informative, while the 0%
and 100% points are theoretically redundant
• In practice, it is useful to include the easy conditions
to keep the subjects motivated
• So the ends are not informative, but they keep them
for motivation, the middle part is the important and
informative part
- Disadvantage:
• Threshold converges to 50% correct (probability of a correct answer = probability of a
wrong answer); which is too low for some purposes
• 50% because you go to the point where correct/incorrect/correct … (they are close to
each other), so we need a strategy to become more than 50%
- Levitt (1971) has developed a general transformation procedure to acquire specific values on
the psychometric curve through an adaptive “staircase” procedure
• Idea: to get a higher performance level, the subject needs to give several (successive)
correct answers before the stimulus level is lowered:
▪ E.g. “two-down, one-up” procedure converges to 70.7%
▪ E.g. “three-down, one-up” procedure converges to 79.4%
- The adaptive staircase procedure is the most frequently used adaptive test procedure because
it is the most straightforward
- Advantages:
• It is easy to choose the next stimulus level, increment, stop criterion and threshold
(adaptive staircase → PEST)
• No assumptions with respect to the shape of the psychometric function (non-parametric
→ maximum-likelihood)
▪ Only assumption: monotonic link between the stimulus level and the
performance level
2) PEST procedures
3) Maximum-likelihood procedures
- With the previous methods, subsequent changes in stimulus intensity relies on the subjects
previous two or three responses
- Main idea of the maximum likelihood method:
• The stimulus intensity presented at each trial is determined by statistical estimation of
the subject threshold based on all responses from the beginning of the session
• At each trial, a psychophysics function is updated and then serves to select a stimulus
intensity for the upcoming trial
- So using that fit of the function you will chose the intensity of the stimulus on the next trial
• Fitting the data on the previous data you have accumulated
- Maximum-likelihood adaptive procedure:
• Estimation procedure in which the best-fitting model is defined to be that which
maximizes the likelihood function
- Essential:
• Performance up to a given moment is used to estimate as well as possible the entire
psychometric function
• There are assumptions with regard to the shape of the psychometric function
(parametric)
- Likelihood:
• The probability with which a hypothetical observer characterized by assumed model
parameters would reproduce exactly the responses of a human observer
• The likelihood is a function of parameter values
- Graph:
• No very clear criteria (steps are not always the same)
• It is converting more quickly to a threshold
- Best PEST:
• stimulus intensity after a correct
response
• stimulus intensity after an incorrect
response
• Step sizes are not fixed (and tend to )
• 2nd trial: always the highest/lowest, given
the range set by the experimenter
IV. Conclusion
- Important to keep in mind the main procedures and methods, as well as the advantages and
drawbacks of each method
3. The psychometric function: advanced aspects
- Throughout this class we rely on simulated data ( α = 0.5; β = 0.1; γ = 0.5; λ = 0; F is cumulative
Gaussian)
• Bell shape: two different parameters:
▪ The mean, which is the centre (alpha value), the location
▪ The width of the distribution, summarizes where bulk of density is located (beta)
• Cumulative gaussian is sigmoidal, and alpha will be locating where the direction of
acceleration switches from sign, and beta says how steep the curve is
▪ If you play with the location (alpha), the sigmoidal will also move left/right
▪ The curve ranges between 0 and 1
𝑧
- Substituting the number of correct answers for the proportion correct 𝑦 = yields (this
𝑛
formula is for the proportion of correct instead of number of correct)
- We assume an underlying process that gives us the probability of getting a correct response,
which is dependant on a number of trials
- However, after an experiment we already know the data y, and we are interested in estimating
p, thus f(y) is not really interesting
• When we do fitting, we already have the data so the concept of probability is not really
appropriate anymore (because you either have observed it or not)
- As said earlier, the probability of a correct response is specified by the psychometric function
Ψ: Ψ(x|α, β, γ, λ) = γ + (1 − γ − λ)F(x; α, β) with parameter vector θ = (α, β, γ, λ)
- The likelihood thus becomes:
➔ In words: a loaded coin flip is assumed to underlie psychophysical responses (loaded since the
probability of heads and tails is not the same)
- Expected variability:
• For a contrast value, you can have a lot of variation
- How extreme (unlikely) a probability of success equal to 85% correct is all depends on the
amount of variability to be expected given n and p
• On the left the green point is not unexpected, but on the right it is very unexpected
- Variation of the binomial distribution:
• Function of n and p
• Likelihood depends on n and p
- The fact that it depends on n and p is because in a binomial,
the variance of the distribution (of correct responses or
proportion) varies with the true probability of success
- In this case:
• Variance peaks at 0.5, expected variability is higher at
0.5 than at 1
• If probability of heads is 1, you basically always observe
heads, there is no variability
• So variance fluctuates with probability correct, variance is a function of n and p
- The model "knows" how much variability to expect given n and p by assuming binomial
variability
- We can use the model to generate a number of data sets using a specific Ψ as a generating
function
- Parameter estimation:
• We choose the values of α, β, γ, λ for which the overall likelihood is maximal = maximum-
likelihood estimation
• Let’s map out the likelihood surface for different combinations of α and β assuming a
specific psychometric function Ψ ( α = .5, β = .1, γ = .5, λ = 0) and a binomial process
underlying the psychophysical responses
- Example:
• Z = log likelihood function
• The estimated values for α and β are: 0.535 and 0.084
• The maximum-likelihood estimator 𝜃̂ of θ is that for
which l(𝜃̂; y) ≥ l(θ; y) holds for all θ
• Deviance: comparison, compare likelihood of best estimate with θmax (saturated model),
deviance compares likelihood of saturated, perfect model with likelihood of fitted model
- Where θmax denotes the parameter vector of the saturated model without residual error
between empirical data and model prediction
- The reason why deviance is preferred as a goodness-of-fit statistic is that, asymptotically, it is
distributed as 𝜒𝐾2 where K denotes the number of data points (blocks of trials)
• With pi = Ψ(xi; θ)
- This is a very global measure, summary in a single measure of what we just have seen
- Difference between model prediction and observed data, take the sign of it (positive or
negative), compare predicted and observed proportion, if they are close to each other deviance
gets smaller
• We assess the sign, (positive or negative), then compare proportion, and then weigh with
number of trials
• If number of trials is big the weight is higher
- Importance of λ
• In this figure you can see the importance of having a lambda parameter
• If this is non-zero, it will influence quality of estimates if you assume it is 0 in your fitting
procedure
• Data point no longer at 1 but slightly lower because person has lapsed, but we keep
lambda fixed at 0, you see that estimation is strongly affected by that one data point
▪ This will result in difference of estimated threshold, you will never be able to
obtain true threshold because model is biased in lambda
• So it will influence the quality of your estimates when you assume lambda = 0
- If all JND’s imply equal changes in sensation magnitude, then the physical size of the JND must
be inversely related to the slope of the psychophysical function relating sensation magnitude to
stimulus intensity
• Hellman et al. (1987) suggest this is not the case
- JND is bigger than JND with other, steeper function
3) Thurstonian scaling
- Law of comparative judgment: theoretical model describing internal processes that enable the
observer to make paired comparison judgments
• From these judgments, it is possible to calculate psychological scale values
- Thurstone assumes that stimulus presentation results in a discriminal process that has a value
on a psychological continuum
• The variability in this process is called discriminal dispersion (= spreading)
• The psychological scale value is the mean of the distribution of discriminal processes
- Only indirect measurements can uncover this, by considering the proportions of comparative
judgments between stimuli
- Discrimination of two stimuli results in a discriminal difference → the standard deviation of
discriminal differences is given by:
4) Multidimensional scaling
- Metric MDS assumes a particular distance metric (e.g., Euclidian, cityblock, ...):
- City-block distance frequently applies for dimensions that can be intuitively
added, Euclidian distance for so-called "integral" dimensions.
- Nonmetric MDS assumes that only the rank order of distances need to be modelled
1) Partition scaling
Equisection:
- Section the psychological continuum into distances that are judged equal
• Distance between categories have to be equal
• People have to categorize the stimuli you give them
• For example: If stimulus A is the minimal and D the maximal stimulus, set B and C such
that distances A-B, B-C, and C-D are all equal
- Plateau (1872) introduced this method by letting painters reproduce
a gray midway between black and white (bisection)
- Simultaneous equisection and progressive equisection (see figure)
- Example: simultaneous equisection for
tones varying in frequency
- Validation procedures are important!
Verify that observers are indeed capable
of sectioning the psychological continuum
(under)
Category
2) Magnitude scaling
- Magnitude scaling became of interest when acoustic engineers started to try and specify a good
psychological scale of loudness
- Sound intensity can be converted to the logarithmic decibel scale, and according to Fechner's
law, this should be sufficient to arrive at a quantification of the loudness of sound
• 80dB does not appear do be twice as loud as 40dB
• Constructing a proper ratio scale of loudness was necessary
Estimation
- Richardson and Ross (1930) assigned the number 1 to a standard tone, and then let observers
determine numbers associated with sounds varying in intensity
- This yielded a power function rather than a logarithmic function! (R = response)
- S. S. Stevens then proposed that this could be used to replace Fechner's logarithmic law
- Corollary (= gevolg): If power law holds, and Weber's law holds, subjective JND’s should scale
with sensation magnitudes (cf. Ekman's law)
- A useful feature of power functions: they become linear if the logarithm is taken on both sides:
Production
3) Individual differences
V. Conclusion
- The last section describes a worked out example of trying to answer a psychophysical question
using these methods
5. Signal detection theory
- Early psychophysicists assumed a close correspondence between verbal reports and concurrent
neurological changes in the sensory system caused by stimulation
• They assumed a close correspondence between what people said and what was
happening in sensorial system
- It was assumed that - for well-trained observers - p(yes) was a function of the stimulus and the
biological state of the sensory system
• You have to be an expert in introspecting your sensations, so a well-trained observer
- Core idea: no nonsense variables influencing the proportion of yes, but we have already seen
that there is no clear stepfunction but a gradual increase (sigmoidal)
- Many nonsensory variables were found to influence p(yes) as well:
• Probability of stimulus occurrence
▪ Even these well trained observers were influenced by how often the stimulus
occurred (when it was always there they say more yes compared to when it is
only there for half of the trials)
• Response consequences
- The threshold concept is not applicable to stimulus detection behavior
- Tanner and Swets (1954) proposed that statistical detection theory might be used to build a
model closely approximating how people actually behave in detection situations
- Green and Swets (1966) describe signal detection theory in detail
- The crucial assumption is that signals are always detected against a certain background level of
activity or noise → the observer must therefore make an observation and try to decide
whether this observation was generated by the signal or by the noise
- Signal detection theory: a theory relating choice behavior to a psychological decision space
• An observer's choices are determined by the distances between distributions in the
space due to different stimuli (sensitivities) and by the manner in which the space is
partitioned to generate the possible responses (response biases)
• Difference between noise and signal distribution (maps to how sensitive you are), and
the way how you cut the space to make your decision (underlying space that depends on
your sensitivity)
- Today: three different paradigms that lie at the core of how signal detection theory is applied
- One-interval design: Participants are presented with a single stimulus and have to classify it in
one of two classes (present/absent, left/right, slow/fast, soft/loud)
- Performance on a task can be decomposed in:
• The extent to which responses mimic the stimuli (sensitivity),
• and the extent to which observers prefer to use one response more than the other (bias)
- Depending on the task, it can be useful to be biased to one or the other response (e.g., tumor
detection by radiologists, eyewitness testimonies, ...)
• Tumor detection by radiologists: very difficult task, yes/no because given x-ray you have
to say if they have a tumor or not, so basic yes/no paradigm, but they are maybe inclined
to say more yes because it is harmful to not detect the tumor
• Accusing someone of being at a crime scene can be detrimental so more inclined to say
no (eyewitness testimonies)
1) Sensitivity
- Typical yes/no experiment: participants are shown a series of faces they have to remember
• In the test phase, some new ones ("lures") are added and participants have to classify
each face as being "old" or "new"
- → Sensitivity refers to how well participants can discriminate old from new faces
- The table can be summarized by two numbers: hit rate (saying yes when
there is something) and false alarm rate (saying yes when there is nothing)
• H = P ("Yes" |Old)
• F = P ("Yes" |New)
- d’ was merely introduced, what kind of discrimination process does it imply? What is the
internal representation and how do observers arrive at a decision?
- Signal detection model underlying this d’
• There is always some kind of activity, and stimulus will generate additional activity on top
of the noise
• There is considerable overlap, so observers have to partition the space somewhere
▪ As soon as the activity level is strong
enough, for example bigger than 0, I will
say it is an old stimulus
▪ Is the stimulus is then smaller than 0 I will
say it is a new stimulus
• This is the decision space
• Moving the criterium across decision space,
because depending on criterium you will have
different hit rates and false alarms
- d’ does not depend on response bias, so a good measure of response bias is also independent
of sensitivity
- Bias should depend on H and F but now increasing when both increase (and vice versa), as it
aims to quantify the tendency to prefer one of the responses
- Where sensitivity relies on a difference between H and F, response bias should rely on a sum
- SDT suggested one sensitivity measure d’, but a variety of bias measures is available
4) Criterion location
- For each distribution, the decision variable has an associated "likelihood" (i.e., the height of the
distribution at that point)
- The likelihood ratio is another possible way to quantify bias:
𝑓(𝑥|𝑂𝑙𝑑)
• 𝛽= 𝑓(𝑥|𝑁𝑒𝑤)
- Under the standard SDT assumption (equal-variance Gaussian), this simplifies to:
• β = ecd’
• ln(β) = cd′
8) Response bias
- In yes-no experiments, participants determine which of two events occurred on a given trial
- In rating experiments, observers can make graded reports about the degree of their experience
by setting multiple criteria simultaneously
• Yes no: present/absent vs. graded reports: strongly absent, present, little absent …
- Example: how is odour memory influenced by the passage of time? (left)
• Continuous way in which ratings are distributed
• We actually ask to partition the space in more than only two sections
- how to calculate sensitivity?
• Option 1: ignore confidence judgments, and collapse responses in two classes, and
calculate d′ as was done previously
• Option 2: calculate H and F for each cell, and tabulate cumulative probabilities, this
implies successively partitioning decision space (i.e., shifting the criterion location)
- This yields the following table (right):
• Based on the (H, F) pairs, one can now calculate a set of d′
- What if the slope of the zROC curve is not equal to 1? Detectability as quantified by d′ is no
longer constant (i.e., it varies with criterion location)!
- What kind of underlying distribution can cause this type of ROC? (left)
• When the slope is smaller than 1, moving one z-unit on the F axis of the zROC implies
moving less than one z-unit on the H axis
- Principled way to compute d′? (see Chapter)
- Difference in width distribution, so you can’t compute d’ anymore on the formula we have,
because it would give different d’ values for different criteria