Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Instant Download Information Theory and The Brain Roland Baddeley PDF All Chapter

Download as pdf or txt
Download as pdf or txt
You are on page 1of 84

Full download ebook at ebookgate.

com

Information Theory and the Brain Roland


Baddeley

https://ebookgate.com/product/information-theory-
and-the-brain-roland-baddeley/

Download more ebook from https://ebookgate.com


More products digital (pdf, epub, mobi) instant
download maybe you interests ...

The Theory of Information and Coding 2nd Edition


Mceliece

https://ebookgate.com/product/the-theory-of-information-and-
coding-2nd-edition-mceliece/

Cosmopolitanism in Context Perspectives from


International Law and Political Theory 1st Edition
Roland Pierik (Editor)

https://ebookgate.com/product/cosmopolitanism-in-context-
perspectives-from-international-law-and-political-theory-1st-
edition-roland-pierik-editor/

Information Theory A Concise Introduction Hollos

https://ebookgate.com/product/information-theory-a-concise-
introduction-hollos/

Information Quality Management Theory and Applications


Latif Al-Hakim

https://ebookgate.com/product/information-quality-management-
theory-and-applications-latif-al-hakim/
Handbook of brain theory and neural networks 2nd
Edition Michael A. Arbib

https://ebookgate.com/product/handbook-of-brain-theory-and-
neural-networks-2nd-edition-michael-a-arbib/

Virtual Reality and Medicine James Roland

https://ebookgate.com/product/virtual-reality-and-medicine-james-
roland/

Emergent Information A Unified Theory of Information


Framework World Scientific Series in Information
Studies 3 1st Edition Hofkirchner

https://ebookgate.com/product/emergent-information-a-unified-
theory-of-information-framework-world-scientific-series-in-
information-studies-3-1st-edition-hofkirchner/

Quantum Information Theory and the Foundations of


Quantum Mechanics 1st Edition Christopher G. Timpson

https://ebookgate.com/product/quantum-information-theory-and-the-
foundations-of-quantum-mechanics-1st-edition-christopher-g-
timpson/

Price Theory and Applications Decisions Markets and


Information 7th Edition Jack Hirshleifer

https://ebookgate.com/product/price-theory-and-applications-
decisions-markets-and-information-7th-edition-jack-hirshleifer/
INFORMATION THEORY AND THE BRAIN

Information Theory and the Brain deals with a new and expanding area of
neuroscience which provides a framework for understanding neuronal proces-
sing. It is derived from a conference held in Newquay, UK, where a handful of
scientists from around the world met to discuss the topic. This book begins
with an introduction to the basic concepts of information theory and then
illustrates these concepts with examples from research over the last 40 years.
Throughout the book, the contributors highlight current research from four
different areas: (1) biological networks, including a review of information
theory based on models of the retina, understanding the operation of the insect
retina in terms of energy efficiency, and the relationship of image statistics and
image coding; (2) information theory and artificial networks, including inde-
pendent component-based networks and models of the emergence of orienta-
tion and ocular dominance maps; (3) information theory and psychology,
including clarity of speech models, information theory and connectionist mod-
els, and models of information theory and resource allocation; (4) formal
analysis, including chapters on modelling the hippocampus, stochastic reso-
nance, and measuring information density. Each part includes an introduction
and glossary covering basic concepts.
This book will appeal to graduate students and researchers in neuroscience
as well as computer scientists and cognitive scientists. Neuroscientists inter-
ested in any aspect of neural networks or information processing willfindthis a
very useful addition to the current literature in this rapidly growing field.

Roland Baddeley is Lecturer in the Laboratory of Experimental Psychology at


the University of Sussex.

Peter Hancock is Lecturer in the Department of Psychology at the University


of Stirling.

Peter Foldiak is Lecturer in the School of Psychology at the University of St.


Andrews.
INFORMATION THEORY
AND THE BRAIN

Edited by
ROLAND BADDELEY
University of Sussex

PETER HANCOCK
University of Stirling

PETER FOLDIAK
University of St. Andrews

CAMBRIDGE
UNIVERSITY PRESS
CAMBRIDGE UNIVERSITY PRESS
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, Sao Paulo, Delhi

Cambridge University Press


The Edinburgh Building, Cambridge CB2 8RU, UK

Published in the United States of America by Cambridge University Press, New York

www.cambridge.org
Information on this title: www.cambridge.org/9780521631976

© Cambridge University Press 2000

This publication is in copyright. Subject to statutory exception


and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.

First published 2000


This digitally printed version 2008

A catalogue record for this publication is available from the British Library

Library of Congress Cataloguing in Publication data


Information theory and the brain / edited by Roland Baddeley, Peter
Hancock, Peter Foldiak
p. cm.
1. Neural networks (Neurobiology) 2. Neural networks (Computer
science). 3. Information theory in biology. I. Baddeley, Roland,
1965- . II. Hancock, Peter J. B., 1958- . III. Foldiak, Peter,
1963-
OP363.3.I54 1999 98-32172
612.8'2—dc21 CIP

ISBN 978-0-521-63197-6 hardback


ISBN 978-0-521-08786-5 paperback
Contents

List of Contributors page xi

Preface xiii

1 Introductory Information Theory and the Brain 1


ROLAND BADDELEY
1.1 Introduction 1
1.2 What Is Information Theory? 1
1.3 Why Is This Interesting? 4
1.4 Practical Use of Information Theory 5
1.5 Maximising Information Transmission 13
1.6 Potential Problems and Pitfalls 17
1.7 Conclusion 19

Part One: Biological Networks 21

2 Problems and Solutions in Early Visual Processing 25


BRIAN G. BURTON
2.1 Introduction 25
2.2 Adaptations of the Insect Retina 26
2.3 The Nature of the Retinal Image 30
2.4 Theories for the RFs of Retinal Cells 31
2.5 The Importance of Phase and the Argument for Sparse,
Distributed Coding 36
2.6 Discussion 38

3 Coding Efficiency and the Metabolic Cost of Sensory and


Neural Information 41
SIMON B. LAUGHLIN, JOHN C. ANDERSON, DAVID O'CARROLL
AND ROB DE RUYTER VAN STEVENINCK
3.1 Introduction 41
vi Contents

3.2 Why Code Efficiently? 42


3.3 Estimating the Metabolic Cost of Transmitting Information 45
3.4 Transmission Rates and Bit Costs in Different Neural
Components of the Blowfly Retina 48
3.5 The Energetic Cost of Neural Information is Substantial 49
3.6 The Costs of Synaptic Transfer 50
3.7 Bit Costs Scale with Channel Capacity - Single Synapses
Are Cheaper 52
3.8 Graded Potentials Versus Action Potentials 53
3.9 Costs, Molecular Mechanisms, Cellular Systems and
Neural Codes 54
3.10 Investment in Coding Scales with Utility 57
3.11 Phototransduction and the Cost of Seeing 58
3.12 Investment in Vision 59
3.13 Energetics - a Unifying Principle? 60

4 Coding Third-Order Image Structure 62


MITCHELL THOMPSON
4.1 Introduction 62
4.2 Higher-Order Statistics 64
4.3 Data Acquisition 65
4.4 Computing the SCF and Power Spectrum 66
4.5 Computing the TCF and Bispectrum 68
4.6 Spectral Measures and Moments 70
4.7 Channels and Correlations 72
4.8 Conclusions 77

Part Two: Information Theory and Artificial Networks 79

5 Experiments with Low-Entropy Neural Networks 84


GEORGE HARPUR AND RICHARD PRAGER
5.1 Introduction 84
5.2 Entropy in an Information-Processing System 84
5.3 An Unsupervised Neural Network Architecture 86
5.4 Constraints 88
5.5 Linear ICA 93
5.6 Image Coding 95
5.7 Speech Coding 97
5.8 Conclusions 100

6 The Emergence of Dominance Stripes and Orientation Maps


in a Network of Firing Neurons 101
STEPHEN P. LUTTRELL
6.1 Introduction 101
Contents Vll

6.2 Theory 102


6.3 Dominance Stripes and Orientation Maps 104
6.4 Simulations 109
6.5 Conclusions 118
Appendix 119

7 Dynamic Changes in Receptive Fields Induced by Cortical


Reorganization 122
GERMAN MATO AND NESTOR PARGA
7.1 Introduction 122
7.2 The Model 124
7.3 Discussion of the Model 127
7.4 Results 130
7.5 Conclusions 137

8 Time to Learn About Objects 139


GUY WALLIS
8.1 Introduction 139
8.2 Neurophysiology 142
8.3 A Neural Network Model 149
8.4 Simulating Fractal Image Learning 153
8.5 Psychophysical Experiments 156
8.6 Discussion 162

9 Principles of Cortical Processing Applied to and Motivated by


Artificial Object Recognition 164
NORBERT KRUGER, MICHAEL POTZSCH AND
GABRIELE PETERS
9.1 Introduction 164
9.2 Object Recognition with Banana Wavelets 166
9.3 Analogies to Visual Processing and Their Functional
Meaning 171
9.4 Conclusion and Outlook 178

10 Performance Measurement Based on Usable Information 180


MARTIN ELLIFFE
10.1 Introduction 181
10.2 Information Theory: Simplistic Application 186
10.3 Information Theory: Binning Strategies 187
10.4 Usable Information: Refinement 191
10.5 Result Comparison 194
10.6 Conclusion 198
viii Contents

Part Three: Information Theory and Psychology 201


11 Modelling Clarity Change in Spontaneous Speech 204
MATTHEW AYLETT
11.1 Introduction 204
11.2 Modelling Clarity Variation 206
11.3 The Model in Detail 207
11.4 Using the Model to Calculate Clarity 213
11.5 Evaluating the Model 215
11.6 Summary of Results 218
11.7 Discussion 220
12 Free Gifts from Connectionist Modelling 221
JOHN A. BULLINARIA
12.1 Introduction 221
12.2 Learning and Developmental Bursts 222
12.3 Regularity, Frequency and Consistency Effects 223
12.4 Modelling Reaction Times 227
12.5 Speed-Accuracy Trade-offs 231
12.6 Reaction Time Priming 232
12.7 Cohort and Left-Right Seriality Effects 234
12.8 Lesion Studies 235
12.9 Discussion and Conclusions 239
13 Information and Resource Allocation 241
JANNE SINKKONEN
13.1 Introduction 241
13.2 Law for Temporal Resource Allocation 242
13.3 Statistical Information and Its Relationships to Resource
Allocation 246
13.4 Utility and Resource Sharing 248
13.5 Biological Validity of the Resource Concept 248
13.6 An MMR Study 249
13.7 Discussion 251
Part Four: Formal Analysis 255
14 Quantitative Analysis of a Schaffer Collateral Model 257
SIMON SCHULTZ, STEFANO PANZERI, EDMUND ROLLS
AND ALESSANDRO TREVES
14.1 Introduction 257
14.2 A Model of the Schaffer Collaterals 259
14.3 Technical Comments 262
Contents ix

14.4 How Graded is Information Representation on the


Schaffer Collaterals? 264
14.5 Non-uniform Convergence 267
14.6 Discussion and Summary 268
Appendix A. Expression from the Replica Evaluation 270
Appendix B. Parameter Values 272

15 A Quantitative Model of Information Processing in CA1 273


CARLO FULVI MARI, STEFANO PANZERI, EDMUND
ROLLS AND ALESSANDRO TREVES
15.1 Introduction 273
15.2 Hippocampal Circuitry 274
15.3 The Model 276
15.4 Statistical-Informational Analysis 280
15.5 Results 281
15.6 Discussion 283
Appendix: Results of the Analytical Evaluation 283

16 Stochastic Resonance and Bursting in a Binary-Threshold


Neuron with Intrinsic Noise 290
PAUL C. BRESSLOFF AND PETER ROPER
16.1 Introduction 290
16.2 The One-Vesicle Model 293
16.3 Neuronal Dynamics 294
16.4 Periodic Modulation and Response 300
16.5 Conclusions 301
Appendix A: The Continuous-Time CK Equation 303
Appendix B: Derivation of the Critical Temperature 303

17 Information and Density and Cortical Magnification Factors 305


M. D. PLUMBLEY
17.1 Introduction 305
17.2 Artificial Neural Feature Maps 306
17.3 Information Theory and Information Density 308
17.4 Properties of Information Density and Information
Distribution 309
17.5 Symmetrical Conditional Entropy 311
17.6 Example: Two Components 312
17.7 Alternative Measures 312
17.8 Continuous Domain 314
x Contents

17.9 Continuous Example: Gaussian Random Function 314


17.10 Discussion 316
17.11 Conclusions 316
Bibliography 318
Index 341
List of Contributors

John C. Anderson, Department of Zoology, University of Cambridge,


Downing Street, Cambridge CB2 3EJ, United Kingdom
Matthew Aylett, Human Communications Resource Center, University of
Edinburgh, 2 Buccleuch Place, Edinburgh EH8 PLW, United Kingdom
Roland Baddeley, Laboratory of Experimental Psychology, Sussex
University, Brighton, BN1 9QG, United Kingdom
Paul C. Bressloff, Nonlinear and Complex Systems Group, Department of
Mathematical Sciences, Loughborough University, Leics, LEU 3TU,
United Kingdom
John A. Bullinaria, Centre for Speech and Language, Department of
Psychology, Birkbeck College, Malet Street, London WC1E 7HX,
United Kingdom
Brian G. Burton, Department of Zoology, University of Cambridge,
Downing Street, Cambridge CB2 3EJ, United Kingdom
Rob de Ruyter van Steveninck, NEC Research Institute, 4 Independence Way,
Princeton NJ 08540, USA
Martin Elliffe, Department of Experimental Psychology, University of
Oxford, South Parks Road, Oxford OX1 3UD, United Kingdom
Carlo Fulvi Mari, Department of Cognitive Neuroscience, SISSA, Via Beirut
2-4, 34013 Trieste, Italy
George Harpur, Department of Engineering, Cambridge University,
Trumpington Street, Cambridge CB2 1PZ, United Kingdom
Norbert Kruger, Institut fur Neuroinformatik, Ruhr, Universitat Bochum,
44801 Bochum, Universitaatsstrasse 150, Bochum ND 03/71, Germany
Simon B. Laughlin, Department of Zoology, University of Cambridge,
Downing Street, Cambridge CB2 3EJ, United Kingdom

XI
xii Contributors

Stephen P. Luttrell, Defence Research Agency, St. Andrews Road, Malvern,


Worcs, WR14 3PS, United Kingdom
German Mato, Departamento de Fisica Teorica C-XI, Universidad
Autonoma de Madrid, 28049 Madrid, Spain
David O'Carroll, NEC Research Institute, 4 Independence Way, Princeton,
NJ 08540, USA
Stefano Panzeri, University of Oxford, Department of Experimental
Psychology, South Parks Road, Oxford OX1 3UD, United Kingdom
Nestor Parga, Departamento de Fisica Teoica C-XI, Universidad Autonoma
de Madrid, 28049 Madrid, Spain
Gabriele Peters, Institut fur Neuroinformatik, Ruhr, Universitat Bochum,
44801 Bochum, Universitaatsstrasse 150, Bochum, ND 03/71, Germany
M. D. Plumbley, Division of Engineering, King's College London, Strand,
London WC2R 2LS, United Kingdom
Michael Potzsch, Institut fur Neuroinformatik, Ruhr, Universitat Bochum,
44801 Bochum, Universitaatsstrasse 150, Bochum, ND 03/71, Germany
Richard Prager, Department of Engineering, Cambridge University,
Trumpington Street, Cambridge CB2 1PZ, United Kingdom
Edmund Rolls, University of Oxford, Department of Experimental
Psychology, South Parks Road, Oxford OX1 3UD, United Kingdom
Peter Roper, Nonlinear and Complex Systems Group, Department of
Mathematical Sciences, Loughborough University, Leics, LEU 3TU,
United Kingdom
Simon Schultz, University of Oxford, Department of Experimental
Psychology, South Parks Road, Oxford OX1 3UD, United Kingdom
Janne Sinkkonen, Cognitive Brain Research Unit, Department of
Psychology, University of Helsinki, Finland
Mitchell Thompson, Vision Sciences, University of Aston, Aston Triangle,
Birmingham B4 7ET, United Kingdom
Alessandro Treves, Department of Cognitive Neuroscience, SISSA, Via
Beirut 2-4, 34013 Trieste, Italy
Guy Wallis, MPI fuer biologische Kybernetik, Spemannstr. 38, Tuebingen
72076, Germany
Preface

This book is the result of a dilemma I had in 1996: I wanted to attend a


conference on information theory, I fancied learning to surf, and my position
meant that it was very difficult to obtain travel funds. To solve all of these
problems in one fell swoop, I decided to organise a cheap conference, in a
place anyone who was interested could surf, and to use as a justification a
conference on information theory. All I can say is that I thoroughly recom-
mend doing this. Organising the conference was a doddle (a couple of web
pages, and a couple of phone calls to the hotel in Newquay). The location
was superb. A grand hotel perched on a headland looking out to sea (and the
film location of that well-known film Witches). All that and not 100 yards
from the most famous surfing beach in Britain. The conference was friendly,
and the talks were really very good. The whole experience was only marred
by the fact that Jack Scannell was out skilfully surfing the offshore breakers,
whilst I was still wobbling on the inshore surf.
Before the conference I had absolutely no intention of producing a book,
but after going to the conference, getting assurances from the other editors
that they would help, and realising that in fact the talks would make a book
that I would quite like to read, I plunged into it. Unlike the actual conference
organisation, preparing the book has been a lot of work, but I hope the result
is of interest to at least a few people, and that the people who submitted their
chapter promptly are not too annoyed at the length of time the whole thing
took to produce.

Roland Baddeley

xin
Introductory Information Theory and the Brain
ROLAND BADDELEY

1.1 Introduction
Learning and using a new technique always takes time. Even if the question
initially seems very straightforward, inevitably technicalities rudely intrude.
Therefore before a researcher decides to use the methods information theory
provides, it is worth finding out if these set of tools are appropriate for the
task in hand.
In this chapter I will therefore provide only a few important formulae and
no rigorous mathematical proofs (Cover and Thomas (1991) is excellent in
this respect). Neither will I provide simple "how to" recipes (for the psychol-
ogist, even after nearly 40 years, Attneave (1959) is still a good introduction).
Instead, it is hoped to provide a non-mathematical introduction to the basic
concepts and, using examples from the literature, show the kind of questions
information theory can be used to address. If, after reading this and the
following chapters, the reader decides that the methods are inappropriate,
he will have saved time. If, on the other hand, the methods seem potentially
useful, it is hoped that this chapter provides a simplistic overview that will
alleviate the growing pains.

1.2 What Is Information Theory?


Information theory was invented by Claude Shannon and introduced in his
classic book The Mathematical Theory of Communication (Shannon and
Weaver, 1949). What then is information theory? To quote three previous
authors in historical order:

Information Theory and the Brain, edited by Roland Baddeley, Peter Hancock, and Peter Foldiak.
Copyright © 1999 Cambridge University Press. All rights reserved.
2 Roland Baddeley

The "amount of information" is exactly the same concept that we talked about for
years under the name "variance". [Miller, 1956]
The technical meaning of "information" is not radically different from the everyday
meaning; it is merely more precise. [Attneave, 1959]
The mutual information I{X\ Y) is the relative entropy between the joint distribution
and the product distribution p(x)p(y), i.e.,
„ P(x, y)

[Cover and Thomas, 1991]


Information theory is about measuring things, in particular, how much
measuring one thing tells us about another thing that we did not know
before. The approach information theory makes to measuring information
is to first define a measure of how uncertain we are of the state of the world.
We then measure how less uncertain we are of the state of the world after we
have made some measurement (e.g. observing the output of a neuron; asking
a question; listening to someone speak). The difference between our uncer-
tainty before and the uncertainty after making a measurement we then define
as the amount of information that measurement gives us. As can be seen, this
approach depends critically on our approach to measuring uncertainty, and
for this information theory uses entropy. To make our description more
concrete, the concepts of entropy, and later information, will be illustrated
using a rather artificial scenario: one person has randomly flipped to a page
of this book, and another has to use yes/no questions (I said it was artificial)
to work out some aspect of the page in question (for instance the page
number or the author of the chapter).

Entropy
The first important aspect to quantify is how ''uncertain" we are about the
input we have before we measure it. There is much less to communicate
about the page numbers in a two-page pamphlet than in the Encyclopedia
Britannica and, as the measure of this initial uncertainty, entropy measures
how many yes/no questions would be required on average to guess the state
of the world. Given that all pages are equally likely, the number of yes/no
questions required to guess the page flipped to in a two-page pamphlet would
be 1, and hence this would have an entropy (uncertainty) of 1 bit. For a 1024
(210) page book, 10 yes/no questions are required on average and the entropy
would be 10 bits. For a one-page book, you would not even need to ask a
question, so it would have 0 bits of entropy. As well as the number of
questions required to guess a signal, the entropy also measures the smallest
possible size that the information could be compressed to.
1.2. What Is Information Theory? 3

The simplest situation and one encountered in many experiments is where


all possible states of the world are equally likely (in our case, the "page
flipper" flips to all pages with equal probability). In this case no compression
is possible and the entropy (H) is equal to:

H = \og2N (1.1)

where N is the number of possible states of the world, and log2 means that
the logarithm is to the base 2.1 Simply put, the more pages in a book, the
more yes/no questions required to identify the page and the higher the
entropy. But rather than work in a measuring system based on "number of
pages", we work with logarithms. The reason for this is simply that in many
cases we will be dealing with multiple events. If the "page flipper" flips twice,
the number of possible combinations of word pages would be N x TV (the
numbers of states multiply). If instead we use logarithms, then the entropy of
two-page flips will simply be the sum of the individual entropies (if the
number of states multiply, their logarithms add). Addition is simpler than
multiplication so by working with logs, we make subsequent calculations
much simpler (we also make the numbers much more manageable; an
entropy of 25 bits is more memorable than a system of 33,554,432 states).
When all states of the world are not equally likely, then compression is
possible and fewer questions need (on average) to be asked to identify an
input. People often are biased page flippers, flipping more often to the middle
pages. A clever compression algorithm, or a wise asker of questions can use
this information to take, on average, fewer questions to identify the given
page. One of the main results of information theory is that given knowledge
of the probability of all events, the minimum number of questions on average
required to identify a given event (and smallest that the thing can be com-
pressed) is given by:

£> ^ (1.2)

where p(x) is the probability of event x. If all events are equally likely, this
reduces to equation 1.1. In all cases the value of equation 1.2 will always be
equal to (if all states are equally likely), or less than (if the probabilities are
not equal) the entropy as calculated using equation 1.1. This leads us to call a
distribution where all states are equally likely a maximum entropy distribu-
tion, a property we will come back to later in Section 1.5.
1
Logarithms to the base 2 are often used since this makes the "number of yes/no" interpretation
possible. Sometimes, for mathematical convenience, natural logarithms are used and the resulting
measurements are then expressed in nats. The conversion is simple with 1 bit = log(e)/ log(2)
nats % 0.69314718 nats.
4 Roland Baddeley

Information
So entropy is intuitively a measure of (the logarithm of) the number of states
the world could be in. If, after measuring the world, this uncertainty is
decreased (it can never be increased), then the amount of decrease tells us
how much we have learned. Therefore, the information is defined as the
difference between the uncertainty before and after making a measurement.
Using the probability theory notation of P(X\ Y) to indicate the probability
of X given knowledge of Y (conditional on), the mutual information
(I(X; Y)) between a measurement X and the input Y can be defined as:
I{X\ Y) = H(X) - H(X\ Y) (1.3)
With a bit of mathematical manipulation, we can also get the following
definitions where H(X, Y) is the entropy of all combination of inputs and
outputs (the joint distribution):
\H(
H(X)~ H(X\Y) (a)
I(X; Y)=\H(Y)-
H(Y)- H(Y\X) H(Y\X) (b)
(b) (1.4)
j j / ~vr\ i
11 ( J\. ) ~\~ H(Y)-H(X, Y) (c)

1.3 Why Is This Interesting?


In the previous section, we have informally defined information but left
unanswered the question of why information theory would be of any use
in studying brain function. A number of reasons have inspired its use includ-
ing:

Information Theory Can Be Used as a Statistical Tool. There are a number of


cases where information-theoretic tools are useful simply for the statistical
description or modelling of data. As a simple measure of association of two
variables, the mutual information or a near relative (Good, 1961; Press et al.,
1992) can be applied to both categorical and continuous signals and produces
a number that is on the same scale for both. While correlation is useful for
continuous variables (and if the variables are Gaussian, will produce very
similar results), it is not directly applicable to categorical data. While x2 is
applicable to categorical data, all continuous data needs to be binned. In
these cases, information theory provides a well founded and general measure
of relatedness.
The use of information theory in statistics also provides a basis for the
tools of (non-linear) regression and prediction. Traditionally regression
methods minimise the sum-squared error. If instead we minimise the
(cross) entropy, this is both general (it can be applied to both categorical
and continuous outputs), and if used as an objective for neural networks,
1.4. Practical Use of Information Theory 5

maximising information (or minimising some related term) can result in


neural network learning algorithms that are much simpler; theoretically
more elegant; and in many cases appear to perform better (Ackley et al.,
1985; Bishop, 1995).

Analysis of Informational Bottlenecks. While many problems are, for theore-


tical and practical reasons, not amenable to analysis using information the-
ory, there are cases where a lot of information has to be communicated but
the nature of the communication itself places strong constraints on transmis-
sion rates. The time-varying membrane potential (a rich informational
source) has to be communicated using only a stream of spikes. A similar
argument applies to synapses, and to retinal ganglion cells communicating
the incoming light pattern to the cortex and beyond. The rate of speech
production places a strong limit on the rate of communication between
two people who at least sometimes think faster than they can speak. Even
though a system may not be best thought of as simply a communication
system, and all information transmitted may not be used, calculating trans-
mitted information places constraints on the relationship between two sys-
tems. Looking at models that maximise information transmission may
provide insight into the operation of such systems (Atick, 1992a; Linsker,
1992; Baddeley et al., 1997).

1.4 Practical Use of Information Theory


The previous section briefly outlined why, in principle, information theory
might be useful. That still leaves the very important practical question of how
one could measure it. Even in the original Shannon and Weaver book
(Shannon and Weaver, 1949), a number of methods were used. To give a
feel for how mutual information and entropy can be estimated, this section
will describe a number of different methods that have been applied to pro-
blems in brain function.

Directly Measuring Discrete Probability Distributions


The most direct and simply understood method of measuring entropy and
mutual information is to directly estimate the appropriate probability dis-
tributions (P(input), /^output) and P(input and output)). This is concep-
tually straightforward and, given enough data, a reasonable method.
One example of an application where this method is applicable was
inspired by the observation that people are very bad at random number
generation. People try and make sequences "more random" than real ran-
dom numbers by avoiding repeats of the same digit; they also, under time
pressure, repeat sequences. This ability to generate random sequences has
6 Roland Baddeley

therefore been used as a measure of cognitive load (Figure 1.1), where


entropy has been used as the measure of randomness (Baddeley, 1956).
The simplest estimators were based on simple letter probabilities and in
this case it is very possible to directly estimate the distribution (we only
have 26 probabilities to estimate). Unfortunately, methods based on simple
probability estimation will prove unreliable when used to estimate, say, letter
pair probabilities (a statistic that will be sensitive to some order information).
In this case there are 676 (262) probabilities to be estimated, and subjects'
patience would probably be exhausted before enough data had been collected
to reliably estimate them. Note that even when estimating 26 probabilities,
entropy will be systematically underestimated (and information overesti-
mated) if we only have small amounts of data. Fortunately, simple methods
to remove such an "under-sampling bias" have been known for a long time
(Miller, 1955).
Of great interest in the 1960s was the measuring of the "capacity" of
various senses. The procedure varied in detail, but was essentially the
same: the subjects were asked to label stimuli (say, tones of different frequen-
cies) with different numbers. The mutual information between the stimuli
and the numbers assigned by the subjects was then calculated with different
numbers of stimuli presented (see Figure 1.2). Given only two stimuli, a
subject would almost never make a mistaken identification, but as the num-
ber of stimuli to be labelled increased, subjects started to make mistakes. By
estimating where the function relating mutual information to the number of

0.08

ahdywshcf
yfktwvnljk H(X)=SP(X)logl/P(X)
epuucdqld
fpyubferki

Random letter Estimate distribution


Sequence of letters Calculate Entropy

Figure 1.1. The most straightforward method to calculate entropy or mutual informa-
tion is direct estimation of the probability distributions (after Baddeley, 1956). One case
where this is appropriate is in using the entropy of subjects' random number generation
ability as a measure of cognitive load. The subject is asked to generate random digit
sequences in time with a metronome, either as the only task, or while simultaneously
performing a task such as card sorting. Depending on the difficulty of the other task and
the speed of generation, the "randomness" of the digits will decrease. The simplest way
to estimate entropy is to estimate the probability of different letters. Using this measure
of entropy, redundancy (entropy/maximum entropy) decreases linearly with generation
time, and also with the difficulty of the other task. This has subsequently proved a very
effective measure of cognitive load.
1.4. Practical Use of Information Theory

D
"H la
100hz 1000hz 10,000hz g
3 4 5 6 7 8 |

B U.U.U.t,! 12

I
100hz 1000hz 10,000hz
12345678 |1
(0

1 2 3 4
100hz 1000hz 10,000hz
Input Information
Figure 1.2. Estimating the "channel capacity" for tone discrimination (after Pollack,
1952, 1953). The subject is presented with a number of tones and asked to assign
numeric labels to them. Given only three tones (A), the subject has almost perfect
performance, but as the number of tones increase (B), performance rapidly deteriorates.
This is not primarily an early sensory constraint, as performance is similar when the
tones are tightly grouped (C). One way to analyse such data is to plot the transmitted
information as a function of the number of input stimuli (D). As can be seen, up until
about 2.5 bits, all the available information is transmitted, but when the input informa-
tion is above 2.5 bits, the excess information is lost. This limited capacity has been found
for many tasks and was of great interest in the 1960s.

input categories asymptotes, an estimate of subjects channel capacity can be


made. Surprisingly this number is very small - about 2.5 bits. This capacity
estimate approximately holds for a large number of other judgements: loud-
ness (2.3 bits), tastes (1.9 bits), points on a line (3.25 bits), and this leads to
one of the best titles in psychology - the "seven plus or minus two" of Miller
(1956) refers to this small range (between 2.3 bits (log2 5) and 3.2 bits
(Iog29)).
Again in these tasks, since the number of labels usable by subjects is small,
it is very possible to directly estimate the probability distributions with rea-
sonable amounts of data. If instead subjects were reliably able to label 256
stimuli (8 bits as opposed to 2.5 bits capacity), we would again get into
problems of collecting amounts of data sufficient to specify the distributions,
and methods based on the direct estimation of probability distributions
would require vast amounts of subjects' time.

Continuous Distributions
Given that the data are discrete, and we have enough data, then simply
estimating probability distributions presents few conceptual problems.
Unfortunately if we have continuous variables such as membrane potentials,
or reaction times, then we have a problem. While the entropy of a discrete
probability distribution is finite, the entropy of any continuous variable is
8 Roland Baddeley

infinite. One easy way to see this is that using a single real number between 0
and 1, we could very simply code the entire Encyclopedia Britannica. The first
two digits after the decimal place could represent the first letter; the second
two digits could represent the second letter, and so on. Given no constraint
on accuracy, this means that the entropy of a continuous variable is infinite.
Before giving up hope, it should be remembered that mutual information
as specified by equation 1.4 is the difference between two entropies. It turns
out that as long as there is some noise in the system (H(X\ Y) > 0), then the
difference between these two infinite entropies is finite. This makes the role of
noise vital in any information theory measurement of continuous variables.
One particular case is if both the signal and noise are Gaussian (i.e.
normally) distributed. In this case the mutual information between the signal
(s) and the noise-corrupted version (sn) is simply:

(1.5)

where o2signai is the variance of the signal, and o2noise is the variance of the noise.
This has the expected characteristics: the larger the signal relative to the noise,
the larger the amount of information transmitted; a doubling of the signal will
result in an approximately 1 bit increase in information transmission; and the
information transmitted will be independent of the unit of measurement.
It is important to note that the above expression is only valid when both
the signal and noise are Gaussian. While this is often a reasonable and
testable assumption because of the central limit theorem (basically, the
more things we add, usually the more Gaussian the system becomes), it is
still only an estimate and can underestimate the information (if the signal is
more Gaussian than the noise) or overestimate the information (if the noise is
more Gaussian than the signal).
A second problem concerns correlated signals. Often a signal will have
structure - for instance, it could vary only slowly over time. Alternatively,
we could have multiple measurements. If all these measurements are inde-
pendent, then the situation is simple - the entropies and mutual informations
simply add. If, on the other hand, the variables are correlated across time,
then some method is required to take these correlations into account. In an
extreme case if all the measurements were identical in both signal and noise,
the information from one such measurement would be the same as the com-
bined information from all: it is important to in some way deal with these
effects of correlation.
Perhaps the most common way to deal with this "correlated measure-
ments" problem is to transform the signal to the Fourier domain. This
method is used in a number of papers in this volume and the underlying
logic is described in Figure 1.3.
1.4. Practical Use of Information Theory

C)
10

i5
<
o 20 40 0 1 2 3
Original signal Frequency

20 40

Figure 1.3. Taking into account correlations in data by transforming to a new repre-
sentation. (A) shows a signal varying slowly as a function of time. Because the voltages
at different time steps are correlated, it is not possible to treat each time step as inde-
pendent and work out the information as the sum of the information values at different
time steps. One way to approach this problem is to transform the signal to a new
representation where all components are now uncorrelated. If the signal is Gaussian,
transforming to a Fourier series representation has this property. Here we represent the
original signal (A) as a sum of sines and cosines of different frequencies (B). While the
individual time measurements are correlated, if the signal is Gaussian, the amounts of
each Fourier components (C) will be uncorrelated. Therefore the mutual information
for the whole signal will simply be the sum of the information values for the individual
frequencies (and these can be calculated using equation 1.5).

The Fourier transform method always uses the same representation (in
terms of sines and cosines) independent of the data. In some cases, especially
when we do not have that much data, it may be more useful to choose a
representation which still has the uncorrelated property of the Fourier com-
ponents, but is optimised to represent a particular data set. One plausible
candidate for such a method is principal components analysis. Here a new set
of measurements, based on linear transformation of the original data, is used
to describe the data. The first component is the linear combination of the
original measurements that captures the maximum amount of variance. The
second component is formed by a linear combination of the original mea-
surements that captures as much of the variance as possible while being
orthogonal to the first component (and hence independent of the first com-
ponent if the signal is Gaussian). Further components can be constructed in a
similar manner. The main advantage over a Fourier-based representation is
10 Roland Baddelev

that more of the signal can be described using fewer descriptors and thus less
data is required to estimate the characteristics of the signal and noise.
Methods based on principal-component-based representations of spikes
trains have been applied to calculating the information transmitted by cor-
tical neurons (Richmond and Optican, 1990).
All the above methods rely on an assumption of Gaussian nature of the
signal, and if this is not true and there exist non-linear relationships between
the inputs and outputs, methods based on Fourier analysis or principal
components analysis can only give rather inaccurate estimates. One method
that can be applied in this case is to use a non-linear compression method to
generate a compressed representation before performing the information
estimation (see Figure 1.4).

Output

n units

c2 units
= Linear unit

h units

Non linear unit


cl units

Input n units

Input

Figure 1.4. Using non-linear compression techniques for generating compact represen-
tations of data. Linear principal components analysis can be performed using the neural
network shown in (A) where a copy of the input is used as the target output. On
convergence, the weights from the n input units to the h coding units will span the
same space as the first h principal components and, given that the input is Gaussian,
the coding units will be a good representation of the signal. If, on the other hand, there
is non-Gaussian non-linear structure in the signals, this approach may not be optimal.
One possible approach to dealing with such non-linearity is to use a compression-based
algorithm to create a non-linear compressed representation of the signals. This can be
done using the non-linear generalisation of the simple network to allow non-linearities
in processing (shown in (B)). Again the network is trained to recreate its input from its
output, while transmitting the information through a bottleneck, but this time the data
is allowed to be transformed using an arbitrary non-linearity before coding. If there are
significant non-linearities in the data, the representation provided by the bottleneck
units may provide a better representation of the input than a principal-components-
based representation. (After Fotheringhame and Baddeley, 1997.)
1.4. Practical Use of Information Theory 11

Estimation Using an "Intelligent" Predictor


Though the direct measurement of the probability distributions is concep-
tually the simplest method, often the dimensionality of the problem renders
this implausible. For instance, if interested in the entropy of English, one
could get better and better approximations by estimating the probability
distribution of letters, letter pairs, letter triplets, and so on. Even for letter
triplets, there is the probability of 273 = 19,683 possible three-letter combi-
nations to estimate: the amount of data required to do this at all accurately is
prohibitive. This is made worse because we know that many of the regula-
rities of English would only be revealed over groups of more than three
letters. One potential solution to this problem is available if we have access
to a good model of the language or predictor. For English, one source of a
predictor of English is a native speaker. Shannon (see Table 1.1) used this to
devise an ingenious method for estimating the entropy of English as
described in Table 1.1.
Even when we don't have access to such a good predictor as an English
language speaker, it often simpler to construct (or train) a predictor rather
than to estimate a large number of probabilities. This approach to estimating
mutual information has been applied (Heller et al., 1995) to estimation of the
visual information transmission properties of neurons in both the primary
visual cortex (also called VI; area 17; or striate cortex) and the inferior
temporal cortex (see Figure 1.5). Essentially the spikes generated by neurons
when presented various stimuli were coded in a number of different ways (the

Table 1.1. Estimating the entropy of English using an intelligent predictor (after Shannon,
1951).

T H E R E I S N o R E V E R S E
1 1 1 5 1 1 2 1 1 2 1 1 15 1 17 1 1 1 2

O N A M o T O R C Y C L E
1 3 2 1 2 2 7 1 1 1 1 4 1 1 1 1

Above is a short passage of text. Underneath each letter is the number of guesses required by a
person to guess that letter based only on knowledge of the previous letters. If the letters were
completely random (maximum entropy and no redundancy), the best predictor would take on
average 27/2 guesses (26 letters and a space) for every letter. If, on the other hand, there is complete
predictability, then a predictor would only require only one guess per letter. English is between
these two extremes and, using this method, Shannon estimated an entropy per letter of between 1.6
and 0.6 bits per letter. This contrasts with log 27 = 4.76 bits if every letter was equally likely and
independent. Technical details can be found in Shannon (1951) and Attneave (1959).
12 Roland Baddeley

A)
B) C)
Spike train
L
Neuron — —

an
D)
Neural
Prediction of input Network
E)
Walsh Patterns
Figure 1.5. Estimating neuronal information transfer rate using a neural network based
predictor (after Heller et al., 1995). A collection of 32 4x4 Walsh patterns (and their
contrast reversed versions) (A) were presented to awake Rhesus Macaque monkeys, and
the spike trains generated by neurons in VI and IT recorded (B and C). Using differ-
ently coded versions of these spike trains as input, a neural network (D) was trained
using the back-propagation algorithm to predict which Walsh pattern was presented.
Intuitively, if the spike train contains a lot of information about the input, then an
accurate prediction is possible, while if there is very little information then the spike
train will not allow accurate prediction of the input. Notice that (1) the calculated
information will be very dependent on the choice (and number of) of stimuli, and (2)
even though we are using a predictor, implicitly we are still estimating probability
distributions and hence we require large amounts of data to accurately estimate the
information. Using this method, it was claimed that the neurons only transmitted small
amounts of information (~ 0.5 bits), and that this information was contained not in the
exact timing of the spikes, but in a local "rate".

average firing rate, vectors representing the presence and absence of spikes,
various low-pass-filtered versions of the spike train, etc). These codified spike
trains were used to train a neural network to predict the visual stimulus that
was presented when the neurons generated these spikes. The accuracy of
these predictions, given some assumptions, can again be used to estimate
the mutual information between the visual input and the differently coded
spike trains estimated. For these neurons and stimuli, the information trans-
mission is relatively small (^ 0.5 bits s"1).

Estimation Using Compression


One last method for estimating entropy is based on Shannon's coding theo-
rem, which states that the smallest size that any compression algorithm can
compress a sequence is equal to its entropy. Therefore, by invoking a number
of compression algorithms on the sample sequence of interest, the smallest
compressed representation can be taken as an upper bound on that sequen-
ce's entropy. Methods based on this intuition have been more common in
genetics, where they have been used to ask such questions as does "coding"
DNA have higher or lower entropy than "non-coding" DNA (Farach et al.,
1995). (The requirements of quick convergence and reasonable computation
1.5. Maximising Information Transmission 13

A) B) Basque
"I hereby undertake not Manx (Celtic)
to remove from the
library, or to mark, deface, I— English
or injure in anyway, any Estimate entropies and Dutch
volume, document, or — cross entropies using
other object belonging compression algorithm
techniques. German
to it or in its custody; Italian
not to bring into the Cluster using
Library or kindle " cross entropies
as distances Spanish

Figure 1.6. Estimating entropies and cross entropies using compression-based techni-
ques. The declaration of the Bodleian Library (Oxford) has been translated into more
than 50 languages (A). The entropy of these letter sequences can be estimated using the
size of a compressed version of the statement. If the code book derived by the algorithm
for one language is used to code another language, the size of the code book will reflect
the cross entropy (B). Hierarchical minimum distance cluster analysis, using these cross
entropies as a distances, can then be applied to this data (a small subset of the resulting
tree is shown (C)). This method can produce an automatic taxonomy of languages, and
has been shown to correspond very closely to those derived using more traditional
linguistic analysis (Juola, P., personal communication).

time mean that only the earliest algorithms simply performed compression,
but the concept behind later algorithms is essentially the same.)
More recently, this compression approach to entropy estimation has been
applied to automatically calculating linguistic taxonomies (Figure 1.6). The
entropy was calculated using a modified compression algorithm based on
Farach et al. (1995). Cross entropy was estimated using the compressed
length when the code book derived for one language was used to compress
another. Though methods based on compression have not been commonly
used in the theoretical neuroscience community (but see Redlich, 1993), they
provide at least interesting possibilities.

1.5 Maximising Information Transmission


The previous section was concerned with simply measuring entropy and
information. One other proposal that has received a lot of attention recently
is the proposition that some cortical systems can be understood in terms of
them maximising information transmission (Barlow, 1989). There are a num-
ber of reasons supporting such an information maximisation framework:

Maximising the Richness of a Representation. The richness and flexibility of


the responses to a behaviourally relevant input will be limited by the number
of different states that can be discriminated. As an extreme case, a protozoa
that can only discriminate between bright and dark will have less flexible
navigating behaviour than an insect (or human) that has an accurate repre-
14 Roland Baddeley

sentation of the grey-level structure of the visual world. Therefore, heuristi-


cally, evolution will favour representations that maximise information trans-
mission, because these will maximise the number of discriminable states of
the world.

As a Heuristic to Identify Underlying Causes in the Input. A second reason is


that maximising information transmission is a reasonable principle for gen-
erating representations of the world. The pressure to compress the world
often forces a new representation in terms of the actual "causes" of the
images (Olshausen and Field, 1996a). A representation of the world in
terms of edges (the result of a number of information maximisation algo-
rithms when applied to natural images, see for instance Chapter 5), may well
be easier to work with than a much larger and redundant representation in
terms of the raw intensities across the image.

To Allow Economies to be Made in Space, Weight and Energy. By having a


representation that is efficient at transmitting information, it may be possible
to economise on some other of the system design. As described in Chapter 3,
an insect eye that transmits information efficiently can be smaller and lighter,
and can consume less energy (both when operating and when being trans-
ported). Such "energetic" arguments can also be applied to, say, the trans-
mission of information from the eye to the brain, where an inefficient
representation would require far more retinal ganglion cells, would take
significantly more space in the brain, and use a significantly larger amount
of energy.

As a Reasonable Formalism for Describing Models. The last reason is more


pragmatic and empirica1 The quantities required to work out how efficient a
representation is, and the nature of a representation that maximises informa-
tion transmission, are measurable and mathematically formalisable. When
this is done, and the "optimal" representations compared to the physiologi-
cal and psychophysical measurements, the correspondence between these
optimal representations and those observed empirically is often very close.
This means that even if the information maximisation approach is only
heuristic, it is still useful in summarising data.
How then can one maximise information transmission? Most approaches
can be understood in terms of a combination of three different strategies:
• Maximise the number of effective measurements by making sure that each
measurement tells us about a different thing.
• Maximise the signal whilst minimising the noise.
• Subject to the external constraints placed on the system, maximise the
efficiency of the questions asked.
1.5. Maximising Information Transmission 15

Maximising the Effective Number of Questions


The simplest method of increasing information transmission is to increase the
number of measurements made: someone asking 50 questions concerning the
page flipped to in a book has more chance of identifying it than someone who
asks one question. Again an eye connected by a large number of retinal
ganglion cells to later areas should send more information than the single
ganglion cell connected to an eyecup of a flatworm.
This insight is simple enough not to rely on information theory, but the
raw number of measurements is not always equivalent to the "effective"
number of measurements. If given two questions to identify a page in the
book - if the first one was "Is it between pages 1 and 10?" then a second of
"Is it between 2 and 11?" would provide remarkably little extra information.
In particular, given no noise, the maximum amount of information can be
transmitted if all measurements are independent of each other.
A similar case occurs in the transmission of information about light enter-
ing the eye. The outputs of two adjacent photoreceptors will often be mea-
suring light coming from the same object and therefore send very correlated
signals. Transmitting information to later stages simply as the output of
photoreceptors would therefore be very inefficient, since we would be sending
the same information multiple times. One simple proposal for transforming
the raw retinal input before transmitting it to later stages is shown in Figure
1.7, and has proved successful in describing a number of facts about early
visual processing (see Chapter 3).

Figure 1.7. Maximising information transmission by minimising redundancy. In most


images, (A) the intensity arriving at two locations close together in the visual field will
often be very similar, since it will often originate from the same object. Sending infor-
mation in this form is therefore very inefficient. One way to improve the efficiency of
transmission is not to send the pixel intensities, but the difference between the intensity
at a location and that predicted from the nearby photoreceptors. This can be achieved
by using a centre surround receptive field as shown in (B). If we transmit this new
representation (C), far less channel capacity is used to send the same amount of infor-
mation. Such an approach seems to give a good account of the early spatial filtering
properties of insect (Srinivasan et al., 1982; van Hateren, 1992b) and human (Atick,
1992b; van Hateren, 1993) visual systems.
16 Roland Baddeley

Guarding Against Noise


The above "independent measurement" argument is only true to a point.
Given that the person you ask the question of speaks clearly, then ensuring
that each measurement tells you about a different thing is a reasonable
strategy. Unfortunately, if the person mumbles, has a very strong accent,
or has possibly been drinking too much, we could potentially miss the answer
to our questions. If this happens, then because each question is unrelated to
the others, an incorrect answer cannot be detected by its relationship to other
questions, nor can they be used to correct the mistake. Therefore, in the
presence of noise, some redundancy can be helpful to (1) detect corrupted
information, and (2) help correct any errors. As an example, many non-
native English speakers have great difficulty in hearing the difference between
the numbers 17 and 70. In such a case it actually might be worth asking "is
the page above seventy" as well as "is it above fifty" since this would provide
some guard against confusion of the word seventy. This may also explain the
charming English habit of shouting loudly and slowly to foreigners.
The appropriate amount of redundancy will depend on the amount of
noise: the amount of redundancy should be high when there is a lot of
noise, and low when there is little. Unfortunately this can be difficult to
handle when the amount of noise is different at different times, as in the
retina. Under a bright illuminant, the variations in image intensity (the sig-
nal) will be much larger than the variations due to the random nature of
photon arrival or the unreliability of synapses (the noise). On the other hand,
for very low light conditions this is no longer the case, with the variations due
to the noise now relatively large. If the system was to operate optimally, the
amount of redundancy in the representation should change at different illu-
mination levels. In the primate visual system, the spatial frequency filtering
properties of the "retinal filters" change as a function of light level, consistent
with the retina maximising information transmission at different light levels
(Atick, 1992b).

Making Efficient Measurements


The last way to maximise information transmission is to ensure not only that
all measurements measure different things, and noise is dealt with effectively,
but also that the measurements made are as informative as possible, subject
to the constraints imposed by the physies of the system.
For binary yes/no questions, this is relatively straightforward. Consider
again the problem of guessing a page in the Encyclopedia Britannica. Asking
the question "Is it page number 1?" is generally not a good idea - if you
happen to guess correctly then this will provide a great deal of information
(technically known as suprisal), but for the majority of the time you will
1.6. Potential Problems and Pitfalls 17

know very little more. The entropy (and hence the maximum amount of
information transmission) is maximal when the uncertainty is maximal,
and this occurs when both alternatives are equally likely. In this case we
want questions where "yes" is has the same probability as "no". For instance
a question such as "Is it in the first or second half of the book?" will generally
tell you more than "Is it page 2?". The entropy as a function of probability is
shown for a yes/no system (binary channel) in Figure 1.8.
When there are more possible signalling states than true and false, the
constraints become much more important. Figure 1.9 shows three of the
simplest cases of constraints and the nature of the outputs (if we have no
noise) that will maximise information transmission. It is interesting to note
that the spike trains of neurons are exponentially distributed as shown in
Figure 1.9(C), consistent with maximal information transmission subject to
an average firing rate constraint (Baddeley et al., 1997).

1.6 Potential Problems and Pitfalls


The last sections were essentially positive. Unfortunately not all things about
information theory are good:

The Huge Data Requirement. Possibly the greatest problem with information
theory is its requirement for vast amounts of data if the results are to tell us
more about the data than about the assumptions used to calculate its value.
As mentioned in Section 1.4, estimating the probability of every three-letter
combination in English would require sufficient data to estimate 19,683 dif-
ferent probabilities. While this may actually be possible given the large num-
ber of books available electronically, to get a better approximation to
English, (say, eight-letter combinations), the amount of data required

0.5 1
Probability
Figure 1.8. The entropy of a binary random (Bernoulli) variable is a function of its
probability and maximum when its probability is 0.5 (when it has an entropy of 1 bit).
Intuitively, if a measurement is always false (or always true) then we are not uncertain of
its value. If instead it is true as often as not, then the uncertainty, and hence the entropy,
is maximised.
18 Roland Baddeley

0.01

50 100 50 100 50 100


Firing rate Firing rate Firing rate
Figure 1.9. The distribution of neuronal outputs consistent with optimal information
transmission will be determined by the most important constraints operating on that
neuron. First, if a neuron is only constrained by its maximum and minimum output,
then the maximum entropy, and therefore the maximum information that could be
transmitted, will occur when all output states are equally likely (A) (Laughlin, 1981).
Second, a constraint favoured for mathematical convenience is that the power (or
variance) of the output states is constrained. Given this, entropy is maximised for a
Gaussian firing rate distribution (B). Third, if the constraint is on the average firing
rate of a neuron, higher firing rates will be more "costly" than low firing rates, and an
exponential distribution of firing rates would maximise entropy (C). Measurements
from VI and IT cells show that neurons in these areas have exponentially distributed
outputs when presented with natural images (Baddeley et al., 1997), and hence are at
least consistent with maximising information transmission subject to an average rate
constraint.

becomes completely unrealistic. Problems of this form are almost always


present when applying information theory, and often the only way to pro-
ceed is to make assumptions which are possibly unfounded and often difficult
to test. Assuming true independence (very difficult to verify even with large
data sets), and assuming a Gaussian signal and noise can greatly cut down on
the number of measurements required. However, these assumptions often
remain only assumptions, and any interpretations of the data rest strongly
on them.

Information and Useful Information. Information theory again only mea-


sures whether there are variations in the world that can be reliably discri-
minated. It does not tell us if this distinction is of any interest to the
animal. As an example, most information-maximisation-based models of
low-level vision assume that the informativeness of visual information is
simply based on how much it varies. Even at the simplest level, this is
difficult to maintain as variation due to, say, changes in illumination is
often of less interest than variations due to changes in reflectance, while
the variance due to changes in illumination is almost always greater than
that caused by changes in reflectance. While the simple "variation equals
information" may be a useful starting point, after the mathematics starts it
is potentially easy to forget that it is only a first approximation, and one
can be led astray.
1.7. Conclusion 19

Coding and Decoding. A related problem is that information theory tells us if


the information is present, but does not describe whether, given the compu-
tational properties of real neurons, it would be simple for neurons to extract.
Caution should therefore be expressed when saying that information present
in a signal is information available to later neurons.

Does the Receiver Know About the Input? Information theory makes some
strong assumptions about the system. In particular it assumes that the recei-
ver knows everything about the statistics of the input, and that these statistics
do not change over time (that the system is stationary). This assumption of
stationarity is often particularly unrealistic.

1.7 Conclusion
In this chapter it was hoped to convey an intuitive feel for the core concepts
of information theory: entropy and information. These concepts themselves
are straightforward, and a number of ways of applying them to calculate
information transmission in real systems were described. Such examples are
intended to guide the reader towards the domains that in the past have
proved amenable to information theoretic techniques. In particular it is
argued that some aspects of cortical computation can be understood in the
context of maximisation of transmitted information. The following chapters
contain a large number of further examples and, in combination with Cover
and Thomas (1991) and Rieke et al. (1997), it is hoped that the reader will
find this book helpful as a starting point in exploring how information theory
can be applied to new problem domains.
PART ONE

Biological Networks

T HE FIRST PART concentrates on how information theory can give us


insight into low-level vision, an area that has many characteristics that
make it particularly appropriate for the application of such techniques.
Chapter 2, by Burton, is a historical review of the application of information
theory to understanding the retina and early cortical areas. The rather
impressive matches of a number of models to data are described, together
with the different emphases placed by researchers on dealing with noise,
removing correlations, and having representations that are amenable to
later processing.
Information theory only really works if information transmission is max-
imised subject to some constraint. In Chapter 3 Laughlin et al. explore the
explanatory power of considering one very important constraint: the use of
energy. This is conceptually very neat, since there is a universal biological
unit of currency, the ATP molecule, allowing the costs of various neuronal
transduction processes to be related to other important costs to an insect,
such as the amount of energy required to fly. There are a wealth of ideas here
and the insect is an ideal animal to explore them, given our good knowledge
of physiology, and the relative simplicity of collecting the large amounts of
physiological data required to estimate the statistics required for information
theoretical descriptions.
To apply the concepts of information theory, at a very minimum one
needs a reasonable model of the input statistics. In vision, the de facto
model is based on the fact that the power spectra of natural images have a
structure where the power at a given frequency is proportional to one over
that frequency squared. If the images are stationary and Gaussian, this is all
that needs to be known to fully specify the probability distribution of inputs
to the eye. Much successful work has been based on the simplistic model, but
despite this, common sense tells us that natural images are simply not
Gaussian.

21
22 Part One: Biological Networks

In Chapter 4, by Thompson, the very successful Fourier-based description


of the statistics of natural images is extended to allow the capturing of
higher-order regularities. The relationship between third-order correlations
between pixels (the expected value of the the product of any given three
pixels), and the bispectrum (a Fourier-based measure that is sensitive to
higher-order structure) is described, and this method is used to construct a
model of the statistics of a collection of natural images. The additional
insights provided by this new image model, in particular the "phase" rela-
tionships, are used to explain both psychophysical and electrophysiological
measurements. This is done in terms of generating a representation that has
equal degrees of phase coupling for every channel.

Glossary
ATP Adenosine triphosphate, the basic molecule involved in Kreb's cycle and
therefore involved in most biological metabolic activity. It therefore constitutes
a good biological measure of energy consumption in contrast to a physical
measure such as calories.
Autocorrelation The spatial autocorrelation refers to the expected correlation
across a set of images, of the image intensity of any two pixels as a function
distance and orientation. Often for convenience, one-dimensional slices are
used to describe how the correlation between two pixels decays as a function
of distance. It can in some cases be most simply calculated using Fourier-
transform-based techniques.
Bispectrum A generalisation of the power spectrum that, as well as capturing
the pairwise correlations between inputs, also captures three-way correlations.
It is therefore useful as a numerical technique for calculating the higher-order
regularities in natural images.
Channels A concept from the psychophysics of vision, where the outputs of a
number of neurally homogeneous mechanisms are grouped together for con-
venience. Particularly influential is the idea that vision can be understood in
terms of a number of independent spatial channels, each conveying informa-
tion about an image at different spatial scales. Not to be confused with the
standard information theory concept of a channel.
Difference of Gaussians (DoG) A simple numerical approximation to the recep-
tive field properties of retinal ganglion cells, and a key filter in a number of
computational approaches to vision. The spatial profile of the filter consists of
the difference between a narrow and high-amplitude Gaussian and a wide and
low-amplitude Gaussian, and has provided a reasonable model for physiolo-
gical data.
Factorial coding The concept that a good representation is one where all fea-
tures are completely independent. Given this the probability of any combina-
Part One: Biological Networks 23

tion of features is given by the probabilities of the individual features multi-


plied together.
Gabor A simple mathematical model of the receptive field properties of simple
cells in VI. Its spatial weighting profile is given by a sinusoid windowed by a
Gaussian, and as a filter minimises the joint uncertainty of the frequency and
spatial information in an image. Again it has had some success as a model of
biological data.
Kurtosis A statistic, based on fourth-order moments, used to test if a distribu-
tion is Gaussian. Essentially it tests if the distribution has longer "tails" than a
Gaussian, in contrast to skew, which measures if a distribution is asymmetric.
As a practical measure with only smallish amounts of data, it can be rather
unstable numerically (Press et al., 1992).
Large monopolar cell (LMC) A large non-spiking cell in the insect retina. These
receive input from the photoreceptors, and then communicate it to later stages.
Minimum entropy coding Possibly a confusing term, describing a representation
where the sum of the individual unit entropies is the minimum required in
order to transmit the desired amount of information.
Phase randomisation A method for demonstrating that the spatial frequency
content of an image is not all that is important in human spatial vision.
Given an image and its Fourier transform, two other images are generated:
one where all the phases are randomised, and one where all the amplitudes are
randomised. Humans often have little difficulty in recognising the amplitude
randomised version, but never recognise the phase randomised version. This is
important, since given the standard image models in terms of the Fourier
spectra, all that is relevant is the amplitude spectra. This means that one should
construct image models that take into account the phase as well.
Phasic response This refers to the property of most neurons when stimulated to
signal - a temporary increase in activity followed by a decay to some lower
level.
Poisson process A process where the timing of all events is random, and dis-
tribution of interevent times is exponentially distributed. This random model:
(1) is a reasonable null hypothesis of things such as spike times; (2) has been
used as a model for neuronal firing, in particular where the the average rate can
change as a function of time (a non-homogeneous Poisson process); (3) would
allow the most information to be transmitted if the exact time of spikes was
what communicated information.
Retinal ganglion cells The neurons that communicate information from the
retina to the lateral geniculate nucleus. Often conceptualised as the "output"
of the eye.
Simple cell Neurons in VI appear to be reasonably well modelled as linear
filters. In physiology they are often defined as cells that, when presented
with drifting gratings, show more power at the drifting frequency than at zero.
24 Part One: Biological Networks

Sparse coding A code where a given input is signalled by the activity of a very
small number of "features" out of a potentially much larger number.
Problems and Solutions in Early Visual Processing
BRIAN G. BURTON

2.1 Introduction
Part of the function of the neuron is communication. Neurons must com-
municate voltage signals to one another through their connections (synapses)
in order to coordinate their control of an animal's behaviour. It is for this
reason that information theory (Shannon and Weaver, 1949) represents a
promising framework in which to study the design of natural neural systems.
Nowhere is this more so than in the early stages of vision, involving the
retina, and in the vertebrate, the lateral geniculate nucleus and the primary
visual cortex. Not only are early visual systems well characterised physiolo-
gically, but we are also able to identify the ultimate "signal" (the visual
image) that is being transmitted and the constraints which are imposed on
its transmission. This allows us to suggest sensible objectives for early vision
which are open to direct testing. For example, in the vertebrate, the optic
nerve may be thought of as a limited-capacity channel. The number of gang-
lion cells projecting axons in the optic nerve is many times less than the
number of photoreceptors on the retina (Sterling, 1990). We might therefore
propose that one goal of retinal processing is to package information as
efficiently as possible so that as little as possible is lost (Barlow, 1961a).
Important to this argument is that we do not assume the retina is making
judgements concerning the relative values of different image components to
higher processing (Atick, 1992b). Information theory is a mathematical the-
ory of communication. It considers the goal of faithful and efficient transmis-
sion of a defined signal within a set of data. The more narrowly we need to
define this signal, the more certain we must be that this definition is correct
for information theory to be of use. Therefore, if we start making a priori

Information Theory and the Brain, edited by Roland Baddeley, Peter Hancock, and Peter Foldiak.
Copyright © 1999 Cambridge University Press. All rights reserved.

25
26 Brian G. Burton

assumptions about what features of the image are relevant for the animal's
needs, then we can be less confident in our conclusions. Fortunately, whilst
specialisation may be true of higher visual processing, in many species this is
probably not true for the retina. It is usually assumed that the early visual
system is designed to be flexible and to transmit as much of the image as
possible. This means that we may define two goals for early visual processing,
namely, noise reduction and redundancy reduction. We wish to suppress
noise so that a larger number of discriminable signals may be transmitted
by a single neuron and we wish to remove redundancy so that the full
representational potential of the system is realised.
These objectives are firmly rooted in information theory and we will see
that computational strategies for achieving them predict behaviour which
matches closely to that seen in early vision. I start with an examination of
the fly compound eye as this illustrates well the problems associated with
noise and possible solutions (see also Laughlin et al., Chapter 3 this volume).
It should become clear how noise and redundancy are interrelated. However,
most theoretical work on redundancy has concentrated on the vertebrate
visual system about which there is more contention. Inevitably, the debate
concerns the structure of the input, that is, the statistics of natural images.
This defines the redundancy and therefore the precise information theoretic
criteria that should be adopted in visual processing. It is this issue upon
which I wish to focus, with particular emphasis on spatial redundancy.

2.2 Adaptations of the Insect Retina


A major problem for any visual system is the large range of background
intensities displayed by natural light. Despite having a limited dynamic
range, the photoreceptor must remain sensitive to contrast (deviations
from the mean) at all intensities. Sensory adaptation is the familiar solution
to this problem (Laughlin, 1994) and this may be seen as a simple example
where the visual system adjusts to the contingencies of the environment.
However, associated with changes in background light intensity are changes
in the signal-to-noise ratio (SNR) which are of equal concern. Light is an
inherently noisy phenomenon. Photon incidence rates follow Poisson statis-
tics. That is, over a given area of retina and over a given time interval, the
variance in the number of photons arriving is equal to the mean. If we define
"noise" as the standard deviation in photon count, then the consequence of
this is that the SNR is proportional to the square root of the "signal" (mean).
Low ambient light levels are therefore associated with low SNRs. This is
clearly a problem, since across a communication channel, noise reduces the
certainty that the receiver has about the identity of the source output. It
effectively reduces the number of discriminable signals that the channel
may transmit and thus its capacity. More formally, provided photon inci-
2.2. Adaptations of the Insect Retina 27

dence rates are high enough that we can use the Gaussian approximation to
the Poisson distribution, we may use Shannon's (Shannon and Weaver, 1949)
equation to define channel capacity. For an analogue neuron (such as a
photoreceptor), subject to Gaussian distributed noise, the SNR affects capa-
city (in bits s"1) as follows:

where S{v) and N{v) are the (temporal) power spectral densities of the opti-
mum driving stimulus and the noise respectively.
There are a number of ways in which the insect retina may cope with the
problem of input noise. Where the metabolic costs are justified, the length of
photoreceptors and hence the number of phototransduction units may be
increased to maximise quantum catch (Laughlin and McGinness, 1978).
Alternatively, at low light intensities, it may be beneficial to trade off tem-
poral resolving power for SNR to make optimum use of the neuron's limited
dynamic range. It has recently been found that the power spectrum of nat-
ural, time-varying images follows an inverse relationship with temporal fre-
quency (Dong and Atick, 1995a). Because noise power spectra are flat (van
Hateren, 1992a), this means that SNR declines with frequency. There is
therefore no advantage in transmitting high temporal frequencies at low
light intensity when signal cannot be distinguished from noise. Instead, the
retina may safely discard these to improve SNR at low frequencies and
maximise information rate. In the fly, this strategy is exemplified by the
second-order interneurons, the large monopolar cells (LMCs) which become
low-pass temporal filters at low SNR (Laughlin, 1994, rev.).
The problem of noise is not just limited to extrinsic noise.
Phototransduction, for example, is a quantum process and is inherently
noisy. More generally, however, synaptic transmission is a major source of
intrinsic noise and we wish to find ways in which synaptic SNR may be
improved. Based on very few assumptions, Laughlin et al. (1987) proposed
a simple model for the graded synaptic transmission between photoreceptors
and LMCs which predicts that synaptic SNR is directly proportional to
synaptic voltage gain. More precisely, if b describes the sensitivity of trans-
mitter release to presynaptic voltage and determines the maximum voltage
gain achievable across the synapse, then:
(2.2)
where AR is the change in receptor potential being signalled and T is the
present level of transmitter release (another Poisson process). That is, by
amplifying the receptor signal through b, it is possible to improve synaptic
SNR. However, because LMCs are under the same range constraints as
photoreceptors, such amplification may only be achieved through transient
28 Brian G. Burton

response properties and LMCs are phasic (Figure 2.1a). This is related to
redundancy reduction, since transmitting a signal that is not changing (a
tonic response) would not convey any information yet would use up the cell's
dynamic range. Furthermore, by amplifying the signal at the very earliest
stage of processing, the signal becomes more robust to noise corruption at
later stages. It may also be significant that this amplification occurs before
the generation of spikes (LMCs show graded responses). De Ruyter van
Steveninck and Laughlin (1996b) determined the optimum stimuli for driving
LMCs and found that their information capacity can reach five times that of
spiking neurons (see also Juusola and French, 1997). If signal amplification
were to take place after the first generation of spikes, this would not only be
energetically inefficient but might result in unnecessary loss of information.

receptor
A)

stimulus intensity

LMC

-0.5 0 0.5
contrast

stimulus
Figure 2.1. Responses of fly retinal LMC cells, (a) Response to tonic stimulation. While
photoreceptors show tonic activity in response to a sustained stimulus, LMC interneur-
ons show phasic response. This allows signal amplification and protection against noise.
Note, LMCs may be hyperpolarising because transmission is by electrotonus, not spike
generation. (From Laughlin et al., 1987, with permission from The Royal Society.)
(b) Matched coding. Natural images have a characteristic distribution of contrasts
(top) and corresponding cumulative probability (middle). The LMC synapse matches
its output to this cumulative probability curve to maximise information transmission
(bottom). (From Laughlin, 1987, with permission from Elsevier Science.)
2.2. Adaptations of the Insect Retina 29

As will be detailed later, there are two types of redundancy that should be
removed from a cell's response. Besides removing temporal correlations, the
cell should also utilise its different response levels with equal frequency. For a
channel with a limited range, a uniform distribution of outputs is the one
with the most entropy and therefore the one that may realise channel capa-
city. This principle too is demonstrated by the photoreceptor-LMC synapse.
Laughlin (1981) measured the relative frequencies of different levels of con-
trast in the fly's natural environment under daylight conditions and com-
pared the resulting histogram with the responses of LMCs to the range of
contrasts recorded. Remarkably, the input-output relationship of the LMC
followed the cumulative distribution observed in natural contrasts, just what
is predicted for entropy maximisation (Figure 2.1b). This behaviour may also
be seen as allowing the fly to discriminate between small changes in contrast
where they are most frequent, since the highest synaptic gain corresponds
with the modal contrast.
In summary, the work on insects, and in particular, the fly, has shown how
cellular properties may be exquisitely designed to meet information theoretic
criteria of efficiency. Indeed, LMC responses at different light intensities may
be predicted with striking accuracy merely on the assumption that the retina
is attempting to maximise information transmission through a channel of
limited dynamic range (van Hateren, 1992a). This is most clearly demon-
strated by the correspondence between the images recorded in the fly retina
and those predicted by theory (van Hateren, 1992b). In particular, the fly has
illustrated the problems associated with noise. However, it should be pointed
out that the design principles identified in flies may also be seen in the
vertebrate eye (Sterling, 1990). With the exception of ganglion cells (the
output neurons), the vertebrate retina also comprises almost exclusively
non-spiking neurons and one of its main functions appears to be to protect
against noise by eliminating redundant or noisy signal components and
boosting the remainder. For example, phasic retinal interneurons are argu-
ably performing the same function as the LMC and the slower responses of
photoreceptors at low light intensities may be an adaptation to low SNR. In
addition, the well-known centre-surround antagonistic receptive fields (RFs)
of ganglion cells first appear in the receptors themselves (Baylor et al., 1971).
This allows spatially redundant components to be removed (examined in
more detail below) and for the information-carrying elements to be amplified
before noise corruption at the first feed-forward synapse. Finally, there exist
on and off ganglion cells. This not only effectively increases the dynamic
range of the system and allows greater amplification of input signals, but
also provides equally reliable transmission of all contrasts. Because spike
generation is subject to Poisson noise, a single spiking cell which responds
monotonically to contrast will have a low SNR at one end of its input range
where its output is low. In a two-cell system, however, in which distinct cells
30 Brian G. Burton

respond in opposite directions to increases and decreases in contrast, there is


always one cell type with a high SNR.

2.3 The Nature of the Retinal Image


In the previous section, I gave a brief, qualitative description of the spatial
receptive field properties of vertebrate retinal cells. The centre-surround
antagonism of these is well known. Most are excited by light of one contrast
in the centres of their RFs but are inhibited by contrast of the opposite sign
in the surround. What is the reason for this opponency? Can spatial RF
properties be predicted using information theory? To answer these questions
requires knowledge of the nature of the retinal image. Only by knowing what
the system is working with may we understand what it is doing. We have
already seen, for example, that the receptor-LMC synapse in the fly is
adapted to the distribution of contrast in the image so we might expect
that consideration of other image statistics should be of similar use.
Given the relative simplicity of the retina and its early position in the
visual pathway, it is often assumed that it is aware of only the most general
of image statistics (e.g. Atick and Redlich, 1992). Besides image contrast and
the relationship between mean intensity and noise, a fundamental feature of
natural scenes is that they are translation invariant. This means that, aver-
aged over an ensemble of scenes, there is no part of the image with special
statistics. This is fortunate as it allows us to determine the autocorrelation
function for natural scenes, the degree of correlation between points at dif-
ferent relative positions in the image. For a square image of length and
width, 2a, the autocorrelation function, R(x), is given by:

R(x) = (l/4a 2 )/ f /(x')/(x + x ) d x \ (2.3)

where /(x) is the light intensity at position, x and (•) indicates averaging over
the ensemble of examples. In Fourier terms, this is expressed by the power
spectral density (Bendant and Piersol, 1986). When this is determined, the
relationship between signal power and spatial frequency, f, follows a distinct
l/|f|2 law (Burton and Moorhead, 1987; Field, 1987). If T[-] represents the
Fourier transformation, and L(f) the Fourier transform of /(x), then

oc - L (2.4)
|f)2

That is, as with temporal frequencies, there is less "energy" in the "signal" at
high frequencies and hence a lower SNR. More interestingly, such a relation-
ship signifies scale invariance. That is, the image appears the same at all
2.4. Theories for the RFs of Retinal Cells 31

magnifications. This probably reflects the fractal geometry of natural forms


and the fact that similar objects may appear at different distances
(Ruderman, 1994). It should be noted, however, that the image sampled by
the retina is not quite the same as that passing through the lens. The retinal
image is also affected by the modulation transfer function of the eye (MTF).
This describes the attenuation (modulation) of different spatial frequencies
by imperfect optics and the intraocular medium. In Fourier space, this may
be described by an exponential (Campbell and Gubisch, 1966), essentially a
low-pass filter which reduces the amplitudes of high spatial frequencies and
compounds the SNR problem there.
Given these basic properties of the retinal image, what explanations have
been offered for the RFs of retinal cells? The more convincing studies are
those that take into account the MTF (Atick and Redlich, 1992; van
Hateren, 1992c). However, most are in agreement that the l/|f|2 relationship
is important. It represents the statistical structure of the environment and
therefore the nature of its redundancy. In particular, it represents correlation.
Nearby points in a natural image are correlated and therefore tend to carry
the same signal. By taking account of this, the receptive fields of retinal
ganglion cells may recode the image into a more compact and efficient
form (Barlow, 1961a) for transmission down the optic nerve.

2.4 Theories for the RFs of Retinal Cells


Perhaps the simplest coding scheme proposed for retinal cells is "collective
coding" (Tsukomoto et al., 1990). In this model, ganglion cell RFs are con-
structed to improve SNR. This is achieved by appropriately combining
receptor outputs under the assumptions that image intensities, but not
noise, are locally correlated and that the autocorrelation function is a nega-
tive exponential. Much as in real ganglion cells, the optimum RF profile is
found to be dome-shaped across the centre (Figure 2.2a,b). It is also found
that the improvement in SNR is a decelerating function of array width
(related to low correlation at distance) but is proportional to the square
root of cone density. These simple relationships allow Tsukomoto et al.
(1990) to speculate about the relationship between SNR and anatomy. It is
well known that ganglion cells have larger RFs (sampling areas) in the per-
iphery than in the fovea but also that peripheral cone density is lower than
foveal cone density. Tsukomoto et al. calculate the SNRs achieved across the
eye using anatomical measurements of these parameters in the cat. This
shows that when images are highly correlated, SNR increases with eccentri-
city, reflecting the pull of increasing sampling area. When images are poorly
correlated, SNR decreases with eccentricity, reflecting the pull of decreasing
cone density. However, for a correlation space constant believed to represent
natural scenes, the SNR is constant across the retina. Thus, it is possible that
Brian G. Burton

11

IT

-100 0 100
Diagonal (/im)

Figure 2.2. Comparison between collective coding and predictive coding. (a,b) Collective
coding, (a) The ganglion cell RF is constructed by weighting the inputs from the sur-
rounding m photoreceptors according to their autocorrelation coefficients, r. (b) The
optimum RF profile (O), shown here across the diagonal of the RF, is found to be dome
shaped. This gives greater SNR than either a flat (F) or exponential (E) weighting
function. (From Tsukomoto et al., 1990, with permission from the author.)
(c) Predictive coding. At high SNR (top), the inhibitory surround of a model ganglion
cell is restricted. As SNR is lowered, the surround becomes more diffuse (middle) and
eventually subtracts an unweighted average of local image intensity (bottom). (From
Srinivasan et al., 1982, with permission from the Royal Society.) While collective coding
explains the centre of the RF, predictive coding explains the surround. However, neither
explain both.

although visual acuity drops off with eccentricity, the eye is designed to
obtain equally reliable signals from all parts of the retinal image.
The collective coding model is instructive. It shows how natural statistics
and the statistical independence of noise may be used to improve system
performance. However, whilst collective coding provides an appreciation
for the form of the RF across its centre, it does not satisfactorily address
2.4. Theories for the RFs of Retinal Cells 33

redundancy. Image redundancy is used to suppress noise but as may be


understood from the fly LMC, if redundancy were specifically targeted,
this would naturally allow amplification and more effective use of dynamic
range. This is discussed by Tsukomoto et al. (1990) and their principle of
photoreceptor convergence is not incorrect, but there are more holistic the-
ories.
The first attempt to explain retinal RFs in terms of redundancy was the
"predictive coding" of Srinivasan et al. (1982). They proposed that the antag-
onistic surround of ganglion cell RFs serves as a prediction of the signal in
the centre. This prediction is based on correlations within natural images and
thus represents knowledge of statistical structure, that is, redundancy. By
subtracting the redundant component from its response, the ganglion cell
need only transmit that which is not predictable. This may then be amplified
to protect against noise injection at later stages.
To be more precise, the synaptic weights, wh on a ganglion cell, are
adjusted to minimise the squared error, E, between the intensity received
at the centre, Jt0, and the weighted average of those received in the vicinity, xf.

E = I Ux0 + n0) - Y, w^ + "/)) ) (2-5)


where n represents noise. It is not hard to show that the solution involves the
inverse of a noise-modified matrix. When noise is present, this increases
diagonal coefficients and the optimum weights change accordingly. This
shows that at high SNR, the prediction is based on signals in the immediate
vicinity but at low SNR, equivalent to low light levels, it becomes necessary
to average signals over a wider area (Figure 2.2c). Such contingent modifica-
tion of lateral inhibition is a feature of real ganglion cells (e.g. Rodiek and
Stone, 1965) and so it would seem that predictive coding is consistent with
experiment. It is also true that the above objective function results in dec-
orrelation of ganglion cell outputs, a feature which has considerable theore-
tical merits (see later). However, whilst collective coding does not say
anything about the surround of the RF, predictive coding does not say any-
thing about the centre. In this respect, predictive coding is complementary
but not superior to collective coding.
More recent theories have explicitly formulated the problem of early
vision in terms of information theory and have met with some success.
Van Hateren's (1992c; 1993) model of the retina for example, is able to
predict several psychophysical laws. The objective of his analysis is to modify
the neural filter to optimise information transmission given the l/|f |2 statistic
of natural images, the lens MTF, the existence of frequency-neutral noise and
the limited dynamic range of the ganglion cell. In both the spatial and tem-
poral domains, he finds that the optimum filter is band-pass at high light
34 Brian G. Burton

intensities but becomes low-pass at low intensities. This behaviour may be


appreciated with reference to equation 2.1. Generally, it is more important to
have a moderate SNR over a large range of frequencies than a large SNR
over a smaller range, since:
log(l + a + b) < log(l + a) + log(l + b) for a,b>0 (2.6)
Accordingly, the low signal frequencies which are naturally of high ampli-
tude should be attenuated while the high frequencies should be amplified. It
is not worth amplifying the very high frequencies, however, since they
already have poor SNR before processing. As average SNR (across all fre-
quencies) is reduced with light level, this factor becomes more important and
the filter must bias towards the low frequencies to maintain the highest rate
of information transmission.
By specifying information maximisation as his goal, van Hateren (1992c,
1993) was not placing any a priori importance on redundancy or noise reduc-
tion. Another study based on an information theoretic approach is that of
Atick and Redlich (1992). They propose that one of the purposes of retinal
processing is the reduction of redundancy manifested by statistical depen-
dencies between ganglion cell outputs. Thus, although their model includes
an initial low-pass filter, designed to suppress noise while maintaining the
mutual information between its input and its output, the final stage of pro-
cessing involves decorrelation. There are few free parameters in Atick and
Redlich's model and the predicted retinal filters bear a remarkable resem-
blance to those obtained in psychophysical experiments (Figure 2.3a).
The motivation for Atick and Redlich's model is both experimental and
theoretical. Experimentally, it appears that real ganglion filter kernels flatten
the power spectrum of natural images up to a particular frequency at high

B) C)
300 300

100 100

30 30

10 10

3 3

10 30 100

Sj»ti«l fir«qu«»cy. c / d « |

Figure 2.3. Decorrelation in the retina, (a) Match with psychophysical experiments. For
a certain parameter regime, the predictions of Atick and Redlich (curves) fit very well
with psychophysical data (points), (b) Signal whitening. When the contrast sensitivity
curves (left) obtained from psychophysical experiments at high luminosity are multiplied
by the amplitude spectrum of natural images, the curve becomes flat at low frequencies.
This indicates that ganglion cells are indeed attempting to decorrelate their input. (From
Atick and Redlich, 1992, with permission from MIT Press Journals.)
2.4. Theories for the RFs of Retinal Cells 35

light intensity (Figure 2.3b). Since a flat spectrum would correspond to a


Dirac delta autocorrelation function, this indicates that ganglion cells are
indeed attempting to decorrelate their outputs where SNR permits. The
theoretical arguments for decorrelation are numerous. First and foremost,
decorrelation allows for a compact coding; that is, one in which there is
minimal redundancy across neurons. We may formally define redundancy
here as (Atick and Redlich, 1990):

= 1 (2 7)
^ -
where /(y; x) is the mutual information between the output of the channel, y,
and the input, x and C(y) is the capacity, the maximum of/(y;x). Consider
the case when there is no input noise, no dimensionality reduction and input
follows Gaussian statistics. If processing is described by the linear transfor-
mation, A, then the value of C(y) is given by:

1 /|ARA + (^)l|\
c)
C(y) = -logV ' (2.8)
2
\ («?>i / argmax
where R is the autocorrelation matrix of the input and nc is channel noise
(note the similarity with equation 2.1). Now, if the output variances, (yf)
(diagonal terms of ARA r -f {nl)I), are fixed, then by the inequality,
|M| < ]~I/(M)/p Ay;x) m a Y o n ly equal the capacity when all the entries of
ARA , except those on the diagonal, are zero. That is, redundancy is
removed only when the output is decorrelated.
Besides this desirable feature, decorrelation also represents the first step
towards achieving a factorial code. In the limit, this would require that the
outputs of neurons were statistically independent regardless of the probabil-
ity distribution of inputs. That is, the probability that a particular level of
activity is observed in a given neuron is independent of the activity observed
at other neurons:

P ( y t \ y \ , ^ 2 ^ 3 ••• y & i - • • y n ) = P(yd for a l l /


^ (2.9)

In information theoretic terms, this is equivalent to saying that the sum of


entropies at each of the neurons is equal to the entropy of the network as a
whole. Since this sum would be greater than the total entropy if any depen-
dencies existed, factorial coding is often referred to as minimum entropy
coding (Atick, 1992a, 1992b). Barlow (1989) reasons that a minimum entropy
coding is desirable as it allows the probability of obtaining any pattern of
activity to be calculated from a small number of individual prior probabilities
(equation 2.9b). It represents a simple model of the environment from which
Another random document with
no related content on Scribd:
CHAPTER VII.
A NIGHT-FESTIVAL IN A HINDU TEMPLE.

The festival of Taypusam is one of the more important among the


many religious festivals of the Hindus, and is celebrated with great
rejoicings on the night of the first full moon in January each year. In
the case of the great temples of Southern India, some of which are
so vast that their enclosures are more than a mile in circumference,
enormous crowds—sometimes 20,000 people or more—will
congregate together to witness the ceremonials, which are
elaborately gorgeous. There are a few Hindu temples of smaller size
in Ceylon, and into one of these I had the good fortune to be
admitted, on the occasion of this year’s festival (1891), and at the
time when the proceedings were about to commence.
It was nine o’clock, the full moon was shining in the sky, and
already the blaring of trumpets and horns could be heard from
within as I stood at the gate seeking admittance. At first this was
positively denied; but my companion, who was a person, of some
authority in the temple, soon effected an entrance, and we presently
stood within the precincts. It must be understood that these temples
generally consist of a large oblong enclosure, more or less planted
with palms and other trees, within which stands the sanctuary itself,
with lesser shrines, priests’ dwellings and other buildings grouped
round it. In the present case the enclosure was about one hundred
yards long by sixty or seventy wide, with short grass under foot. In
the centre stood the temple proper—a building without any
pretensions to architectural form, a mere oblong, bounded by a wall
ten or twelve feet high; unbroken by any windows, and rudely
painted in vertical stripes, red and white. At the far end, under trees,
were some low priests’ cottages; and farther on a tank or reservoir,
not very large, with a stone balustrade around it. Coming round to
the front of the temple, which was more ornamented, and where the
main doorway or entrance was, we found there a considerable crowd
assembled. We were in fact just in time to witness the beginning of
the ceremony; for almost immediately a lot of folk came rushing out
through the doorway of the temple in evident excitement; torches
were lighted, consisting of long poles, some surmounted with a
flaming ring of rags dipped in coco-nut oil, others with a small iron
crate in which lumps of broken coco-nut burned merrily. In a few
moments there was a brilliant light; the people arranged themselves
in two lines from the temple door; sounds of music from within got
louder; and a small procession appeared, musicians first, then four
nautch girls, and lastly a small platform supported on the shoulders
of men, on which was the great god Siva.
At first I could not make out what this last-named object was,
but presently distinguished two rude representations of male and
female figures, Siva and his consort Sakti, apparently cut out of one
block, seated, and about three feet high, but so bedone with jewels
and silks that it was difficult to be sure of their anatomy! Over them
was held a big ornamental umbrella, and behind followed the priest.
We joined the procession, and soon arrived at the edge of the
reservoir which I have already mentioned, and on which was floating
a strange kind of ship. It was a raft made of bamboos lashed to
empty barrels, and on it a most florid and brilliant canopy, covered
with cloths of different colors and surmounted by little scarlet
pennants. A flight of steps down to the water occupied the whole of
one side of the tank, the other three sides were surrounded by the
stone balcony, and on these steps and round the balcony the crowd
immediately disposed itself, while the procession went on board.
When the god was properly arranged under his canopy, and the
nautch girls round about him, and when room had been found for
the crew, who with long poles were to propel the vessel, and for as
many musicians as convenient—about a dozen souls in all—a bell
rang, and the priest, a brown-bodied young Brahman with the
sacred thread over his shoulders and a white cloth edged with red
round his loins, made an offering of flame of camphor in a five-
branched lamp. A hush fell upon the crowd, who all held their hands,
palms together, as in the attitude of prayer (but also symbol of the
desire to be joined together and to the god)—some with their arms
high above their heads; a tray was placed on the raft, of coco-nuts
and bananas which the priest opening deposited before the image;
the band burst forth into renewed uproar, and the ship went gyrating
over the water on her queer voyage.

TAMIL MAN.

What a scene! I had now time to look around a little. All round
the little lake, thronging the steps and the sides in the great glare of
the torches, were hundreds of men and boys, barebodied, barehead
and barefoot, but with white loin-cloths—all in a state of great
excitement—not religious so much as spectacular, as at the
commencement of a theatrical performance, myself and companion
about the only persons clothed,—except that in a corner and forming
a pretty mass of color were a few women and girls, of the poorer
class of Tamils, but brightly dressed, with nose-rings and ear-rings
profusely ornamented. On the water, brilliant in scarlet and gold and
blue, was floating the sacred canopy, surrounded by musicians
yelling on their various horns, in the front of which—with the priest
standing between them—sat two little naked boys holding small
torches; while overhead through the leaves of plentiful coco-nut and
banana palms overhanging the tank, in the dim blue sky among
gorgeous cloud-outlines just discernible, shone the goddess of night,
the cause of all this commotion.
Such a blowing up of trumpets in the full moon! For the first
time I gathered some clear idea of what the ancient festivals were
like. Here was a boy blowing two pipes at the same time, exactly as
in the Greek bas-reliefs. There was a man droning a deep bourdon
on a reed instrument, with cheeks puffed into pouches with long-
sustained effort of blowing; to him was attached a shrill flageolet
player—the two together giving much the effect of Highland
bagpipes. Then there were the tomtoms, whose stretched skins
produce quite musical and bell-like though monotonous sounds; and
lastly two old men jingling cymbals and at the same time blowing
their terrible chank-horns or conches. These chanks are much used
in Buddhist and Hindu temples. They are large whorled sea-shells of
the whelk shape, such as sometimes ornament our mantels. The
apex of the spiral is cut away and a mouthpiece cemented in its
place, through which the instrument can be blown like a horn. If
then the fingers be used to partly cover and vary the mouth of the
shell, and at the same time the shell be vibrated to and fro in the air
—what with its natural convolutions and these added complications,
the most ear-rending and diabolically wavy bewildering and hollow
sounds can be produced, such as might surely infect the most
callous worshiper with a proper faith in the supernatural.
The temper of the crowd too helped one to understand the old
religious attitude. It was thoroughly whole-hearted—I cannot think
of any other word. There was no piety—in our sense of the word—or
very little, observable. They were just thoroughly enjoying
themselves—a little excited no doubt by chanks and divine
possibilities generally, but not subdued by awe; talking freely to each
other in low tones, or even indulging occasionally—the younger ones
—in a little bear-fighting; at the same time proud of the spectacle
and the presence of the divinity, heart and soul in the ceremony, and
anxious to lend hands as torch-bearers or image-bearers, or in any
way, to its successful issue. It is this temper which the wise men say
is encouraged and purposely cultivated by the ceremonial institutions
of Hinduism. The temple services are made to cover, as far as may
be, the whole ground of life, and to provide the pleasures of the
theatre, the art-gallery, the music hall and the concert-room in one.
People attracted by these spectacles—which are very numerous and
very varied in character, according to the different feasts—presently
remain to inquire into their meaning. Some like the music, others the
bright colors. Many men come at first merely to witness the dancing
of the nautch girls, but afterwards and insensibly are drawn into
spheres of more spiritual influence. Even the children find plenty to
attract them, and the temple becomes their familiar resort from early
life.
The theory is that all the ceremonies have inner and mystic
meanings—which meanings in due time are declared to those who
are fit—and that thus the temple institutions and ceremonies
constitute a great ladder by which men can rise at last to those inner
truths which lie beyond all formulas and are contained in no creed.
Such is the theory, but like all theories it requires large deductions
before acceptance. That such theory was one of the formative
influences of the Hindu ceremonial, and that the latter embodies
here and there important esoteric truths descending from Vedic
times, I hardly doubt; but on the other hand, time, custom and
neglect, different streams of tradition blending and blurring each
other, reforms and a thousand influences have—as in all such cases
—produced a total concrete result which no one theory can account
for or coordinate.
Such were some of my thoughts as I watched the crowd around
me. They too were not uninterested in watching me. The
appearance of an Englishman under such circumstances was
perhaps a little unusual and scores of black eyes were turned
inquiringly in my direction; but covered as I was by the authority of
my companion no one seemed to resent my presence. A few I
thought looked shocked, but the most seemed rather pleased, as if
proud that a spectacle so brilliant and impressive should be
witnessed by a stranger—besides there were two or three among
the crowd whom I happened to have met before and spoken with,
and whose friendly glances made me feel at home.
Meanwhile the gyrating raft had completed two or three voyages
round the little piece of water. Each time it returned to the shore
fresh offerings were made to the god, the bell was rung again, a
moment of hushed adoration followed, and then with fresh strains of
mystic music a new start for the deep took place. What the inner
signification of these voyages might be I had not and have not the
faintest idea; it is possible even that no one present knew. At the
same time I do not doubt that the drama was originally instituted in
order to commemorate some actual event or to symbolise some
doctrine. On each voyage a hymn was sung or recited. On the first
voyage the Brahman priest declaimed a hymn from the Vedas—a
hymn that may have been written 3,000 years ago—nor was there
anything in the whole scene which appeared to me discordant with
the notion that the clock had been put back 3,000 years (though of
course the actual new departure in the Brahmanical rites which we
call Hinduism does not date back anything like so far as that). On
the second voyage a Tamil hymn was sung by one of the youths
trained in the temples for this purpose; and on the third voyage
another Tamil hymn, with interludes of the most ecstatic
caterwauling from chanks and bagpipes! The remainder of the
voyages I did not witness, as my conductor now took me to visit the
interior of the temple.
That is, as far as it was permissible to penetrate. For the
Brahman priests who regulate these things, with far-sighted policy
make it one of their most stringent rules that the laity shall not have
access beyond a short distance into the temple, and heathen like
myself are of course confined to the mere forecourts. Thus the
people feel more awe and sanctity with regard to the holy place
itself and the priests who fearlessly tread within than they do with
regard to anything else connected with their religion.
Having passed the porch, we found ourselves in a kind of
entrance hall with one or two rows of columns supporting a flat
wooden roof—the walls adorned with the usual rude paintings of
various events in Siva’s earthly career. On the right was a kind of
shrine with a dancing figure of the god in relief—the perpetual dance
of creation; but unlike some of the larger temples, in which there is
often most elaborate and costly stonework, everything here was of
the plainest, and there was hardly anything in the way of sculpture
to be seen. Out of this forecourt opened a succession of chambers
into which one might not enter; but the dwindling lights placed in
each served to show distance after distance. In the extreme
chamber farthest removed from the door, by which alone daylight
enters—the rest of the interior being illumined night and day with
artificial lights—is placed, surrounded by lamps, the most sacred
object, the lingam. This of course was too far off to be discerned—
and indeed it is, except on occasions, kept covered—but it appears
that instead of being a rude image of the male organ (such as is
frequently seen in the outer courts of these temples), the thing is a
certain white stone, blue-veined and of an egg-shape, which is
mysteriously fished up—if the gods so will it—from the depths of the
river Nerbudda, and only thence. It stands in the temple in the
hollow of another oval-shaped object which represents the female
yoni; and the two together, embleming Siva and Sakti, stand for the
sexual energy which pervades creation.
Thus the worship of sex is found to lie at the root of the present
Hinduism, as it does at the root of nearly all the primitive religions of
the world. Yet it would be a mistake to conclude that such worship is
a mere deification of material functions. Whenever it may have been
that the Vedic prophets descending from Northern lands into India
first discovered within themselves that capacity of spiritual ecstasy
which has made them even down to to-day one of the greatest
religious forces in the world, it is certain that they found (as indeed
many of the mediæval Christian seers at a later time also found) that
this ecstasy had a certain similarity to the sexual rapture. In their
hands therefore the rude, phallic worships, which their predecessors
had with true instinct celebrated, came to have a new meaning; and
sex itself, the most important of earthly functions, came to derive an
even greater importance from its relation to the one supreme and
heavenly fact, that of the soul’s union with God.
In the middle line of all Hindu temples, between the lingam and
the door, are placed two other very sacred objects—the couchant
bull Nandi and an upright ornamented pole, the Kampam, or as it is
sometimes called, the flagstaff. In this case the bull was about four
feet in length, carved in one block of stone, which from continual
anointing by pious worshipers had become quite black and lustrous
on the surface. In the great temple at Tanjore there is a bull twenty
feet long cut from a single block of syenite, and similar bull-images
are to be found in great numbers in these temples, and of all sizes
down to a foot in length, and in any accessible situation are sure to
be black and shining with oil. In Tamil the word pasu signifies both
ox—i.e. the domesticated ox—and the soul. Siva is frequently
represented as riding on a bull; and the animal represents the
human soul which has become subject and affiliated to the god. As
to the flagstaff, it was very plain, and appeared to be merely a
wooden pole nine inches or so thick, slightly ornamented, and
painted a dull red color. In the well-known temple at Mádura the
kampam is made of teak plated with gold, and is encircled with
certain rings at intervals, and at the top three horizontal arms
project, with little bell-like tassels hanging from them. This curious
object has, it is said, a physiological meaning, and represents a
nerve which passes up the median line of the body from the genital
organs to the brain (? the great sympathetic). Indeed the whole
disposition of the parts in these temples is supposed (as of course
also in the Christian Churches) to represent the human body, and so
also the universe of which the human body is only the miniature. I
do not feel myself in a position however to judge how far these
correspondences are exact. The inner chambers in this particular
temple were, as far as I could see, very plain and unornamented.
On coming out again into the open space in front of the porch,
my attention was directed to some low buildings which formed the
priests’ quarters. Two priests were attached to the temple, and a
separate cottage was intended for any traveling priest or lay
benefactor who might want accommodation within the precincts.
And now the second act of the sacred drama was commencing.
The god, having performed a sufficient number of excursions on the
tank, was being carried back with ceremony to the space in front of
the porch—where for some time had been standing, on portable
platforms made of poles, three strange animal figures of more than
life-size—a bull, a peacock, and a black creature somewhat
resembling a hog, but I do not know what it was meant for. On the
back of the bull, which was evidently itself in an amatory and excited
mood, Siva and Sakti were placed; on the hog-like animal was
mounted another bejewelled figure—that of Ganésa, Siva’s son; and
on the peacock again the figure of his other son, Soubramánya.
Camphor flame was again offered, and then a lot of stalwart and
enthusiastic worshipers seized the poles, and mounting the
platforms on their shoulders set themselves to form a procession
round the temple on the grassy space between it and the outer wall.
The musicians as usual went first, then came the dancing girls, and
then after an interval of twenty or thirty yards the three animals
abreast of each other on their platforms, and bearing their
respective gods upon their backs. At this point we mingled with the
crowd and were lost among the worshipers. And now again I was
reminded of representations of antique religious processions. The
people, going in front or following behind, or partly filling the space
in front of the gods—though leaving a lane clear in the middle—were
evidently getting elated and excited. They swayed their arms, took
hands or rested them on each other’s bodies, and danced rather
than walked along; sometimes their shouts mixed with the music;
the tall torches swayed to and fro, flaring to the sky and distilling
burning drops on naked backs in a way which did not lessen the
excitement; the smell of hot coco-nut oil mingling with that of
humanity made the air sultry; and the great leaves of bananas and
other palms leaning over and glistening with the double lights of
moon and torch flames gave a weird and tropical beauty to the
2
scene. In this rampant way the procession moved for a few yards,
the men wrestling and sweating under the weight of the god-
images, which according to orthodox ideas are always made of an
alloy of the five metals known to the ancients—an alloy called
panchaloka—and are certainly immensely heavy; and then it came to
a stop. The bearers rested their poles on strong crutches carried for
the purpose, and while they took breath the turn of the nautch girls
came.

2
Mrs. Speir, in her Life in Ancient India, p.
374, says that we first hear of Siva worship
about b.c. 300, and that it is described by
Megasthenes as “celebrated in tumultuous
festivals, the worshippers anointing their bodies,
wearing crowns of flowers, and sounding bells
and cymbals. From which,” she adds, “the
Greeks conjectured that Siva worship was
derived from Bacchus or Dionysos, and carried to
the East in the traditionary expedition which
Bacchus made in company with Hercules.”
NAUTCH GIRL.

Most people are sufficiently familiar now-a-days, through


Oriental exhibitions and the like, with the dress and bearing of these
Devadásis, or servants of God. “They sweep the temple,” says the
author of Life in an Indian Village, “ornament the floor with quaint
figures drawn in rice flour, hold the sacred light before the god, fan
him, and dance and sing when required.” “In the village of
Kélambakam,” he continues, “there are two dancing girls,
Kanakambujam and Minakshi. K. is the concubine of a neighboring
Mudelliar, and M. of Appalacharri the Brahman. But their services can
be obtained by others.” I will describe the dress of one of the four
present on this occasion. She had on a dark velveteen tunic with
quite short gold-edged sleeves, the tunic almost concealed from
view by a very handsome scarf or sari such as the Indian women
wear. This sari, made of crimson silk profusely ornamented with gold
thread, was passed over one shoulder, and having been wound twice
or thrice round the waist was made to hang down like a petticoat to
a little below the knee. Below this appeared silk leggings of an
orange color; and heavy silver anklets crowned the naked feet.
Handsome gold bangles were on her arms (silver being usually worn
below the waist and gold above), jewels and bell-shaped pendants in
her nose and ears, and on her head rose-colored flowers pinned
with gold brooches and profusely inwoven with the plaited black hair
that hung down her back. The others with variations in color had
much the same costume.
To describe their faces is difficult. I think I seldom saw any so
inanimately sad. It is part of the teaching of Indian women that they
should never give way to the expression of feeling, or to any kind of
excitement of manner, and this in the case of better types leads to a
remarkable dignity and composure of bearing, such as is
comparatively rare in the West, but in more stolid and ignorant sorts
produces a most apathetic and bovine mien. In the case of these
nautch women circumstances are complicated by the prostitution
which seems to be the inevitable accompaniment of their profession.
One might indeed think that it was distinctly a part of their
profession—as women attached to the service of temples whose
central idea is that of sex—but some of my Hindu friends assure me
that this is not so: that they live where they like, that their dealings
with the other sex are entirely their own affair, and are not regulated
or recognised in any way by the temple authorities, and that it is
only, so to speak, an accident that these girls enter into commercial
relations with men—generally, it is admitted, with the wealthier of
those who attend the services—an accident of course quite likely to
occur, since they are presumably good-looking, and are early forced
into publicity and out of the usual routine of domestic life. All the
same, though doubtless these things are so now, I think it may fairly
be supposed that the sexual services of these nautch girls were at
one time a recognised part of their duty to the temple to which they
were attached. Seeing indeed that so many of the religions of
antiquity are known to have recognised services of this kind, seeing
also that Hinduism did at least incorporate in itself primitive sexual
worships, and seeing that there is no reason to suppose that such
practices involved any slur in primitive times on those concerned in
them—rather the reverse—I think we have at any rate a strong
primâ-facie case. It is curious too that, even to-day, notwithstanding
the obvious drawbacks of their life, these girls are quite recognised
and accepted in Hindu families of high standing and respectability.
When marriages take place they dress the bride, put on her jewels,
and themselves act as bridesmaids; and generally speaking are
much referred to as authorities on dress. Whatever, however, may
have been the truth about the exact duties and position of the
Devadásis in old times, the four figuring away there before their
gods that night seemed to me to present but a melancholy and
effete appearance. They were small and even stunted in size, nor
could it be said that any of them were decently good-looking. The
face of the eldest—it was difficult to judge their age, but she might
have been twenty—was the most expressive, but it was thin and
exceedingly weary; the faces of the others were the faces of children
who had ceased to be children, yet to whom experience had brought
no added capacity.
These four waifs of womanhood, then, when the procession
stopped, wheeled round, and facing the god approached him with
movements which bore the remotest resemblance to a dance.
Stretching out their right hands and right feet together (in itself an
ungraceful movement) they made one step forward and to the right;
then doing the same with left hands and feet made a step in
advance to the left. After repeating this two or three times they
then, having first brought their finger points to their shoulders,
extended their arms forward towards the deity, inclining themselves
at the same time. This also was repeated, and then they moved
back much as they had advanced. After a few similar evolutions,
sometimes accompanied by chanting, they wheeled round again,
and the procession moved forwards a few yards more. Thus we
halted about half a dozen times before we completed the circuit of
the temple, and each time had a similar performance.
On coming round to the porch what might be called the third act
commenced. The platform of the bull and the god Siva was—not
without struggles—lowered to the ground so as to face the porch,
the other two gods being kept in the background; and then the four
girls, going into the temple and bringing forth little oil-lamps, walked
in single file round the image, followed by the musicians also in
single file. These latter had all through the performance kept up an
almost continuous blowing; and their veined knotted faces and
distended cheeks bore witness to the effort, not to mention the state
of our own ears! It must however in justice be said that the drone,
the flageolet, and the trumpets were tuned to the same key-note,
and their combined music alone would not have been bad; but a
chank-shell can no more be tuned than a zebra can be tamed, and
when two of these instruments together, blown by two wiry old men
obdurately swaying their heads, were added to the tumult, it seemed
not impossible that one might go giddy and perhaps become
theopneustos, at any moment.
The show was now evidently culminating. The entry of the
musicians into the temple, where their reverberations were simply
appalling, was the signal for an inrush of the populace. We passed in
with the crowd, and almost immediately Siva, lifted from the bull,
followed borne in state under his parasol. He was placed on a stand
in front of the side shrine in the forecourt already mentioned; and a
curtain being drawn before him, there was a momentary hush and
awe. The priest behind the curtain (whom from our standpoint we
could see) now made the final offerings of fruit, flowers and
sandalwood, and lighted the five-branched camphor lamp for the last
time. This burning of camphor is, like other things in the service,
emblematic. The five lights represent the five senses. As camphor
consumes itself and leaves no residue behind, so should the five
senses, being offered to God, consume themselves and disappear.
When this is done, that happens in the soul which was now figured
in the temple service; for as the last of the camphor burned itself
away the veil was swiftly drawn aside—and there stood the image of
Siva revealed in a blaze of light.
The service was now over. The priest distributed the offerings
among the people; the torches were put out; and in a few minutes I
was walking homeward through the streets and wondering if I was
really in the modern world of the 19th century.
A VISIT TO A GÑÁNI
CHAPTER VIII.
A VISIT TO A GÑÁNI.

During my stay in Ceylon I was fortunate enough to make the


acquaintance of one of the esoteric teachers of the ancient religious
mysteries. These Gurus or Adepts are to be found scattered all over
the mainland of India; but they lead a secluded existence, avoiding
the currents of Western civilisation—which are obnoxious to them—
and rarely come into contact with the English or appear on the
surface of ordinary life. They are divided into two great schools, the
Himalayan and South Indian—formed probably, even centuries back,
by the gradual retirement of the adepts into the mountains and
forests of their respective districts before the spread of foreign races
and civilisations over the general continent. The Himalayan school
has carried on the more democratic and progressive Buddhistic
tradition, while the South Indian has kept more to caste, and to the
ancient Brahmanical and later Hindu lines. This separation has led to
divergencies in philosophy, and there are even (so strong is sectional
feeling in all ranges of human activity) slight jealousies between the
adherents of the two schools; but the differences are probably after
all very superficial; in essence their teaching and their work may I
think be said to be the same.
The teacher to whom I allude belongs to the South Indian
school, and was only sojourning for a time in Ceylon. When I first
made his acquaintance he was staying in the precincts of a Hindu
temple. Passing through the garden and the arcade-like porch of the
temple with its rude and grotesque frescoes of the gods—Siva
astride the bull, Sakti, his consort, seated behind him, etc.—we
found ourselves in a side-chamber, where seated on a simple couch,
his bed and day-seat in one, was an elderly man (some seventy
years of age, though he did not look nearly as much as that)
dressed only in a white muslin wrapper wound loosely round his lithe
and even active dark brown form: his head and face shaven a day or
two past, very gentle and spiritual in expression, like the best type of
Roman Catholic priest—a very beautiful full and finely formed mouth,
straight nose and well-formed chin, dark eyes, undoubtedly the eyes
of a seer, dark-rimmed eyelids, and a powerful, prophetic, and withal
childlike manner. He soon lapsed into exposition which he continued
for an hour or two with but few interjections from his auditors.
At a later time he moved into a little cottage where for several
weeks I saw him nearly every day. Every day the same—generally
sitting on his couch, with bare arms and feet, the latter often coiled
under him—only requiring a question to launch off into a long
discourse—fluent, and even rapt, with ready and vivid illustration
and long digressions, but always returning to the point. Though
unfortunately my knowledge of Tamil was so slight that I could not
follow his conversation and had to take advantage of the services of
a friend as interpreter, still it was easy to see what a remarkable
vigor and command of language the fellow had, what power of
concentration on the subject in hand, and what a wealth of
reference—especially citation from ancient authorities—wherewith to
illustrate his discourse.
Everything in the East is different from the West, and so are the
methods of teaching. Teaching in the East is entirely authoritative
and traditional. That is its strong point and also its defect. The pupil
is not expected to ask questions of a sceptical nature or expressive
of doubt; the teacher does not go about to “prove” his thesis to the
pupil, or support it with arguments drawn from the plane of the
pupil’s intelligence; he simply re-delivers to the pupil, in a certain
order and sequence, the doctrines which were delivered to him in his
time, which have been since verified by his own experience, and
which he can illustrate by phrases and metaphors and citations
drawn from the sacred books. He has of course his own way of
presenting the whole, but the body of knowledge which he thus
hands down is purely traditional, and may have come along for
thousands of years with little or no change. Originality plays no part
in the teaching of the Indian Sages. The knowledge which they have
to impart is of a kind in which invention is not required. It purports
to be a knowledge of the original fact of the universe itself—
something behind which no man can go. The West may originate,
the West may present new views of the prime fact—the East only
seeks to give to a man that fact itself, the supreme consciousness,
undifferentiated, the key to all that exists.
The Indian teachers therefore say there are as a rule three
conditions of the attainment of Divine knowledge or gñánam:—(1)
The study of the sacred books, (2) the help of a Guru, and (3) the
verification of the tradition by one’s own experience. Without this
last the others are of course of no use; and the chief aid of the Guru
is directed to the instruction of the pupil in the methods by which he
may attain to personal experience. The sacred books give the
philosophy and some of the experiences of the gñáni or illuminated
person, but they do not, except in scattered hints, give instruction as
to how this illumination is to be obtained. The truth is, it is a
question of evolution; and it would neither be right that such
instruction should be given to everybody, nor indeed possible, since
even in the case of those prepared for it the methods must differ,
according to the idiosyncrasy and character of the pupil.
There are apparently isolated cases in which individuals attain to
Gñánam through their own spontaneous development, and without
instruction from a Guru, but these are rare. As a rule every man who
is received into the body of Adepts receives his initiation through
another Adept who himself received it from a fore-runner, and the
whole constitutes a kind of church or brotherhood with genealogical
branches so to speak—the line of adepts from which a man
descends being imparted to him on his admission into the fraternity.
I need not say that this resembles the methods of the ancient
mysteries and initiations of classic times; and indeed the Indian
teachers claim that the Greek and Egyptian and other Western
schools of arcane lore were merely branches, more or less
degenerate, of their own.
The course of preparation for Gñánam is called yogam, and the
person who is going through this stage is called a yogi—from the
root yog, to join—one who is seeking junction with the universal
spirit. Yogis are common all over India, and exist among all classes
and in various forms. Some emaciate themselves and torture their
bodies, others seek only control over their minds, some retire into
the jungles and mountains, others frequent the cities and exhibit
themselves in the crowded fairs, others again carry on the
avocations of daily life with but little change of outward habit. Some
are humbugs, led on by vanity or greed of gain (for to give to a holy
man is highly meritorious); others are genuine students or
philosophers; some are profoundly imbued with the religious sense,
others by mere distaste for the world. The majority probably take to
a wandering life of the body, some become wandering in mind; a
great many attain to phases of clairvoyance and abnormal power of
some kind or other, and a very few become adepts of a high order.
Anyhow the matter cannot be understood unless it is realised
that this sort of religious retirement is thoroughly accepted and
acknowledged all over India, and excites no surprise or special
remark. Only some five or six years ago the son of the late Rajah of
Tanjore—a man of some forty or fifty years of age, and of course the
chief native personage in that part of India—made up his mind to
become a devotee. He one day told his friends he was going on a
railway journey, sent off his servants and carriages from the palace
to the station, saying he would follow, gave them the slip, and has
never been heard of since! His friends went to the man who was
known to have been acting as his Guru, who simply told them, “You
will never find him.” Supposing the G.O.M. or the Prince of Wales
were to retire like this—how odd it would seem!
To illustrate this subject I may tell the story of Tilleináthan
Swámy, who was the teacher of the Guru whose acquaintance I am
referring to in this chapter. Tilleináthan was a wealthy shipowner of
high family. In 1850 he devoted himself to religious exercises, till
1855, when he became “emancipated.” After his attainment he felt
sick of the world, and so he wound up his affairs, divided all his
goods and money among relations and dependents, and went off
stark naked into the woods. His mother and sisters were grieved and
repeatedly pursued him, offering to surrender all to him if he would
only return. At last he simply refused to answer their importunities,
and they desisted. He appeared in Tanjore after that in ’57, ’59, ’64
and ’72, but has not been seen since. He is supposed to be living
somewhere in the Western Ghauts.
In ’58 or ’59, at the close of the Indian Mutiny, when search was
being made for Nana Sahib, it was reported that the Nana was
hiding himself under the garb (or no garb) of an “ascetic,” and
orders were issued to detain and examine all such people.
Tilleináthan was taken and brought before the sub-magistrate at
Tanjore, who told him the Government orders, and that he must
dress himself properly. At the same time the sub-magistrate, having
a friendly feeling for T. and guessing that he would refuse
obedience, had brought a wealthy merchant with him, whom he had
persuaded to stand bail for Tilleináthan in such emergency. When
however the merchant saw Tilleináthan, he expressed his doubts
about standing bail for him—whereupon T. said, “Quite right, it is no
good your standing bail for me; the English Government itself could
not stand bail for one who creates and destroys Governments. I will
be bail for myself.” The sub-magistrate then let him go.
But on the matter being reported at head-quarters the sub. was
reprimanded, and a force, consisting of an inspector and ten men
(natives of course), was sent to take Tilleináthan. He at first refused
and threatened them, but on the inspector pleading that he would
be dismissed if he returned with empty hands T. consented to come
“in order to save the inspector.” They came into full court—as it
happened—before the collector (Morris), who immediately
reprimanded T. for his mad costume! “It is you that are mad,” said
the latter, “not to know that this is my right costume,”—and he
proceeded to explain the four degrees of Hindu probation and
emancipation. (These are, of course, the four stages of student,
householder, yogi and gñáni. Every one who becomes a gñáni must
pass through the other three stages. Each stage has its appropriate
costume and rules; the yogi wears a yellow garment; the gñáni is
emancipated from clothing, as well as from all other troubles.)
Finally T. again told the collector that he was a fool, and that he
T. would punish him. “What will you do?” said the collector. “If you
don’t do justice I will burn you,” was the reply! At this the mass of
the people in court trembled, believing no doubt implicitly in T.’s
power to fulfil his threat. The collector however told the inspector to
read the Lunacy Act to Tilleináthan, but the inspector’s hand shook
so that he could hardly see the words—till T. said, “Do not be afraid
—I will explain it to you.” He then gave a somewhat detailed account
of the Act, pointed out to the collector that it did not apply to his
own case, and ended by telling him once more that he was a fool.
The collector then let him go!
Afterwards Morris—having been blamed for letting the man go—
and Beauchamp (judge), who had been rather impressed already by
T.’s personality, went together and with an escort to the house in
Tanjore in which Tilleináthan was then staying—with an undefined
intention, apparently, of arresting him. T. then asked them if they
thought he was under their Government—to which the Englishmen
replied that they were not there to argue philosophy but to enforce
the law. T. asked how they would enforce it. “We have cannons and
men behind us,” said Morris. “And I,” said T., “can also bring cannons
and forces greater than yours.” They then left him again, and he was
no more troubled.
This story is a little disappointing in that no miracles come off,
but I tell it as it was told to me by the Guru, and my friend A. having
heard it substantially the same from other and independent
witnesses at Tanjore it may be taken as giving a fairly correct idea of
the kind of thing that occasionally happens. No doubt the collector
would look upon Tilleináthan as a “luny”—and from other stories I
have heard of him (his utter obliviousness of ordinary
conventionalities and proprieties, that he would lie down to sleep in
the middle of the street to the great inconvenience of traffic, that he
would sometimes keep on repeating a single vacant phrase over and
over again for half a day, etc.), such an opinion might, I should say,
fairly be justified. Yet at the same time there is no doubt he was a
very remarkable man, and the deep reverence with which our friend
the Guru spoke of him was obviously not accorded merely to the
abnormal powers which he seems at times to have manifested, but
to the profundity and breadth of his teaching and the personal
grandeur which prevailed through all his eccentricities.
It was a common and apparently instinctive practice with him to
speak of the great operations of Nature, the thunder, the wind, the
shining of the sun, etc., in the first person, “I”—the identification
with, or non-differentiation from the universe (which is the most
important of esoteric doctrines) being in his case complete. So also
the democratic character of his teaching surpassed even our
Western records. He would take a pariah dog—the most scorned of
creatures—and place it round his neck (compare the pictures of
Christ with a lamb in the same attitude), or even let it eat out of one
plate with himself! One day, in Tanjore, when importuned for
instruction by five or six disciples, he rose up and saying, “Follow
me,” went through the streets to the edge of a brook which divided
the pariah village from the town—a line which no Hindu of caste will
ever cross—and stepping over the brook bade them enter the defiled
ground. This ordeal however his followers could not endure, and—
except one—they all left him.
Tilleináthan’s pupil, the teacher of whom I am presently
speaking, is married and has a wife and children. Most of these
“ascetics” think nothing of abandoning their families when the call
comes to them, and of going to the woods perhaps never to be seen
again. He however has not done this, but lives on quietly at home at
Tanjore. Thirty or forty years ago he was a kind of confidential friend
and adviser to the then reigning prince of Tanjore, and was well up
in traditional state-craft and politics; and even only two or three
years ago took quite an active interest in the National Indian
Congress. His own name was Ramaswámy, but he acquired the
name Elúkhanam, “the Grammarian,” on account of his proficiency in
Tamil grammar and philosophy, on which subject he was quite an
authority, even before his initiation.
Tamil is a very remarkable, and indeed complex language—
rivaling the Sanskrit in the latter respect. It belongs to the Dravidian
group, and has few Aryan roots in it except what have been
borrowed from Sanskrit. It contains however an extraordinary
number of philosophical terms, of which some are Sanskrit in their
origin, but many are entirely its own; and like the people it forms a
strange blend of practical qualities with the most inveterate
occultism. “Tamil,” says the author of an article in the Theosophist
for November, ’90, “is one of the oldest languages of India, if not of
the world. Its birth and infancy are enveloped in mythology. As in
the case of Sanskrit, we cannot say when Tamil became a literary
language. The oldest Tamil works extant belong to a time, about
2,000 years ago, of high and cultured refinement in Tamil poetic
literature. All the religious and philosophical poetry of Sanskrit has
become fused into Tamil, which language contains a larger number
of popular treatises in Occultism, Alchemy, etc., than even Sanskrit;
and it is now the only spoken language of India that abounds in
occult treatises on various subjects.” Going on to speak of the Tamil
Adepts, the author of this article says: “The popular belief is that
there were eighteen brotherhoods of Adepts scattered here and
there, in the mountains and forests of the Tamil country, and
presided over by eighteen Sadhoos; and that there was a grand
secret brotherhood composed of the eighteen Sadhoos, holding its
meetings in the hills of the Agasthya Kútam in the Tinnevelly district.
Since the advent of the English and their mountaineering and
deforestation, these occultists have retired far into the interior of the
thick jungles on the mountains; and a large number have, it is
believed, altogether left these parts for more congenial places in the
Himalayan ranges. It is owing to their influence that the Tamil
language has been inundated, as it were, with a vast number of
3
works on esoteric philosophy. The works of Agasthya Muni alone
would fill a whole library. The chief and only object of these
brotherhoods has been to popularise esoteric truths and bring them
home to the masses. So great and so extensive is their influence
that the Tamil literature is permeated with esoteric truths in all its
ramifications.” In fact the object of this article is to point out the vast
number of proverbs and popular songs, circulating among the Tamils
to-day, which conceal under frivolous guise the most profound
mystic truths. The grammar too—as I suppose was the case in
Sanskrit—is linked to the occult philosophy of the people.

3
Or those ascribed to him.

To return to the Teacher, besides state-craft and grammar he is


well versed in matters of law, and not unfrequently tackles a
question of this kind for the help of his friends; and has some
practical knowledge of medicine, as well as of cookery, which he
considers important in its relation to health (the divine health,
Sukham). It will thus be seen that he is a man of good practical
ability and acquaintance with the world, and not a mere dreamer, as
is too often assumed by Western critics to be the case with all those
who seek the hidden knowledge of the East. In fact it is one of the
remarkable points of the Hindu philosophy that practical knowledge
of life is expressly inculcated as a preliminary stage to initiation. A
man must be a householder before he becomes a yogi; and
familiarity with sexual experience instead of being reprobated, is
rather encouraged, in order that having experienced one may in time
pass beyond it. Indeed it is not unfrequently maintained that the
early marriage of the Hindus is advantageous in this respect, since a
couple married at the age of fifteen or sixteen have by the time they
are forty a grown-up family launched in life, and having circled
worldly experience are then free to dedicate themselves to the work
of “emancipation.”
During his yoga period, which lasted about three years, his wife
was very good to him and assisted him all she could. He was
enjoined by his own teacher to refrain from speech and did so for
about a year and a half, passing most of his time in fixed attitudes of
meditation, and only clapping his hands when he wanted food, etc.
Hardly anything shows more strongly the hold which these religious
ideas have upon the people than the common willingness of the
women to help their husbands in works of this kind, which beside
the sore inconvenience of them, often deprive the family of its very
means of subsistence and leave it dependent on the help of relations
and others. But so it is. It is difficult for a Westerner even to begin to
realise the conditions and inspirations of life in the East.
Refraint from speech is not a necessary condition of initiation,
but it is enjoined in some cases. (There might be a good many cases
among the Westerners where it would be very desirable—with or
without initiation!) “Many practising,” said the Guru one day, “have
not spoken for twelve years—so that when freed they had lost the
power of speech—babbled like babies—and took some time to
recover it. But for two or three years you experience no disability.”
“During my initiation,” he added, “I often wandered about the woods
all night, and many times saw wild beasts, but they never harmed
me—as indeed they cannot harm the initiated.”
At the present time he lives (when at home) a secluded life,
mostly absorbed in trance conditions—his chief external interest no
doubt being the teaching of such people as are led to him, or he is
led to instruct. When however he takes up any practical work he
throws himself into it with that power and concentration which is
peculiar to a “Master,” and which is the natural corollary of the
power, of abstraction when healthily used.
Among their own people these Gurus often have small circles of
disciples, who receive the instruction of their master and in return
are ever ready to attend upon his wants. Sometimes such little
parties sit up all night alternately reading the sacred books and
absorbing themselves in meditation. It appears that Elukhanam’s
mother became his pupil and practised according to instructions,
making good progress. One day however she told her son that she
should die that night. “What, are you ill?” he said. “No,” she replied,
“but I feel that I shall die.” Then he asked her what she desired to
be done with her body. “Oh, tie a rope to it and throw it out into the
street,” was her reply—meaning that it did not matter—a very strong
expression, considering caste regulations on the subject. Nothing
more was said, but that night at 3 a.m. as they and some friends
were sitting up (cross-legged on the floor as usual) reading one of
the sacred books, one of those present said, “But your mother does
not move,”—and she was dead.
When in Ceylon our friend was only staying temporarily in a
cottage, with a servant to look after him, and though exceedingly
animated and vigorous as I have described, when once embarked in
exposition—capable of maintaining his discourse for hours with
unflagging concentration—yet the moment such external call upon
his faculties was at an end, the interest that it had excited seemed
to be entirely wiped from his mind; and the latter returned to that
state of interior meditation and absorption in the contemplation of
the world disclosed to the inner sense, which had apparently
become his normal condition.
I was in fact struck, and perhaps a little shocked, by the want of
interest in things and persons around him displayed by the great
man—not that, as I have said, he was not very helpful and
considerate in special cases—but evidently that part of his nature
which held him to the actual world was thinning out; and the
personalities of attendants and of those he might have casual
dealings with, or even the scenes and changes of external nature,
excited in him only the faintest response.
As I have said he seemed to spend the greater part of the
twenty-four hours wrapt in contemplation, and this not in the woods,
but in the interior of his own apartment. As a rule he only took a
brief half-hour’s walk mornings and evenings, just along the road
and back again, and this was the only time he passed out of doors.
Certainly this utter independence of external conditions—the very
small amount of food and exercise and even of sleep that he took,
combined with the great vigor that he was capable of putting forth
on occasion both bodily and mentally, and the perfect control he had
over his faculties—all seemed to suggest the idea of his having
access to some interior source of strength and nourishment. And
indeed the general doctrine that the gñáni can thus attain to
independence and maintain his body from interior sources alone (eat
of the “hidden manna”) is one much cherished by the Hindus, and
which our friend was never tired of insisting on.
Finally, his face, while showing the attributes of the seer, the
externally penetrating quick eye, and the expression of illumination—
the deep mystic light within—showed also the prevailing sentiment
of happiness behind it. Sandósiam, Sandósiam eppótham—“Joy,
always joy”—was his own expression, oft repeated.
Perhaps I have now said enough to show—what of course was
sufficiently evident to me—that, however it may be disguised under
trivial or even in some cases repellent coverings, there is some
reality beneath all these—some body of real experience, of no little
value and importance, which has been attained in India by a portion
at any rate of those who have claimed it, and which has been
handed down now through a vast number of centuries among the
Hindu peoples as their most cherished and precious possession.
CHAPTER IX.
CONSCIOUSNESS WITHOUT THOUGHT.

The question is, What is this experience? or rather—since an


experience can really only be known to the person who experiences
it—we may ask, “What is the nature of this experience?” And in
trying to indicate an answer of some kind to this question I feel
considerable diffidence, just for the very reason (for one) already
mentioned—namely that it is so difficult or impossible for one person
to give a true account of an experience which has occurred to
another. If I could give the exact words of the teacher, without any
bias derived either from myself or the interpreting friend, the case
might be different; but that I cannot pretend to do; and if I could,
the old-world scientific forms in which his thoughts were cast would
probably only prove a stumbling-block and a source of confusion,
instead of a help, to the reader. Indeed, even in the case of the
sacred books, where we have a good deal of accessible and
authoritative information, Western critics though for the most part
agreeing that there is some real experience underlying, are sadly at
variance as to what that experience may be.
For these reasons I prefer not to attempt or pretend to give the
exact teaching, unbiassed, of the Indian Gurus, or their experiences;
but only to indicate as far as I can, in my own words, and in modern
thought-forms, what I take to be the direction in which we must look
for this ancient and world-old knowledge which has had so
stupendous an influence in the East, and which indeed is still the
main mark of its difference from the West.
And first let me guard against an error which is likely to arise. It
is very easy to assume, and very frequently assumed, in any case
where a person is credited with the possession of an unusual faculty,
that such person is at once lifted out of our sphere into a
supernatural region, and possesses every faculty of that region. If
for instance he or she is or is supposed to be clairvoyant, it is
assumed that everything is or ought to be known to them; or if the
person has shown what seems a miraculous power at any time or in
any case, it is asked by way of discredit why he or she did not show
a like power at other times or in other cases. Against all such hasty
generalisations it is necessary to guard ourselves. If there is a higher
form of consciousness attainable by man than that which he for the
most part can claim at present, it is probable, nay certain, that it is
evolving and will evolve but slowly, and with many a slip and
hesitant pause by the way. In the far past of man and the animals
consciousness of sensation and consciousness of self have been
successively evolved—each of these mighty growths with
innumerable branches and branchlets continually spreading. At any
point in this vast experience, a new growth, a new form of
consciousness, might well have seemed miraculous. What could be
more marvelous than the first revealment of the sense of sight, what
more inconceivable to those who had not experienced it, and what
more certain than that the first use of this faculty must have been
fraught with delusion and error? Yet there may be an inner vision
which again transcends sight, even as far as sight transcends touch.
It is more than probable that in the hidden births of time there lurks
a consciousness which is not the consciousness of sensation and
which is not the consciousness of self—or at least which includes
and entirely surpasses these—a consciousness in which the contrast
between the ego and the external world, and the distinction
between subject and object, fall away. The part of the world into
which such a consciousness admits us (call it supramundane or
whatever you will) is probably at least as vast and complex as the
part we know, and progress in that region at least equally slow and
tentative and various, laborious, discontinuous, and uncertain. There
is no sudden leap out of the back parlor onto Olympus; and the
routes, when found, from one to the other, are long and bewildering
in their variety.

You might also like