Informatics: Information Sources
Informatics: Information Sources
Lecture 2
Information Sources
Introduction
This lecture is concerned with the
sources available to us for the huge
variety of information that could be
useful.
Later lectures will consider how to
process the information and prioritise
it.
To begin with consider why we are
collecting information in the first place
Why
We are presumably collecting data in
order to inform a decision that has to
be made by an organisation,
individual or government:
What products to manufacture
What items to market and to who
Make financial projections
Should we monitor an individual
Should we arrest an individual
And so.
The issues associated with this are
varied:
There is often no shortage of data
There is often too much
Its rare that the decision is obvious
Some data will contradict
Some data is simply incorrect
Some data is being hidden from us
Type of content
The range of data sources is very
varied and so as a result is the range
of type of data a limited number of
examples could include:
Numeric quantities to calculate
totals and averages
Text names of people and
objects, significant words
Type of content
Locations and times derived from
GPS data
Video records of events place
people in time and place
Photos also place people,
relationships and identify locations
Narrative opinions and views
Dialogue deception, grooming
Fidelity of data
It is often thought that digital (binary)
data can be stored and transmitted
without any loss of fidelity
We must remember however that a
lot of data originates either from
sensors (e.g. cameras) or is
compressed to reduce file size
This can lead to a loss in fidelity and
so uncertainty
Example - CCTV
The UK is awash with CCTV cameras
but many of them are of such poor
quality that identifying an individual is
very difficult
This is improving and high quality HD
cameras are becoming more
affordable and less storage is being
done on recycled VHS tapes
CCTV
Audio
People can be recognised from
their voice and words can be
identified from the dialogue
Most speech (phone etc.) is heavily
compressed to save space and this
can compromise the processing
Of course it is possible to
deliberately disguise a voice with
gadgets made for the purpose
Audio - dialogue
It is possible to capture a
conversation and analyse this for a
number of items of interest:
Age of the speaker
Nationality accents etc.
Angry, stressed, frightened..
Expressing a view or opinion
Being ironic etc.
Ethics
Of course it is possible to gather data
that is in the public domain and this is
increasingly useful
Most governments can covertly
monitor their citizens sometimes
after a legal application
There are of course ethical issues in
gathering data without the knowledge
of the individual see later lectures
Ethics
Official hacking
One important source of data is of
course that obtained by statesponsored hacking
In this way many nations are turning
to an offensive mode of dealing with
cybercrime
The Flame virus gathers data and is
20x more sophisticated than Stuxnet
Intelligence analysis
Within both the security and
commercial worlds the analysis of the
masses of data, to extract meaning,
to inform decisions, is becoming more
sophisticated
The remainder of this lecture
considers data sources from the
intelligence analysis viewpoint
Literal Sources
In a form suitable for human
communication.
Open Source
Human
Communication
Cyber
Online databases
Overload, how to extract useful information
Commercial
Imagery satellites
Commercial databases
All for a price!
archives
Human - HUMINT
HUMINT focuses on humans and their access
to information (takes time to acquire)
Often best method of dealing with illicit
networks or for finding:
Opponents plans, trade secrets, certain indicators
And tip-offs
Communication COMINT
Generally a governmental thing rather than
private (illegal)
The interception, processing and reporting of an
opponents communications
E.g. voice, fax, data comms, internet, any other
deliberate transmission
Collected by aircraft, satellites, ground bases, sea etc.
Insights into plans/intentions (people, organisations,
financial, facilities, budgets, procedures etc)
Relationships? Classified projects?
Cyber
Collection from an information system or
network (a mix of humint/comint/osint)
Becoming a rich source of intelligence
E.g. target personnel databases for personal
information and possible recruitment as
HUMINT
Low risk of obtaining it rather than spying
The hacker is the offender (and usually
wins), defense is much harder
Does the defender think like the attacker?
Large systems have more vulnerabilities
Cyber
Gain access, exploit with tools, remove
evidence
Survey possible networks, ping a network
(for vulnerabilities), hack it (install software
backdoors), use backdoors to sustain
collection
Sustained collection uses:
Trojan horses
Worms (entirely concealed)
Rootkits (software to avoid detection)
keystroke loggers
Nonliteral Sources
Require human interpretation
Imaging IMINT
Visible Photography
Camera, aircraft, spacecraft
Open source
Ever zoomed into your own house (or someone
elses) using Google Earth?
Photography/video (handheld)
Imaging radar (mounted on craft)
Electro-optical imaging (not good when cloudy)
Radiometry/spectrometry (heat related
emission)
Spectral imaging (combines above two)
Radar
Tracking of targets - satellites,
missiles, ships, aircraft, other
vehicles in combat
E.g. a missile trajectory can be
detected by radar
We are all familiar (and thankful) for
radar navigation during inclement
weather
Although the quality of radar imagery
does not match optical it is very good
in poor visual conditions
Geophysical / Nuclear
Collection, processing, exploitation of
environmental disturbances transmitted
through earth, water or atmosphere
E.g. magnetic sensing of vehicles,
submarines
Materiel/ Materials
Usually clandestine and HUMINT
Materials
Particulate, trace elements, effluence,
debris
Nuclear, chemical, biological issues
DNA
Materiel
Equipment, apparatus, supplies
Stealing competitors sample products
Biometrics
Capturing of a persons physical or behavioural
characteristics that identify them
Morse code did this in World War II fist
Summary
Categorisation of data sources
Primary, secondary, tertiary
Storage
Databases
Data warehouses
Social media
Readings
R Clark Intelligence Analysis
Chapter 6 for security part of lecture