Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

What Is Information Retrieval (IR) ?

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

What is Information Retrieval (IR)?

1
What is information retrieval
• Gathering information from a source(s) based on
an information need usually from a query
– Major assumption - that the information need can be
specified
– Broad definition of information
• Sources of information
– Other people
– Archived information (libraries, maps, etc.)
– Radio, TV, etc.
– Web
– Nature
2
Data, information, knowledge
• Data - Facts, observations, or perceptions.
• Information - Subset of data, only including those data that
possess context, relevance, and purpose.
• Knowledge - A more simplistic view considers knowledge as
being at the highest level in a hierarchy with data (at the lowest
level) and information (at the middle level).

•Data refers to bare facts void of context.


–A telephone number.
•Information is data in context.
–A phone book.
•Knowledge is information that facilitates action.
–Recognizing that a phone number belongs to a good client,
who needs to be called once per week to get his orders.

3
How much information is there?
Yotta
• Soon most everything will be Everything
recorded and indexed Zetta
!
• Most bytes will never be seen Recorded
by humans. Gray - Microsoft All Books Exa
• Data summarization, MultiMedia
trend detection Peta
anomaly detection All books
are key technologies (words) Tera
See Mike Lesk:
How much information is there: .Movi
http://www.lesk.com/mlesk/ksg97/ksg.html
e Giga
See Lyman & Varian:
How much information A Photo
http://www.sims.berkeley.edu/research/projects/how-much-info/ Mega
A Book
24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9 nano, 6 micro, 3 milli 4 Kilo
Ideal Information Retrieval
• The answer should be:

– what is actually needed (relevant)


• IR is very concerned with relevance
– available when you want it
– available where you want it
– tailored to the user (personalization)
– your information needs anticipated

5
What is relevance?
• An answer(s) that fits your need.

6
How is IR accomplished?
• Ask someone
• Search
– Search for someone to ask
– Search for needed information
– Use a search engine
• Process of IR - queries or questions

7
Information to be retrieved
• Tacit vs explicit information
– Tacit: in someone’s mind
– Explicit: written down
• Permanent vs Impermanent information
– Conversation
– Documents (in a general sense)
• Text
• Video
• Files
• Pictures
• Data
• Both
• Assumption: it exists!
8
The information acquisition process
• Know what you want, where it is and go get it
• Ask questions to information sources as needed
(queries) - manifestation of SEARCH - and let
them suggest (rank) answers
• Have information sent to you on a regular basis
based on some predetermined information need
• Push/pull models (RSS)

9
What is SEARCH?
DEFINITIONS FROM THE WEB

• the activity of looking thoroughly in order to find something or someone


• an investigation seeking answers; "a thorough search of the ledgers revealed nothing"; "the
outcome justified the search"
• an operation that determines whether one or more of a set of items has a specified property; "they
wrote a program to do a table lookup"
• the examination of alternative hypotheses; "his search for a move that would avoid checkmate was
unsuccessful"
• try to locate or discover, or try to establish the existence of; "The police are searching for clues";
"They are searching for the missing man in the entire county"
• To request the electronic retrieval of documents based on the presence of specific terms and within
other restrictions established (e.g., subject, date, journal, etc.). Search results list The list of
documents retrieved as a result of a search request submitted. Settings The record of the personal
details related to an individual user, containing information such as, name, address, e-mail, and
display preferences (if available), etc. Settings are used to set up a personal profile for the user, and
are available only on systems that have user/password authentication.
• Intelligently seeking answers to a known or unknown question, often as part
of solving a larger problem (AI, planning, strategy, etc.)

11
What IR is usually not about
• Not about structured data (databases)
– Why?
– Grow of structured data?
• Retrieval from databases is usually not considered
– Database querying assumes that the data is in a
standardized format
– Transforming all information, news articles, web sites
into a database format is difficult for large data
collections
• INTEGRATED IR

12
What an IR system should do
• Store/archive information
• Provide access to that information
• Answer queries with relevant information
• Stay current
• Future list
– Understand the user’s queries
– Understand the user’s need
– Acts as an assistant

14
What is relevance?
• In IR relevance is everything
• Relevance information is that suited to your
information need.
• Dependent on
– User
– Space/time
– Group
– Context
• Examples?
15
How good is the IR system
Measures of performance based on what the system
returns:
• Relevance
• Coverage
• Recency
• Functionality (e.g. query syntax)
• Speed
• Availability
• Usability
• Time/ability to satisfy user requests

16
How IR systems work
Algorithms implemented in software
• Gathering of information
• Storage of information
• Indexing
• Interaction
• Evaluation

17
IR is an Iterative Process

Goals Repositories

Workspace

18
User’s
Information
Need

text input

Parse Query
Collections

Pre-process

Index
User’s
Information Collections
Need

Pre-process
text input

Parse Query Index

Rank or Match
User’s
Information Collections
Need

Pre-process
text input

Parse Query Index

Rank or Match

Query Reformulation
2
3 23

You might also like