Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Information Retrieval 1 Introduction To IR

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 12

Information Retrieval : 1

Introduction to IR

Prof Neeraj Bhargava


Vaibhav Khanna
Department of Computer Science
School of Engineering and Systems Sciences
Maharshi Dayanand Saraswati University Ajmer
Learning objectives of IR Series
• Introduction: Motivation, Basic concepts, past, present, and future, the retrieval process.
•  
• Modeling: Introduction, A taxonomy of information retrieval models, retrieval: ad hoc and filtering, a formal
characterization of IR models, classic information retrieval, alternative set theoretic models, alternative
algebraic models, alternative probabilistic models, structured text retrieval models, models for browsing.
•  
• Retrieval Evaluation: Introduction, retrieval performance evaluation, reference collections. query
• Languages: Introduction, keyword-based querying, Pattern matching, Structural queries, Query protocols.
•  
• Query Operations: Introduction, user relevance feedback, automatic local analysis, automatic global analysis.
•  
• Text and multimedia languages and Properties: Introduction, metadata, text, markup languages,
• Indexing and searching: Introduction; inverted files; other indices for text; Boolean queries; sequential
searching; pattern matching; structural queries; compression.
•  
• Searching the Web: Introduction, challenges, characterizing the web, search engines, browsing, meta
searchers, finding the needle in the haystack, searching using hyperlinks.
Architecture of the IR System
Information Retrieval (IR)
• IR deals with the representation, storage, organization
of, and access to information items
• Types of information items: documents, Web pages,
online catalogs, structured records, multimedia objects
• Early goals of the IR area: indexing text and searching
for useful documents in a collection
• Nowadays, research in IR includes:
– Modeling, Web search, text classification, systems
architecture, user interfaces, data visualization, filtering and
languages.
Early Developments
• For more than 5,000 years, man has organized information for
later retrieval and searching
• This has been done by compiling, storing, organizing, and
indexing papyrus, hieroglyphics, and books
• For holding the various items, special purpose buildings called
libraries, or bibliothekes, are used
• The oldest known library was created in Elba, in the Fertile
Crescent, between 3,000 and 2,500 BC
• Since the volume of information in libraries is always growing,
it is necessary to build specialized data structures for fast
search — the indexes
Libraries and Digital Libraries
• For centuries indexes have been created manually as sets of
categories, with labels associated with each category
• The advent of modern computers has allowed the construction of
large indexes automatically
• Libraries were among the first institutions to adopt IR systems for
retrieving information
• Initially, such systems consisted of an automation of existing
processes such as card catalogs searching
• Increased search functionality was then added
• Ex: subject headings, keywords, query operators
• Nowadays, the focus has been on improved graphical interfaces,
electronic forms, hypertext features
IR at the Center of the Stage
• Until recently, IR was an area of interest restricted mainly to
librarians and information experts
• A single fact changed these perceptions—the introduction of
the Web, which has become the largest repository of
knowledge in human history
• Due to its enormous size, finding useful information on the
Web usually requires running a search
• And searching on the Web is all about IR and its technologies
• Thus, almost overnight, IR has gained a place with other
technologies at the center of the stage
The IR Problem
• The IR Problem
• The key goal of an IR system is to retrieve all
the items that are relevant to a user query,
while retrieving as few non relevant items as
possible
• That is, the IR system must rank the
information items according to a degree of
relevance to the user query
The User’s Task
• Consider a user who seeks information on a topic of their
interest
• This user first translates their information need into a query,
which requires specifying the words that compose the query
• In this case, we say that the user is searching or querying for
information of their interest
• Consider now a user who has an interest that is either poorly
defined or inherently broad
• For instance, the user has an interest in car racing and wants to
browse documents on Formula 1 In this case, we say that the
user is browsing or navigating the documents of the collection
The User’s Task
Information × Data Retrieval
• Data retrieval: the task of determining which
documents of a collection contain the keywords in the
user query Data retrieval system Ex: relational
databases
• Deals with data that has a well defined structure and
semantics
• A single erroneous object among a thousand retrieved
objects means total failure
• Data retrieval does not solve the problem of retrieving
information about a subject or topic
Assignments
• Briefly Discuss the Architecture of the IR
System

You might also like