This document provides an introduction to information retrieval, covering early developments in libraries and indexing, the key goal of retrieving relevant information while limiting non-relevant results, and how users search by translating information needs into queries or browse collections. It discusses how IR has grown in importance with the rise of the web and digital libraries, and the basic components of an IR system including indexing, querying, and ranking results by relevance.
This document provides an introduction to information retrieval, covering early developments in libraries and indexing, the key goal of retrieving relevant information while limiting non-relevant results, and how users search by translating information needs into queries or browse collections. It discusses how IR has grown in importance with the rise of the web and digital libraries, and the basic components of an IR system including indexing, querying, and ranking results by relevance.
This document provides an introduction to information retrieval, covering early developments in libraries and indexing, the key goal of retrieving relevant information while limiting non-relevant results, and how users search by translating information needs into queries or browse collections. It discusses how IR has grown in importance with the rise of the web and digital libraries, and the basic components of an IR system including indexing, querying, and ranking results by relevance.
This document provides an introduction to information retrieval, covering early developments in libraries and indexing, the key goal of retrieving relevant information while limiting non-relevant results, and how users search by translating information needs into queries or browse collections. It discusses how IR has grown in importance with the rise of the web and digital libraries, and the basic components of an IR system including indexing, querying, and ranking results by relevance.
Download as PPTX, PDF, TXT or read online from Scribd
Download as pptx, pdf, or txt
You are on page 1of 12
Information Retrieval : 1
Introduction to IR
Prof Neeraj Bhargava
Vaibhav Khanna Department of Computer Science School of Engineering and Systems Sciences Maharshi Dayanand Saraswati University Ajmer Learning objectives of IR Series • Introduction: Motivation, Basic concepts, past, present, and future, the retrieval process. • • Modeling: Introduction, A taxonomy of information retrieval models, retrieval: ad hoc and filtering, a formal characterization of IR models, classic information retrieval, alternative set theoretic models, alternative algebraic models, alternative probabilistic models, structured text retrieval models, models for browsing. • • Retrieval Evaluation: Introduction, retrieval performance evaluation, reference collections. query • Languages: Introduction, keyword-based querying, Pattern matching, Structural queries, Query protocols. • • Query Operations: Introduction, user relevance feedback, automatic local analysis, automatic global analysis. • • Text and multimedia languages and Properties: Introduction, metadata, text, markup languages, • Indexing and searching: Introduction; inverted files; other indices for text; Boolean queries; sequential searching; pattern matching; structural queries; compression. • • Searching the Web: Introduction, challenges, characterizing the web, search engines, browsing, meta searchers, finding the needle in the haystack, searching using hyperlinks. Architecture of the IR System Information Retrieval (IR) • IR deals with the representation, storage, organization of, and access to information items • Types of information items: documents, Web pages, online catalogs, structured records, multimedia objects • Early goals of the IR area: indexing text and searching for useful documents in a collection • Nowadays, research in IR includes: – Modeling, Web search, text classification, systems architecture, user interfaces, data visualization, filtering and languages. Early Developments • For more than 5,000 years, man has organized information for later retrieval and searching • This has been done by compiling, storing, organizing, and indexing papyrus, hieroglyphics, and books • For holding the various items, special purpose buildings called libraries, or bibliothekes, are used • The oldest known library was created in Elba, in the Fertile Crescent, between 3,000 and 2,500 BC • Since the volume of information in libraries is always growing, it is necessary to build specialized data structures for fast search — the indexes Libraries and Digital Libraries • For centuries indexes have been created manually as sets of categories, with labels associated with each category • The advent of modern computers has allowed the construction of large indexes automatically • Libraries were among the first institutions to adopt IR systems for retrieving information • Initially, such systems consisted of an automation of existing processes such as card catalogs searching • Increased search functionality was then added • Ex: subject headings, keywords, query operators • Nowadays, the focus has been on improved graphical interfaces, electronic forms, hypertext features IR at the Center of the Stage • Until recently, IR was an area of interest restricted mainly to librarians and information experts • A single fact changed these perceptions—the introduction of the Web, which has become the largest repository of knowledge in human history • Due to its enormous size, finding useful information on the Web usually requires running a search • And searching on the Web is all about IR and its technologies • Thus, almost overnight, IR has gained a place with other technologies at the center of the stage The IR Problem • The IR Problem • The key goal of an IR system is to retrieve all the items that are relevant to a user query, while retrieving as few non relevant items as possible • That is, the IR system must rank the information items according to a degree of relevance to the user query The User’s Task • Consider a user who seeks information on a topic of their interest • This user first translates their information need into a query, which requires specifying the words that compose the query • In this case, we say that the user is searching or querying for information of their interest • Consider now a user who has an interest that is either poorly defined or inherently broad • For instance, the user has an interest in car racing and wants to browse documents on Formula 1 In this case, we say that the user is browsing or navigating the documents of the collection The User’s Task Information × Data Retrieval • Data retrieval: the task of determining which documents of a collection contain the keywords in the user query Data retrieval system Ex: relational databases • Deals with data that has a well defined structure and semantics • A single erroneous object among a thousand retrieved objects means total failure • Data retrieval does not solve the problem of retrieving information about a subject or topic Assignments • Briefly Discuss the Architecture of the IR System