This document provides the course planner for an Information Retrieval Systems elective course offered at Bharat Institute of Engineering and Technology. The course aims to present concepts in information search and retrieval using models like vector space, Boolean, and query expansion. It discusses implementation and evaluation of retrieval algorithms and facilitates designing search tools for e-commerce websites. The course objectives are demonstrated through assignments, projects, lectures and tutorials assessing students' understanding of retrieval systems and their ability to implement algorithms and design inverted indexes.
This document provides the course planner for an Information Retrieval Systems elective course offered at Bharat Institute of Engineering and Technology. The course aims to present concepts in information search and retrieval using models like vector space, Boolean, and query expansion. It discusses implementation and evaluation of retrieval algorithms and facilitates designing search tools for e-commerce websites. The course objectives are demonstrated through assignments, projects, lectures and tutorials assessing students' understanding of retrieval systems and their ability to implement algorithms and design inverted indexes.
Original Description:
notification parts of information retrieval and search engine.
This document provides the course planner for an Information Retrieval Systems elective course offered at Bharat Institute of Engineering and Technology. The course aims to present concepts in information search and retrieval using models like vector space, Boolean, and query expansion. It discusses implementation and evaluation of retrieval algorithms and facilitates designing search tools for e-commerce websites. The course objectives are demonstrated through assignments, projects, lectures and tutorials assessing students' understanding of retrieval systems and their ability to implement algorithms and design inverted indexes.
This document provides the course planner for an Information Retrieval Systems elective course offered at Bharat Institute of Engineering and Technology. The course aims to present concepts in information search and retrieval using models like vector space, Boolean, and query expansion. It discusses implementation and evaluation of retrieval algorithms and facilitates designing search tools for e-commerce websites. The course objectives are demonstrated through assignments, projects, lectures and tutorials assessing students' understanding of retrieval systems and their ability to implement algorithms and design inverted indexes.
RETRIEVAL SYSTEMS Subject Code: A70533 Regulations: R15 – JNTUH Class: IV Year B.Tech CSE I Semester
Department of Computer Science and Engineering
BHARAT INSTITUTE OF ENGINEERING AND TECHNOLOGY
Ibrahimpatnam - 501 510, Hyderabad
INFORMATION RETRIEVAL SYSTEMS (A70533) (Elective – 2) COURSE PLANNER I. COURSE OVERVIEW: The main objective of this course is to present the scientific support in the field of information search and retrieval. This course explores the fundamental relationship between information retrieval, hypermedia architectures, and semantic models, thus deploying and testing several important retrieval models such as vector space, Boolean and query expansion. It discusses implementation and evaluation issues of new algorithms like clustering, pattern searching, and stemming with advanced data/file structures, indirectly facilitating a platform to implement comprehensive catalogue of information search tools while designing an e-commerce web site II PRE-REQUISITES 1. Students must have the minimal concept of Data Base Management Systems 2. They must also have the concept of different types of algorithms used for searching data 3. They must also have the minimal knowledge of Natural language such as thesaurus, synonyms etc. to understand the concept of retrieving the textual information because text is the main data type used in Information Retrieval Systems III. COURSE OBJECTIVES: Demonstrate genesis and diversity of information retrieval situations for text and 1 hyper media. Describe hands-on experience store, and retrieve information from www using 2 semantic approaches. Demonstrate the usage of different data/file structures in building computational 3 search engines. Analyze the performance of information retrieval using advanced 4 techniques such as classification, clustering, and filtering over multimedia. Analyze ranked retrieval of a very large number of documents with hyperlinks 5 between them. Demonstrate Information visualization technologies like Cognition and perception in 6 the Internet or Web search engine. IV. COURSE OUTCOMES: S.No Description Blooms level of taxonomy Describe the objectives of information retrieval 1 Understanding systems. Describe models like vector-space, probabilistic 2 and language models to iidentify the similarity of Understanding query and document. Implement clustering algorithms like hierarchical 3 Create agglomerative clustering and k-means algorithm. Understand relevance feedback in vector space 4 Understanding model and probabilistic model. Illustrate how N-grams are used for detection and 5 Understanding, Knowledge correction of spelling errors. 6 Understand the method of Regression analysis to Understanding estimate the probability of relevance. Understand the method to construct thesauri 7 Understanding automatically and Manually. Understand natural language systems to build 8 Understanding, Knowledge semantic networks for text. Illustrate algorithms used for natural language 9 Understanding processing. Understand the measures to evaluate the 10 Understanding performance of cross language information Understand query, document and phrase 11 translation. Understanding, Knowledge
12 Design the method to build inverted index. Create
VII V. HOW PROGRAM OUTCOMES ARE . ASSESSED: Program Leve Proficiency assessed by Outcomes l PO1 Engineering knowledge: the knowledge mathematics Assignments Apply of , 3 science, engineering fundamentals, and an engineering , specialization Tutorials to the solution of complex engineering problems. PO2 Problem analysis: Identify, formulate, review research literature, and 3 Assignments analyze engineerin problems reachin substantiate complex g g d conclusions using first principles of mathematics, natural sciences, and engineering sciences. PO3 Design/development of solutions: Design solutions for complex engineering problems and design system components or 2 Mini processes that Projects meet the specified needs with appropriate consideration for the public health and safety, and the cultural, societal, and environmental considerations . PO4 Conduct investigations of complex problems: Use research-based 2 Projects knowledge and research includin design of methods g experiments, analysis and interpretation of data, and synthesis of the information to provide valid conclusions. PO5 Modern tool usage: Create, select, and apply appropriate techniques, 2 Mini resources, and modern engineering and IT tools including Projects prediction and modeling to complex engineering activities with an understanding of the limitations. PO6 The engineer and society: Apply reasoning informed by the 2 Assignments contextual knowledge to societal, health, safety, legal assess and cultural issues and the consequent responsibilities relevant to the professional engineering practice. PO7 Environment and sustainability: Understand the impact of the -- -- professional engineering in societal and environmenta solutions l contexts, and demonstrate the knowledge of, and need for sustainable development. PO8 Ethics: Apply ethical principles and commit to -- -- professional ethics and responsibilities and norms of the engineering practice. PO9 Individual and team work: Function effectively as an individual, and -- -- as a member or leader in diverse teams, and in multidisciplinary settings. PO1 Communication: Communicate effectively on complex 0 engineering activities with the engineering community and with society 2 Assignments at large, such as, being able to comprehend and write effective reports and design documentation, make effective presentations, and give and receive clear instructions. PO11 Project management and finance: Demonstrate knowledge and -- -- understanding the and management principles of engineerin and g apply these to one’s own work, as a member and leader in VI. HOW PROGRAM SPECIFIC OUTCOMES ARE ASSESSED:
Program Specific Outcomes Level Proficiency
assessed by PSO1 Professional Skills: The ability to research, understand and implement computer programs in the areas related Lectures, 3 Assignments to algorithms, system software, multimedia, web design, big data analytics, and networking for efficient analysis and design of computer-based systems of varying complexity. PSO2 Problem-Solving Skills: The ability to apply standard practices and strategies in software 3 Mini Projects project development usingopen-ended programming environments to deliver a quality product for business success. PSO3 Successful Career and Entrepreneurship: The ability to employ modern computer languages, environments, 2 Guest and platforms in creating innovative career paths, to be Lectures an entrepreneur, and a zest for higher studies. N – None S - Supportive H - Highly Related VII SYLLABUS: UNIT – I: Introduction: Retrieval strategies: vector space model, Probabilistic retrieval strategies: Simple term weights, Non binary independence model, Language models. UNIT – II: Retrieval Utilities: Relevance feedback, clustering, N-grams, Regression analysis, Thesauri. UNIT – III: Retrieval utilities: Semantic networks, parsing Cross –Language: Information Retrieval: Introduction, Crossing the Language barrier. UNIT – IV: Efficiency: Inverted Index, Query processing, Signature files, Duplicate document detection. UNIT – V: Integrating structured data and text. A historical progression, Information retrieval as relational application, Semi Structured search using a relational schema. Distributed Information Retrieval: A theoretical Model of Distributed retrieval, web search SUGGESTED BOOKS Text books: 1. David A. Grossman, OphirFrieder, Information Retrieval – Algorithms and Heuristics, Springer, 2nd Edition( Distributed by Universal Press), 2004 Reference books: 1. Gerald J Kowalski, Mark T Maybury Information Storage and Retrieval Systems: Theory and Implementation, Springer, 2004. 2. SoumenChakrabarti, Mining the Web : Discovering Knowledge from Hypertext Data, Morgan – Kaufmann Publishers, 2002. 3. Christopher D Manning, PrabhakarRaghavan, HinrichSchutze, An Introduction to Information Retrieval By Cambridge University Press, England, 2009. VIII. COURSE PLAN: Course Lectur Wee Reference Topic Learning e k s Outcomes UNIT -I 1. Introduction T1, R3 Retrieval strategies: Introduction Understandin g information 2. 1 retrieval strategies 3. vector space model With examples 4. vector space model Probabilistic retrieval strategies: 5. Introduction 6. Simple term weights 7. 2 Non binary independence model Non binary independence model, 8. Language models Mock Test #1 UNIT-II 9. Retrieval Utilities overview 10. Introduction 11. Retrieval Utilities overview T1, R3 Relevance feedback Knowledge gathering 3 about Retreival 12. Utilities and relevance feedback Tutorial / Bridge Class # 1 13. Relevance feedback 14. Relevance feedback 15. 4 clustering 16. Clustering cont’d Tutorial / Bridge Class # 2 17. N-grams 18. Regression analysis 19. 5 Regression analysis 20. Thesauri. Tutorial / Bridge Class # 3 UNIT- III 21. Retrieval utilities T1, R3 Retrieval utilities cont’d Applying and 22. examine case 6 studies 23. Case study #1 24. Case study #2 Tutorial / Bridge Class # 4 25. Semantic networks 7 26. Semantic networks cont’d Case study #1 27. 29. Case study #2 Tutorial / Bridge Class # 5 30. Parsing 31. Parsing cont’d 32. 8 Case study #1 33. Case study #2 Tutorial / Bridge Class # 6 MID-TERM #1 EXAMINATIONS (WEEK-9) X. MAPPING COURSE OBJECTIVES LEADING TO THE ACHIEVEMENT OF PROGRAM OUTCOMES AND PROGRAM SPECIFIC OUTCOMES: Program Specific Program Outcomes CO Outcomes PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2 PSO3 1 3 -- -- 3 -- -- -- -- -- 2 -- -- 2 -- -- 2 -- 3 -- -- -- -- -- -- -- -- -- -- -- 3 -- 3 -- 3 3 -- -- -- -- -- -- -- -- -- -- 2 -- 4 2 -- -- -- -- -- -- -- -- 3 -- -- 3 -- 2 5 -- -- -- 3 -- -- -- -- -- -- -- -- -- 3 -- 6 3 -- -- -- -- 2 -- -- -- -- -- 3 -- 2 -- 7 -- -- 3 -- -- -- -- -- -- -- -- -- -- 3 -- 8 -- -- -- 3 -- -- -- -- -- 3 -- -- -- 2 -- 9 3 -- -- -- -- -- -- -- -- 2 -- 3 2 -- -- 10 -- 3 2 -- -- -- -- -- -- -- -- -- -- 3 -- 11 3 -- -- -- -- -- -- -- -- 3 -- -- 3 -- -- 12 -- -- 3 -- -- -- -- -- -- -- -- -- -- 3 -- AVG 1.2 0.8 0.9 0.8 0 0.2 0 0 0 1.08 0 0.5 0.833 1.75 0.167 X QUESTION BANK Blooms Course Taxonomy S No Question Level Outcome UNIT – I Part - A (Short Answer Questions) 1 Define information retrieval system? Knowledge 1 Differentiate DBMS with information retrieval 2 system? Understand 1 3 Differentiate browsing vs. Searching? Knowledge 1 Explain your answer with relevant example Can 4 information retrieval Knowledge 1 system be related to a database management system? 5 Define briefly terms Knowledge 1 1. Precision 2. Recall 6 List 5 challenges of searching for information o the web? Knowledge 1 7 List 3 difference between data retrieval and information Knowledge 1 retrieval? Part - B (Long Answer Questions) Explain the differences between Information Retrieval 1 Systems Apply 1 and DBMS? 2 Explain similarity coefficient and determine the ranking of Knowledge 2 following documents Q:gold silver truck D1:shipment of gold damaged in a fire D2:delivery of silver arrived in a silver truck D3:shipment of gold arrived in a truck Explain the concept of simple term weights for the above 3 query Understand 2 and documents? 4 Explain inverse document frequency? Evaluate 1 5 Explain about the objectives of IRS? Understand 1 UNIT – II Part - A (Short Answer Questions) 1 Explain the purpose of retrieval utilities? Knowledge 3 2 Explain the concept of clustering as a retrieval utility? Understand 3 Explain how Relevance feedback is used to improve the 3 results Knowledge 4 of retrieval strategy? 4 Explain N-gram data structure? Knowledge 5 5 Describe regression analysis? Knowledge 6 Part - B (Long Answer Questions) 1 Explain about relevance feedback in vector space model? Understand 3 2 Explain about relevance feedback in probabilistic model? Understand 3 3 Discuss the use of manually generated thesaurus? Knowledge 5 4 Explain the concept of thesauri by constructing term-term Knowledge 3 similarity matrix? 5 Explain the approach of regression analysis to estimate the Knowledge 3 probability of relevance? Unit III / Part - A (Short Answer Questions) Discuss R-distance for calculating distance between query 1 and Understand 8 document? 2 Describe how ranking is based on constrained spreading Knowledge 8 activation? Explain how NLP is used to reduce ambiguity in 3 language? Knowledge 9 4 Define cross language information retrieval? Apply 10 5 Define query translation? Understand 11 Part - B (Long Answer Questions) 1 Explain the concept of semantic networks for automatic Create 6 relevance ranking? 2 Explain why parsing is an essential feature of information Understand 8 retrieval system? 3 Explain three different types of translations? Apply 9 4 Discuss unbalanced and structured queries approaches for Understand 10 choosing translations? 5 Explain about syntactic parsing? Understand 8 UNIT - IV Part - A (Short Answer Questions) 1 Explain index pruning? Knowledge 12 2 Explain posting list? Understand 12 3 Define document file? Understand 12 4 Describe index? Understand 13 5 Explain about I-Match? Understand 13 Part - B (Long Answer Questions) Explain methods to reorder documents prior to 1 indexing? Understand 13 2 Discuss methods to compress an inverted index? Knowledge 13 3 Define efficiency? Explain about inverted index? Knowledge 13 4 Explain about throughput-optimized compression? Create 12 5 Explain various top-down and bottom-up algorithms? Create 12 9 Describe the method for finding similar duplicates? Understand 12 Explain how signature files are used to detect 10 duplicates? Understand 12 UNIT - V Part - A (Short Answer Questions)
1 Define Data Integrity? Knowledge 14
2 Defin performance? Understand 14 Defin 3 e Portability? Understand 14 4 Explain are the extensions to SQL? Understand 14 5 List different types of User-defined Operators? Understand 14 Part - B (Long Answer Questions) 1 Explain about historical progression? Create 14 2 Discuss briefly about user-defined operators? Understand 14 3 Explain Non-first normal form approaches? Understand 14 Discuss about information retrieval as a relational 4 application? Understand 14 5 Explain about Boolean queries? Apply 14 XII OBJECTIVE QUESTIONS UNIT-I 1.Which function is primarily used to compensate for errors in spelling of words? [ ] A) Fuzzy B) Indexing C) Ranking D) Zoning 2.The _________ system that acts as a user frontend to the Retrieval are search system allows the user to browse an item in the order of the paragraphs [ ] A) OCR B) INQUERY C) DCARS D) NISO 3. The transformation from the received item to the searchable data structure is called[ ] A) Ranking B) Indexing C) Term Masking D) None 4. The process of creating term linkages at index creation time is called_______ [ ] A) Post-coordination B) Indexing C) Pre-coordination D) None 5. Concept indexing determines a ________set of concepts based upon a test set of terms and uses them as a basis for indexing all items [ ] A) Canonical B) Searching C) Associated D) Relationship UNIT-II 1. Precision is directly affected by retrieval of non-relevant items and drops to a number close to ____ 2. The rank-frequency law of Ziph is___________ 3. The format for proximity is: TERM1 within “m” “units” of TERM2 m___________ 4. The _________process is a pattern recognition process that segments the scanned in image into sub-regions 5. Under Boolean systems, the status display is a count of the number of items found by the query is____________ UNIT-III 1. An___________________ is a system that is capable of storage, retrieval, and maintenance of information. 2. The success of IRS can be measured by __________ 3. The measures associated with IRS are __________ and ______________(precision and recall) 4. AFB stands for _______________________(Automatic File Build) 5. The masking is done for single character in __________________ UNIT-IV 1. Words shares the same written form but a different meaning is known as ______________ 2. Process of creating term linkages at index creation time is called ________________ 3. The weighted systems are mostly known as ________________________ 4. SMIL stands for ______________(Synchronized Multimedia Integrated Language) 5. The process of converting the received item into searchable data structure is known as -------------------- UNIT-V 1. The structure that deals with layout of document context is __________________ 2. QBIC is abbreviated as __________________ 3. OPAC is abbreviated as _____________________ 4. Set of digital objects is treated as __________________ 5. The system used by DIALOG is ________________________ XII. RELEVANT SYLLABUS FOR GATE: Not applicable RELEVANT SYLLABUS FOR IES: Not applicable XIII WEBSITES 1. Information Storage and Retrieval Systems: Theory and Implementation By Kowalski (UNIT I to UNIT VI) 2. Modern Information Retrieval by Ricardo Beeza-Yates ( UNIT VII and UNIT VIII) XIV EXPERT DETAILS 1. Dr.S.ViswanadhaRaju,Professor of CSE JNTUHCE, JNT University Hyderabad 2. Dr. A GovardhanProfessor Computer Science & Engineering at School of InformationTechnology, Jawaharlal Nehru Technological University Hyderabad (JNTUH), India 3. Dr. B Padmaja RaniProfessor & Head, Computer Science & Engineering JNTUH College of Engineering Hyderabad (Autonomous) XV JOURNALS 1. Information Storage and Retrieval Systems: Theory and Implementation By Kowalski (UNIT I to UNIT VI) 2. Modern Information Retrieval by Ricardo Beeza-Yates ( UNIT VII and UNIT VIII) 3. International Journal of Multimedia Information Retrieval (IJMIR) 4. International Journal of Information Retrieval Research (IJIRR XVI .LIST OF TOPICS FOR STUDENT SEMINARS 1. Hypertext data structures and linkages 2. Stemming algorithms 3. Manual clustering 4. Information visualization 5. Measures of Information evaluation 6. Multimedia retrieval systems XVII .CASE STUDIES/SMALL PROJECTS Presentation on image query processing i.e. about QBIC Presentation on one of the case studies of Information Retrieval System