01 Intro
01 Intro
01 Intro
Introduction
Instructor:
Walid Magdy
23-Sep-2020
Lecture Objectives
• Know about the course:
• Topic
• Objectives
• Requirements
• Format
• Logistics
• Note:
• No much technical content today
• Don’t assume next lectures would be the same!
2
Walid Magdy, TTDS 2020/2021
1
9/23/2020
3
Walid Magdy, TTDS 2020/2021
IR is NOT just
Web search
4
Walid Magdy, TTDS 2020/2021
2
9/23/2020
What is IR?
Speech - QA
5
Walid Magdy, TTDS 2020/2021
What is IR?
Information
Filtering
Recommendation
Social search
6
Walid Magdy, TTDS 2020/2021
3
9/23/2020
What is IR?
What is IR?
Legal search
8
Walid Magdy, TTDS 2020/2021
4
9/23/2020
What is IR?
Cross-Language search
9
Walid Magdy, TTDS 2020/2021
What is IR?
10
5
9/23/2020
What is IR?
Advertising
Query suggestion
/ correction
Snippet selection
/ summarisation
Categorisation
(search verticals)
11
Walid Magdy, TTDS 2020/2021
11
IR ≠ Find
• Sequential
• Exact match
12
Walid Magdy, TTDS 2020/2021
12
6
9/23/2020
What is IR?
• IR is finding material of an unstructured nature that
satisfies an information need from within large
collections
• Find → Task
• Unstructured → Nature
• Information need → Target
• Satisfies → Evaluation
13
Walid Magdy, TTDS 2020/2021
13
Text classification
14
Walid Magdy, TTDS 2020/2021
14
7
9/23/2020
Text classification
15
Walid Magdy, TTDS 2020/2021
15
Text classification
16
Walid Magdy, TTDS 2020/2021
16
8
9/23/2020
17
Walid Magdy, TTDS 2020/2021
17
18
9
9/23/2020
19
Walid Magdy, TTDS 2020/2021
19
20
Walid Magdy, TTDS 2020/2021
20
10
9/23/2020
21
Walid Magdy, TTDS 2020/2021
21
Pre-requests (1/3)
• Maths requirements:
• Linear algebra: vectors/matrices (addition, multiplication, inverse,
projections ... etc).
• Probability theory: Discrete and continuous univariate random variables.
Bayes rule. Expectation, variance. Univariate Gaussian distribution.
• Calculus: Functions of several variables. Partial differentiation. Multivariate
maxima and minima.
• Special functions: Log, Exp, Ln.
22
Walid Magdy, TTDS 2020/2021
22
11
9/23/2020
Pre-requests (2/3)
• Programming requirements:
• Python or Perl
• Knowledge in regular expressions
• Shell commands (cat, sort, grep, uniq, sed, ...)
• Data structures and software engineering for course project.
23
Walid Magdy, TTDS 2020/2021
23
Pre-requests (3/3)
• Team-work requirement:
• Final course project would be in groups of 5-6 students.
• Working in a team for the project is a requirement.
• No exceptions will be allowed!
24
Walid Magdy, TTDS 2020/2021
24
12
9/23/2020
25
Walid Magdy, TTDS 2020/2021
25
Course Structure
• 20 Lectures:
• 2 lectures → Introduction (today)
• 14 lectures → IR (50% practical lectures)
• 4 lectures → Text Analytics/Classification
• 8-10 Labs:
• Practice what you learn
• No Tutorials
• Some self-reading
• Lots of system implementation
• Few online videos
26
Walid Magdy, TTDS 2020/2021
26
13
9/23/2020
Course Instructors
+ 1 guest lecture
27
Walid Magdy, TTDS 2020/2021
27
Lecture Format
• 2 Lectures at a time
• Questions are allowed any time. Feel free to interrupt
• 5-10 mins break after L1
• Feel free to go out and come back
• Discuss 1st lecture with friends
• Questions on L1 are allowed before starting L2
• Mind teaser math problem (for fun)
• Some lectures are interactive. Please participate
• Some lectures will include demos (running code)
• 2 tutorial lectures about using tools
28
Walid Magdy, TTDS 2020/2021
28
14
9/23/2020
Labs
• Online!
• How it will work?
• Relevant lab will be announced with each lecture on Wednesday
• You should implement lab directly after lecture
• Any issues → ask on Piazza (tag question by lab number)
• Produced output → Share on Piazza (publicly)
• Demonstrators → answer questions + validate your output
• DO NOT ask a question before checking if it was asked before
• Tuesdays → Optional Teams meetings for those still require support
29
Walid Magdy, TTDS 2020/2021
29
Assessments
• Coursework 1: 10%
The same as labs 1-3 → Build your first search engine
• Coursework 2: 20%
IR Evaluation, Text classification/analytics
• Group project: 40%
A full running search engine supported by text technologies
• Final Exam: 30%
30
Walid Magdy, TTDS 2020/2021
30
15
9/23/2020
Group Project
• The largest weight: 40% of the total mark
• Teamwork → Group 5-6 (you select your group)
• Design a full end-to-end search engine that searches a large
collection of documents with many functionalities.
• Mark:
• 50% on project → the same for all team members
- How complete/effective/fast/nice is your search engine?
• 50% on individual contribution → different for each member
- How useful/much is your contribution? (Mark can be negative!)
31
Walid Magdy, TTDS 2020/2021
31
Example
• A search engine that retrieves quotes of movies and TV shows.
• Collection size: 77 million movie quotes
• Search options
• Phrase search of quotes
• Movie info search
• Advanced search: movie title, actor/actress, years, keywords
• Query suggestion
• Classifying results by genre
• Demo: http://167.71.139.222/
32
Walid Magdy, TTDS 2020/2021
32
16
9/23/2020
Timeline
• 2 Semesters (or one?)
Exam
Lectures Labs
Semester 1 Semester 2
W5 W11 W11
33
Logistics (1/2)
• Lectures:
• Live on 2 Wednesdays, 12.00-14.30 (some exception might occur)
• Recording will be available
• Handouts to be posted on the day of the lecture
• Course webpage:
• Link: http://www.inf.ed.ac.uk/teaching/courses/tts/
• Handouts, Labs, CW details
• Learn:
• Lecture recordings
• Deadlines
34
Walid Magdy, TTDS 2020/2021
34
17
9/23/2020
Logistics (2/2)
• Pizza:
• All communication will be there
• Questions about lectures/labs/CW are there
• Feel free to answer each other questions
• Lab support will be mainly there
• Please share your lab answers there
• Join NOW: link
• Microsoft Teams:
• Live lab support will be there
• Join NOW: link
35
Walid Magdy, TTDS 2020/2021
35
FAQ
• How the project would be managed? What if one
member does not work?
• I am not that solid in programming, should I take this
course?
• Can I audit the course?
• Anything else?
36
Walid Magdy, TTDS 2020/2021
36
18
9/23/2020
Next Lecture
• Definitions of IR main concepts
(more introduction)
37
Walid Magdy, TTDS 2020/2021
37
19