Introduction of IR Models
Introduction of IR Models
Introduction of IR Models
IR Models
Introduction of IR Models
At the end of this chapter every student must able to:
– Probabilistic models
1. Boolean model
• Consider a set of five docs and assume that they contain the terms
shown in the table
Doc. Terms
D1 Algorithm, information, retrieval
D2 Retrieval, science
D3 Algorithm, information, science
D4 Pattern, retrieval, science
D5 Science, algorithm
Then :
3. Term weighting (tf*idf)
Term weighting is the assignment of numerical values to
terms that represent their importance in a document in
order to improve retrieval effectiveness
Probabilistic IR models are among the oldest, but also among the best-
• performing and most widely used IR models
Probability ranking principle
– Rd,q = 0 otherwise
b/ P(R/Y) +p (NR/Y) =1
P(R/Y) = 1-p (NR/Y)
P(R/Y) = 1-0.3
No of documents r N-r n
containing term tk
Based on this,
r/R-r=n-r/N-n-R+r
Now we can calculate the relevance function as:
RF(W) = r+0.5(N-n-R+r+0.5)
(R-r+0.5)(n-r+0.5),
RF(W) = 5+0.5(20-10-15+5+0.5)
(15-5+0.5)(10-5+0.5),
=0.095 (the probability that there is relevant document)
BM25
• Stands for Best Match
Di=1 ri ni-ri ni
Total R N-R N
Where :
• N=1,000,000
• No relevance information(r and R=0)
• ‘information’ occurs in 500,000 documents(n1=500,000)
• ‘system’ occurs in 10,000 documents(n2=500,000)
BM25(Q,D)=0.353535
CHAPTER
4