More than 45 % of the pages that we visit on the Web are pages that we have visited before. Browsers support revisits with various tools, including bookmarks, history views and URL auto-completion. However, these tools only support revisits to a small number of frequently and recently visited pages. Several browser plugins and extensions have been proposed to better support the long tail of less frequently visited pages, using recommendation and prediction techniques. In this article, we present a systematic overview of revisitation prediction techniques, distinguishing them into two main types and several subtypes. We also explain how the individual prediction techniques can be combined into comprehensive revisitation workflows that achieve higher accuracy. We investigate the performance of the most important workflows and provide a statistical analysis of the factors that affect their predictive accuracy. Further, we provide an upper bound for the accuracy of revisitation prediction using an ‘oracle’ that discards non-revisited pages.

As explained in Sect. 4.1.4, this is ensured through normalization.
As explained in Sect. 4.1.4, this is ensured through normalization.
As explained in Sect. 4.1.4, this is ensured through normalization.
For more details, see https://developer.mozilla.org/en/The_Places_frecency_algorithm.
A temporal unit is measured in milliseconds and expresses any time interval, ranging from seconds, minutes and hours to days, weeks and months.
SUPRA stands for “SUrfing PRediction FrAmework”. The code is publicly available at http://sourceforge.net/projects/supraproject.
Appendix: Notations and Acronyms
In the following, we summarize the symbols used in this work:
5HR \(\rightarrow \) the 500-requests drift method
AM \(\rightarrow \) an association matrix (order-neutral propagation method)
AR \(\rightarrow \) association rules
CTM \(\rightarrow \) the continuous connectivity transition matrix (propagation method)
DM \(\rightarrow \) the day-model drift method
DTM \(\rightarrow \) the decreasing continuous connectivity transition matrix (propagation method)
ED \(\rightarrow \) the exponential decay ranking method
FR \(\rightarrow \) the Frecency ranking method
HDM \(\rightarrow \) the hybrid day model (ranking method)
HHM \(\rightarrow \) the hybrid hour model (ranking method)
HQM \(\rightarrow \) the hybrid day quarter model (ranking method)
\({\mathbf {I}}_{\mathbf{p}_{\mathbf{i}}}\) \(\rightarrow \) the request indices of a page \(p_i\)
ITM \(\rightarrow \) the increasing continuous connectivity transition matrix (propagation method)
LD \(\rightarrow \) the logarithmic decay ranking method
LRU \(\rightarrow \) the last recently used ranking method
M \(\rightarrow \) the propagation matrix
MFU \(\rightarrow \) the most frequently used ranking method
MM \(\rightarrow \) the month-model drift method
\({\mathbf {P}}\) \(\rightarrow \) a set of Web pages
P \(\rightarrow \) the category of one-step workflows that consist solely of a propagation method
P+D \(\rightarrow \) the category of two-step workflows that combine a propagation method with a drift method
PD \(\rightarrow \) the polynomial decay ranking method
\({\mathbf {R}}\) \(\rightarrow \) a set of page requests corresponding to the navigational activity of a user
R \(\rightarrow \) the category of one-step workflows that consist solely of a ranking method
R+P \(\rightarrow \) the category of two-step workflows that combine a ranking method with a propagation method
R+P+D \(\rightarrow \) the category of three-step workflows that combine a ranking method with a propagation and a drift method
\({\mathbf {S}}\) \(\rightarrow \) a session
STM \(\rightarrow \) the simple connectivity transition matrix (propagation method)
\({\mathbf {T}}_{\mathbf{p}_\mathbf{i}}\) \(\rightarrow \) the request timestamps of a page \(p_i\)
TM \(\rightarrow \) a transition matrix (order-preserving propagation method)
TR \(\rightarrow \) the 1000-requests drift method
WM \(\rightarrow \) the week-model drift method
