Department of Computer Science and Engineering, Box 352350 University of Washington, Seattle, WA 98195 (206) 616-1845 Fax: (206) 543-2969
Department of Computer Science and Engineering, Box 352350 University of Washington, Seattle, WA 98195 (206) 616-1845 Fax: (206) 543-2969
Department of Computer Science and Engineering, Box 352350 University of Washington, Seattle, WA 98195 (206) 616-1845 Fax: (206) 543-2969
Figure 1: Typical user access logs, these from a computer science web site. Each entry corresponds to a single request
to the server and includes originating machine, time, and URL requested. Note the series of accesses from each of
two users (one from SFSU, one from UMN).
dicult, but not impossible, questions: every quarter. Enough information is available that im-
What kinds of generalizations can we draw portant documents can be hard to nd or entirely lost
from user access patterns and what kinds of in the clutter. Imagine, however, if the site were able to
changes could we make? Suppose we maintain determine what was important and make that informa-
a web site containing information about various au- tion easiest to nd. Important pages would be available
tomobiles, organized by manufacturer. We observe from the site's front page. Important links would appear
that visitors who look at the Ford Windstar minivan at the top of the page or be highlighted. Timely infor-
page also tend to look at the Dodge Caravan and mation would be emphasized, and obsolete information
Mazda MPV minivan pages. We might therefore would be quietly moved out of the way.
create a new page for minivans, which cuts across There are several factors that make this challenge both
the existing manufacturer-based organization and appropriate and timely for the AI community. First,
provides a new view of the site. the growing popularity and complexity of the web un-
derscores the importance of the challenge. Second, vir-
How do we design a site for adaptivity? We tually all existing web sites are not adaptive, yet data
might specically design parts of the site to be to support the learning process is readily available in
changeable. For example, we might present our web server logs. Clearly, here is an opportunity for
users with a \tour guide" (as in [Armstrong et al., AI! Finally, a number of disconnected projects in ma-
1995]) and have changes to the site be presented chine learning [Armstrong et al., 1995], data mining,
as the agent's suggestions. Alternatively, we might knowledge representation, plan recognition [Kautz, 1987;
annotate our HTML with directives stating where Pollack, 1990], and user modeling [Fink et al., 1996] have
and how changes can be made. Or we may provide begun to explore aspects of the problem. Framing the
semantic information about the entire site, allowing problem explicitly in this paper could help bring these
the agent to reason about the relationships between disparate approaches together.
everything, perhaps representing the entire site as a We pose our challenge as a particular task to be ac-
database (see [Fernandez et al., 1997]). complished by any means available. Many advances
How do we eectively collaborate with a hu- in articial intelligence, both practical and theoretical,
man webmaster to suggest and justify poten- have come about in response to such task-oriented ap-
tial adaptations? Suppose the human webmaster proaches. The quest to build a better chess-playing com-
is still responsible for the nal product. Instead of puter, for example, has led to many advances in search
changing web pages directly, our system might ac- techniques (e.g., [Anantharaman et al., 1990]). The au-
cumulate observations and suggested changes and tonomous land vehicle project at CMU [Thorpe, 1990]
present them to the webmaster, clearly explaining has resulted in not only a highway-cruising vehicle but
its observations and justifying the changes it recom- also breakthroughs in vision, robotics, and neural net-
mends. works. The quest to build autonomous software agents
has similarly led to both practical and theoretical ad-
How do we move beyond one-shot learning vances. For example, the Internet Softbot project has
algorithms to web sites that continually im- yielded both deployed softbots and advances in plan-
prove with experience? Over time, our adaptive ning, knowledge representation, and machine learning
web site will accumulate a great deal of data about [Etzioni, 1996].
its users and should be able to use its rich history We believe that the goal of creating self-improving web
to continually evolve and improve. sites is a similar task: one whose accomplishment will
Our department maintains a web site for its introductory require breakthroughs in dierent areas of AI. In this
computer science course. This site contains schedules, paper we discuss possible approaches to this task and
announcements, assignments, and other information im- how to evaluate the community's progress. In section 2,
portant to the hundreds of students who take the course we present two basic approaches to creating an adaptive
web site. We illustrate both with ongoing research and bold or putting graphics around them, for example)
examples. In section 3, we discuss how to evaluate re- or as much as synthesizing a brand new page that
search on this challenge, discussing practical alternatives we think the user wants to see.
as well as open questions. Throughout, we pose Chal- The WebWatcher [Armstrong et al., 1995] (see
lenge questions intended to suggest research directions http://www.cs.cmu.edu/ webwatcher/) learns to pre-
and illustrate where the open questions lie. dict what links users will follow on a particular page
as a function of their specied interests. WebWatcher
2 Approaches to adaptive web sites observes many users over time and attempts to learn,
Sites may be adaptive in two basic ways. First, the site given a user's current page and stated interests, where
may focus on customization: modifying web pages in real she will go next. A link that WebWatcher believes you
time to suit the needs of individual users. Second, the are likely to follow will be highlighted graphically and
site may focus on optimization: altering the site itself to duplicated at the top of the page. Visitors to a site are
make navigation easier for all. We illustrate these two asked, in broad terms, what they are looking for. Be-
basic approaches with examples drawn from current AI fore they depart, they are asked if they found what they
research. Whether we modify our web pages online or of- wanted. WebWatcher uses the paths of people who indi-
ine, we must use information about user access patterns cated success as examples of successful navigations. If,
and the structure of our site. Much of this information is for example, many people who were looking for \personal
available in access logs and in the site's HTML, but this home pages" follow the \people" link, then WebWatcher
may not be sucient; we also discuss how to support will tend to highlight that link for future visitors with
adaptivity with meta-information | information about the same goal.
page content. Finally, we examine other issues that arise Instead of predicting a user's next action based on the
in designing adaptive web sites. actions of many, we might try to predict the user's ul-
timate goal based on what she has done so far. Goal
2.1 Customization recognition [Kautz, 1987; Pollack, 1990] is the problem
Customization is adjusting the site's presentation for an of identifying, from a series of actions, what an agent is
individual user. Customization allows ne-grained im- trying to accomplish. Lesh and Etzioni [Lesh and Et-
provement, since the interface may be completely tai- zioni, 1995] pose this problem in a domain-independent
lored to each individual user. One way for a site to framework and investigate it empirically in the Unix do-
respond to particular visitors is to allow manual cus- main: by watching over a user's shoulder, can we gure
tomization: allowing users to specify display options out what she is trying to accomplish (and oer to accom-
that are remembered during the entire visit and from plish it for her)? They model user actions as planning
one visit to the next. The Microsoft Network (at operators. Assuming users behave somewhat rationally,
http://www.msn.com), for example, allows users to cre- they use these actions' precondition/postcondition rep-
ate home pages with customized news and information resentation to reason from what a user has done to what
displays. Every time an individual visits her MSN home she must be trying to do. In the web domain, we observe
page, she sees the latest pickings from the site presented a visitor's navigation through our site and try to deter-
according to her customizations. mine what page she is seeking. If we can do this quickly
and accurately, we can then oer the desired page im-
Path prediction, on the other hand, customizes auto- mediately.
matically by attempting to guess where the user wants to
go and taking her there more quickly. A path prediction Challenge: Can we formalize user navigation of
system must answer at least the following questions. the web as a planning process that is amenable to
goal recognition? Do user actions on the web carry
What are we predicting? We may try to predict enough evidence of their purpose?
the user's next step. For example, if we can predict
what link on a page a particular user will follow, The AVANTI Project [Fink et al., 1996] (see
we might highlight the link or bring it to the top of http://zeus.gmd.de/ projects/avanti.html) focuses on
the page. Alternatively, we may try to predict the dynamic customization based on users' needs and tastes.
user's eventual goal; if we can determine what page As with the WebWatcher, AVANTI relies partly on users
at the site a visitor is looking for, we can present it providing information about themselves when they en-
to her immediately. ter the site. Based on what it knows about the user,
On what basis do we make predictions? We
AVANTI attempts to predict both the user's eventual
might use only a particular individual's actions to goal and her likely next step. AVANTI will prominently
predict where she will go next. On the other hand, present links leading directly to pages it thinks a user
we might generalize from multiple users to gather will want to see. Additionally, AVANTI will highlight
data more quickly. links that accord with the user's interests. AVANTI is
illustrated on a hypothetical Louvre Museum web site.
What kinds of modications do we make on For example, when a disabled tourist comes to the site,
the basis of our predictions? We may do as links regarding disabled access and tourist information
little as highlighting selected links (by making them are emphasized. AVANTI relies on users providing some
information about themselves in an initial dialogue; the page. Promotion | making the link to a page more
site then uses this information to guide its customization prominent | is a simple but eective transformation.
throughout the user's exploration of the site. AVANTI We have implemented a form of promotion on an ex-
also attempts to guess where the user might go based on isting web site and have found that approximately 10%
what she has looked at so far. For example, if our dis- of our 10,000-15,000 daily page accesses are through au-
abled tourist looks at a number of paintings at the site, tomatically generated links; roughly 25% of all visitors
AVANTI will emphasize paintings links as it continues to click through at least one such link. Of course, we note
serve pages. As with the WebWatcher, we might ask if that promoting a link may be a self-fullling prophecy
we can avoid AVANTI's requirement that users explicitly | making a page more prominent may increase its pop-
provide information. ularity, articially in
ating the site's apparent success at
adaptation.
2.2 Optimization A more ambitious transformation is clustering | syn-
Whereas customization focuses on individuals, optimiza- thesizing a brand new web page that contains links to a
tion tries to improve the site as a whole. Instead of mak- set of related objects. From available data, the system
ing changes for each user, the site learns from all users must infer that a set of pages at the site are related and
to make the site easier to use. This approach allows even group them together. This inference might be based on
new users, about whom we know nothing, to benet from content (e.g., when a number of pages cover the same
the improvements. topic) or on user navigation patterns (e.g., when visitors
We may view a web site's design as a particular point to one page are particularly likely to visit certain oth-
in the vast space of possible designs. Improving the site, ers). As nal exams approach, students tend to look at
then, corresponds to searching in this space for a \bet- multiple solutions sets on each visit. Even though the
ter" design. Assuming we have a way of measuring \bet- solution pages are not linked together directly, visitors
ter", we may view this as a classical AI search problem. navigate from one to another (via intervening pages) on
One possible quality metric would be to measure the their own. This pattern suggests that the solution sets
amount of eort a visitor needs to exert on average in form a meaningful group in our visitors' heads, which
order to nd what she is looking for at our site. Eort does not appear on our web site | solution sets are only
is dened as a function of the number of links traversed linked to from their respective assignment pages. Our
and the diculty of nding those links. For example, a system would create a new page with a link to each so-
site whose most popular local page is buried ve links lution set and make this new page available to visitors
away from the front page could be improved by mak- to the site. We are currently implementing clustering
ing that page accessible from a readily obvious link on transformations based on user navigation data.
the front page. We can navigate through this space by 2.3 Meta-information
performing transformations on the site | adding or re-
moving links, rearranging links, creating new web pages, A web site's ability to adapt can be hampered by the lim-
etc. If we guarantee that each transformation improves ited knowledge about its content and structure provided
the quality of the site, we are performing a hillclimbing by HTML. For example, suppose that a page contains
search. a list of links. Is it appropriate to add a new link at
Challenge: How large is this search space and the top of the list? The answer depends on the con-
what is an appropriate search strategy? Can we re- tents of the list | an adaptive site should not add a
structure the space to avoid searching large portions link to a course's home page to a list of links to faculty
of it? home pages; furthermore, if the list is in alphabetical
order then a new item can only be added at the appro-
In [Perkowitz and Etzioni, 1997] we sketch the design priate point. Clearly, a site's ability to adapt could be
of a system with a repertoire of transformations that aim enhanced by providing it with meta-information: infor-
to improve a site's organization; transformations include mation about its content, structure, and organization.
rearranging and highlighting links as well as synthesizing In this section, we discuss means of providing an adap-
new pages. Our system learns from common patterns in tive site with this sort of information.
the user access logs and decides how to transform the One way to provide meta-information is to represent
site to exploit those patterns and make the site easier the site's content in a formal framework with precisely
to navigate. For example, the web site for our depart- dened semantics such as a database or a semantic net-
ment's introductory computer science course contains a work. This approach is pioneered by the STRUDEL
web page for each homework assignment given during web-site management system [Fernandez et al., 1997]
the course. After each assignment's due date, a solution which attempts to separate the information available at
set for that assignment is made available. Our system a web site from its graphical presentation. Instead of ma-
would observe that after an assignment's due date many nipulating web sites at the level of pages and links, web
visitors look at the solution set; in fact, the most recent sites may be specied using STRUDEL's view-denition
solution set is one of the most popular pages at the site. language. In addition, web sites may be created and
This observation would lead the system to promote the updated by issuing STRUDEL queries. For example, a
solution set by giving it a prominent link on the front corporation might create home pages for its employees
by merging data from its \manager" and \employee" adaptivity. These annotations would be in the form of
databases. A page would be created for every person directives to the adaptive system telling it where it may
in either database. Furthermore, each manager's page (or may not) make changes and what kinds of changes
would have links to her employees, and vice-versa. it might make. For example, we might add a list tag
This approach would facilitate adaptivity because to HTML to allow us to describe the elements in a
STRUDEL would enable a site to reason about its logi- list and how they are ordered. A list might be de-
cal description and detect cases where adaptations would clared as <list order="unordered">, which tells the
violate the existing logic. Furthermore, an adaptive system it may reorder the list in any way it chooses. Or
site could easily transform itself by issuing STRUDEL a list might be declared <list order="popularity">,
queries; STRUDEL provides the mechanisms to auto- in which case the system will draw upon data from
matically update the site appropriately. The drawback access logs to determine how to present the list. A
of the STRUDEL approach is that it requires the site's list declared <list order="alphabetical"> or <list
entire content to be encoded in a set of databases or order="chronological"> can be modied by additions
in wrappers that map web pages and other information or deletions so long as its original ordering constraint is
sources into STRUDEL. The cost of constructing such preserved.
wrappers for existing web sites, and particularly for rel- We present tags of this sort as part of an \Adaptive
atively unstructured sites, appears to be high. HTML" language called A-HTML in [Perkowitz and Et-
A lighter-weight approach is to annotate an existing zioni, 1997]. Our intention is to extend HTML to a
web site with meta-content tags. In this approach, a higher level of abstraction, allowing a web designer to
formal description of the content coexists with HTML describe objects in terms of their time-relevance, organi-
documents. We may choose how much of the site to an- zation, and interrelationships. Note that this approach
notate and how complex our annotations will be. Yet, does not require the global establishment of an A-HTML
meta-content annotation still facilitates reasoning about standard; the adaptive site uses a server capable of inter-
the connections between parts of the site and still pro- preting A-HTML and translating it into standard HTML
vides guidance as to where and how to make changes. at runtime. Only the resulting HTML is served in re-
One approach of this type is Apple's Meta-Content sponse to page requests.
Format (see http://mcf.research.apple.com). MCF is an
attempt to establish a standard for meta-content an- 2.4 Open Questions
notation for the web. When a user visits an MCF- The quest for a self-improving web site raises a num-
enhanced site with an MCF-enabled browser, she can ber of related questions. An adaptive site will be active
choose to navigate the site in a three-dimensional rep- twenty-four hours a day, seven days a week. The site will
resentation of the site's structure, as determined from constantly be ingesting and analyzing data, adjusting its
the site's MCF annotation. SHOE [Luke et al., 1997] (at concepts and models, and updating its own structure and
http://www.cs.umd.edu/projects/plus/SHOE/), takes a presentation. Over time, this constant cycle will re
ect
dierent tack. SHOE is a language for adding simple many hours of experience and renement. In the past,
ontologies to web pages. SHOE adds basic ontological AI research has focused on single trials and short-lived
declarations to HTML; a page can refer to a particular entities: systems that run their experiments and shut
ontology and declare classications for itself and rela- down, to start again the next day with a blank slate.
tions to other pages. In their example, a man's home Although such an approach may be applied to the adap-
page is annotated with information about him, such as tive site challenge, the most intelligent site will surely
the fact that he is a person, his name, his occupation, be one that continually accumulates knowledge about
and his wife's identity (she has her own home page). pages, users, content, and itself.
SHOE is designed to facilitate the exploration of agents User interface design is dicult enough for human be-
and the workings of search tools, but ontological anno- ings to perform well. Yet an adaptive web site will have
tation could also support adaptation. to take into account all the artistry of good design in
While lighter-weight than STRUDEL, meta-content its self-improvements. We can limit the scope of the sys-
tagging also has clear disadvantages. First, because the tem's ability to change itself, thus ensuring that it cannot
meta-content annotation is separated from the actual do too much harm, but this means we also limit its scope
content, it has to be updated manually as the content for improvement. On the other hand, giving the system
changes. Second, since the meta-content is attached to free rein for radical transformation might mean giving it
existing HTML, it provides no direct support for auto- free rein for radical screwup.
matic adaptation; Any adaptation must still modify the Challenge: How do we formalize the concept of
original HTML. good design? How do we limit the potential for harm
Each of the approaches described so far require a without overly limiting the potential for good?
fair amount of eort to build and maintain the con-
tent descriptions. If we wish only to facilitate adapta- We might instead put the AI system in the role of ad-
tion, this eort may be overkill. An alternative that visor to a human master. Instead of making changes un-
we are actively investigating is to use an extremely der cover of night, our AI system must now intelligently
lightweight annotation system designed specically for present suggestions to a human being, complete with ex-
planation and justication. Such a solution frees us from autonomously or advise a site's webmaster, summariz-
the problem of changing details without changing design ing access information and making suggestions. The im-
but presents us with a new interface challenge. provements can happen in real-time as a visitor is nav-
Challenge: How does our adaptive web site com- igating the site, or oine based on observations culled
municate its suggestions to a webmaster? from many visitors.
This paper juxtaposed a number of disconnected
projects from knowledge representation, machine learn-
3 Evaluation ing, and user modeling that are investigating aspects of
Although the problem of measuring the quality of a web the problem. We believe that posing the challenge ex-
site design is thorny, we have identied several prelimi- plicitly, in this paper, will help to cross-fertilize existing
nary approaches. Progress on the design of adaptive web eorts and alert new researchers to the problem. Success
sites will include more sophisticated methods of evaluat- in the next two years will have a broad and highly visible
ing a site's usability. We propose a basic metric for how impact on the web and the AI community.
usable a site is: how much eort must a user exert on
average in order to nd what she wants? As discussed References
in section 2.2, eort can be dened as a function of the [Anantharaman et al., 1990] T. Anantharaman, M. Camp-
number of links traversed and the diculty of nding bell, and F. Hsu. Singular extensions: adding selectivity to
the links on their pages. The standard daily access log brute-force searching. Articial Intelligence, 43(1):99{109,
may be used to approximately measure user eort. 1990.
However, standard log data is not sucient to know [Armstrong et al., 1995] R. Armstrong,
everything about visitor navigation. For example, stan- D. Freitag, T. Joachims, and T. Mitchell. Webwatcher:
dard logs do not distinguish between individuals con- A learning apprentice for the world wide web. In Working
necting from the same location or record which link a Notes of the AAAI Spring Symposium: Information Gath-
user followed. However, software is available to provide ering from Heterogeneous, Distributed Environments, pp.
more complete information. WebThreads, for example 6{12, Stanford University, 1995. AAAI Press. To order a
(see http://www.webthreads.com), allows a site to track copy, contact sss@aaai.org.
an individual user's progress, including both pages vis- [Etzioni, 1996] O. Etzioni. Moving up the information food
ited and links followed. Along with analysis of our site's chain: softbots as information carnivores. In Proc. 14th
structure, data from a system like WebThreads is su- Nat. Conf. on AI, 1996.
cient for us to measure user eort. [Fernandez et al., 1997] Mary Fernandez, Daniela Florescu,
Analysis of our user logs provides much information Jaewoo Kang, Alon Levy, and Dan Suciu. System demon-
about how users interact with the site. In addition, we stration - strudel: A web-site management system. In
may use controlled tests with subjects. Such tests have ACM SIGMOD Conference on Management of Data, 1997.
the advantage of allowing us to observe users as they [Fink et al., 1996] J. Fink, A; Kobsa, and A. Nill. User-
interact with the site { we get much more information oriented adaptivity and adaptability in the avanti project.
than is encoded in user access logs. As subjects perform In Designing for the Web: Empirical Studies, Microsoft
tasks such as nding information, downloading software, Usability Group, Redmond (WA)., 1996.
or locating documents, we may gather data such as: [Kautz, 1987] H. Kautz. A Formal Theory Of Plan Recogni-
tion. PhD thesis, University of Rochester, 1987.
Whether the subject succeeded at the task (or real-
ized it was not solvable). [Lesh and Etzioni, 1995] Neal Lesh and Oren Etzioni. A
sound and fast goal recognizer. In Proc. 14th Int. Joint
How long the subject took to solve the goal. Conf. on AI, pp. 1704{1710, 1995.
How much exploration was required. [Luke et al., 1997] S. Luke, L. Spector, D. Rager, and
J. Hendler. Ontology-based web agents. In Proceedings of
Careful observation of test subjects would complement the First International Conference on Autonomous Agents,
the limited access data we get on all of the site's regular 1997.
visitors. Of course, we can also rely on intermediate [Perkowitz and Etzioni, 1997] M. Perkowitz and O. Etzioni.
measures such as encouraging users to ll out feedback Adaptive sites: Automatically learning from user access
forms and send e-mail messages. patterns. Technical Report UW-CSE-97-03-01, University
of Washington, Department of Computer Science and En-
4 Conclusion gineering, March 1997.
[Pollack, 1990] M. Pollack. Plans as complex mental atti-
This paper posed the challenge of using AI techniques to tudes. In P. Cohen, J. Morgan, and M. Pollack, eds., In-
radically transform web sites from today's inert collec- tentions in Communication, pp. 77{101. MIT Press, Cam-
tions of HTML pages and hyperlinks to intelligent, evolv- bridge, MA, 1990.
ing entities. Adaptive web sites can make popular pages [Thorpe, 1990] C. Thorpe, ed. Vision and Navigation: the
more accessible, highlight interesting links, connect re- Carnegie Mellon Navlab. Kluwer Academic Publishing,
lated pages, and cluster similar documents together. An Boston, MA, 1990.
adaptive web site can perform these self-improvements