Knowledge Engineering
Knowledge Engineering
Knowledge Engineering
In which we discuss how to pick the right tool for the job, build an intelligent system and turn data into knowledge.
302
Figure 9.1
The process of knowledge engineering is illustrated in Figure 9.1. Knowledge engineering, despite its name, is still more art than engineering, and a real process of developing an intelligent system is not as neat and clean as Figure 9.1 might suggest. Although the phases are shown in sequence, they usually overlap considerably. The process itself is highly iterative, and at any time we may engage in any development activities. Let us now examine each phase in more detail.
303
9.1.1
Problem assessment
During this phase we determine the problems characteristics, identify the projects participants, specify the projects objectives and determine what resources are needed for building the system. To characterise the problem, we need to determine the problem type, input and output variables and their interactions, and the form and content of the solution. The rst step is to determine the problem type. Typical problems often addressed by intelligent systems are illustrated in Table 9.1. They include diagnosis, selection, prediction, classication, clustering, optimisation and control. The problem type inuences our choice of the tool for building an intelligent system. Suppose, for example, we develop a system to detect faults in an electric circuit and guide the user through the diagnostic process. This problem clearly belongs to diagnosis. Domain knowledge in such problems can often be represented by production rules, and thus a rule-based expert system might be the right candidate for the job. Of course, the choice of a building tool also depends on the form and content of the solution. For example, systems that are built for diagnostic tasks usually need explanation facilities the means that enable them to justify their solutions. Such facilities are an essential component of any expert system, but are not available in neural networks. On the other hand, a neural network might be a good choice for classication and clustering problems where the results are often more important than understanding the systems reasoning process. The next step in the problem assessment is to identify the participants in the project. Two critical participants in any knowledge engineering project are
Table 9.1
Typical problems addressed by intelligent systems Description Inferring malfunctions of an object from its behaviour and recommending solutions. Recommending the best option from a list of possible alternatives. Predicting the future behaviour of an object from its behaviour in the past. Assigning an object to one of the dened classes. Dividing a heterogeneous group of objects into homogeneous subgroups. Improving the quality of solutions until an optimal one is found. Governing the behaviour of an object to meet specied requirements in real-time.
304
KNOWLEDGE ENGINEERING AND DATA MINING the knowledge engineer (a person capable of designing, building and testing an intelligent system) and the domain expert (a knowledgeable person capable of solving problems in a specic area or domain). Then we specify the projects objectives, such as gaining a competitive edge, improving the quality of decisions, reducing labour costs, and improving the quality of products and services. Finally, we determine what resources are needed for building the system. They normally include computer facilities, development software, knowledge and data sources (human experts, textbooks, manuals, web sites, databases and examples) and, of course, money.
9.1.2
During this phase we obtain further understanding of the problem domain by collecting and analysing both data and knowledge, and making key concepts of the systems design more explicit. Data for intelligent systems are often collected from different sources, and thus can be of different types. However, a particular tool for building an intelligent system requires a particular type of data. Some tools deal with continuous variables, while others need to have all variables divided into several ranges, or to be normalised to a single range, say from 0 to 1. Some handle symbolic (textual) data, while others use only numerical data. Some tolerate imprecise and noisy data, while others require only well-dened, clean data. As a result, the data must be transformed, or massaged, into the form useful for a particular tool. However, no matter which tool we choose, there are three important issues that must be resolved before massaging the data (Berry and Linoff, 1997). The rst issue is incompatible data. Often the data we want to analyse store text in EBCDIC coding and numbers in packed decimal format, while the tools we want to use for building intelligent systems store text in the ASCII code and numbers as integers with a single- or double-precision oating point. This issue is normally resolved with data transport tools that automatically produce the code for the required data transformation. The second issue is inconsistent data. Often the same facts are represented differently in different databases. If these differences are not spotted and resolved in time, we might nd ourselves, for example, analysing consumption patterns of carbonated drinks using data that do not include Coca-Cola just because they were stored in a separate database. The third issue is missing data. Actual data records often contain blank elds. Sometimes we might throw such incomplete records away, but normally we would attempt to infer some useful information from them. In many cases, we can simply ll the blank elds in with the most common or average values. In other cases, the fact that a particular eld has not been lled in might itself provide us with very useful information. For example, in a job application form, a blank eld for a business phone number might suggest that an applicant is currently unemployed.
INTRODUCTION, OR WHAT IS KNOWLEDGE ENGINEERING? Our choice of the system building tool depends on the acquired data. As an example, we can consider a problem of estimating the market value of a property based on its features. This problem can be handled by both expert system and neural network technologies. Therefore, before deciding which tool to apply, we should investigate the available data. If, for instance, we can obtain recent sale prices for houses throughout the region, we might train a neural network by using examples of previous sales rather than develop an expert system using knowledge of an experienced appraiser. The task of data acquisition is closely related to the task of knowledge acquisition. In fact, we acquire some knowledge about the problem domain while collecting the data.
305
9.1.3
This actually involves creating an intelligent system or, rather, a small version of it and testing it with a number of test cases.
306
What is a prototype?
A prototype system can be dened as a small version of the nal system. It is designed to test how well we understand the problem, or in other words to make sure that the problem-solving strategy, the tool selected for building a system, and techniques for representing acquired data and knowledge are adequate to the task. It also provides us with an opportunity to persuade the sceptics and, in many cases, to actively engage the domain expert in the systems development. After choosing a tool, massaging the data and representing the acquired knowledge in the form suitable for that tool, we design and then implement a prototype version of the system. Once it is built, we examine (usually together with the domain expert) the prototypes performance by testing it with a variety of test cases. The domain expert takes an active part in testing the system, and as a result becomes more involved in the systems development.
9.1.4
As soon as the prototype begins functioning satisfactorily, we can assess what is actually involved in developing a full-scale system. We develop a plan, schedule and budget for the complete system, and also clearly dene the systems performance criteria. The main work at this phase is often associated with adding data and knowledge to the system. If, for example, we develop a diagnostic system, we might need to provide it with more rules for handling specic cases. If we develop a prediction system, we might need to collect additional historical examples to make predictions more accurate. The next task is to develop the user interface the means of delivering information to a user. The user interface should make it easy for users to obtain any details they need. Some systems may be required to explain its reasoning process and justify its advice, analysis or conclusion, while others need to represent their results in a graphical form. The development of an intelligent system is, in fact, an evolutionary process. As the project proceeds and new data and knowledge are collected and added to
INTRODUCTION, OR WHAT IS KNOWLEDGE ENGINEERING? the system, its capability improves and the prototype gradually evolves into a nal system.
307
9.1.5
Intelligent systems, unlike conventional computer programs, are designed to solve problems that quite often do not have clearly dened right and wrong solutions. To evaluate an intelligent system is, in fact, to assure that the system performs the intended task to the users satisfaction. A formal evaluation of the system is normally accomplished with the test cases selected by the user. The systems performance is compared against the performance criteria that were agreed upon at the end of the prototyping phase. The evaluation often reveals the systems limitations and weaknesses, so it is revised and relevant development phases are repeated.
9.1.6
This is the nal phase in developing the system. It involves integrating the system into the environment where it will operate and establishing an effective maintenance program. By integrating we mean interfacing a new intelligent system with existing systems within an organisation and arranging for technology transfer. We must make sure that the user knows how to use and maintain the system. Intelligent systems are knowledge-based systems, and because knowledge evolves over time, we need to be able to modify the system.
308
KNOWLEDGE ENGINEERING AND DATA MINING importantly, adopting new intelligent technologies is becoming problemdriven, rather than curiosity-driven as it often was in the past. Nowadays an organisation addresses its problems with appropriate intelligent tools. In the following sections, we discuss applications of different tools for solving specic problems.