Visual Analytics: Definition, Process and Challenges
Visual Analytics: Definition, Process and Challenges
Visual Analytics: Definition, Process and Challenges
A. Kerren et al. (Eds.): Information Visualization, LNCS 4950, pp. 154–175, 2008.
c Springer-Verlag Berlin Heidelberg 2008
Visual Analytics: Definition, Process, and Challenges 155
Due to information overload, time and money are wasted, scientific and in-
dustrial opportunities are lost because we still lack the ability to deal with the
enormous data volumes properly. People in both their business and private lives,
decision-makers, analysts, engineers, emergency response teams alike, are often
confronted with massive amounts of disparate, conflicting and dynamic infor-
mation, which are available from multiple heterogeneous sources. We want to
simply and effectively exploit and use the hidden opportunities and knowledge
resting in unexplored data sources.
In many application areas success depends on the right information being
available at the right time. Nowadays, the acquisition of raw data is no longer
the driving problem: It is the ability to identify methods and models, which can
turn the data into reliable and provable knowledge. Any technology, that claims
to overcome the information overload problem, has to provide answers for the
following problems:
– Who or what defines the “relevance of information” for a given task?
– How can appropriate procedures in a complex decision making process be
identified?
– How can the resulting information be presented in a decision- or task-oriented
way?
– What kinds of interaction can facilitate problem solving and decision mak-
ing?
With every new “real-life” application, procedures are put to the test possibly
under circumstances completely different from the ones under which they have
been established. The awareness of the problem how to understand and analyse
our data has been greatly increased in the last decade. Even as we implement
more powerful tools for automated data analysis, we still face the problem of un-
derstanding and “analysing our analyses” in the future: Fully-automated search,
filter and analysis only work reliably for well-defined and well-understood prob-
lems. The path from data to decision is typically quite complex. Even as fully-
automated data processing methods represent the knowledge of their creators,
they lack the ability to communicate their knowledge. This ability is crucial: If
decisions that emerge from the results of these methods turn out to be wrong,
it is especially important to examine the procedures.
The overarching driving vision of visual analytics is to turn the information
overload into an opportunity: Just as information visualization has changed our
view on databases, the goal of Visual Analytics is to make our way of processing
data and information transparent for an analytic discourse. The visualization of
these processes will provide the means of communicating about them, instead
of being left with the results. Visual Analytics will foster the constructive eval-
uation, correction and rapid improvement of our processes and models and -
ultimately - the improvement of our knowledge and our decisions (see Figure 1).
On a grand scale, visual analytics solutions provide technology that combines
the strengths of human and electronic data processing. Visualization becomes
the medium of a semi-automated analytical process, where humans and machines
cooperate using their respective distinct capabilities for the most effective results.
156 D. Keim et al.
Fig. 1. Tight integration of visual and automatic data analysis methods with database
technology for a scalable interactive decision support.
The user has to be the ultimate authority in giving the direction of the analysis
along his or her specific task. At the same time, the system has to provide
effective means of interaction to concentrate on this specific task. On top of
that, in many applications different people work along the path from data to
decision. A visual representation will sketch this path and provide a reference
for their collaboration across different tasks and abstraction levels.
The diversity of these tasks can not be tackled with a single theory. Visual
analytics research is highly interdisciplinary and combines various related re-
search areas such as visualization, data mining, data management, data fusion,
statistics and cognition science (among others). Visualization has to continuously
challenge the perception by many of the applying sciences that visualization is
not a scientific discipline in its own right. Even if the awareness exists, that
scientific analysis and results must be visualized in one way or the other, this
often results in ad hoc solutions by application scientists, which rarely match
the state of the art in interactive visualization science, much less the full com-
plexity of the problems. In fact, all related research areas in the context of visual
analytics research conduct rigorous, serious science each in a vibrant research
community. To increase the awareness of their work and their implications for
visual analytics research clearly emerges as one main goal of the international
visual analytics community (see Figure 2).
Because visual analytics research can be regarded as an integrating discipline,
application specific research areas should contribute with their existing proce-
dures and models. Emerging from highly application-oriented research, dispersed
research communities worked on specific solutions using the repertoire and stan-
dards of their specific fields. The requirements of visual analytics introduce new
dependencies between these fields.
Visual Analytics: Definition, Process, and Challenges 157
Fig. 2. Visual analytics integrates scientific disciplines to improve the division of labor
between human and machine.
Fig. 3. Visual Analytics integrates Scientific and Information Visualization with core
adjacent disciplines: Data management and analysis, spatio-temporal data, and human
perception and cognition. Successful Visual Analytics research also depends on the
availability of appropriate infrastructure and evaluation facilities.
3.1 Visualization
Visualization has emerged as a new research discipline during the last two dec-
ades. It can be broadly classified into Scientific and Information Visualization.
In Scientific Visualization, the data entities to be visualized are typically 3D
geometries or can be understood as scalar, vectorial, or tensorial fields with ex-
plicit references to time and space. A survey of current visualization techniques
can be found in [22,35,23]. Often, 3D scalar fields are visualized by isosurfaces or
semi-transparent point clouds (direct volume rendering) [15]. To this end, meth-
ods based on optical emission- or absorption models are used which visualize the
volume by ray-tracing or projection. Also, in the recent years significant work
focused on the visualization of complex 3-dimensional flow data relevant e.g.,
in aerospace engineering [40]. While current research has focused mainly on effi-
ciency of the visualization techniques to enable interactive exploration, more and
more methods to automatically derive relevant visualization parameters come
into focus of research. Also, interaction techniques such as focus&context [28]
gain importance in scientific visualization.
Information Visualization during the last decade has developed methods
for the visualization of abstract data where no explicit spatial references are
given [38,8,24,41]. Typical examples include business data, demographics data,
network graphs and scientific data from e.g., molecular biology. The data con-
sidered often comprises hundreds of dimensions and does not have a natural
mapping to display space, and renders standard visualization techniques such as
(x, y) plots, line- and bar-charts ineffective. Therefore, novel visualization tech-
niques are being developed by employing e.g., Parallel Coordinates and their
numerous extensions [20], Treemaps [36], and Glyph [17]- and Pixel-based [25]
visual data representations. Data with inherent network structure may be visual-
ized using graph-based approaches. In many Visualization application areas, the
typically huge volumes of data require the appropriate usage of automatic data
analysis techniques such as clustering or classification as preprocessing prior to
visualization. Research in this direction is just emerging.
technology. But the availability of heterogeneous data not only requires the map-
ping of database schemata but includes also the cleaning and harmonization of
uncertainty and missing data in the volumes of heterogeneous data. Modern ap-
plications require such intelligent data fusion to be feasible in near real-time and
as automatically as possible [32]. New forms of information sources such as data
streams [11], sensor networks [30] or automatic extraction of information from
large document collections (e.g., text, HTML) result in a difficult data analysis
problem which to support is currently in the focus of database research [43].
The relationship between Data Management, Data Analysis and Visualization
is characterized such that Data Management techniques developed increasingly
rely on intelligent data analysis techniques, and also interaction and visualiza-
tion to arrive at optimal results. On the other hand, modern database systems
provide the input data sources which are to be visually analyzed.
regarding the main data types and user tasks [2] to be supported are highly de-
sirable for shaping visual analytics research. A common understanding of data
and problem dimensions and structure, and acceptance of evaluation standards
will make research results better comparable, optimizing research productivity.
Also, there is an obvious need to build repositories of available analysis and vi-
sualization algorithms, which researchers can build upon in their work, without
having to re-implement already proven solutions.
How to assess the value of visualization is a topic of lively debate [42,33]. A
common ground that can be used to position and compare future developments
in the field of data analysis is needed. The current diversification and dispersion
of visual analytics research and development resulted from its focus onto specific
application areas. While this approach may suit the requirements of each of
these applications, a more rigorous and overall scientific perspective will lead to
a better understanding of the field and a more effective and efficient development
of innovative methods and techniques.
3.7 Sub-communities
Spatio-Temporal Data: While many different data types exist, one of the
most prominent and ubiquitous data types is data with references to time and
space. The importance of this data type has been recognized by a research
community which formed around spatio-temporal data management and anal-
ysis [14]. In geospatial data research, data with references in the real world
coming from e.g., geographic measurements, GPS position data, remote sensing
applications, and so on is considered. Finding spatial relationships and patterns
among this data is of special interest, requiring the development of appropriate
management, representation and analysis functions. E.g., developing efficient
data structures or defining distance and similarity functions is in the focus of re-
search. Visualization often plays a key role in the successful analysis of geospatial
data [6,26].
In temporal data, the data elements can be regarded as a function of time.
Important analysis tasks here include the identification of patterns (either lin-
ear or periodical), trends and correlations of the data elements over time, and
application-dependent analysis functions and similarity metrics have been pro-
posed in fields such as finance, science, engineering, etc. Again, visualization of
time-related data is important to arrive at good analysis results [1].
The analysis of data with references both in space and in time is a chal-
lenging research topic. Major research challenges include [4]: scale, as it is often
necessary to consider spatio-temporal data at different spatio-temporal scales;
the uncertainty of the data as data are often incomplete, interpolated, collected
at different times, or based upon different assumptions; complexity of geograph-
ical space and time, since in addition to metric properties of space and time
and topological/temporal relations between objects, it is necessary to take into
account the heterogeneity of the space and structure of time; and complexity of
spatial decision making processes, because a decision process may involve hetero-
164 D. Keim et al.
geneous actors with different roles, interests, levels of knowledge of the problem
domain and the territory.
Network and Graph Data: Graphs appear as flexible and powerful math-
ematical tools to model real-life situations. They naturally map to transporta-
tion networks, electric power grids, and they are also used as artifacts to study
complex data such as observed interactions between people, or induced interac-
tions between various biological entities. Graphs are successful at turning seman-
tic proximity into topological connectivity, making it possible to address issues
based on algorithmics and combinatorial analysis.
Graphs appear as essential modeling and analytical objects, and as effective
visual analytics paradigms. Major research challenges are to produce scalable
analytical methods to identify key components both structurally and visually.
Efforts are needed to design process capable of dealing with large datasets while
producing readable and usable graphical representations, allowing proper user
interaction. Special efforts are required to deal with dynamically changing net-
works, in order to assess of structural changes at various scales.
Fig. 4. The sense-making loop for Visual Analytics based on the simple model of
visualization by Wijk [42].
5 Application Challenges
Visual Analytics is a highly application oriented discipline driven by practical
requirements in important domains. Without attempting a complete survey over
all possible application areas, we sketch the potential applicability of Visual
Analytics technology in a few key domains.
In the Engineering domain, Visual Analytics can contribute to speed-up de-
velopment time for products, materials, tools and production methods by offering
more effective, intelligent access to the wealth of complex information resulting
from prototype development, experimental test series, customers’ feedback, and
many other performance metrics. One key goal of applied Visual Analytics in
the engineering domain will be the analysis of the complexity of the production
systems in correlation with the achieved output, for an efficient and effective
improvement of the production environments.
Financial Analysis is a prototypical promising application area for Visual
Analytics. Analysts in this domain are confronted with streams of heterogeneous
information from different sources available at high update rates, and of varying
166 D. Keim et al.
6 Technical Challenges
The primary goal of Visual Analytics is the analysis of vast amounts of data to
identify and visually distill the most valuable and relevant information content.
The visual representation should reveal structural patterns and relevant data
properties for easy perception by the analyst. A number of key requirements
need to be addressed by advanced Visual Analytics solutions. We next outline
important scientific challenges in this context.
Visual Analytics: Definition, Process, and Challenges 167
Fig. 5. A visual display of a large amount of position records is unreadable and not
suitable for analysis.
Fig. 6. Positions of stops have been extracted from the database. By means of cluster-
ing, frequently visited places have been detected.
170 D. Keim et al.
Fig. 7. The temporal histograms show the distribution of the stops in the frequently
visited places (Figure 6) with respect to the weekly (left) and daily (right) cycles.
Fig. 8. A result of clustering and summarization of movement data: the routes between
the significant places.
8 Conclusions
The problems addressed by Visual Analytics are generic. Virtually all sciences
and many industries rely on the ability to identify methods and models, which
can turn data into reliable and provable knowledge. Ever since the dawn of mod-
ern science, researchers needed to find methodologies to create new hypotheses,
to compare them with alternative hypotheses, and to validate their results. In
a collaborative environment this process includes a large number of specialized
people each having a different educational background. The ability to commu-
nicate results to peers will become crucial for scientific discourse.
Currently, no technological approach can claim to give answers to all three
key questions that have been outlined in the first section, regarding the
– relevance of a specific information
– adequacy of data processing methods and validity of results
– acceptability of the presentation of results for a given task
172 D. Keim et al.
Visual Analytics research does not focus on specific methods to address these
questions in a single “best-practice”. Each specific domain contributes a reper-
toire of approaches to initiate an interdisciplinary creation of solutions.
Visual Analytics literally maps the connection between different alternative
solutions, leaving the opportunity for the human user to view these options in
the context of the complete knowledge generation process and to discuss these
options with peers on common ground.
References
1. Aigner, W., Miksch, S., Müller, W., Schumann, H., Tominski, C.: Visual meth-
ods for analyzing time-oriented data. IEEE Transactions on Visualization and
Computer Graphics 14(1), 47–60 (2008)
2. Amar, R.A., Eagan, J., Stasko, J.T.: Low-level components of analytic activity in
information visualization. In: INFOVIS, p. 15 (2005)
3. Amiel, M., Melançon, G., Rozenblat, C.: Réseaux multi-niveaux: l’exemple des
échanges aériens mondiaux. M@ppemonde 79(3) (2005)
4. Andrienko, G., Andrienko, N., Jankowski, P., Keim, D., Kraak, M.-J.,
MacEachren, A., Wrobel, S.: Geovisual analytics for spatial decision support:
Setting the research agenda. Special issue of the International Journal of Geo-
graphical Information Science 21(8), 839–857 (2007)
5. Andrienko, G., Andrienko, N., Wrobel, S.: Visual analytics tools for analysis of
movement data. ACM SIGKDD Explorations 9(2) (2007)
6. Andrienko, N., Andrienko, G.: Exploratory Analysis of Spatial and Temporal
Data. Springer, Heidelberg (2005)
7. Auber, D., Chiricota, Y., Jourdan, F., Melançon, G.: Multiscale visualization of
small world networks. In: INFOVIS (2003)
8. Card, S.K., Mackinlay, J., Shneiderman, B.: Readings in Information Visualiza-
tion: Using Vision to Think. Morgan Kaufmann, San Francisco (1999)
9. Ceglar, A., Roddick, J.F., Calder, P.: Guiding knowledge discovery through in-
teractive data mining, pp. 45–87. IGI Publishing, Hershey (2003)
10. Chiricota, Y., Melançon, G.: Visually mining relational data. International Review
on Computers and Software (2005)
11. Das, A.: Semantic approximation of data stream joins. IEEE Transactions on
Knowledge and Data Engineering 17(1), 44–59 (2005), Member-Johannes Gehrke
and Member-Mirek Riedewald
12. Dix, A., Finlay, J.E., Abowd, G.D., Beale, R.: Human-Computer Interaction (.),
3rd edn. Prentice-Hall, Inc., Upper Saddle River (2003)
13. Duda, R., Hart, P., Stock, D.: Pattern Classification. John Wiley and Sons Inc,
Chichester (2000)
14. Dykes, J., MacEachren, A., Kraak, M.-J.: Exploring geovisualization. Elsevier
Science, Amsterdam (2005)
15. Engel, K., Hadwiger, M., Kniss, J.M., Rezk-salama, C., Weiskopf, D.: Real-time
Volume Graphics. A. K. Peters, Ltd., Natick (2006)
16. Ester, M., Sander, J.: Knowledge Discovery in Databases - Techniken und An-
wendungen. Springer, Heidelberg (2000)
17. Forsell, C., Seipel, S., Lind, M.: Simple 3d glyphs for spatial multivariate data.
In: INFOVIS, p. 16 (2005)
174 D. Keim et al.
18. Han, J., Kamber, M. (eds.): Data Mining: Concepts and Techniques. Morgan
Kaufmann, San Francisco (2000)
19. Hand, D., Mannila, H., Smyth, P. (eds.): Principles of Data Mining. MIT Press,
Cambridge (2001)
20. Inselberg, A., Dimsdale, B.: Parallel Coordinates: A Tool for Visualizing Multi-
variate Relations (chapter 9), pp. 199–233. Plenum Publishing Corporation, New
York (1991)
21. Jacko, J.A., Sears, A.: The Handbook for Human Computer Interaction. Lawrence
Erlbaum & Associates, Mahwah (2003)
22. Johnson, C., Hanson, C. (eds.): Visualization Handbook. Kolam Publishing (2004)
23. Keim, D., Ertl, T.: Scientific visualization (in german). Information Technol-
ogy 46(3), 148–153 (2004)
24. Keim, D., Ward, M.: Visual Data Mining Techniques (chapter 11). Springer, New
York (2003)
25. Keim, D.A., Ankerst, M., Kriegel, H.-P.: Recursive pattern: A technique for visu-
alizing very large amounts of data. In: VIS ’95: Proceedings of the 6th conference
on Visualization ’95, Washington, DC, USA, p. 279. IEEE Computer Society
Press, Los Alamitos (1995)
26. Keim, D.A., Panse, C., Sips, M., North, S.C.: Pixel based visual data mining of
geo-spatial data. Computers &Graphics 28(3), 327–344 (2004)
27. Kerren, A., Stasko, J.T., Fekete, J.-D., North, C.J. (eds.): Information Visualiza-
tion. LNCS, vol. 4950. Springer, Heidelberg (2008)
28. Krúger, J., Schneider, J., Westermann, R.: Clearview: An interactive context pre-
serving hotspot visualization technique. IEEE Transactions on Visualization and
Computer Graphics 12(5), 941–948 (2006)
29. Maimon, O., Rokach, L. (eds.): The Data Mining and Knowledge Discovery Hand-
book. Springer, Heidelberg (2005)
30. Meliou, A., Chu, D., Guestrin, C., Hellerstein, J., Hong, W.: Data gathering tours
in sensor networks. In: IPSN (2006)
31. Mitchell, T.M.: Machine Learning. McGraw-Hill, Berkeley (1997)
32. Naumann, F., Bilke, A., Bleiholder, J., Weis, M.: Data fusion in three steps:
Resolving schema, tuple, and value inconsistencies. IEEE Data Eng. Bull. 29(2),
21–31 (2006)
33. North, C.: Toward measuring visualization insight. IEEE Comput. Graph.
Appl. 26(3), 6–9 (2006)
34. Perner, P. (ed.): Data Mining on Multimedia Data. LNCS, vol. 2558. Springer,
Heidelberg (2002)
35. Schumann, H., Müller, W.: Visualisierung - Grundlagen und allgemeine Metho-
den. Springer, Heidelberg (2000)
36. Shneiderman, B.: Tree visualization with tree-maps: 2-d space-filling approach.
ACM Trans. Graph. 11(1), 92–99 (1992)
37. Shneiderman, B., Plaisant, C.: Designing the User Interface. Addison-Wesley,
Reading (2004)
38. Spence, R.: Information Visualization. ACM Press, New York (2001)
39. Thomas, J.J., Cook, K.A.: Illuminating the Path. IEEE Computer Society Press,
Los Alamitos (2005)
40. Tricoche, X., Scheuermann, G., Hagen, H.: Tensor topology tracking: A visual-
ization method for time-dependent 2d symmetric tensor fields. Comput. Graph.
Forum 20(3) (2001)
41. Unwin, A., Theus, M., Hofmann, H.: Graphics of Large Datasets: Visualizing a
Million (Statistics and Computing). Springer, New York (2006)
Visual Analytics: Definition, Process, and Challenges 175
42. van Wijk, J.J.: The value of visualization. In: IEEE Visualization, p. 11 (2005)
43. Widom, J.: Trio: A system for integrated management of data, accuracy, and
lineage. In: CIDR, pp. 262–276 (2005)
44. Yi, J.S., Kang, Y.a., Stasko, J.T., Jacko, J.A.: Toward a deeper understanding
of the role of interaction in information visualization. IEEE Trans. Vis. Comput.
Graph. 13(6), 1224–1231 (2007)