Scientific discovery within data streams
January 2005
Pages 66 - 80
Abstract
The term ‘data-stream' is an increasingly overloaded expression. It often means different things to different people, depending on domain, usage or operation. Harold (2003) draws the following analogy:
“A [stream] analogy might be a queue of people waiting to get on a ride at an amusement park. As people are processed at the front (i.e. get on the roller coaster) more are added at the back of the line. If it's a slow day the roller coaster may catch up with the end of the line and have to wait for people to board. Other days there may always be people in line until the park closes...There's always a definite number of people in line though this number may change from moment to moment as people enter at the back of the line and exit from the front of the line. Although all the people are discrete, you'll sometimes have a family that must be put together in the same car. Thus although the individuals are discrete, they aren't necessarily unrelated.”
For our purposes we define a data-stream as a series of data (e.g. credit card transactions arriving at a clearing office, cellular phone traffic or environmental data from satellites) arriving in real time, that have an initiation, a continuous ingest of data, but with no expectations on the amount, length, or end of the data flow. The data stream does not have a database or repository as an intrinsic part of its definition–it is a ‘one-look' opportunity from the perspective of data stream analytics. We call each data element in the stream a token and the complexity of these tokens ranges from simple (e.g. characters in a sentence: “T H I S I S A S T R E A M...”) to extremely complex (e.g. a detailed transaction record). The volume of data-streams is usually massive, and while each individual token may be rather uninformative, taken as a whole they describe the nature of the changing phenomena over time.
References
[1]
Lyman, P. and Varian, H.: How Much Information? A project report of the Regents of the University of California, available at http://www.sims.berkeley.edu/how-much-info. (2000)
[2]
Gorton, I., Almquist, Justin, Cramer, Nick, Haack, Jereme, Hoza, Mark: "An Efficient, Scalable Content-Based Messaging System", in Proc. 7th IEEE International Enterprise Distributed Object Computing Conference (EDOC 2003), pg. 278-285, Brisbane, Australia. (September 2003)
[3]
Pacific Northwest National Laboratory's IT Showcase http://showcase.pnl.gov/show?it/triver-prod webpage
[4]
Havre SL, Hetzler BG, Whitney PD and Nowell LT.: "ThemeRiver: Visualizing Thematic Changes in Document Collections". IEEE Transactions on Visualization and Computer Graphics. (2002) 8(1):9-20
[5]
Hoffman, D.D.: "Visual Intelligence: How We Create What We See." W.W. Norton and Company, Inc., New York. (1998)
[6]
Ware, C.: "Information Visualization: Perception for Design." San Diego: Academic Press. (2000)
[7]
May, R.: "Hi-Space: A Next Generation Workspace Environment." Masters Thesis in Electrical Engineering Computer Science. Pullman, Washington, Washington State University.
[8]
http://showcase.pnl.gov/show?it/hispace-prod
[9]
http://www.hitl.washington.edu/hispace.html
[10]
MacEachren M.A., et al: "Visually-Enabled Geocollaboration to Support Data Exploration and Decision-Making." Proceedings of the 21st International Cartographic Conference, Durban, South Africa. (August 2003) 10-16
[11]
Ullmer, B. and H. Ishii: "The metaDESK: Models and Prototypes for Tangible User Interfaces." UIST. (1997)
[12]
Matsushita, N. and Rekimoto, J.: "HoloWall: Designing a Finger, Hand, Body, and Object Sensitive Wall." UIST. (1997)
[13]
Krueger, M. W.: "Artificial Reality II", Addison-Wesley Publishing Company. (1991)
[14]
Wellner, P.: "Interactions with Paper on the DigitalDesk." Communications of the ACM. (1993) 36(7): 87-96.
[15]
Ohshima, T., K. Sato, et al.: "AR2 Hockey; A Case Study of Collaborative Augmented Reality." VRAIS. (1998)
- Scientific discovery within data streams
Recommendations
Exploiting Punctuation Semantics in Continuous Data Streams
As most current query processing architectures are already pipelined, it seems logical to apply them to data streams. However, two classes of query operators are impractical for processing long or infinite data streams. Unbounded stateful operators ...
Comments
Information & Contributors
Information
Published In
Publisher
Springer-Verlag
Berlin, Heidelberg
Publication History
Published: 01 January 2005
Qualifiers
- Chapter
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Reflects downloads up to 13 Sep 2024
Other Metrics
Citations
View Options
View options
Get Access
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in