chapter

Scientific discovery within data streams

Authors:

Andrew J. Cowell,

Sue Havre,

Richard May,

Antonio SanfilippoAuthors Info & Claims

Ambient Intelligence for Scientific Discovery: foundations, Theories, and Systems

January 2005

Pages 66 - 80

Published: 01 January 2005 Publication History

Abstract

The term ‘data-stream' is an increasingly overloaded expression. It often means different things to different people, depending on domain, usage or operation. Harold (2003) draws the following analogy:

“A [stream] analogy might be a queue of people waiting to get on a ride at an amusement park. As people are processed at the front (i.e. get on the roller coaster) more are added at the back of the line. If it's a slow day the roller coaster may catch up with the end of the line and have to wait for people to board. Other days there may always be people in line until the park closes...There's always a definite number of people in line though this number may change from moment to moment as people enter at the back of the line and exit from the front of the line. Although all the people are discrete, you'll sometimes have a family that must be put together in the same car. Thus although the individuals are discrete, they aren't necessarily unrelated.”

For our purposes we define a data-stream as a series of data (e.g. credit card transactions arriving at a clearing office, cellular phone traffic or environmental data from satellites) arriving in real time, that have an initiation, a continuous ingest of data, but with no expectations on the amount, length, or end of the data flow. The data stream does not have a database or repository as an intrinsic part of its definition–it is a ‘one-look' opportunity from the perspective of data stream analytics. We call each data element in the stream a token and the complexity of these tokens ranges from simple (e.g. characters in a sentence: “T H I S I S A S T R E A M...”) to extremely complex (e.g. a detailed transaction record). The volume of data-streams is usually massive, and while each individual token may be rather uninformative, taken as a whole they describe the nature of the changing phenomena over time.

References

[1]

Lyman, P. and Varian, H.: How Much Information? A project report of the Regents of the University of California, available at http://www.sims.berkeley.edu/how-much-info. (2000)

Abstract

References

Recommendations

Punctuated data streams

Exploiting Punctuation Semantics in Continuous Data Streams

Statistical mining in data streams

Comments

Information

Published In

Publisher

Publication History

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations