Big Data
Big Data
Big Data
1 Denition
Growth of and Digitization of Global Information Storage
Capacity[1]
Data sets are growing rapidly in part because they are increasingly gathered by cheap and numerous informationsensing mobile devices, aerial (remote sensing), software
logs, cameras, microphones, radio-frequency identication (RFID) readers and wireless sensor networks.[5][6]
The worlds technological per-capita capacity to store information has roughly doubled every 40 months since the In a 2001 research report[14] and related lectures, META
1980s;[7] as of 2012, every day 2.5 exabytes (2.51018 ) Group (now Gartner) analyst Doug Laney dened data
1
3 ARCHITECTURE
growth challenges and opportunities as being threedimensional, i.e. increasing volume (amount of data),
velocity (speed of data in and out), and variety (range of
data types and sources). Gartner, and now much of the
industry, continue to use this 3Vs model for describing big data.[15] In 2012, Gartner updated its denition as
follows: Big data is high volume, high velocity, and/or
high variety information assets that require new forms
of processing to enable enhanced decision making, insight discovery and process optimization. Gartners definition of the 3Vs is still widely used, and in agreement
with a consensual denition that states that Big Data
represents the Information assets characterized by such
a High Volume, Velocity and Variety to require specic
Technology and Analytical Methods for its transformation into Value.[16] Additionally, a new V Veracity is
added by some organizations to describe it,[17] revisionism challenged by some industry authorities.[18] The 3Vs
have been expanded to other complementary characteristics of big data:[19][20]
2 Characteristics
Big data can be
characteristics:[19][20]
described
by
the
following
The growing maturity of the concept more starkly delineates the dierence between big data and Business Intelligence:[23]
Business Intelligence uses descriptive statistics with Data must be processed with advanced tools (analytics
data with high information density to measure and algorithms) to reveal meaningful information. For
example, to manage a factory one must consider both visthings, detect trends, etc..
ible and invisible issues with various components. Infor Big data uses inductive statistics and concepts from mation generation algorithms must detect and address innonlinear system identication[24] to infer laws (re- visible issues such as machine degradation, component
[30][31]
gressions, nonlinear relationships, and causal ef- wear, etc. on the factory oor.
fects) from large sets of data with low information density[25] to reveal relationships and dependencies, or to perform predictions of outcomes and 3 Architecture
behaviors.[24][26]
In a popular tutorial article published in IEEE Access
Journal,[27] the authors classied existing denitions of
big data into three categories: Attribute Denition, Comparative Denition and Architectural Denition. The authors also presented a big-data technology map that illustrates its key technological evolutions.
In 2000, Seisint Inc. (now LexisNexis Group) developed a C++-based distributed le-sharing framework for
data storage and query. The system stores and distributes
structured, semi-structured, and unstructured data across
multiple servers. Users can build queries in a C++ dialect
called ECL. ECL uses an apply schema on read method
to infer the structure of stored data when it is queried,
3
instead of when it is stored. In 2004, LexisNexis ac Big Data technologies, like business intelligence,
quired Seisint Inc.[32] and in 2008 acquired ChoicePoint,
cloud computing and databases
Inc.[33] and their high-speed parallel processing platform.
Visualization, such as charts, graphs and other disThe two platforms were merged into HPCC (or Highplays of the data
Performance Computing Cluster) Systems and in 2011,
HPCC was open-sourced under the Apache v2.0 License.
Currently, HPCC and Quantcast File System[34] are the Multidimensional big data can also be represented as
only publicly available platforms capable of analyzing tensors, which can be more eciently handled by
tensor-based computation,[44] such as multilinear submultiple exabytes of data.
space learning.[45] Additional technologies being applied
In 2004, Google published a paper on a process called
to big data include massively parallel-processing (MPP)
MapReduce that uses a similar architecture. The MapRedatabases, search-based applications, data mining,[46]
duce concept provides a parallel processing model, and an
distributed le systems, distributed databases, cloudassociated implementation was released to process huge
based infrastructure (applications, storage and computing
amounts of data. With MapReduce, queries are split
resources) and the Internet.
and distributed across parallel nodes and processed in
parallel (the Map step). The results are then gathered Some but not all MPP relational databases have the ability
and delivered (the Reduce step). The framework was to store and manage petabytes of data. Implicit is the
very successful,[35] so others wanted to replicate the al- ability to load, monitor, back up, and optimize the use of
gorithm. Therefore, an implementation of the MapRe- the large data tables in the RDBMS.[47]
duce framework was adopted by an Apache open-source DARPA's Topological Data Analysis program seeks the
project named Hadoop.[36]
fundamental structure of massive data sets and in 2008
MIKE2.0 is an open approach to information management that acknowledges the need for revisions due to big
data implications identied in an article titled Big Data
Solution Oering.[37] The methodology addresses handling big data in terms of useful permutations of data
sources, complexity in interrelationships, and diculty
in deleting (or modifying) individual records.[38]
Recent studies show that a multiple-layer architecture is
one option to address the issues that big data presents.
A distributed parallel architecture distributes data across
multiple servers; these parallel execution environments
can dramatically improve data processing speeds. This
type of architecture inserts data into a parallel DBMS,
which implements the use of MapReduce and Hadoop
frameworks. This type of framework looks to make the
processing power transparent to the end user by using a
front-end application server.[39]
5 Applications
4
Technologies
Big data has increased the demand of information management specialists in that Software AG, Oracle CorpoA 2011 McKinsey Global Institute report character- ration, IBM, Microsoft, SAP, EMC, HP and Dell have
izes the main components and ecosystem of big data as spent more than $15 billion on software rms specializing
in data management and analytics. In 2010, this industry
follows:[43]
was worth more than $100 billion and was growing at al Techniques for analyzing data, such as A/B testing, most 10 percent a year: about twice as fast as the software
machine learning and natural language processing
business as a whole.[2]
5 APPLICATIONS
initiative is composed of 84 dierent big data programs spread across six departments.[54]
Big data analysis played a large role in Barack
Obama's successful 2012 re-election campaign.[55]
The United States Federal Government owns six
of the ten most powerful supercomputers in the
world.[56]
5.1
Government
5.4
Healthcare
5.3
a mirrored image of the real machineable to continuously record and track machine condition during the later
utilization stage. Finally, with the increased connectivity oered by cloud computing technology, the coupled
model also provides better accessibility of machine condition for factory managers in cases where physical access
to actual equipment or machine data is limited.[31]
Manufacturing
5.3.1
Cyber-physical models
5.4 Healthcare
Big data analytics has helped healthcare improve by providing personalized medicine and prescriptive analytics,
clinical risk intervention and predictive analytics, waste
and care variability reduction, automated external and
internal reporting of patient data, standardized medical terms and patient registries and fragmented point
solutions.[68]
5.5 Education
A McKinsey Global Institute study found a shortage of
1.5 million highly trained data professionals and managers [43] and a number of universities [69] including
University of Tennessee and UC Berkeley, have created
masters programs to meet this demand. Private bootcamps have also developed programs to meet that demand, including free programs like The Data Incubator
or paid programs like General Assembly.[70]
5.6 Media
To understand how the media utilises Big Data, it is rst
necessary to provide some context into the mechanism
used for media process. It has been suggested by Nick
Couldry and Joseph Turow that practitioners in Media
and Advertising approach big data as many actionable
points of information about millions of individuals. The
industry appears to be moving away from the traditional
approach of using specic media environments such as
newspapers, magazines, or television shows and instead
tap into consumers with technologies that reach targeted
people at optimal times in optimal locations. The ultimate aim is to serve, or convey, a message or content that
is (statistically speaking) in line with the consumers mindset. For example, publishing environments are increasingly tailoring messages (advertisements) and content (articles) to appeal to consumers that have been exclusively
gleaned through various data-mining activities.[71]
Targeting of consumers (for advertising by marketers)
Data-capture
5 APPLICATIONS
5.6.1
Technology
5.8 Science
The Large Hadron Collider experiments represent about
150 million sensors delivering data 40 million times per
second. There are nearly 600 million collisions per second. After ltering and refraining from recording more
than 99.99995%[81] of these streams, there are 100 collisions of interest per second.[82][83][84]
As a result, only working with less than 0.001% of
the sensor stream data, the data ow from all four
LHC experiments represents 25 petabytes annual
rate before replication (as of 2012). This becomes
nearly 200 petabytes after replication.
If all sensor data were recorded in LHC, the data
ow would be extremely hard to work with. The
data ow would exceed 150 million petabytes annual
rate, or nearly 500 exabytes per day, before replication. To put the number in perspective, this is equivalent to 500 quintillion (51020 ) bytes per day, almost 200 times more than all the other sources combined in the world.
As of August 2012, Google was handling roughly The Square Kilometre Array is a radio telescope built of
thousands of antennas. It is expected to be operational by
100 billion searches per month.[75]
2024. Collectively, these antennas are expected to gather
Oracle NoSQL Database has been tested to past the
14 exabytes and store one petabyte per day.[85][86] It is
1M ops/sec mark with 8 shards and proceeded to hit
considered one of the most ambitious scientic projects
1.2M ops/sec with 10 shards.[76]
ever undertaken.
5.7
5.7.1
Private sector
Retail
Retail banking
7
The NASA Center for Climate Simulation (NCCS)
stores 32 petabytes of climate observations and
simulations on the Discover supercomputing
cluster.[88][89]
Googles DNAStack compiles and organizes DNA
samples of genetic data from around the world to
identify diseases and other medical defects. These
fast and exact calculations eliminate any 'friction
points,' or human errors that could be made by one
of the numerous science and biology experts working with the DNA. DNAStack, a part of Google Genomics, allows scientists to use the vast sample of resources from Googles search server to scale social
experiments that would usually take years, instantly.
5.9
Sports
Big data can be used to improve training and understanding competitors, using sport sensors. Besides, it
is possible to predict winners in a match using big data
analytics.[90] Future performance of players could be predicted as well. Thus, players value and salary is determined by data collected throughout the season.[91]
The U.S. state of Massachusetts announced the MasThe movie MoneyBall demonstrates how big data could sachusetts Big Data Initiative in May 2012, which probe used to scout players and also identify undervalued vides funding from the state government and private
players.[92]
companies to a variety of research institutions.[102] The
In Formula One races, race cars with hundreds of sen- Massachusetts Institute of Technology hosts the Intel Scisors generate terabytes of data. These sensors collect data ence and Technology Center for Big Data in the MIT
points from tire pressure to fuel burn eciency. Then, Computer Science and Articial Intelligence Laboratory,
this data is transferred to team headquarters in United combining government, corporate, and institutional fundKingdom through ber optic cables that could carry data ing and research eorts.[103]
at the speed of light.[93] Based on the data, engineers and The European Commission is funding the 2-year-long Big
data analysts decide whether adjustments should be made Data Public Private Forum through their Seventh Framein order to win a race. Besides, using big data, race teams work Program to engage companies, academics and other
try to predict the time they will nish the race before- stakeholders in discussing big data issues. The project
hand, based on simulations using data collected over the aims to dene a strategy in terms of research and innovaseason.[94]
tion to guide supporting actions from the European Com-
Research activities
7 CRITIQUE
Computational social sciences Anyone can use Application Programming Interfaces (APIs) provided by Big
Data holders, such as Google and Twitter, to do research
in the social and behavioral sciences.[108] Often these
APIs are provided for free.[108] Tobias Preis et al. used
Google Trends data to demonstrate that Internet users
from countries with a higher per capita gross domestic
product (GDP) are more likely to search for information
about the future than information about the past. The
ndings suggest there may be a link between online behaviour and real-world economic indicators.[109][110][111]
The authors of the study examined Google queries logs
made by ratio of the volume of searches for the coming year ('2011') to the volume of searches for the previous year ('2009'), which they call the 'future orientation index'.[112] They compared the future orientation index to the per capita GDP of each country, and found a
strong tendency for countries where Google users inquire
more about the future to have a higher GDP. The results
hint that there may potentially be a relationship between
the economic success of a country and the informationseeking behavior of its citizens captured in big data.
7 Critique
Critiques of the big data paradigm come in two avors,
those that question the implications of the approach itself,
and those that question the way it is currently done.[125]
Big data sets come with algorithmic challenges that previously did not exist. Hence, there is a need to fundamen- Much in the same line, it has been pointed out that the detally change the processing ways.[123]
cisions based on the analysis of big data are inevitably informed by the world as it was in the past, or, at best, as it
currently is.[63] Fed by a large number of data on past ex6.1 Sampling Big Data
periences, algorithms can predict future development if
the future is similar to the past.[129] If the systems dynamAn important research question that can be asked about ics of the future change (if it is not a stationary process),
big data sets is whether you need to look at the full data to the past can say little about the future. In order to make
draw certain conclusions about the properties of the data predictions in changing environments, it would be necesor is a sample good enough. The name big data itself con- sary to have a thorough understanding of the systems dytains a term related to size and this is an important char- namic, which requires theory.[129] As a response to this
acteristic of big data. But Sampling (statistics) enables critique it has been suggested to combine big data apthe selection of right data points from within the larger proaches with computer simulations, such as agent-based
data set to estimate the characteristics of the whole pop- models[63] and Complex Systems. Agent-based models
ulation. For example, there are about 600 million tweets are increasingly getting better in predicting the outcome
produced every day. Is it necessary to look at all of them of social complexities of even unknown future scenarios
9
through computer simulations that are based on a collection of mutually interdependent algorithms.[130][131] In
addition, use of multivariate methods that probe for the
latent structure of the data, such as factor analysis and
cluster analysis, have proven useful as analytic approaches
that go well beyond the bi-variate approaches (cross-tabs)
typically employed with smaller data sets.
In health and biology, conventional scientic approaches
are based on experimentation. For these approaches, the
limiting factor is the relevant data that can conrm or
refute the initial hypothesis.[132] A new postulate is accepted now in biosciences: the information provided by
the data in huge volumes (omics) without prior hypothesis is complementary and sometimes necessary to conventional approaches based on experimentation. In the
massive approaches it is the formulation of a relevant hypothesis to explain the data that is the limiting factor.
The search logic is reversed and the limits of induction
(Glory of Science and Philosophy scandal, C. D. Broad,
1926) are to be considered.
Privacy advocates are concerned about the threat to privacy represented by increasing storage and integration of
personally identiable information; expert panels have released various policy recommendations to conform practice to expectations of privacy.[133][134][135]
7.2
10
References
[1] The Worlds Technological Capacity to Store, Communicate, and Compute Information. MartinHilbert.net. Retrieved 13 April 2016.
[2] Data, data everywhere. The Economist. 25 February
2010. Retrieved 9 December 2012.
[3] Community cleverness required. Nature 455 (7209): 1.
4 September 2008. doi:10.1038/455001a.
[4] Reichman, O.J.; Jones, M.B.; Schildhauer, M.P.
(2011).
Challenges and Opportunities of Open
Data in Ecology.
Science 331 (6018): 7035.
doi:10.1126/science.1197962. PMID 21311007.
[5] Hellerstein, Joe (9 November 2008). Parallel Programming in the Age of Big Data. Gigaom Blog.
[6] Segaran, Toby; Hammerbacher, Je (2009). Beautiful
Data: The Stories Behind Elegant Data Solutions. O'Reilly
Media. p. 257. ISBN 978-0-596-15711-1.
[7] Hilbert, Martin; Lpez, Priscila (2011). The Worlds
Technological Capacity to Store, Communicate, and
Compute Information. Science 332 (6025): 6065.
doi:10.1126/science.1200970. PMID 21310967.
[8] IBM What is big data? Bringing big data to the enterprise. www.ibm.com. Retrieved 2013-08-26.
[9] Oracle and FSN, Mastering Big Data: CFO Strategies to
Transform Insight into Opportunity, December 2012
[10] Jacobs, A. (6 July 2009). The Pathologies of Big Data.
ACMQueue.
[11] Magoulas, Roger; Lorica, Ben (February 2009).
Introduction to Big Data. Release 2.0 (Sebastopol CA:
O'Reilly Media) (11).
[12] Snijders, C.; Matzat, U.; Reips, U.-D. (2012). "'Big Data':
Big gaps of knowledge in the eld of Internet. International Journal of Internet Science 7: 15.
[13] Ibrahim; Targio Hashem, Abaker; Yaqoob, Ibrar; Badrul
Anuar, Nor; Mokhtar, Salimah; Gani, Abdullah; Ullah
Khan, Samee (2015). big data on cloud computing: Review and open research issues. Information Systems 47:
98115. doi:10.1016/j.is.2014.07.006.
[14] Laney, Douglas. 3D Data Management: Controlling
Data Volume, Velocity and Variety (PDF). Gartner. Retrieved 6 February 2001.
[15] Beyer, Mark. Gartner Says Solving 'Big Data' Challenge
Involves More Than Just Managing Volumes of Data.
Gartner. Archived from the original on 10 July 2011. Retrieved 13 July 2011.
[16] De Mauro, Andrea; Greco, Marco; Grimaldi, Michele
(2016). A Formal denition of Big Data based on
its essential Features. Library Review 65: 122135.
doi:10.1108/LR-06-2015-0061.
[17] What is Big Data?". Villanova University.
REFERENCES
11
[35] Bertolucci, Je Hadoop: From Experiment To Leading Big Data Platform, Information Week, 2013. Retrieved on 14 November 2013.
[36] Webster, John. MapReduce: Simplied Data Processing
on Large Clusters, Search Storage, 2004. Retrieved on
25 March 2013.
[37] Big Data Solution Oering. MIKE2.0. Retrieved 8 December 2013.
[38] Big Data Denition. MIKE2.0. Retrieved 9 March
2013.
[39] Boja, C; Pocovnicu, A; Btgan, L. (2012). Distributed
Parallel Architecture for Big Data. Informatica Economica 16 (2): 116127.
[40] Intelligent Maintenance System
[41] http://www.hcltech.com/sites/default/files/solving_key_
businesschallenges_with_big_data_lake_0.pdf
[42] Method for testing the fault tolerance of MapReduce
frameworks (PDF). Computer Networks. 2015.
[43] Manyika, James; Chui, Michael; Bughin, Jaques; Brown,
Brad; Dobbs, Richard; Roxburgh, Charles; Byers, Angela
Hung (May 2011). Big Data: The next frontier for innovation, competition, and productivity. McKinsey Global
Institute. Retrieved January 16, 2016.
[44] Future Directions in Tensor-Based Computation and
Modeling (PDF). May 2009.
[45] Lu, Haiping; Plataniotis, K.N.; Venetsanopoulos, A.N.
(2011). A Survey of Multilinear Subspace Learning for
Tensor Data (PDF). Pattern Recognition 44 (7): 1540
1551. doi:10.1016/j.patcog.2011.01.004.
[46] Pllana, Sabri; Janciak, Ivan; Brezany, Peter; Whrer,
Alexander. A Survey of the State of the Art in Data Mining and Integration Query Languages. 2011 International
Conference on Network-Based Information Systems (NBIS
2011). IEEE Computer Society. Retrieved 2 April 2016.
[47] Monash, Curt (30 April 2009). eBays two enormous
data warehouses.
Monash, Curt (6 October 2010). eBay followup Greenplum out, Teradata > 10 petabytes, Hadoop has some
value, and more.
[48] Resources on how Topological Data Analysis is used to
analyze big data. Ayasdi.
[49] CNET News (1 April 2011). Storage area networks need
not apply.
[50] How New Analytic Systems will Impact Storage.
September 2011.
[51] An Error Occurred Setting Your User Cookie.
[52] Rajpurohit, Anmol (11 July 2014). Interview: Amy Gershko, Director of Customer Analytics & Insights, eBay
on How to Design Custom In-House BI Tools. KDnuggets. Retrieved 2014-07-14. Dr. Amy Gershko:
Generally, I nd that o-the-shelf business intelligence
12
REFERENCES
[69] Degrees in Big Data: Fad or Fast Track to Career Success. Forbes. Retrieved 2016-02-21.
[70] NY gets new bootcamp for data scientists: Its free, but
harder to get into than Harvard. Venture Beat. Retrieved
2016-02-21.
[76] Lamb, Charles. Oracle NoSQL Database Exceeds 1 Million Mixed YCSB Ops/Sec.
[94] Frank Bi. How Formula One Teams Are Using Big Data
To Get The Inside Edge. www.forbes.com. Retrieved 12
December 2015.
[86] Future telescope array drives development of exabyte [105] Alan Turing Institute to be set up to research big data.
processing. Ars Technica. Retrieved 15 April 2015.
BBC News. 19 March 2014. Retrieved 2014-03-19.
13
[106] Inspiration day at University of Waterloo, Stratford Cam- [122] Jason Palmer (25 April 2013). Google searches predict
pus. betakit.com/. Retrieved 2014-02-28.
market moves. BBC. Retrieved 9 August 2013.
[107] Lee, Jay; Lapira, Edzel; Bagheri, Behrad; Kao, [123] E. Sejdi, Adapt current tools for use with big data, Nature, vol. vol. 507, no. 7492, pp. 306, Mar. 2014.
Hung-An (2013). Recent Advances and Trends in
Predictive Manufacturing Systems in Big Data En[124] Deepan Palguna, Vikas Joshi, Venkatesan Chakaravarthy,
vironment.
Manufacturing Letters 1 (1): 3841.
Ravi Kothari and L. V. Subramaniam (2015). Analysis of
doi:10.1016/j.mfglet.2013.09.005.
Sampling Algorithms for Twitter. International Joint Conference on Articial Intelligence.
[108] Reips, Ulf-Dietrich; Matzat, Uwe (2014). Mining Big
Data using Big Data Services. International Journal of
[125] Kimble, C.; Milolidakis, G. (2015). Big Data and
Internet Science 1 (1): 18.
Business Intelligence: Debunking the Myths. Global
Business and Organizational Excellence 35 (1): 2334.
[109] Preis, Tobias; Moat,, Helen Susannah; Stanley, H. Eudoi:10.1002/joe.21642.
gene; Bishop, Steven R. (2012). Quantifying the Advantage of Looking Forward. Scientic Reports 2:
[126] Chris Anderson (23 June 2008). The End of Theory:
350. doi:10.1038/srep00350. PMC 3320057. PMID
The Data Deluge Makes the Scientic Method Obsolete.
22482034.
WIRED.
[110] Marks, Paul (5 April 2012). Online searches for future [127] Graham M. (9 March 2012). Big data and the end of
linked to economic success. New Scientist. Retrieved 9
theory?". The Guardian (London).
April 2012.
[128] Good Data Won't Guarantee Good Decisions. Har[111] Johnston, Casey (6 April 2012). Google Trends reveals
vard Business Review. Shah, Shvetank; Horne, Andrew;
clues about the mentality of richer nations. Ars Technica.
Capell, Jaime;. HBR.org. Retrieved 8 September 2012.
Retrieved 9 April 2012.
[129] Big Data requires Big Visions for Big Change., Hilbert, M.
[112] Tobias Preis (24 May 2012). Supplementary Informa(2014). London: TEDxUCL, x=independently organized
tion: The Future Orientation Index is available for downTED talks
load (PDF). Retrieved 2012-05-24.
[130] Jonathan Rauch (1 April 2002). Seeing Around Cor[113] Philip Ball (26 April 2013). Counting Google searches
ners. The Atlantic.
predicts market movements. Nature. Retrieved 9 August
[131] Epstein, J. M., & Axtell, R. L. (1996). Growing Articial
2013.
Societies: Social Science from the Bottom Up. A Brad[114] Tobias Preis, Helen Susannah Moat and H. Eugene Stanford Book.
ley (2013). Quantifying Trading Behavior in Financial Markets Using Google Trends. Scientic Reports 3: [132] Delort P., Big data in Biosciences, Big Data Paris, 2012
1684. doi:10.1038/srep01684. PMC 3635219. PMID
[133] Ohm, Paul. Don't Build a Database of Ruin. Harvard
23619126.
Business Review.
[115] Nick Bilton (26 April 2013). Google Search Terms Can [134] Darwin Bond-Graham, Iron Cagebook The Logical End
Predict Stock Market, Study Finds. New York Times. Reof Facebooks Patents, Counterpunch.org, 2013.12.03
trieved 9 August 2013.
[135] Darwin Bond-Graham, Inside the Tech industrys Startup
[116] Christopher Matthews (26 April 2013). Trouble With
Conference, Counterpunch.org, 2013.09.11
Your Investment Portfolio? Google It!". TIME Magazine.
[136] danah boyd (29 April 2010). Privacy and Publicity in the
Retrieved 9 August 2013.
Context of Big Data. WWW 2010 conference. Retrieved
[117] Philip Ball (26 April 2013). Counting Google searches
2011-04-18.
predicts market movements. Nature. Retrieved 9 August
[137] Jones, MB; Schildhauer, MP; Reichman, OJ; Bowers, S
2013.
(2006). The New Bioinformatics: Integrating Ecological
[118] Bernhard Warner (25 April 2013).
"'Big Data'
Data from the Gene to the Biosphere (PDF). Annual ReResearchers Turn to Google to Beat the Markets.
view of Ecology, Evolution, and Systematics 37 (1): 519
Bloomberg Businessweek. Retrieved 9 August 2013.
544. doi:10.1146/annurev.ecolsys.37.091305.110031.
[119] Hamish McRae (28 April 2013). Hamish McRae: Need [138] Boyd, D.; Crawford, K. (2012). Critical Questions for
a valuable handle on investor sentiment? Google it. The
Big Data. Information, Communication & Society 15 (5):
Independent (London). Retrieved 9 August 2013.
662679. doi:10.1080/1369118X.2012.678878.
[120] Richard Waters (25 April 2013). Google search proves [139] Failure to Launch: From Big Data to Big Decisions, Forte
to be new word in stock market prediction. Financial
Wares.
Times. Retrieved 9 August 2013.
[140] Gregory Piatetsky (12 August 2014). Interview: Michael
[121] David Leinweber (26 April 2013). Big Data Gets Bigger:
Berthold, KNIME Founder, on Research, Creativity, Big
Now Google Trends Can Predict The Market. Forbes.
Data, and Privacy, Part 2. KDnuggets. Retrieved 2014Retrieved 9 August 2013.
08-13.
14
11
10
Further reading
11
External links
EXTERNAL LINKS
15
12
12.1
Big data Source: https://en.wikipedia.org/wiki/Big_data?oldid=722833400 Contributors: William Avery, Heron, Kku, Samw, Andrewman327, Tpbradbury, Ryuch, , Shizhao, Topbanana, Paul W, F3meyer, Sunray, Giftlite, Langec, Erik Carson, Utcursch, Beland,
Jeremykemp, David@scatter.com, Discospinster, Rich Farmbrough, Kdammers, ArnoldReinhold, Narsil, Bender235, Stesmo, Viriditas,
Lenov, Gary, Pinar, Tobych, Miranche, Broeni, Compo, Tomlzz1, Axeman89, Woohookitty, Pol098, BD2412, Qwertyus, Rjwilmsi, Koavf,
ElKevbo, Jehochman, Nihiltres, Luminade, Tedder, DVdm, SteveLoughran, Aeusoes1, Daniel Mietchen, Tony1, Cedar101, Dimensionsix, Katieh5584, Henryyan, McGeddon, Od Mishehu, Gilliam, Ohnoitsjamie, Chris the speller, RDBrown, Pegua, Madman2001, Krexer,
Kuru, Accurizer, Almaz~enwiki, Dl2000, HelloAnnyong, Razi chaudhry, The Letter J, Chris55, Yragha, Sanspeur, Jac16888, Marc W.
Abel, Cydebot, Matrix61312, Quibik, DumbBOT, Malleus Fatuorum, EdJohnston, Nick Number, Cowb0y, Lmusher, Barek, Josephmarty, Kforeman1, Rmyeid, OhanaUnited, Relyk, Wllm, Lvsubram, Magioladitis, Nyq, Tedickey, Steven Walling, Thevoid00, Casieg,
Jim.henderson, Tokyogirl79, MacShimi, McSly, NewEnglandYankee, Lamp90, Asefati, Pchackal, Mgualtieri, VolkovBot, JohnBlackburne, Vishal0soni, Vincent Lextrait, Philip Trueman, Ottb19, Billinghurst, ParallelWolverine, Grinq, Scottywong, Luca Naso, Dawn
Bard, Yintan, Jazzwang, Jojikiba, Eikoku, SPACKlick, CutOTies, Mkbergman, Melcombe, Siskus, PabloStraub, Dilaila, Martarius,
Sfan00 IMG, Faalagorn, Apptrain, Morrisjd1, Grantbow, Mild Bill Hiccup, Ottawahitech, Cirt, Auntof6, Lbertolotti, Gnome de plume,
Resoru, Pablomendes, Saisdur, Vehementlyirish, SchreiberBike, MPH007, Rui Gabriel Correia, Mymallandnews, XLinkBot, Ost316, Benboy00, MystBot, Itadapter, P.r.newman, Addbot, Mortense, Drevicko, Thomas888b, Non-dropframe, AndrewHZ, Tothwolf, Ronhjones,
Moosehadley, MrOllie, Download, Jarble, Arbitrarily0, Luckas-bot, Yobot, Fraggle81, Manivannan pk, Misterlevel, Elx, Jean.julius,
AnomieBOT, Jim1138, Babrodtk, Bluerasberry, Materialscientist, Citation bot, Xqbot, Marko Grobelnik, Melmann, Bgold12, Anna
Frodesiak, Tomwsulcer, Srich32977, Omnipaedista, Smallman12q, Joaquin008, CorporateM, Jugdev, FrescoBot, Jonathanchaitow, I42,
PeterEastern, AtmosNews, B3t, I dream of horses, HRoestBot, Jonesey95, Jandalhandler, Mengxr, Ethansdad, Yzerman123, Lotje, Msalganik, , Sideways713, Stuartzs, Jfmantis, Mean as custard, RjwilmsiBot, Ripchip Bot, Mm479arok, Winchetan, Petermcelwee,
DASHBot, EmausBot, John of Reading, Oliverlyc, Timtempleton, Dewritech, Primefac, Peaceray, Radshashi, Cmlloyd1969, Dcirovic,
K6ka, HiW-Bot, Richard asr, ZroBot, Checkingfax, BobGourley, Josve05a, Xtzou, Chire, Kilopi, Laurawilber, Rcsprinter123, Rick
jens, Palosirkka, Donner60, MainFrame, ChuispastonBot, Sean Quixote, Axelode, Mhiji, Helpsome, ClueBot NG, Behrad3d, Horoporo, Danielg922, Pramanicks, Jj1236, Widr, WikiMSL, Lawsonstu, Fvillanustre, Helpful Pixie Bot, Lowercase sigmabot, BG19bot,
And Adoil Descended, Seppemans123, Jantana, Innocentantic, Northamerica1000, Asplanchna, MusikAnimal, AvocatoBot, Noelwclarke,
Matt tubb, Jordanzhang, Bar David, InfoCmplx, Atlasowa, Cth027, Fylbecatulous, Camberleybates, BattyBot, WH98, DigitalDev, Haroldpolo, Ryguyrg, Untioencolonia, Shirishnetke, Ampersandian, MarkTraceur, ChrisGualtieri, TheJJJunk, Khazar2, Vaibhav017, IjonTichyIjonTichy, Danap611, Saturdayswiki, Mheikkurinen, Seherrell, Mjvaugh2, ChazzI73, Davidogm, Dexbot, Mherradora, Jkofron4, Stevebillings, Indianbusiness, Toopathnd, Jeremy Kolb, Frosty, Jamesx12345, OnTheNet21, BrighterTomorrow, Phamnhatkhanh, Jacoblarsen
net, Epicgenius, DavidKSchneider, Socratesplato9, Anirudhrata, Parasdoshiblog, Edwinboothnyc, JuanCarlosBrandt, Helenellis, MMeTrew, Warrenpd86, Michael.alexander.kaufmann, AuthorAnil, ViaJFK, Gary Simon, Bsc, FCA, FBCS, CITP, Mcio, Joe204, Caraconan, Evaluatorgroup, Hessmike, TJLaher123, Chengying10, IndustrialAutomationGuru, Dabramsdt, Prussonyc, Abhishek1605, Dilaila123, Willymomo, Rzicari, Mandruss, Mingminchi, BigDataGuru1, Sugamsha, Sysp, Azra2013, Paul2520, Dudewhereismybike,
Shahbazali101, SJ Defender, Yeda123, Miakeay, Stamptrader, Accountdp, Morganmissen, JeanneHolm, Fixuture, Yourconnotation, JenniferAndy, Arcamacho, Amgauna, Bigdatavomit, Monkbot, Wikientg, Scottishweather, Textractor, Analytics ireland, Addisnog, Lspin01l,
ForumOxford Online, JanSmicer, Mansoor-siamak, Belasobral, Sightestrp, Jwdang4, Amortias, Wikiauthor22, Femiolajiga, Tttcraig,
Lepro2, Mythnder, DexterToo, Mr P. Kopee, Pablollopis, SVtechie, Deathmuncher19, Smaske, Greystoke1337, Viam Ferream, Loraof, Prateekkeshari, Hmrv83, Vidyasnap, KaraHayes, Iqmc, Lalith269, Helloyoubum, Jakesher, IEditEncyclopedia, Rajsbhatta123, Ragnar Valgeirsson, Vedanga Kumar, Fgtyg78, Gary2015, HelpUsStopSpam, EricVSiegel, Benedge46, Friafternoon, KasparBot, Adzzyman,
Pmaiden, Spetrowski88, JuiAmale, Yasirsid, LGB2015, Diyottainc, Nt8068a, WikilleWi, Preyansh07, Dharnett21, Winterysteppe, It0713,
Pookiegalore, Loki Farrell, Davidosawa, ArguMentor, Richard.Zhang99, Sushant3010, Gaurav 2410, Swimfan93, Sethkylebolton, Kokilasoral, Wikiritammi, Lectorenespaol, Tullika.life and Anonymous: 408
12.2
Images
12.3
Content license