To reduce Cost of data ware house deployment , virtualization is very Important. virtualization can reduce Cost
and as well as tremendous Pressure of managing devices, Storages Servers, application models & main Power.
In current time, data were house is more effective and important Concepts that can make much impact in
decision support system in Organization. Data ware house system takes large amount of time, cost and efforts
then data base system to Deploy and develop in house system for an Organization . Due to this reason that,
people now think about cloud computing as a solution of the problem instead of implementing their own data
were house system . In this paper, how cloud environment can be established as an alternative of data ware
house system. It will given the some knowledge about better environment choice for the organizational need.
Organizational Data were house and EC2 (elastic cloud computing ) are discussed with different parameter like
ROI, Security, scalability, robustness of data, maintained of system etc
Lecture 03 - The Data Warehouse and Design phanleson
The chapter discusses the design process for building a data warehouse, including designing the interface from operational systems and designing the data warehouse itself. It covers topics like beginning with operational data, data and process models, data warehouse data models at different levels, normalization and denormalization, and managing complexity in transformations and integrations between operational and warehouse systems. The goal is to extract, transform, and load relevant data from operational sources into the data warehouse in a way that supports analysis and decision-making.
This document provides information about big data and its characteristics. It discusses the different types of data that comprise big data, including structured, semi-structured, and unstructured data. It also addresses some of the challenges of big data, such as its increasing volume and the need to process it in real-time for applications like online promotions and healthcare monitoring. Traditional data warehouse architectures may not be well-suited for big data applications.
The document discusses big data analytics and related topics. It covers the evolution of technology, an overview of big data analytics including the 5 V's (volume, variety, velocity, value, and veracity). It also discusses research topics in big data, tools and software, literature surveys on various big data studies, identified research gaps, and a proposed activity chart and bibliography. The document provides a comprehensive overview of big data analytics, key concepts, potential research areas, and literature in the field.
Building a data warehouse of call data recordsDavid Walker
This document discusses considerations for building a data warehouse to archive call detail records (CDRs) for a mobile virtual network operator (MVNO). The MVNO needed to improve compliance with data retention laws and enable more flexible analysis of CDR data. Key factors examined were whether to use Hadoop/NoSQL solutions and relational databases. While Hadoop can handle unstructured data, the CDRs have a defined structure and the IT team lacked NoSQL skills, so a relational database was deemed more suitable.
Deduplication on Encrypted Big Data in HDFSIRJET Journal
This document discusses data deduplication techniques for big data stored in HDFS (Hadoop Distributed File System). It begins by defining data deduplication as a data compression technique that eliminates duplicate copies of repeating data to reduce storage space. The document then reviews different levels and types of deduplication (file-level, block-level, inline, post-process, client-side, target-based) and discusses how deduplication can reduce storage needs significantly for backup applications and file systems. However, security and privacy concerns arise when sensitive user data is deduplicated in the cloud. The document proposes a new authorized deduplication scheme that considers access control policies of users in addition to the data itself.
What is Data Warehouse?OLTP vs. OLAP, Conceptual Modeling of Data Warehouses,Data Warehousing Components, Data Warehousing Components, Building a Data Warehouse, Mapping the Data Warehouse to a Multiprocessor Architecture, Database Architectures for Parallel Processing
Topics in Data Management include data analysis, database management systems, data modeling, database administration, data warehousing, data mining, data quality assurance, data security, and data architecture. Data analysis involves looking at and summarizing data to extract useful information and develop conclusions. Database management systems are used to manage databases and are used by over 90% of people using computers. Data modeling is the process of structuring and organizing data to be implemented in a database. Database administrators are responsible for ensuring the security, performance, and availability of organizational data.
An elastic , effective, activety or intelligent ,graceful networking architecture layout be desired to make processing massive data. next to that ,existent network architectures be considerably incapable for
cleatting the huge data. massive data thrusts network exchequers into border it consequence with in network overcrowding ,needy achievement, then permicious employer exprtises. this offered the current state-of-the-art research affronts ,potential solutions into huge data networking notion. More specifically, present the state of networking problems into massive data connected intrequirements,capacity,running ,
data manipulating also will introduce the architectures of MapReduce , Hadoop paradigm within research
requirements, fabric networks and software defined networks which utilizized into making today’s idly growing digital world and compare and contrast into identify relevant drawbacks and solutions.
A Review on Classification of Data Imbalance using BigDataIJMIT JOURNAL
Classification is one among the data mining function that assigns items in a collection to target categories or collection of data to provide more accurate predictions and analysis. Classification using supervised learning method aims to identify the category of the class to which a new data will fall under. With the advancement of technology and increase in the generation of real-time data from various sources like Internet, IoT and Social media it needs more processing and challenging. One such challenge in processing is data imbalance. In the imbalanced dataset, majority classes dominate over minority classes causing the machine learning classifiers to be more biased towards majority classes and also most classification algorithm predicts all the test data with majority classes. In this paper, the author analysis the data imbalance models using big data and classification algorithm.
This document provides an overview of data warehousing and data mining. It begins by defining a data warehouse as a system that contains historical and cumulative data from single or multiple sources for simplifying reporting, analysis, and decision making. It describes three common data warehouse architectures and the key components of a data warehouse, including the database, ETL tools, metadata, query tools, and data marts. The document then defines data mining as extracting usable data from raw data using software to analyze patterns. It outlines descriptive and predictive data mining tasks and techniques like clustering, associations, summarization, prediction, and classification. Finally, it provides examples of data mining applications and discusses how AWS services like Amazon Redshift can provide scalable data warehousing
This document provides a survey of distributed heterogeneous big data mining adaptation in the cloud. It discusses how big data is large, heterogeneous, and distributed, making it difficult to analyze with traditional tools. The cloud helps overcome these issues by providing scalable infrastructure on demand. However, directly applying Hadoop MapReduce in the cloud is inefficient due to its assumption of homogeneous nodes. The document surveys different approaches for improving MapReduce performance in heterogeneous cloud environments through techniques like optimized task scheduling and resource allocation.
Those responsible for data management often struggle due to the many responsibilities involved. While organizations recognize data as a key asset, they are often unable to properly manage it. Creating a "Literal Staging Area" or LSA platform can help take a holistic view of improving overall data management. An LSA makes a copy of business systems that is refreshed daily and can be used for tasks like data quality monitoring, analysis, and operational reporting to help address data management challenges in a cost effective way for approximately $120,000.
The document discusses temporal databases, which store information about how data changes over time. It covers several key points:
- Temporal databases allow storage of past and future states of data, unlike traditional databases which only store the current state.
- Time can be represented in terms of valid time (when facts were true in the real world) and transaction time (when facts were current in the database). Temporal databases may track one or both dimensions.
- SQL supports temporal data types like DATE, TIME, TIMESTAMP, INTERVAL and PERIOD for representing time values and durations.
- Temporal information can describe point events or durations. Relational databases incorporate time by adding timestamp attributes, while object databases
Analysis and evaluation of riak kv cluster environment using basho benchStevenChike
This document analyzes and evaluates the performance of the Riak KV NoSQL database cluster using the Basho-bench benchmark tool. Experiments were conducted on a 5-node Riak KV cluster to test throughput and latency under different workloads, data sizes, and operations (read, write, update). The results found that Riak KV can handle large volumes of data and various workloads effectively with good throughput, though latency increased with larger data sizes. Overall, Riak KV is suitable for distributed big data environments where high availability, scalability and fault tolerance are important.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
The document discusses key concepts related to big data including what data and big data are, the three structures of big data (volume, velocity, and variety), sources and types of big data, how big data differs from traditional databases, applications of big data across various fields such as healthcare and social media, tools for working with big data like Hadoop and MongoDB, and challenges and solutions related to big data.
1. Database management systems (DBMS) allow users to define, create, query, update, and administer databases.
2. A DBMS interacts with users, applications, and the database itself to capture and analyze data stored in the database.
3. Well-known DBMS are tools like MySQL, Oracle, SQL Server, and PostgreSQL. They allow defining, creating, querying, updating and managing databases.
Lecture4 big data technology foundationshktripathy
The document discusses big data architecture and its components. It explains that big data architecture is needed when analyzing large datasets over 100GB in size or when processing massive amounts of structured and unstructured data from multiple sources. The architecture consists of several layers including data sources, ingestion, storage, physical infrastructure, platform management, processing, query, security, monitoring, analytics and visualization. It provides details on each layer and their functions in ingesting, storing, processing and analyzing large volumes of diverse data.
Application Of A New Database Management SystemPamela Wright
The document discusses selecting a new database management system. It recommends determining if the vendor offers auditing, reporting and data management tools, and ensuring the software provides application level security and interfaces with corporate access granting procedures. Authentication should use secure protocols like SSL, and data encryption is also important. Application security gateways can understand applications, track user access, and perform deep packet inspection to determine access attempts. Organizational policies should be documented and reports automatically generated covering topics like compliance, risk assessment and investigating exceptions. All database requests should be logged and a full audit trail extractable, containing information like who accessed what data and from where.
Key aspects of big data storage and its architectureRahul Chaturvedi
This paper helps understand the tools and technologies related to a classic BigData setting. Someone who reads this paper, especially Enterprise Architects, will find it helpful in choosing several BigData database technologies in a Hadoop architecture.
This document discusses cloud databases and database-as-a-service (DBaaS). It outlines the benefits of moving databases to the cloud, such as reduced costs and increased flexibility. Popular cloud databases mentioned include MySQL, PostgreSQL, Google CloudSQL, and MongoLab. The document also discusses features of cloud computing like on-demand self-service, broad network access, resource pooling, rapid elasticity, and monitored service. Associating databases with the cloud provides organizations with a flexible, always-available backend without worrying about hardware and software maintenance.
The Proliferation And Advances Of Computer NetworksJessica Deakin
The document discusses selecting a new database management system for an organization. Key considerations include ensuring the vendor offers auditing, reporting and data management tools to provide application level security and interface with existing corporate access procedures. The selected solution should be able to automate report production on topics like database compliance, certification, control of activities, and risk assessment to adhere to organizational policies. Application security gateways can provide additional protection by examining network traffic to the database server.
Three reasons why data virtualization is poised to play a key role in data management:
1) Data management challenges are increasing due to needs for quick response times, large and diverse data sources like social media and sensors, and many data management tools.
2) Data virtualization can address these challenges by providing a unified, secure access layer and delivering data as a service to meet business needs.
3) Data virtualization allows for a hybrid data storage model with data stored in both data warehouses and cheaper storage like Hadoop, and provides a common way to access both through its virtualization layer.
Databases allow for the storage and organization of related data. A database contains tables that store data in rows and columns. A database management system (DBMS) helps define, construct, and manipulate the database. Relational databases follow a relational model and store data in related tables. Benefits of databases over file systems include reduced data redundancy, avoidance of data inconsistency, ability to share data among multiple users, and application of security restrictions. Transactions allow multiple database operations to be executed atomically as a single unit.
Data mining involves discovering hidden patterns in data, while data warehousing involves integrating data from multiple sources and storing it in a centralized location to support analysis. Some key differences are:
- Data mining uses techniques like classification, clustering, and association to discover insights from data, while data warehousing focuses on data integration and OLAP tools.
- Data mining looks for unknown relationships and makes predictions, while data warehousing provides a way to extract and analyze historical data.
- Data warehousing involves extracting, cleaning, and transforming data during an ETL process before loading it into a separate database optimized for analysis. Data mining builds on the outputs of data warehousing.
AtomicDB is a proprietary software technology that uses an n-dimensional associative memory system instead of a traditional table-based database. This allows information to be stored and related in a way analogous to human memory. The technology does not require extensive programming and can rapidly build and modify information systems to meet evolving needs. It provides significant cost and performance advantages over traditional databases for managing complex, relational data.
This document provides an introduction to databases and data mining. It defines what a database is and describes different types of databases, including centralized, distributed, personal, end user, commercial, NoSQL, operational, relational, cloud, and object-oriented databases. It also discusses database management systems and their role in maintaining database security, integrity, and accessibility. The document then introduces concepts related to data warehousing and data mining, including definitions and common uses.
Analysis of SOFTWARE DEFINED STORAGE (SDS)Kaushik Rajan
This document analyzes software defined storage (SDS) and compares it to traditional storage systems. SDS abstracts and simplifies data storage management, separating the storage software from hardware. It provides benefits like flexibility, reliability, lower costs, and higher performance. SDS also allows for easier scaling of storage capacity and automation of management. While traditional systems are suitable for some specific workloads, the comparison shows SDS has advantages and is revolutionizing storage in the IT industry.
This document provides an introduction to database management systems (DBMS). It defines a database as an organized collection of data and explains that a DBMS is software that allows for the storage, organization, and retrieval of data from a database. The document then discusses different database models including hierarchical, network, relational, and object-oriented models. It provides examples and discusses advantages and disadvantages of each model.
The document discusses two approaches to managing domains in a data mesh architecture: the open model and strict model. The open model gives domain teams freedom to choose their own tools and data storage, requiring reliable teams to avoid inconsistencies. The strict model predefines domain environments without customization allowed and puts central management on data persistence, ensuring consistency but requiring more platform implementation. Both have pros and cons depending on the organization and use case.
Intro to big data and applications -day 3Parviz Vakili
This document provides a summary of a presentation on introductory concepts related to big data and applications. The presentation was delivered on October 2020 by Parviz Vakili and covered several key topics including data architecture, data governance, data modeling and design, data storage and operations, data warehousing and business intelligence, and document and content management. It included definitions and context diagrams for major data management concepts.
Ethopian Database Management system as a Cloud Service: Limitations and advan...IOSR Journals
This document discusses deploying database management systems as a cloud service in Ethiopia. It notes some key advantages, such as lower upfront costs and paying only for resources used. However, it also identifies limitations, such as security risks from storing data off-site and lack of control over data location. The document analyzes which types of data management applications, like analytical vs transactional systems, may be better suited to the cloud. It concludes that analytical systems for business intelligence and decision support are a good initial fit due to their read-mostly nature and ability to parallelize workloads.
Cloud Computing: A Perspective on Next Basic Utility in IT World IRJET Journal
This document discusses cloud computing and its architecture. It begins with an introduction to cloud computing, defining it as a model that provides infrastructure, platforms, and software as services. The key characteristics and service models of cloud computing are described.
The document then discusses the architecture of cloud computing, including the layers of Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). It also describes the deployment models of private cloud, public cloud, community cloud, and hybrid cloud.
The document outlines several challenges of cloud computing, such as resource allocation and scheduling, cost optimization, processing time and speed, memory management, load balancing, security issues, fault
IRJET- An Integrity Auditing &Data Dedupe withEffective Bandwidth in Cloud St...IRJET Journal
This document proposes a system for secure cloud storage that uses data deduplication, integrity auditing by a third party auditor (TPA), and encryption to improve security, reduce storage usage, and verify data integrity. It compares different levels of data deduplication (byte-level, block-level, file-level) and proposes using a combination of SHA-512 hashing, Merkle hash trees, and AES-128 encryption. Performance analysis shows the proposed system requires less storage space than existing systems by removing duplicate data, and the third party auditor can verify data integrity more efficiently than the cloud service provider.
Similar to Data Ware House System in Cloud Environment (20)
The Control of Relative Humidity & Moisture Content in The AirAshraf Ismail
To many of us Relative Humidity (RH%) & Moisture Content (g/ kg) are confusing terms & we often don't know which one of them to choose in order to highlight our "Humidity" issues!
This post is to briefly address the definition of Relative Humidity, Moisture Content , Moisture Load Sources & Humidity Control Hazard!
"Operational and Technical Overview of Electric Locomotives at the Kanpur Ele...nanduchaihan9
"My Summer Report" provides a detailed account of the Indian Railways and the operations of electric locomotives at the Electric Loco Shed in Kanpur. It includes information on the history of Indian Railways, the establishment and functioning of the Electric Loco Shed, and technical descriptions of the components and operations of three-phase locomotives. The report discusses various parts of the locomotives such as the pantograph, servo motor, lightening arrester, circuit breaker, main transformer, harmonic filter, traction motor, battery, cooling fan, and compressor. It also explains the working of traction converters and provides circuit diagrams for different locomotive models.
1. DEE 1203 ELECTRICAL ENGINEERING DRAWING.pdfAsiimweJulius2
This lecture will equip students with basic electrical engineering knowledge on various types of electrical and electronics drawings, different types of drawing papers, different ways of producing a good drawing and the importance of electrical engineering drawing to both engineers and the users.
By the end of this lecture, students will be to differentiate between different electrical diagrams like, block diagrams, schematic diagrams, circuit diagrams among others.
1. Mr. Krishna Prasad Bajgai1
. Int. Journal of Engineering Research and Applications www.ijera.com
ISSN: 2248-9622, Vol. 6, Issue 3, (Part - 6) March 2016, pp.67-73
www.ijera.com 67|P a g e
Data Ware House System in Cloud Environment
Mr. Krishna Prasad Bajgai1
, Mr. Amit Kumar Asthana(HOD)2
1
Student (M.Tech-CSE), Subharti Institute of Technology & Engineering
2
Swami Vivekanand Subharti University, Meerut
ABSTRACT
To reduce Cost of data ware house deployment , virtualization is very Important. virtualization can reduce Cost
and as well as tremendous Pressure of managing devices, Storages Servers, application models & main Power.
In current time, data were house is more effective and important Concepts that can make much impact in
decision support system in Organization. Data ware house system takes large amount of time, cost and efforts
then data base system to Deploy and develop in house system for an Organization . Due to this reason that,
people now think about cloud computing as a solution of the problem instead of implementing their own data
were house system . In this paper, how cloud environment can be established as an alternative of data ware
house system. It will given the some knowledge about better environment choice for the organizational need.
Organizational Data were house and EC2 (elastic cloud computing ) are discussed with different parameter like
ROI, Security, scalability, robustness of data, maintained of system etc.
Keyboard – EC2, Iaas, Saas, & Paas cloud coumputing ,Dw.
I. INTRODUCTION
Database Technology is well developed
and accepted by every small to large scale
organization in all over the world . Database
technology has collected Vast Amount of data in
the organizational storage . it will be a great serve
to the society, if this data can be managed well for
strong, then the organizational decision support
system . A Concept to manage large Data Storage
of Data based System is call the Data were house
System .[1]
Data were house system concept is also
implemented in many organization across the
world . It is very easy to deploy data were
house in the Organization compare to earlier
days.
The maintaining very large amount of data
cross –platform data transformation & loading,
integration and data retrieval are main stages of
data were house system. The concept of DW, in
earlier time, people find difficulties in working
with large data and cross platform data integration
.[2]
Dw System Concept is still in growing
stage but its components are well defined to serve
as good decision support system .
Integration of various system can be done
by defining well-structured meta data and to
achieve faster retrieval rate can be done by various
types of data mining algorithm in a data were house
System .
Another emerging technology that sought
more in ICT is cloud Computing . Cloud
Computing is a duster of scalable and vertical
resources like computers, storages, System S/W
etc. The users are required to have the internet
enable devices to access services of service
provider for the implementation of cloud
computing. The services providers Provided all
remaining requirements.
They are required to maintain various
computers , servers, database S/W, System S/W
and networking systems . There are mainly three
types of services according to there Standard.
− (IaaS) Intertexture o a sources ,
− Platform as a services (Paas)
− Software as a services (SaaS).,
All these three types of service are again divided in
to Public and private services.
DW System meanly services to top
management people due to its decision making
ability . Organization are running to world the data
were house system by having their own system or
adopt service of cloud computing that is EC2.[2]
This parts of paper describe about
importance of both system using relevant scenario.
It also explain . brief architecture of data were
house system & cloud computing System . By
using different example and case studies. It is also
discuses by functionality & comparison of in-
house data were house system & EC2 System. In
conclusion of this paper have all pros & cons
about data ware house and EC2. Conclusion will a
way for developed & user of ICT to choose the best
for their applications.
II. FUNCTION OF THE SYSTEM :-
Data from different department of
organization are collected & stored together in one
place. The large amount of data which come from
RESEARCH ARTICLE OPEN ACCESS
2. Mr. Krishna Prasad Bajgai1
. Int. Journal of Engineering Research and Applications www.ijera.com
ISSN: 2248-9622, Vol. 6, Issue 3, (Part - 6) March 2016, pp.67-73
www.ijera.com 68|P a g e
heterogeneous sources can not able to manage by
conventional data base system .[4]
OLTP (Online transaction processing)
These data base system are made to perform small
transactions for OLTP, while. DW System are
used for complex analysis that some times have
more then two or three dimensions. These system
are known as OLAP (Online Analytical Processing
).
There are three types of computational
resources in terms of s/w or computational power
which made available thronged a computer
network (e.g. the internet ).[5] EC2 refer to
computational resources are available as per
requirement on rent . s/w can be provided as a
services to customer in the cloud is SaaS (s/w as a
services ) as per example :-
Data base tools, ERP tools, Utility tools,
etc . infrastructure can be provided as a services to
customer in the cloud is called IaaS.
(Infrastructure as a services ) as per example cluster
of processing unites file server , Data bas server
etc.
Computational recourse can be periods to
a customer to use the cloud as a platform like
operating System , system s/w etc.
This is known as Paas (Platform as a
services ) [5] we are interested in running a DW
system by using the cloud as a plate form we are
mostly interested in PaaS service through this
paper.
(A) Data Ware House
DW system is also organization . It
Collected data from organization for at least five
year or more. Data collected by DW System were
generated on various OS and Data base system.
There are different internal & external format.
Same time meta data about data is quite different
when it reach to centralize storage system . for
Exp:- Data is generated in hexadecimal format
which in centralize repository it has character data
type . Conversation of data from one system to
another is done precisely for data for maintain
consistently & reliability.
Conversation and Integration are main two
tasks for loading data in to data ware house
system.[5]
Further it also needed for generate meta data of
stored data to make proper utilization of data ware
house repository. Only meta data is not enough for
faster and efficient usage but retrieval algorithms
also have very important role in the system.
Data ware house are the enterprises most
valuable assets in what concerns critical business
information, making them an appending target for
malicious inside and out side attackers.
DW are mainly database storing consolidated
historical and current business data for decision
support system. [6]
In figure no. 1 DW system having three
layers. Electronic data extraction layer is first ,
transformation according to data warehouse
systems requirements is second layer and loading
cleaned data in to DW storage is third layer of DW.
Transformation layer also do cleansing data of
various formats as per systems required format.
In figure no. 2 shows the architectural diagram of
DW system. In figure no. 2 meta data is one of the
key elements for retrieval of data. Meta data
repository is used by retrieval algorithms to dig up,
dig down and dig across from internal storage.[7]
These algorithms work on the structure of the
objects and get the internal format of storage of
data. Meta data also contains information about the
indexes, clusters and other objects created on a
basic storage objects.
These objects are used in retrieval
algorithm to calculate best suitable execution path
for a given query. Primary constraints and integrity
constraint of a basic storage objects in meta data
helps retrieval algorithms to find data by using
relationship defined with in an object or between
two different objects.
Retrieval algorithms or data mining
algorithms are another important element of data
ware house system [7].
Using meta data , these algorithm select
suitable path for large data retrieval. Large amount
of data needs to have very well maintained meta
data because most of the summary field in a DSS
report are pre-calculated in the system. For
summary generation, this steps is also very
necessary. In the internal storage, if summary fields
are not pre-calculated and preserved then all
retrieval process of data will do summarization
process. This is redundant process and one can save
big amount of time by having all necessary
summaries are pre-calculated with basic storage
unit. Thus meta data should be structured such that
it is easy to access the summary the fields and
whenever there is any change in a dependent data
then meta data automatically refreshes all those
summary fields affected by that[1].
DW’s data many times come from the
departmental level where it is called a data mart.
All data marts work together and build centralized
data marts work together and build centralized data
work house system.
In DW system various resources such as
servers, storage devices , different kind of platform
, system software and network connectivity is
managed in house by organization’s people. Now
system software means database system ,
3. Mr. Krishna Prasad Bajgai1
. Int. Journal of Engineering Research and Applications www.ijera.com
ISSN: 2248-9622, Vol. 6, Issue 3, (Part - 6) March 2016, pp.67-73
www.ijera.com 69|P a g e
optimizer, retrieval algorithms and meta data repository needs to be taken care for authorities.
Fg. 1 Layer of data ware housing system for collection of data.
Rules for meta data management and best
selection path for optimizer in retrieval algorithm
are written as per system connectivity and network
band width in the system. It also calculates decay
for traversing data from one storage unit to another
storage unit and sum the amount for data retrieval
cost. Such calculation helps in the retrieval
methods and boosts the per formation of the
process. This system is tightly bound with the
resource and it’s utilization. [7]
(B) Elastic Cloud Computing:-
In EC2 (Elastic cloud computing) service
providers provides different kind of services on
rent. It includes Iaas, Paas and Saas as stated
above. Users of the EC2 does not require to
establish computing environment in their house.
Implementation of the S/W, deployment of system
and maintenance of data warehouse work are
performed by the service providers. Hardware &
Software maintenance are also dedicated from their
responsibility, thus it saves not only cost of the
system but complexity of the tedious task including
managing man power.[8] Fig.-3 shows the different
kind of services provided by cloud computing.
Computing resources like server , storage and
network are provided as a service through the
internet. These resources are available on different
OS or database platform or any middle ware
services. It also provides special S/W services like
document management , meta data management ,
data retrieval S/W etc.[7]
It is mandatory for services provider to provide
uninterrupted services with high scalability ,
robustness and security. A consumer need to
purchase computing power and other services
prescribed in the list as per their requirement
without bothering about computing resources and
manpower investment. There are some reason that
makes the EC2 (Elastic cloud computing ) to
consider as an alternative of in house data
warehouse concept and technology. Some of
among them are given below.
- Scalability :- With clouding computing
there is an illusion of infinite computing resources.
If a customer want more resources , he/she can rent
these resources and more capabilities will become
available to the customer almost instantly .
- Speed of deployment :- Offering of full-
fledged services by cloud providers can reduce
deployment time compared to in house
deployment.
- Reliability :- A cloud providers can
achieve high reliability , Theoretically . This can be
achieved not only by making backups , but also by
having more resources ; for example multiple data
centers.
- Elasticity :- In cloud computing a pay –per
–use payment model is generally applied , meaning
that you only pay for the resources you actually
use. This model ensures that deployment costs and
costs due to over – provisioning are avoided.
- Reduced costs :- Costs can be reduced
because users can hire services as per requirement.
The cost of effort is also reduced because one can
instantly acquires services as per demand.
Sourc
e 1
Sourc
e 2
Sourc
e3
Transfe
r
process
1
Transfe
r
process
1
D
W
Temp
file
Load
proces
s1
Load
proces
s2
4. Mr. Krishna Prasad Bajgai1
. Int. Journal of Engineering Research and Applications www.ijera.com
ISSN: 2248-9622, Vol. 6, Issue 3, (Part - 6) March 2016, pp.67-73
www.ijera.com 70|P a g e
Given are major benefit of EC2 that attract
people to have services on rent rather by having in
house system for Organization. Some time
Theoretical & Practical ration may not match but
overall services are efficient per quality.
III. EC2 AND DATA WAREHOUSES:-
As describe above to develop in house
data warehouse system in the Organization
required long span of time and sizeable Capital
investment in hardware & software. Lost of
technical reserves and technical main power
required to build on environment .[2]
Hugest data management become a key
issue for developer and administrator for such
system . system must work with good retrieval rate
and data should be summarize with more then two
or three dimensions for effective DSS Reports .
EC2 should provide these services on rent . Here,
neither technical resources and required to establish
in environment nor very high skilled people are
needed to employ for management. EC2 is more
Convenient for Small and medium scale
Organization . Unlike EC2, Organization data were
house is generated for their in-house data only
though it very complex process but it provide more
consistent, robust and scalable approach to the user
. in house data were house storage is granted for
betterment of Organizational DSS that enforces
organization’s standards and reduces redundancy.
Consistent standard , structures and secured
transactions are important in in-house data
warehouse system. Later, if organization wants to
switch over from one platform to another platform
they can do the migration and there is no
redundancy in data format and data values. [2]
In ECC, user may have to wait for
availability of services and some time for the
compatibility of H/W and S/W. Migration of data
from one system to another is not easy in EC2 so
for whole data is under control of services provider.
Here, one can have access to generate data; This
data can be send and retrieve by user but how data
has been stored in a system or what types of
internal structure is assigned to the data is unknown
to service taking organization. Small and medium
scale organization can find these services very
suitable to them. EC2 can reduce the cost of capital
in investment as well as maintenance of devices
and other resources. While in-house data
warehouse demands very high capital investment
and other running cost. According to the Darrel M.
West , minimum 40% of cost reduction is
estimated in EC2 for different cases compare to in-
house data warehouse system[5]. This is very
beneficial for organizations that want to cut off
there capital cost.
Data ware house has in-house storage
allocation so optimizer tool can give extra value to
device a cost of query execution. Data
administrator can alter the command of query
according to storage location in the network , band
with of establish network and distance between two
or more storage units. User of data warehouse can
5. Mr. Krishna Prasad Bajgai1
. Int. Journal of Engineering Research and Applications www.ijera.com
ISSN: 2248-9622, Vol. 6, Issue 3, (Part - 6) March 2016, pp.67-73
www.ijera.com 71|P a g e
have control on retrieval process. If centralize data
warehouse system is not working then data marks
of department is able to provide service on
department data.
Table -1 Approximate cost of data warehouse system in organization.
SN
o
Description Approximate
Estimated Cost
Approximate
of given set
of Values
Cumulative
cost for 5
years.
1 Capital cost of H/W $20000 to 40000 $30000 $30000
2 Operational cost of H/W
per Annum
$1000 to $2000 $1500 $7500
3 Capital cost of DBMS $10000 to $26000 $18000 $18000
4 Operational Cost of
DBMS per Annum
$400 to $1200 $800 $4000
5 Capital cost of ETL
tools
$4000 to $8000 $6000 $6000
6 Operational cost of ETL
tools per Annum
$400 to $600 $500 $25000
7 Capital cost of Retrieval
and data mining tools
$4000 to $
16000
$10000 $10000
8 Operation cost of
retrieval and data
mining
$1000 to $5000 $3000 $15000
Total Approximate amount spent for
organizational data warehouse system: $69800
It is not necessary requirement for
centralized data ware house system should be up
for the transaction and / or DSS reports. Later these
transaction can be merged the centralized data ware
house system as and when transformation batch get
started.[3]
In EC2, our own optimizer routine can not
work with service providers utilities. Services
provider can be here there internal optimizer to
optimize data belong to it’s storages for different
organizations. If certain service are down in cloud
computing then it effort the inter system. Some
time user are unable to access there local data too.
In EC2, user can not compute retrieval time and
cost of data processing.
Security is always and issue when we are
working with the internet. In EC2 , VPN is a good
solution for security of our transactions on the
internet. Where as data warehouse is private with in
within an organization , it is secured in
organization environment.[7]
Implementation of organization (In house
) data warehouse and EC2 are different in there
own way. Complete implementation of
organizational data ware house can be done with in
1-2 years or in same case it may takes more the two
year of periods.
Implementation and deployment is
complex process compare to other task of the
system. Unlike organization data warehouse system
, EC2 can start working within 2-3 months. In very
few cases it may takes up to 6 month of period of
complete management of necessary tools to starts
up producing data for data ware system. [5]
A study of both data warehouse system for
cost and performance point of view has been done
for more then 50 industry people. Table-1, show
the capital cost and operational cost of
organizational data ware house system. In-house
data warehouse system naturally having large
amount for deployment of new H/W and S/W. this
cost is one time cost and consider as a capital cost
of data ware house system while to maintain all
H/W and S/W need additional cost that is
operational cost of data warehouse .[3] Table – 1
shows capital cost and operational cost for
consecutive five year as per current rate.
The cost of data ware house system for
EC2 is shown in table -2 as per current market rate
service provider are changes different rate for there
different services like file server usage, CPU usage,
RAM usage , Band with usages etc.
Table-2 Approximation cost of EC2 data warehouse system .
Description Approximate estimated cost
- Cost of CPU hour usage
- Cost o RAM hours usage
- Cost of H/W based networking usages.
- Cost of out going band with
- Cost of cloud file storage usage.
- Cost of operating system platform usage.
All these utilities are provided to user by $150 to $600 per uer per month
rate.
- Cost of DBMS and other related product usage. In SAAS rate of utilities start with $200-$500 per user per month rate.
6. Mr. Krishna Prasad Bajgai1
. Int. Journal of Engineering Research and Applications www.ijera.com
ISSN: 2248-9622, Vol. 6, Issue 3, (Part - 6) March 2016, pp.67-73
www.ijera.com 72|P a g e
Given Table-2 is prepared by considering
range of rate available in the market for users. Later
on mean of the range will calculate effective cost
for EC2 data ware house system.
Table-1 shows set of values according to
current market. Capital cost is spent for once at the
time of deployment and operational cost is for
every year. Last column shown the cumulative cost
of consecutive five years operational cost and
capital cost. The approximate total cost is $115500
for organization data warehouse system.
In EC2 , as discussed earlier it is service
based architecture, charges are given as per
services. Table-2 shows cost of service charges.
There are many kinds of package available by
service provider. These package start with $150
and go up to $600. The average of the cost is $375.
Given charges are per user per service for
infrastructure and services used by user of
provider. Similarly charges for data warehouse
products like ETL tools, meta data management
tools, and data mining tools , also include in the
cost. Generally this charges starts with $200 and go
up to $500. The average of cost is $350 per user
per month.
IV. CONCLUSION
It can be said that there is no denying
about bugs in the equipment’s and technology
which makes our system fail. However to know
current trends of technology is beneficial. Similarly
data ware house and EC2 are in a growing stage of
ICT. One should understand the requirement and
budget of their application.
EC2 is work very better in small scale and
medium scale organization but some facts about its
working condition are also useful to know for
users. Unlike EC2 , in-house data warehouse
requires large time span to deploy system
successfully. There were many unsuccessful
development of data warehouse in earlier days. It
also demand large amount of manpower and
infrastructure to work behind it. Manpower with
the skill of current technology and trends is always
a hazards for any organization. [2]
Currently , in the market there are not lack
of hardware and software. Various types of tools
are available for an application . Prices are getting
down so storage media ,network band width ,
processing speed can be available at affordable
rate. Connection to world wide web is easy even at
remote place. Now people are more concern for
high reliability and security for their data. If
services are provided with less hazard and high
security then they will definitely choose services at
provided rate. [8]
The best way to choose an approximate
solution for organization is to define requirements.
Fix the time span and budget allocated for the
system . if requirement need to be solved within
short period of time and budget low then cloud
computing is very good . security and scalability of
data is vital then in-house data warehouse is more
preferable.
REFERENCE
[1]. Buddhadav B, Shah Neepa(2009).
Efficient data access method in a
hierarchical way for data warehouse . The
National Journal of Computer Science and
Technology , vol. I, Issue I, Jan-Jan, SV-
ACRID,pp. 26-30
[2]. S. Chaudhary, U. Dayal; An overview of
data warehousing and OLAP Technology.
In ACM Sigmod record, 1997
[3]. Kimball Ralph , Inmon W.H. (1996) , The
Data Warehouse toolkit practice technique
for building dimension data warehouses,
John Wiley & Sons, Inc.
[4]. N.W. Patan , M.A.T. de Aragao , K. Lee,
A.A.A. Fernades, R sakellariou:
Optimizing utility in cloud computing
through Autonomic workload execution .
IEEE Data Eng. Bull ,2009
[5]. West Darrell M. “Saving money through
cloud computing “, Government studies at
Brookings, aprial ,07,2010
[6]. Wyld, David , “Moving to the cloud : An
Introduction to cloud computing in
Government “, IBM center for the
Business of Government E-Government
series, 2009
[7]. Helfer , Markus (2001), Managing and
Measuring Data Quality in Data
Warehousing , Word Multi conference on
systematic , cybernetic and informatic, 22-
jul-01-25-jul-01, Orlando, Florida, USA
[8]. William Meknight (2000) , The CRM-
Ready Data Warehouse , DM Review
enterprise Column.
[9]. Alford, Ted and Gwen Morton , “ The
Economic of Cloud Computing:
addressing the Benefits of Infrastructure in
the cloud,” Booz , Allen , and Hamilton,
2009
[10]. Optimus information
http://www.optimusinfo.com/blog/2011/0
9/24 data warehousing in the cloud.html.
[11]. Amazon web services
http://aws.amazon.com/
[12]. IBM( General US web site)
http://www.ibm.com/us/en/
[13]. Oracle (General US web site)
http://www.oracle.com/us.index.html
[14]. Teradata (General website)
http://www.teradata.com
7. Mr. Krishna Prasad Bajgai1
. Int. Journal of Engineering Research and Applications www.ijera.com
ISSN: 2248-9622, Vol. 6, Issue 3, (Part - 6) March 2016, pp.67-73
www.ijera.com 73|P a g e
[15]. MySql, http://www.mysql.it/
[16]. Tomcat, http://www.tomcat.apache.org/
[17]. The global world net association ,
http://www.globalworldnet.org