CCS341 Data Warehousing Notes Unit I
CCS341 Data Warehousing Notes Unit I
Subject - Oriented
A data warehouse target on the modeling and analysis of data for decision-makers. Therefore, data
warehouses typically provide a concise and straightforward view around a particular subject, such as
customer, product, or sales, instead of the global organization's ongoing operations. This is done by
excluding data that are not useful concerning the subject and including all data needed by the users to
understand the subject.
Integrated
Time-Variant
Historical information is kept in a data warehouse. For example, one can retrieve files from 3 months, 6
months, 12 months, or even previous data from a data warehouse. These variations with a transactions
system, where often
only themost current file
is kept.
Propertiesof Data WarehouseArchitectures
1. Separation:Analyticaland transactional
processingshouldbe keepapartas muchas possible.
2. Scalability: Hardware and software architectures should be simple to upgrade the data volume, which
has to be managedand processed,and the numberof user's requirements,which have to be met,
progressively increase.
5. Administerability:
DataWarehousemanagement
shouldnot be complicated.
Types of Data Warehouse Architectures
➢ The figure shows the only layer physically available is the source layer. In this method, data
warehouses are virtual. This means that the data warehouse is implemented as a
multidimensional view of operational data created by specific middleware, or an intermediate
processing layer.
The metadata
repository stores
informationthatdefinesDWobjects.It includesthe following
parameters and
informationforthe middle andthe top-tierapplications:
1. A description of the DW structure, including the warehouse schema, dimension, hierarchies, data
mart locations, and contents, etc.
2. Operational metadata, which usually describes the currency level of the stored data, i.e., active,
archivedor purged,and warehousemonitoringinformation,i.e., usagestatistics,error reports,
audit, etc.
3. Systemperformancedata, which includesindices,used to improvedata access and retrieval
performance.
4. Information about the mapping from operational databases, which provides source RDBMSs and
their contents, cleaning
and transformation
rules, etc.
5. Summarization algorithms, predefined
queries, and reports business data, which include business
terms and definitions,
ownership information,
etc.
Load Performance
What Is Snowflake?
Snowflake is a data warehouse built for the cloud. It centralizes data from multiple sources, enabling you
to run in-depth business insights that power your teams.
At its core, Snowflake is designed to handle structured and semi-structured data from various sources,
allowing organizations to integrate and analyze data from diverse systems seamlessly. Its unique
architecture separates compute and storage, enabling users to scale each independently based on their
specific needs. This elasticity ensures optimal resource allocation and cost-efficiency, as users only pay
for the actual compute and storage utilized.
Snowflake uses a SQL-based query language, making it accessible to data analysts and SQL developers.
Its intuitive interface and user-friendly features allow for efficient data exploration, transformation, and
analysis. Additionally, Snowflake provides robust security and compliance features, ensuring data privacy
and protection.
One of Snowflake’s notable strengths is its ability to handle large-scale, concurrent workloads without
performance degradation. Its auto-scaling capabilities automatically adjust resources based on the
workload demands, eliminating the need for manual tuning and optimization.
Another key advantage of Snowflake is its native integration with popular data processing and analytics
tools, such as Apache Spark, Python, and R. This compatibility enables seamless data integration, data
engineering, and advanced analytics workflows.
What Is Oracle?
Oracle is available as a cloud data warehouse and an on-premise warehouse (available through Oracle
Exadata Cloud Service). For this comparison, DreamFactory will review Oracle’s cloud service.
Like Snowflake, Oracle provides a centralized location for analytical data activities, making it easier for
businesses like yours to identify trends and patterns in large sets of big data.
Oracle’s flagship product, Oracle Database, is a robust and highly scalable relational database
management system (RDBMS). It is known for its reliability, performance, and extensive feature set,
making it suitable for handling large-scale enterprise data requirements. Oracle Database supports a wide
range of data types and provides advanced features for data modeling, indexing, and querying.
In addition to its RDBMS, Oracle provides a complete ecosystem of data management tools and
technologies. Oracle Data Warehouse solutions, such as Oracle Exadata and Oracle Autonomous Data
Warehouse, offer high-performance, optimized platforms specifically designed for data warehousing and
analytics workloads.
Oracle’s data warehousing offerings come with a suite of powerful analytics and business intelligence
tools. Oracle Analytics Cloud (OAC) provides comprehensive self-service analytics capabilities, enabling
users to explore and visualize data, build interactive dashboards, and generate actionable insights.
Snowflake and Oracle’s cloud data warehouse adopt a pay-as-you-go model, where you only pay for the
amount of data you consume. This model can work out to be expensive if you have large amounts of data,
but Snowflake might save you more money in the long run. That’s because clusters will stop when you’re
not running any queries (and resume when queries run again).
Ease of Use
Snowflake automatically applies all upgrades, fixes, and security features, reducing your workload.
Oracle, however, typically requires a database administrator of some kind, which can add to the cost of
data warehousing in your organization. Similar problems exist with scaling these warehouses to meet the
needs of your business. Snowflake data warehouse manages partitioning, indexing, and other data
management tasks automatically; Oracle usually requires a database administrator to execute any
scalability-related changes. Consider these differences when comparing Snowflake vs. Oracle.
Features
What about Snowflake vs Oracle features? Oracle lets you build and run machine learning algorithms
inside its warehouse, which can prove incredible for your analytical objectives. Snowflake lacks this
capability, requiring users to invest in a stand-alone machine learning platform to run algorithms. Oracle
also offers support for cursors, making it simple to program data.
On the flip side, Snowflake comes with an integrated automatic query performance optimization feature
that makes it easy to query data without playing around with too many settings.
Snowflake and Oracle take data security seriously, with features such as data encryption, IP blocklists,
multi-factor authentication, access controls, and adherence to data security standards such as PCI DSS.
Data Governance
Users should be aware of data governance principles when transferring data to Snowflake or Oracle.
Legislation such as GDPR and HIPAA mean businesses can incur expensive penalties for incorrectly
moving sensitive information between data sources and a warehouse. Both platforms handle data
governance adequately, with the ability to manage data quality rules and data stewardship workflows.
While Snowflake and Oracle are effective data warehouses for analytics, both have steep learning curves
that many businesses might struggle with. Companies will need coding knowledge (SQL) when
operationalizing data in these warehouses and require a data engineer to ensure a smooth transfer of data
between sources and their warehouse of choice.
Moving data to Snowflake or Oracle typically involves a process called Extract, Transfer, Load, or ETL.
That means users have to extract data from a source like a relational database, transactional database,
customer relationship management (CRM) system, enterprise resource planning (ERP) system, or other
data platform. After data extraction, users must transform data into the correct format for analytics before
loading it to Snowflake or Oracle. Another data integration option is Extract, Load, Transfer, where users
extract data and load it to Snowflake or Oracle before transforming that data into a suitable format.
ETL, ELT, and other data integration methods require a specific skill set because these processes are so
complicated. Using DreamFactory can provide a solution to this problem. It connects data sources to
Snowflake or Oracle through a live, documented, and standardized REST API, offering an alternative to
data warehousing.
Snowflake and Oracle are two prominent players in the data warehousing space, each offering its own
strengths and capabilities. Understanding the key differences between Snowflake and Oracle can help
organizations make informed decisions when choosing a data warehousing solution.
One of the primary differences lies in their architecture. Snowflake is designed as a cloud-native platform,
built from the ground up for the cloud environment. It offers a unique separation of compute and storage,
allowing independent scaling and optimized performance. This architecture enables seamless scalability,
cost-efficiency, and flexibility, making it an attractive choice for organizations operating in the cloud.
On the other hand, Oracle has a long-standing history in the data warehousing market, initially built for
on-premises deployments and later transitioning to the cloud. Oracle provides a comprehensive suite of
tools and solutions, including its flagship Oracle Database, which is widely recognized for its reliability,
scalability, and robust features. Oracle’s offering appeals to organizations with existing Oracle
deployments, as it allows them to leverage their familiarity with Oracle tools, interfaces, and ecosystem.
In terms of performance and scalability, Snowflake excels in its ability to handle large-scale workloads.
Its multi-cluster architecture and auto-scaling capabilities ensure optimal performance even with
concurrent workloads. Additionally, Snowflake’s native support for semi-structured data allows
organizations to work with diverse data types more efficiently.
Oracle, on the other hand, offers powerful optimization capabilities, particularly with its Exadata and
Autonomous Data Warehouse offerings. These platforms are specifically designed to deliver high-
performance data processing, analytics, and query optimization for enterprise-scale workloads.
Data integration and analytics are also key areas of differentiation. Snowflake provides native integration
with various data processing and analytics tools, making it easier for organizations to leverage their
existing analytics ecosystem. On the other hand, Oracle offers a comprehensive ecosystem of data
integration and analytics tools, enabling organizations to tap into a wide range of solutions for their
specific requirements.
When comparing Snowflake and Oracle, two prominent players in the data warehousing landscape,
several factors come into play. Let’s delve into the comparison to help you determine which platform
might be the best fit for your needs.
• Oracle: Oracle has a long-standing reputation for its user-friendly interfaces and robust tools.
Oracle Database, combined with its analytics and business intelligence solutions, offers a
familiar environment for users already experienced with Oracle technologies.
4. Integration and Ecosystem:
• Snowflake: Snowflake provides native integration with popular data processing and analytics
tools, facilitating seamless data integration and workflows. It has a growing ecosystem of
partners and connectors, expanding its compatibility with various third-party systems.
• Oracle: Oracle’s extensive ecosystem offers a wide range of tools, applications, and industry-
specific solutions. With its strong integration capabilities and partnerships, Oracle enables
organizations to connect and consolidate their data across multiple sources effectively.
5. Security and Compliance:
• Snowflake: Snowflake places a strong emphasis on security and compliance. It provides
robust security features, including encryption, access controls, and compliance certifications,
ensuring data protection and regulatory compliance.
• Oracle: Oracle has a long history of prioritizing security and compliance. Its data management
solutions offer advanced security features, auditing capabilities, and data governance controls
to safeguard sensitive information.
Snowflake vs. Oracle: How DreamFactory Can Help
When comparing Snowflake vs. Oracle, realize that both providers offer superior data warehouses that
help you operationalize and analyze real-time data in your organization. Snowflake might be easier to use
and work out cheaper because of its ability to pause clusters when not running queries. However, Oracle
comes with support for cursors and in-built machine learning capabilities, helping you program and
generate advanced insights from workloads.
You can also compare Snowflake vs Oracle with other data warehouses such as Amazon (AWS) Redshift,
Microsoft Azure, and Google BigQuery. Whatever option you choose, think about how your business will
transfer data to a warehouse.
Create a Snowflake or Oracle REST API in 30 seconds with DreamFactory’s API generation solution. All
you need is your data warehouse credentials, and DreamFactory will take the rest by generating OpenAPI
documentation and securing your API with keys. Start your FREE DreamFactory trial now!
What is Snowflake?
Snowflake is a cloud-based data warehousing platform known for its modern architecture, scalability, and
performance. It offers a shared data model, separating compute and storage, and provides flexibility, ease
of use, and native integration with various data processing tools.
What is Oracle?
Oracle is a renowned provider of data warehousing and database management systems. It offers a
comprehensive suite of products and services, including Oracle Database, designed for enterprise-scale
data management, analytics, and business intelligence.
Snowflake excels in scalability, allowing independent scaling of compute and storage. It offers a cloud-
native architecture, flexibility, native support for semi-structured data, and strong performance even with
concurrent workloads. It provides an intuitive interface and self-tuning capabilities.