Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Data Science

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

Unit -1

Ques 1 What is data science and its importance for business and
management?
Ans: Data science is the study of data to extract meaningful insights from any
sought of data that allow businesses, organization to take right or
appropriate decision such as finding customer behaviour, preference,
trends etc. The data science involves various processes such as fetching
(getting data from servers, devices, sensors, etc.), cleaning, organizing,
analysis (leveraging predictive analysis for business growth), forecast
(based on analysis). It combines elements from statistics, mathematics,
computer science, and domain-specific knowledge to analyse and
interpret complex data sets.
The purpose or use of data science is to uncover patterns, trends,
correlations, and other valuable information that can help in decision-
making and problem-solving.
The data is the backbone of modern business. The good data collection
and analysis helps a business to grow whereas bad data increases
expenses.
Therefore, importance of data science for business and management
can be summarized/explained below using some key pointers:
1. Empower to make improved decision/resolution:
When business use data to drive out decision or resolution, then they
focus on quality of information/data that can help to make better
decision. Taking decision based on data improves the output and can
help businesses, society and reduce the wastage of resources and that
provides monetary benefit also. Data can be applied or used to take
business decision for expansion, sales & marketing, customer
preferences, and potential risks, financial challenges, and quality of
service (e.g.: suggesting products based on customer preference).

2. Identifying and focusing on the target audience:


Many companies have different sources for gathering customer data
such as product surveys, metrics collection, purchase of data from
different providers. The collected data becomes worthless if it is not
utilized properly, for example, demographic information, demand or
need.

3. When data science is applied, organizations can effectively leverage


data science to gain a competitive edge in the marketplace. They can
1|Page
identify untapped opportunities, develop innovative products.
Additionally, this data may be utilized to publish information on
websites or in permanent records that staff members may access it
any time.

4. Assessing opportunities: By identifying trade-offs that must be


handled and gaps that must be filled, the data science opportunity
assessment enables quickly determining the most valuable data
science prospects for the business.

5. Build better products: The target market can be reached with a


better product by employing data science in business in one of two
ways: either by customizing a product or service to make it more
individualized or by offering a unique method to use the product or
service.

6. Right employee selection: Through data mining, internal


application and resume processing, data science may help in hiring
staff make decisions more quickly and accurately. HR can sort
through all of the information driven from different job portals and
database providers to identify applicants who best meet the needs of
the company. It saves time and selects the best talent.

7. Intrinsic concepts: During promotional drives, new product


development, or content selection many of the constraints can be
eliminated using data science. Data analytics allows for a
comprehensive view of the clients and provides a better
understanding of what they need and how to meet their needs.

8. Data science and blockchain: Due to the decentralized nature of


the blockchain, data scientists can make the best decision directly
from their devices. The maintenance of massive amounts of data is
made easier with the usage of decentralized ledgers.

Ques 2. What is difference between data, information, Knowledge and


Wisdom? Discuss with definition and characteristics.
Ans: The Data-Information-Knowledge-Wisdom (DIKW) pyramid explain
the progression of raw data to valuable insights. It provides a framework
to discuss the level of meaning and utility within data. Each level of the
pyramid builds on lower levels, and to effectively make data-driven
decisions, we need all four levels.
2|Page
Each step up the pyramid answers questions about the initial data and
adds value to it. The more questions we answer, the higher we move up
the pyramid. In other words, the more we enrich our data with meaning
and context, the more knowledge and insights we get out of it. At the top
of the pyramid, we have turned the knowledge and insights into a
learning experience that guides our actions.

1. Data:
 Definition: Data refers to raw, unorganized facts or symbols that represent
events, measurements, or observations. It lacks context and meaning on its
own.
- Characteristics:
1. Objective: Data is objective and neutral, devoid of interpretation or
analysis.
2. Granularity: Data can be granular, consisting of individual data points,
or aggregated into larger datasets.
3. Quantity: Data can be quantitative (numerical) or qualitative
(descriptive).
Examples: Numbers, symbols, text, images, sounds, etc.

3|Page
2. Information:
 Definition: Information is processed, organized, and structured data that
provides context, meaning, and relevance. It answers specific questions and
aids decision-making.
- Characteristics:
1. Contextual: Information is contextualized and relevant to a specific
purpose or context.
2. Interpretation: Information involves interpretation and analysis of data
to extract meaning.
3. Actionable: Information is actionable, enabling informed decision-
making or understanding.
Examples: Reports, summaries, charts, graphs, statistics, etc.
3. Knowledge:
 Definition: Knowledge is derived from information through understanding,
experience, and expertise. It represents the assimilation and internalization
of information, leading to insights, skills, and capabilities.
- Characteristics:
1. Personal: Knowledge is often personal and subjective, shaped by
individual experiences, beliefs, and perspectives.
2. Dynamic: Knowledge evolves over time through learning, reflection, and
adaptation.
3. Tacit and Explicit: Knowledge can be tacit (internalized, intuitive) or
explicit (articulated, codified).
4. Transferable: Knowledge can be shared, transferred, and applied across
contexts.
Examples: Expertise, skills, know-how, best practices, theories, etc.
4. Wisdom:
 Definition: Wisdom is the highest level of understanding that goes beyond
knowledge and involves discernment, judgment, and ethical considerations.
It involves the application of knowledge and experience to make sound
decisions in complex situations.
- Characteristics:
1. Reflective: Wisdom involves reflection, introspection, and critical
thinking.
2. Ethical: Wisdom considers moral and ethical principles in decision-
making.
3. Long-term perspective: Wisdom involves considering long-term
consequences and sustainability.
4|Page
4. Contextual: Wisdom recognizes the context and complexity of situations,
avoiding oversimplification.
Examples: Prudence, foresight, discernment, ethical leadership, etc.
In the context of data science, the DIKW pyramid explains the process of
transforming raw data into actionable insights. Data is collected,
processed, and analysed to generate information, which is then used to
develop knowledge and wisdom. By leveraging advanced analytics
techniques and technologies, organizations can extract valuable insights
from their data and use them to improve their operations, products, and
services. The DIKW pyramid is a hierarchical model that illustrates the
relationship between four levels of information processing: Data,
Information, Knowledge, and Wisdom.
Ques 3. What is python? Why its so popular in data science space? Name few
editors which used for working with python.
Ans: Python is a high-level, interpreted programming language known for
its simplicity, readability, and versatility. It was created by Guido van
Rossum and first released in 1991. Python emphasizes code
readability and allows programmers to express concepts in fewer lines
of code compared to other languages, making it particularly suitable
for rapid development and prototyping.
Python's popularity in the data science space can be attributed to several
factors:
1. Python's syntax is straightforward and easy to learn, even for beginners. Its
readability and simplicity make it accessible to individuals with diverse
backgrounds, including non-programmers and domain experts.

2. Python boasts a vast ecosystem of libraries and frameworks tailored for data
science, machine learning, and scientific computing. Popular libraries such
as NumPy, pandas, matplotlib, scikit-learn, TensorFlow, and PyTorch
provide powerful tools for data manipulation, visualization, and modeling.

3. Python has a large and active community of developers, researchers, and


enthusiasts who contribute to its growth and development. This vibrant
community fosters collaboration, knowledge sharing, and the creation of
new tools and resources.

4. Python seamlessly integrates with other programming languages and


technologies, allowing data scientists to leverage existing code and
infrastructure. It can be easily integrated with databases, web frameworks,
and big data processing tools, facilitating end-to-end data analysis
pipelines.
5|Page
5. Python's versatility enables data scientists to tackle a wide range of tasks,
from data exploration and visualization to advanced machine learning
algorithms. Its scalability allows for handling both small-scale and large-
scale data projects effectively.

Editors commonly used for working with Python are:-


 Jupyter Notebook / JupyterLab
 Colaboratory /Google Colab
 Visual Studio Code (VS Code)
 PyCharm
 Spyder

1. Jupyter Notebook is an open-source web application that allows users to


create and share documents containing live code, equations, visualizations,
and narrative text.

2. VS Code is a lightweight and customizable code editor developed by


Microsoft. It offers built-in support for Python through extensions,
providing features like IntelliSense, debugging, and syntax highlighting.

3. Colaboratory, also referred to as Google Colab, is a free cloud-based tool


that lets users write and run Python code together in a Jupyter Notebook
environment. The purpose of Google Collaboratory Notebook is to make
machine learning (ML) and data science jobs easier by giving users access
to free GPU resources in a virtual environment called Google Colab Python.

4. PyCharm is a powerful integrated development environment (IDE)


specifically designed for Python development. It provides advanced features
such as code completion, debugging, version control integration, and
support for web development frameworks.

5. Spyder is an open-source IDE for scientific computing and data analysis. It


features an interactive development environment with support for data
visualization, debugging, and integration with scientific libraries like
NumPy and matplotlib.

Unit-2

6|Page
Ques 1. What is information technology?
Ans: Information Technology is a field that manages and processes information
for large-scale organizations or companies. Information technology is now
synonymous with any form of digital communications and technologies. It
encompasses a broad range of technologies, applications, and practices
that facilitate the management and processing of information in
organizations and society at large.
1. Hardware: This includes physical devices such as computers, servers,
storage devices, networking equipment, and peripherals (e.g., monitors,
printers) that are used to process and store data.
2. Software: Software refers to the programs, applications, and operating
systems that enable users to perform specific tasks on computers and other
digital devices. This includes everything from productivity software (e.g.,
word processors, spreadsheets) to specialized business applications and
system software.
3. Networking: Networking technologies enable communication and data
exchange between computers and other devices within a network. This
includes local area networks (LANs), wide area networks (WANs), the
internet, and various networking protocols and technologies (e.g., Ethernet,
TCP/IP).
4. Data Management: Data management involves the organization, storage,
retrieval, and protection of data assets within an organization. This includes
databases, data warehouses, data analytics tools, and data security
measures to ensure the integrity, confidentiality, and availability of data.
5. Cybersecurity: Cybersecurity focuses on protecting information systems,
networks, and data from unauthorized access, cyberattacks, and data
breaches. This includes measures such as firewalls, encryption, access
controls, and security policies and procedures.
6. Cloud Computing: Cloud computing involves the delivery of computing
services (e.g., servers, storage, databases, software) over the internet on a
pay-as-you-go basis. It enables organizations to access computing resources
on-demand without the need for extensive hardware investments.
7. Mobile Computing: Mobile computing technologies enable users to
access information and applications on mobile devices such as smartphones
and tablets. This includes mobile apps, mobile operating systems, and
mobile-friendly websites.
8. Emerging Technologies: Information technology is constantly evolving,
with new technologies such as artificial intelligence (AI), machine learning,

7|Page
the Internet of Things (IoT), blockchain, and augmented reality (AR)
reshaping the IT landscape and offering new opportunities for innovation.
Ques 2. What do you understand by Cloud Infrastructure? Discuss utility of
cloud in enhancing business performance.
Ans: Cloud infrastructure refers to the hardware and software components that
are necessary to support cloud computing services. Cloud computing is the
on-demand delivery of IT resources (physical servers or virtual servers,
data storage, networking capabilities, application development tools,
software, AI-powered analytic tools etc) over the Internet with pay-per-use
pricing. This model offers customers greater flexibility and scalability
compared to traditional on-premises infrastructure.
Organizations of every type, size, and industry are using the cloud for a wide
variety of use cases, such as data backup, disaster recovery, email, virtual
desktops, software development and testing, big data analytics, and
customer-facing web applications. For example, healthcare companies are
using the cloud to develop more personalized treatments for patients.
Financial services companies are using the cloud to power real-time fraud
detection and prevention. Video game makers are using the cloud to deliver
online games to millions of players around the world.
The utility of cloud infrastructure in enhancing business performance can
be summarized in several key aspects:
1. Scalability: Cloud infrastructure allows businesses to scale their computing
resources up or down dynamically based on demand. This scalability enables
organizations to handle fluctuations in workload more efficiently, ensuring
optimal performance during peak periods while avoiding unnecessary costs
during periods of low demand.
2. Flexibility: Cloud infrastructure offers flexibility in resource allocation,
allowing businesses to tailor their computing environment to their specific
requirements. Organizations can easily deploy and configure virtual machines,
storage, and networking resources to meet changing business needs without
the constraints of physical infrastructure.
3. Cost Efficiency: Cloud infrastructure follows a pay-as-you-go model, where
businesses only pay for the resources they consume. This cost-effective pricing
structure eliminates the need for upfront capital expenditures on hardware
and reduces operational costs associated with maintenance, upgrades, and
staffing.
4. Accessibility and Collaboration: Cloud infrastructure enables remote access
to computing resources and applications from anywhere with an internet

8|Page
connection. This accessibility promotes collaboration among geographically
dispersed teams, improves productivity, and enables employees to work more
flexibly.
5. Reliability and Redundancy: Cloud infrastructure providers typically offer
robust data centers with high levels of redundancy and reliability. This
ensures that business-critical applications and data remain available and
accessible, even in the event of hardware failures, natural disasters, or other
disruptions.
6. Security: Cloud infrastructure providers implement stringent security
measures to protect data and infrastructure from unauthorized access,
cyberattacks, and data breaches. These measures include encryption, identity
and access management, intrusion detection, and regular security audits,
providing businesses with peace of mind regarding the safety and integrity of
their data.
7. Innovation and Agility: Cloud infrastructure fosters innovation by providing
access to cutting-edge technologies and services, such as artificial intelligence,
machine learning, big data analytics, and Internet of Things (IoT) platforms.
This enables businesses to experiment with new ideas, develop innovative
products and services, and gain a competitive edge in the marketplace.
Overall, cloud infrastructure offers numerous benefits for businesses seeking
to enhance their performance, agility, and competitiveness in today's digital
economy. By leveraging the scalability, flexibility, cost efficiency, and advanced
capabilities of cloud computing, organizations can accelerate their digital
transformation efforts and achieve greater success in meeting the evolving
needs of customers and stakeholders.
Ques 3. What are different types of cloud services and delivery models?
Ans: Cloud computing offers a variety of services and delivery models to meet
the diverse needs of users and organizations. The main types of cloud
services and delivery models include:
1. Infrastructure as a Service (IaaS):
 IaaS provides virtualized computing resources over the internet,
including servers, storage, networking, and virtualization
infrastructure. Users can provision and manage these resources on-
demand, scaling them up or down as needed. Examples of IaaS
providers include Amazon Web Services (AWS) EC2, Microsoft Azure
Virtual Machines, and Google Cloud Compute Engine.
2. Platform as a Service (PaaS):

9|Page
 PaaS offers a complete development and deployment environment in
the cloud, including hardware, operating systems, middleware,
development tools, and runtime environments. It allows developers
to build, deploy, and manage applications without worrying about
underlying infrastructure complexities. Examples of PaaS offerings
include Google App Engine, Microsoft Azure App Service, and Heroku.
3. Software as a Service (SaaS):
 SaaS delivers software applications over the internet on a
subscription basis, eliminating the need for users to install, maintain,
and update software locally. Applications are hosted and managed by
the service provider, who handles infrastructure, security, and
maintenance tasks. Common examples of SaaS applications include
email services like Gmail, productivity suites like Microsoft Office
365, and customer relationship management (CRM) systems like
Salesforce.
4. Function as a Service (FaaS) / Serverless Computing:
 FaaS allows developers to deploy individual functions or pieces of
code in the cloud without managing the underlying infrastructure.
Providers automatically scale resources to handle incoming requests,
charging users only for the compute resources consumed during
function execution. Examples of FaaS platforms include AWS Lambda,
Azure Functions, and Google Cloud Functions.
5. Storage as a Service (STaaS):
 STaaS offers cloud-based storage solutions where users can store and
access data over the internet. It eliminates the need for maintaining
on-premises storage infrastructure and provides scalability,
durability, and accessibility for storing large volumes of data.
Examples of STaaS providers include AWS S3 (Simple Storage
Service), Azure Blob Storage, and Google Cloud Storage.
6. Database as a Service (DBaaS):
 DBaaS provides managed database solutions in the cloud, offering
users access to scalable, reliable, and fully managed database
instances without the need for infrastructure management. Providers
handle tasks such as provisioning, backups, security, and
performance optimization. Examples of DBaaS offerings include AWS
RDS (Relational Database Service), Azure SQL Database, and Google
Cloud SQL.
10 | P a g e
These cloud services and delivery models offer varying levels of
abstraction and management responsibilities, allowing organizations to
choose the most suitable option based on their specific requirements,
budget, and expertise. They enable businesses to leverage the benefits of
cloud computing, including scalability, flexibility, cost efficiency, and
innovation, to accelerate digital transformation initiatives and drive
business growth.

11 | P a g e

You might also like