Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Week 2&3

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 35

CATEGORIZATION OF

BIG DATA TOOLS


IN DATA ANALYSIS

Instructor: Vicky (Yanjie) Tao


OUTLINE

• Popular Spreadsheet programs and tools


• Key Statistical programs in data field.
• BI and Visualization tools
• Key Coding Languages
• Popular Cloud based services and tools
POPULAR
SPREADSHEET
PROGRAMS AND
TOOLS

• Microsoft Excel
• Google Sheet What is
• Zoho Sheet


LibreOffice
Apple Numbers
Spreadsheet ?
WHAT IS SPREADSHEET?

A spreadsheet is a computer program that can capture, display and manipulate data arranged in
rows and columns.

A single spreadsheet can be used as a worksheet to compile data for a purpose, or multiple
sheets can be combined to create an entire workbook.

A single spreadsheet can be used as a worksheet to compile data for a purpose, or multiple
sheets can be combined to create an entire workbook.
WHY SPREADSHEET?

• Business Data Storage


• Accounting and Calculation Uses
• Budgeting and Spending Help
• Assisting with Data Exports Can you provide more?
• Data Sifting and Cleanup
• Generating Reports and Charts
• Business Administrative Tasks
MICROSOFT EXCEL

A spreadsheet is a computer application, which is the


electronic document for computation, organization, analysis
and storage of data in tabular form (rows & columns).

Microsoft Excel is a spreadsheet developed by Microsoft for


Windows, macOS, Android, iOS and iPadOS. It features
calculation or computation capabilities, graphing tools, pivot
tables, and a macro programming language called Visual
Basic for Applications.
GOOGLE SHEET

Google Sheets is a spreadsheet application included as part of


the free, web-based Google Docs Editors suite offered by
Google.

Google Sheets is available as a web application; a mobile app


for: Android, iOS, Microsoft Windows, BlackBerry OS; and as a
desktop application on Google's ChromeOS.
ZOHO SHEET

Zoho Sheet is completely web-based, so collaboration is


smooth and intuitive, with real-time co-authoring, chat,
individual cell versions, and sharing permissions.

The ability to link external data, like a CSV file or RSS feed,
is a handy feature for businesses. Excel can manage this
but it’s much more complicated.
LIBREOFFICE SPREADSHEET

LibreOffice is a free and open-source office productivity


software suite, a project of The Document Foundation.
LibreOffice’s spreadsheet program is called Calc.

A great deal of thought has been put into helping new users
get started. The DataPilot feature facilitates copying over
data from a company database, while the Function Wizard
makes it easy to create complicated functions.
APPLE NUMBERS

Apple Numbers is a spreadsheet application developed by Apple Inc. as part of the iWork
productivity suite alongside Keynote and Pages. Numbers is available for iOS and macOS High
Sierra or newer. A key factor that makes Numbers one of the best spreadsheet software options
is its clean, modern interface.

Pros: It delivers powerful data security, enables offline collaboration, and offers sophisticated
design for charts and graphs.
Cons: It’s only compatible with Apple products, and it’s less user-friendly compared to other
spreadsheet software options.
STATISTICAL
PROGRAMS/TOOLS
IN DATA FIELD



R (R Foundation for Statistical Computing)
SPSS (IBM) Why need
• MATLAB (The Mathworks)


SAS (Statistical Analysis Software)
Microsoft Excel
statistical tools ?
WHAT & WHY STATISTICAL TOOLS?

Statistical methods and analytical tools help collect and analyze samples of data to identify
patterns and trends. These insights help make predictions that can be useful in making strategic
business decisions.

Statistical analysis tools are also effective at analyzing, describing, summarizing, and comparing
data of different organizations in the same industry. Many digital analytical tools can automate
the process of using specialized software and algorithms.

Companies can create better data collection processes, surveys, and tests to make data-based
decisions using statistical tools. These analytical tools also help align different functions of a
business, set realistic goals, and measure progress effectively.
R/R STUDIO

R is a language and environment for statistical computing and


graphics. R provides a wide variety of statistical and graphical
techniques and is highly extensible.

RStudio is an integrated development environment (IDE) for R. It


integrates the comprehensive state-of-the-art statistical package
R with a better user interface.
SPSS

SPSS, (Statistical Package for the Social Sciences) is one of the


most widely used statistics software package within human
behavior research.

SPSS offers the ability to easily compile descriptive statistics,


parametric and non-parametric analyses, as well as graphical
depictions of results through the graphical user interface (GUI).

SPSS includes the option to create scripts to automate analysis,


or to carry out more advanced statistical processing.
MATLAB

MATLAB is a multi-paradigm programming language and


numeric computing environment developed by MathWorks.
It is widely used by engineers and scientists.

MATLAB allows matrix manipulations, plotting of functions


and data, implementation of algorithms, creation of user
interfaces, and interfacing with programs written in other
languages.
SAS

SAS is a statistical analysis software/platform that offers options to


use either the GUI, or to create scripts for more advanced analyses.

It is a premium solution that is widely used in business, healthcare,


and human behavior research.

The coding can also be a difficult adjustment for those not used to
this approach.
BI AND VISUALIZATION
TOOLS

• Tableau
• Power BI
• Infogram
WHAT IS BI?

Business intelligence (BI) tools collect, process, and analyze large amounts of structured and
unstructured data from both internal and external systems.

The tools can perform functions such as data mining, data visualization, performance
management, analytics, reporting, text mining, predictive analytics, and much more.
WHY BI TOOLS?

BI tools can help your business take smart, agile steps toward accomplishing bigger goals.

• Centralized data
All your data, in one place.
• Self-sufficiency
Data no longer just belongs to your company’s IT team.
• Make predictions
With access to so much data from the past and present, employees can make evidence-based decisions.
• Automatic reports
Instead of inputting data manually into Excel spreadsheets or toggling between different tools, many BI tools
are automated.
• Reduces business costs
TABLEAU

Tableau is a powerful and fastest growing data visualization tool


used in the Business Intelligence Industry.

It helps in simplifying raw data in a very easily understandable


format. Tableau helps create the data that can be understood by
professionals at any level in an organization. It also allows non-
technical users to create customized dashboards.

• Data Blending
• Real time analysis
• Collaboration of data
POWER BI

Microsoft Power BI is a data visualization platform used primarily for


business intelligence purposes.

Power BI itself is composed of several interrelated applications:


Power BI Desktop, Pro, Premium, Mobile, Embedded, and Report
Server.

Power BI is also a part of Microsoft’s Power Platform, which includes


Power Apps, Power Pages, Power Automate, and Power Virtual
Agents.
INFOGRAM

Infogram is a fully-featured drag-and-drop visualization tool that


allows even non-designers to create effective visualizations of data
for marketing reports, infographics, social media posts, maps,
dashboards, and more.

Finished visualizations can be exported into several


formats: .PNG, .JPG, .GIF, .PDF, and .HTML. Interactive visualizations
are also possible, perfect for embedding into websites or apps.

Infogram also offers a WordPress plugin that makes embedding


visualizations even easier for WordPress users.
https://youtu.be/Wgg0My-rZnc
KEY CODING
LANGUAGES

• Python
• SQL
• R
• Java
• Scala
WHAT ARE PROGRAMMING
LANGUAGES?

Programming languages give instructions to computers.

A high-level programming language is typically more user-friendly and easier to read and write than
a low-level programming language. High-level languages require the use of a compiler or an
interpreter for their translation into the machine code.
(HTML, Python, JavaScript, Java, PHP, C++ etc..)

A low-level programming language is much faster. Processors run low-level languages without the
need for an interpreter. These are machine languages that computers understand directly.
(Assembly and Machine code)

How About C? (High-level or Low-level?)


WHY PROGRAMING LANGUAGE IN
DATA FIELD?

• Save Time

• Easily reproducing and sharing the analysis

• Clarifying the steps of the analysis


PYTHON

Python is one of the best programming languages for beginners in data analysis because it
is easy to use. Python is Interpreted programming language, in which program can be run
using a Python interpreter.

It is a general-purpose language; it is inherently object oriented. It supports multiple


paradigms, such as procedural, functional, and structured programming.

Its packages also make natural data processing easy and be suitable for a range of tasks,
including deep learning algorithms, natural language processing, and scientific computing.
JAVA

Java is faster than Python, which is a compiled language. The compiled code is converted to
bytecode and can be run on any platform that has Java Virtual Machine (JVM).
Java is one of the most popular languages to perform data analysis tasks.

Python or Java: Which is better for machine learning?


Python is a better option than Java when it comes to artificial intelligence (AI), machine learning
(ML), and data analysis because it is a multi-purpose programming language. Some developers
prefer it over Java because it offers accessibility, ease of use, and simplicity.
Java may be faster, but Python is easier to use overall for machine learning.
SCALA

Scala is a programming language with both functional and object-oriented approaches. The
multi-paradigm language runs on JVM, which is why many data analysts prefer to use it,
especially those who work with high-volume data sets.

Scala performs well with Apache Spark, the cluster computing framework. This makes it easy to
work with massive collections of data.

Scala is compiled on Java bytecode, making it possible for the language to work with Java. It
offers a wide variety of features for both data analysts and data scientists.
POPULAR CLOUD
BASED SERVICES
AND TOOLS

• Amazon Web Services (AWS)


• Microsoft Azure
• Google Cloud
WHAT IS CLOUD/CLOUD SERVICES?

Cloud computing is the on-demand availability of computer system resources, especially data
storage and computing power, without direct active management by the user.

Large clouds often have functions distributed over multiple locations, each of which is a data
center.

There are three main categories of cloud services:


• Software as a Service (SaaS) – Microsoft 365
• Platform as a Service (PaaS) – Microsoft Azure https://youtu.be/1ERdeg8Sfv4
• Infrastructure as a Service (IaaS) – AWS https://youtu.be/4OO77HFcCUs
WHY NEED CLOUD?

• File storage
You can store all types of information in the cloud, including files and email.
(Google Drive, Dropbox etc.,)

• File sharing
The cloud makes it easy to share files with several people at the same time.
(Flickr, iCloud Photos et.,)

• Back up data
Use the cloud to protect your files.
(Carbonite)
AMAZON WEB SERVICES (AWS)

AWS provides over 200 cloud computing services. The services include compute, storage,
analytics, IoT, AI/ML, and database services.

With over 33% market share in cloud infrastructure services, AWS is the most popular IaaS
today. It's also popular because it offers nearly infinite resources and services across the entire
cloud spectrum.

Other Infrastructure-as-a-Service examples are Digital Ocean, Microsoft Azure, Alibaba


Cloud, Google Compute Engine, Vultr, IBM Cloud, Oracle Cloud Infrastructure, Linode, etc.
MICROSOFT AZURE

Microsoft Azure PaaS is a deployment and development environment that delivers simple cloud-
based apps to complex, cloud-enabled applications.

From DevOps to IoT to AI, Azure offers an array of trusted elements that can help facilitate
building cloud-enabled services or custom apps.

Azure offers five main PaaS service elements: Web apps, Mobile apps, Logic apps, Functions,
and Web jobs.
GOOGLE CLOUD

Google Cloud Platform (GCP), offered by Google, is a suite of cloud computing services that runs
on the same infrastructure that Google uses internally for its end-user products, such as Google
Search, Gmail, Google Drive, and YouTube.

Alongside a set of management tools, it provides a series of modular cloud services including
computing, data storage, data analytics and machine learning.

Google Cloud offerings include IaaS, PaaS, and SaaS.


THE END

You might also like