Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
125 views198 pages

History of Information & Communication Technologies (ICT)

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 198

History of Information &

Communication Technologies (ICT)


Outline
• History of Information and Communication Technologies (ICT)
• History of Computers
• History of Software
• History of the Internet
• Impact of ICT
• Economic Impact
• Social Impact

https://royalsocietypublishing.org/doi/10.1098/rsta.2019.0061
https://www.bcg.com/publications/2012/retail-consumer-products-digitals-disruption
First generation computers
(1940-1956)
• The first computers used vacuum tubes for circuitry and
magnetic drums for memory.
• They were often enormous and taking up entire room.
• First generation computers relied on machine language.
• . They were very expensive to operate and in addition to
using a great deal of electricity, generated a lot of heat,
which was often the cause of malfunctions.
• The UNIVAC and ENIAC computers are examples of first-
generation computing devices.
Second generation computers
(1956-1963)
• Transistors replaced vacuum tubes and
ushered in the second generation of
computers.
• Second-generation computers moved from
cryptic binary machine language to symbolic.
• High-level programming languages were also
being developed at this time, such as early
versions of COBOL and FORTRAN.
• These were also the first computers that
stored their instructions in their memory.
Third generation computers
(1964-1973)
• The development of the integrated circuit was the hallmark of the third
generation of computers.
• Transistors were miniaturized and placed on silicon chips, called
semiconductors.
• Instead of punched cards and printouts, users interacted with third
generation computers through keyboards and monitors and interfaced
with an operating system.
• Allowed the device to run many different applications at one time.
• 1960's saw the rise of Operating Systems
 an operating system is a collection of programs that manage peripheral devices and other
resources
 allowed for time-sharing, where users share a computer by swapping jobs in and out
 as computers became affordable to small businesses, specialized programming languages
were developed
Pascal (1971, Wirth), C (1972, Ritche)
Fourth generation computers
(1973-1985)
• The microprocessor brought the fourth
generation of computers, as thousands of
integrated circuits were built onto a single
silicon chip.
• The Intel 4004 chip, developed in 1971, located
all the components of the computer.
• From the central processing unit and memory
to input/output controls—on a single chip.
• Fourth generation computers also saw the
development of GUIs, the mouse and
handheld devices.
Fifth generation computers
(1985 and beyond)
• High-end machines (e.g. servers) can have multiple
CPU’s
• Fifth generation computing devices, based on artificial
intelligence.
• Are still in development, though there are some
applications, such as voice recognition.
• The use of parallel processing and superconductors is
helping to make artificial intelligence a reality.
• The goal of fifth-generation computing is to develop
devices that respond to natural language input and
are capable of learning and self-organization.
Technology Trends – Moore’s Law
Computing power doubles in power and halves in price every 18
months
Price of Computing
Quantum Computing
Beam Splitter
• Computation with coherent atomic-
scale dynamics.
• The behavior of a quantum computer
is governed by the laws of quantum
mechanics.
• In quantum systems possibilities
count, even if they never happen!
• Each of exponentially many • Half of the photons leaving the light
possibilities can be used to perform a source arrive at detector A;
part of a computation at the same
time. • the other half arrive at detector B.
DNA Computing
• DNA computing is utilizing the
property of DNA for massively parallel
computation.
• With an appropriate setup and
enough DNA, one can potentially
solve huge problems by parallel
search.
• Utilizing DNA for this type of
computation can be much faster than
utilizing a conventional computer
• Leonard Adleman proposed that the
makeup of DNA and its multitude of
possible combining nucleotides could
have application in computational
research techniques.
Software
The major types of software
System Software
Application software
System software Operating Systems
Schedules computer events
Hardware Allocates computer resources
Monitor events

Users Language translators


Interpreters
Compilers

Drivers & Utilities


Application Software Routine operations
Programming languages, Assembly Manage data
language, FORTRAN, BASIC, PL/1 Manage peripheral devices
PASCAL, C, C++, Java, Python, R, etc. Mange Security
MS Office, Browsers, Enterprise SW, etc.
Software evolution timeline
• 1960 – COBOL • 1981 – MS-DOS for IBM PCs
• 1964 – SABRE reservation system by • 1982 – Lotus 1-2-3 spreadsheet
IBM • 1983 – MS Word, Oracle
• 1969 – UNIX operating System, • 1984 – MatLab, a mathematical
Customer Information and Control software
System (CICS, transaction processing
system) • 1985 – C++ programming Language
• 1970-72 – PASCAL and C programming • 1987 – virtual reality
languages, SAP • 1988 -- Peoplesoft
• 1976 – CP/M OS for microcomputers • 1990 – MS Windows 3.0, Photoshop
• 1978 – WordStar, 1st word processor
• 1979 – VisiCalc, 1st spread sheet
Software evolution timeline
• 1991 – Linux OS, Pretty Good Privacy • 2005 – Hadoop, YouTube
(PGP) • 2006 – Twitter
• 1992 -- WWW • 2007 – IPhone, IOS
• 1993 – Windows NT • 2008 -- App Store, Google Chrome
• 1994 – Yahoo, Amazon • 2010 – Stuxnet virus, iTunes,
• 1995 – Java programming language, Instagram
Windows 95 • 2011 – Adobe creative cloud
• 1998 -- Google • 2012 – Google Play
• 2000 – Y2K • 2014 – Apple pay, HTML 5
• 2001 – bit torrent, Wikipedia • 2015 – Apple watch
• 2002 – Firefox • 2016 – 1st quantum computer
• 2004 – Facebook
Computer Networking and The
Internet
Computer Network
• The computer network is a set of computers connected to share
resources
• In a network computing devices exchange the data with each other by
using the connections between nodes. The data links are established
over cable media such as wires or optic cables or wireless media such
as Wi-Fi.
Networks allow different devices to communicate with each
other

Network of Networks
Computer Network Topology
What Is the Internet?
• A network of networks, joining many
government, university and private
computers together and providing
an infrastructure for the use of E-mail, bulletin
boards, file archives, hypertext documents, You are somewhere here
databases and other computational resources

• The vast collection of computer networks


which form and act as a single huge network
for transport of data and messages across
distances which can be anywhere from the
same office to anywhere in the world.
Written by William F. Slater, III 1996
President of the Chicago Chapter of the Internet Society
A Brief Summary of the Evolution of the Internet
2010
1945 Internet Web 2.0
Web 3.0
2010 –
Boom 2003 –
Age of 2020
& Bust 2010
eCommerce 2001
Mosaic Begins
WWW 1995
Internet Created
TCP/IP Named Created 1993
and 1989
Created
ARPANET Goes
1972
1969 TCP/IP
Hypertext 1984
Invented
1965
Packet
Switching
Invented
1964
First Vast User Interface
Computer
Network
Envisioned
1962
A Silicon
Mathematical Chip
Theory of 1958
Memex Communication Infrastructure
Conceived 1948
1945
Connectivity Speed
Internet/Smartphone enabled Businesses
The future of Information Systems and
technologies: ABC
• Artificial Intelligence:
• AI and machine learning (ML) go hand in hand, feeding vast amounts of data
into an AI engine, and then applying context (meaning) to the data in order to
understand patterns of behavior that can then lead to both good and bad
events
• Big Data:
• Big Data is about collecting, processing and analyzing large amounts of data
from both traditional and digital sources in order to identify trends, patterns
and emerging paradigms to help us make sound decisions.
• Cloud Computing:
• Software is now being offered as a service (i.e., cloud-based, where you are
essentially leasing vs. purchasing it) for the past 10 years.
AI based Businesses
Economic Impact of ABC
• McKinsey estimates that AI may deliver an additional economic output of
around US$13 trillion by 2030, increasing global GDP by about 1.2 %
annually
• McKinsey Global Institute estimates that Big Data could generate an
additional $3 trillion in value every year in just seven industries. Of this,
$1.3 trillion would benefit the United States..
• Cloud Computing
• The cloud added approximately $214 billion in value-added to U.S. GDP in 2017.
• The cloud added approximately 2.15 million jobs in 2017.
• In approximately 15 years since 2002, the cloud economy has nearly tripled in size.
• In 2017, wider adoption of cloud services estimated to add cumulative total revenue
of EUR 449 billion to the EU28 GDP with significant impact on employment and
business creation.
Converged ABC Economic Impact
Social Impact of ABCs (+ves)
• Technology has helped to bridge a global gap in access to resources
and opportunities
• Unending source of resources -- educational materials, research publications,
patents, and learning systems for people to access from their own homes
• Social interactions -- people often meet friends or dates using apps, from the
convenience and comfort of their own home
• Communication, transportation, and interactions with others
• Social Media – access to news, information, etc
Social Impact of ABCs (-ves)
• Couples: couples spend less and less time actually talking with each other, and
more time glued to their mobile devices or TV sets.
• Teachers and Students: The advent of tablets, apps and computer devices has
seen schools using mobile devices and internet gateways for assignments and
learning
• Parents and Children: Often, tablets raise children more than parents do, while a
parent’s inability to directly engage with their children often results in
disconnected children who have not developed the correct social skills to engage
with others in a healthy manner.
• Co-workers: technology automates tasks and replace certain non-technological
systems, personnel often do not have to interact with other workers as much as
before, but often interact more with computer systems.
• Social Media: fake news, echo chamber
Check out the following (purely optional) links
Computers: History and Development
Personal Computers: History and Development
Computer Museum History Center

The History of Apple Computers


The History of Microsoft
Internet Pioneers: Tim Berners-Lee
Internet Pioneers: Marc Andreessen

Webopedia entry on Programming Languages


Lecture 2-1: Introduction to MIS
Topics of the Day
• What is:
• Information Technology?
• Information Systems?
• Why should I care about Technology?
• IT/IS in Society (Personal)
• IT/IS in Society (Business)
• What is coming next?
Defining Information Technology
• Information Technologies are systems of hardware and software that
capture, process, exchange, store and/or present information using
electrical, magnetic and/or electromagnetic energy.
Information Systems
• An information system is a set of interrelated components that
collect, manipulate, store data and distribute information to users
and provide a feedback mechanism to monitor performance.

• An organized combination of people, hardware, software,


communications networks, and data resources that collects data,
transforms it, and disseminates information.
Media: Live Streaming System (live Sports)

Decisions Feedback
Clients Analytics Engine Browser
BigQuery Client

Monitor
RTPM / Streaming Server CDN Fastly
RTSP Compute Engine Interconnect CDN Mobile /
Live Event Recording Encoding
Tablet Client
Recording Distribute
Collect Manipulate Module

Segment Storage
Cloud Storage

Streaming
Store Player
Generalizing, an Information System looks like:
Management
(Environment) Decisions

Data Input Process Output Information

Control
IT

Feedback
IS
Information Concepts
Data: Raw unorganized facts

Information:
• A collection of facts organized in such a way that they have additional
value beyond the value of the facts themselves
• Defining and organizing relationships among data creates information

• The value of Information is directly linked to how it helps decision


makers achieve their organizational goals.
Information Concepts
Environment:
• Business - other functional areas
• Computer – hardware, software, other IT

Process:
A set of logically related tasks performed to achieve a defined outcome.

Knowledge:
An awareness and understanding of a set of information and ways that
information can be made useful to support a specific task or reach a
decision
Information System Components
A system is a set of elements or components that interact to
accomplish goals.

Hardware:
Computer Equipment
Software:
Computer Programs
Databases:
An organized collections of facts
Information System Components
Telecommunications:
Electronic transmission of signals for communication
 Networks: Distant electronic communication
 Internet: Interconnected Networks (WAN)
 Intranet: Internal Corporate Network (LAN)
 Extranet: Linked Intranets (MAN)
People:
Developers, users, managers, advisers, etc. of IS
Procedures:
Strategies, policies, methods, and rules.
IT/IS in Society (Personal)
• Personal Communication
• Conversations
• Messaging
• Video Coms
• Social media
• Entertainment
• Web surfing
• Video and audio
• Interactive gaming
• Day-to-Day living
• Shopping
• Remote working
• Electronic banking/ stock market
IT in Society (Business)
• Internal Communication
• Computer network
• Corporate website
• Video teleconferencing
• Messaging
• Electronic Commerce
• Video streaming
• Electronic transactions
• Online sales
• Business operations
• Enterprise Resource Planning, Supply Chain Management, etc..
• AI/ML and Analytics
• Databases, Data Warehouses, etc.
Gartner’s Top Trends for 2020
Wrap up
CLOUD COMPUTING AND GOOGLE CLOUD
PLATFORM
Cloud Computing and Google Cloud Platform

Outline

 What are computing challenges?

 Why Cloud Computing?

 What is Cloud Computing any way


Introduction to Cloud Computing
Cloud Computing

1. Cloud computing is a model for enabling convenient, on-


demand network access to a shared pool of configurable and
scalable computing resources (e.g., Compute, storage,
networks, and security)
2. It provides high level abstraction of computation, storage,
network, and security models as platforms.
3. Resources are rapidly provisioned with minimal management
effort.
4. It has a growing list of essential characteristics, service
models, and deployment models.
Introduction to Cloud Computing
Cloud Computing Service Model
Cloud computing service models refer to any IT services that are
provisioned, delivered and accessible globally over the internet.
• On-Demand Self Service:
– Unilateral provisioning of computing resources
• Heterogeneous Access:
– Thin or thick clients can be accessed through the Internet.
• Resource Pooling:
– Resources pooled for multi-tenant model.
– Dynamic assignment of physical and virtual resources
• Measured Service:
– Metering service optimize resources usage.
Software Stack
– Predictable computing expenses.
Introduction to Cloud Computing

Traditional IT vs Cloud Computing (IaaS vs PaaS vs SaaS)

Cloud Computing
Introduction to Cloud Computing

Service Models

• Cloud Infrastructure-as-a-Service (IaaS):


• The capability provided to the consumer is to provision processing, storage, networks,
and other fundamental computing resources.
• The consumer is able to deploy and run arbitrary software, which can include operating
systems and applications.
• The consumer does not manage or control the underlying cloud infrastructure but has
control over operating systems, storage, deployed applications, and possibly limited
control of select networking components (e.g., host firewalls).

53
Introduction to Cloud Computing

Infrastructure-as-a-Service (IaaS)
Compute

Compute
Compute Container Container Cloud
App Engine
Engine Engine
Storage and Databases Registry Functions

Storage
Cloud Cloud Cloud Persistent
Cloud SQL
Storage Bigtable Datastore
Networking Disk

Network
Cloud Virtual Cloud Load Cloud
Network IdentityCloud
Balancing
CDN
& Security Interconnect
Cloud DNS

Security
Cloud Resource Cloud Security Cloud Platform
Cloud IAM
Manager Scanner Security

54
Introduction to Cloud Computing

Service Models

• Cloud Platform-as-a-Service (PaaS):


• The capability provided to the consumer is to deploy onto the cloud infrastructure
consumer-created or acquired applications created using programming languages and
tools supported by the provider.
• The consumer does not manage or control the underlying cloud infrastructure.
• Consumer has control over the deployed applications and possibly application hosting
environment configurations.

55
Introduction to Cloud Computing

Platform-as-a-Service (PaaS)

Big Data

Cloud Cloud Cloud Cloud


BigQuery Genomics
Dataflow Dataproc Datalab Pub/Sub

Machine Learning

Cloud Machine Natural Translation


Vision API Speech API Jobs API
Learning Language API API
Introduction to Cloud Computing

Service Models

• Cloud Software-as-a-Service (SaaS):


• The capability provided to the consumer is to use the provider’s applications running on a
cloud infrastructure.
• The applications are accessible from various client devices such as a web browser (e.g.,
web-based email).
• The consumer does not manage or control the underlying cloud infrastructure including
network, servers, operating systems, storage.

57
Introduction to Cloud Computing

Software-as-a-Service (SaaS)

Management Tools

Error Deployment Cloud Cloud


Stackdriver Monitoring Logging Trace Debugger
Reporting Manager Endpoints Console

Cloud Mobile
Cloud Shell Billing App Cloud APIs
App

Developer Tools

Deployment Cloud Source Cloud Tools for Cloud Tools Cloud Tools for Cloud Tools for Google Plug-in for Cloud Test
Cloud SDK
Manager Repositories Android Studio for IntelliJ PowerShell Visual Studio Eclipse Lab
Introduction to Cloud Computing

Final thoughts on Cloud Computing


 Cloud Computing is running workloads remotely over the internet at a
commercial provider’s data center, also known as the “public cloud” model.
 Cloud Computing is a virtualized pool of resources, from raw compute power to
application functionality, available on demand.
 The public cloud provides capabilities as IaaS, PaaS, or SaaS without the
customers’ needing to invest in new hardware or software.
 Cost optimization is the main reason for adopting cloud computing
 Public clouds continue to grow at a rapid pace
 There is a growing trend for enterprises to migrate towards hybrid and multi-
cloud
Google Cloud Platform for Analytics

Introduction to cloud computing and cloud infrastructure


Cloud Computing Google Cloud
Platform for Analytics

User
Services
Developme Manageme
User Exp.
nt nt

Analytics
Services
Machine
Big Data
Learning

Core
Infrastructure

Compute Storage Network Security


Google Cloud Platform for Analytics

GCP Core Infrastructure

Compute Storage Network Security

Computing infrastructure Globally unified, scalable, A Virtual Private Cloud (VPC) Identity and Access
in predefined or custom and highly durable storage network is a virtual version Management (IAM) enables
machine sizes to accelerate for developers, analysts, of a physical network, you to create and manage
your cloud transformation. and enterprises. implemented inside of permissions for Google
Google's production Cloud resources. IAM unifies
network. access control for Google
Cloud services into a single
system and presents a
consistent set of operations.
Google Cloud Platform for Analytics

GCP Analytics Services

Machine
Big Data
Learning

• Cloud-native and serverless • Cloud native and serverless


• Easily scale data and analytics • Machine learning resource management
• Instant and predictive insights from data • Data ingestion and collection
• Multi-level security for data • Data processing and storing
• Unified stream and batch analytics • Machine learning training and deployment
• AI Hub
Google Cloud Platform for Analytics

GCP User Services

Developme Manageme User


nt nt Experience

• Manage resources and applications • Logging • Cloud console


hosted on Google Cloud • Monitoring • Cloud Shell
• Write, deploy, and debug cloud- • Trace • Resource management
native applications • Debugger • Data management
• Access Google Cloud APIs • Profiler • Billing
programmatically • Activity stream
• Build software and standardize • Diagnostics
CI/CD across all languages • Admin
• Create pipelines, automate
deployments, and get fast feedback
Introduction to Cloud Computing

Quick Comparison of three leading cloud service providers

Features

Regions ~34 Geographic locations 54 regions ~200 countries

IaaS EC2 Virtual Machine Compute Engine

Object Storage S3 Blob Storage Cloud Storage

PaaS Elastic Beanstalk Azure cloud services Google App Engine

Serverless AWS Lambda Azure Functions GCP Cloud Functions

Dominant feature Enterprise Friendly Strong security Pricing


Introduction to Cloud Computing

Quick Comparison of three leading cloud service providers for big data analytics

Elastic Mapreduce Synapse Analytics Dataproc

Redshift, Athena SQL Data Warehouse BigQuery

Kinesis Stream Analytics Dataflow

AWS ML Machine Learning Server Cloud AI

Data Pipelines Azure Data Factory Cloud Datalab

Quicksight Power BI Data Studio

Tools Diversity Windows based organization Tools integration and ease of use
Google Cloud Platform for Analytics

Summary

 GCP offers services that are global, scalable, flexible, cost-effective, and
secure
 It allows users to consume IT resources that are elastic and utility-like services
 GCP services for analytics include:
• Core Infrastructure: computing, storage, networking, security,
• Analytics Services: big data and machine learning,
• User Services: development, management, and user experience
Google Cloud Platform

Let’s take a test drive


Google Cloud Platform
Google Cloud Platform

Create an account
 You should use your personal GMail account for GCP, i.e. NOT SUID@purdue.edu, because
Purdue University managed email accounts do not support creating a new project.

1 2 3
Google Cloud Platform

Create a project

 Step 1: Go to the Manager Resources page in


the GCP console. You can also use search tool
to navigate to any page easily.
gcp-project1
 Step 2: Click Create Project button.
 Step 3: In the New Project window, enter a
project name (Example: gcp-sneppets) as
shown in the picture below
 Step 4: If you would like to add the project to a
folder, then enter the folder name in the Location
box. If not then you can skip this step.
 Step 5: Finally click Create
Google Cloud Platform

Google Cloud Console


… to be continued
SYSTEMS DEVELOPMENT
Phases, Tools, and Techniques
Outline
• Information Systems working
definition
• Modern Information System
architectures
• IS Development Methodologies
Information Systems (working definition)
• What is an Information System (IS)?
• What: Information systems are core and/or support structures for meeting
the company’s strategic and operational goals
• How: New systems are built because employees and customers request
them or there are business opportunities to take advantage of.
• Why: New systems are created to obtain a competitive advantage,
operational efficiencies, and intra-, inter-organizational process
coordination
Modern IS Architectures
Enterprise system architecture
Component based architecture
Service based architecture
Analytics based architecture
Enterprise System Architecture

User Interface Business Logic Enterprise Database

3-tier Architecture
Component based systems
Component based architecture
Service based system architecture
Wikipedia Definitions

Back-end refers to a subordinate program, not


directly accessed by the user, which performs a
Back%
end specialized function on behalf of a main software
system.

A Web API is an application


programming interface for either a web
server or a web browser. It is a web
Web API development concept, usually limited to
a web application's client-side

Web$API User Experience (UX) refers to a person's emotions


User and attitudes about using a
Mobile/App particular product, system or service. It includes the
Experience
practical, experiential, affective, meaningful and
valuable aspects of human–computer
interaction and product ownership.
Service oriented architecture
Analytics system architecture
IS Development Methodologies
• Enterprise system development (e.g., MyPurdue)
• Development methods
• Insourcing
• Outsourcing
• Component based systems (e.g., Microsoft Office)
• Rapid application development methodology
• Service based systems (e.g., Facebook, Google)
• Agile methodology
• Analytics system methodology
• Kubeflow
Enterprise System
Development
Insourcing
Outsourcing
INSOURCING
• Systems development life cycle (SDLC) - a structured step-by-step
approach for developing information systems
• 7 distinct phases, each with well-defined activities
• Also called a waterfall methodology, an approach in which each
phase of the SDLC is followed by another, from planning through
implementation
SDLC as a Waterfall Methodology

Feedback
Phase 1: Planning
• Planning phase - create a solid plan for developing
your information system
• Three primary planning activities:
1. Define the system to be developed
• You can’t build every system, so you make choices based on
your organization’s priorities, which may be expressed as
critical success factors
• Critical success factor (CSF) - a factor simply critical to your
organization’s success
Phase 1: Planning
2. Set the project scope
• Project scope - clearly defines the high-level system requirements
• Scope creep - occurs when the scope of the project increases
• Feature creep - occurs when developers add extra features that were not part of the
initial requirements
• Project scope document - a written definition of the project scope and is usually no
longer than a paragraph
Phase 1: Planning
3. Develop the project plan including tasks, resources, and timeframes
• Project plan - defines the what, when, and who questions of system development
• Project manager - an individual who is an expert in project planning and management,
defines and develops the project plan and tracks the plan to ensure all key project
milestones are completed on time
• Project milestones - represent key dates for which you need a certain group of activities
performed
Phase 1: Planning

Sample Project Plan


Phase 2: Analysis
• Analysis phase - involves end users and IT specialists working
together to gather, understand, and document the business
requirements for the proposed system
Phase 2: Analysis
• Two primary analysis activities:
1. Gather the business requirements
• Business requirements - the detailed set of knowledge worker requests that the
system must meet in order to be successful
• Business requirements address the “why” and “what” of your development activities
• Joint application development (JAD) - knowledge workers and IT specialists meet,
sometimes for several days, to define or review the business requirements for the
system
Phase 2: Analysis
2. Prioritize the requirements
• Requirements definition document – prioritizes the business requirements and
places them in a formal comprehensive document
• Again, you probably can’t do everything, so prioritizing is important
• Users sign off on this document which clearly sets the scope for the project
Cost of Error by phases

Take time during analysis to get the business requirements correct. If you find
errors, fix them immediately. The cost to fix an error in the early stages of the SDLC
is relatively small. In later stages, the cost is huge.
Phase 3: Design
• Design phase - build a technical blueprint of how the proposed
system will work
• Two primary design activities:
1. Design the technical architecture
• Technical architecture - defines the hardware, software, and telecommunications
equipment required to run the system
Phase 3: Design
2. Design system models
• This includes GUI screens that users will interface with, database designs (see
XLM/C), report formats, software steps, etc

• Starting with design, you take on less of an active participation role


and act more as a “quality control” function, ensuring that the IT
people are designing a system to meet your needs
Phase 4: Development
• Development phase - take all of your detailed design documents from
the design phase and transform them into an actual system
• Two primary development activities:
1. Build the technical architecture
2. Build the database and programs
• Both of these activities are mostly performed by IT specialists
Phase 5: Testing
• Testing phase - verifies that the system works and meets all of the
business requirements defined in the analysis phase
• Two primary testing activities:
1. Write the test conditions
• Test conditions - the detailed steps the system must perform along with the
expected results of each step
Phase 5: Testing
2. Perform the testing of the system
• Unit testing – tests individual units of code
• System testing – verifies that the units of code function correctly when integrated
• Integration testing – verifies that separate systems work together
• User acceptance testing (UAT) – determines if the system satisfies the business
requirements
Phase 6: Implementation
• Implementation phase - distribute the system to all of the knowledge
workers and they begin using the system to perform their everyday
jobs
• Two primary implementation activities
1. Write detailed user documentation
• User documentation - highlights how to use the system
Phase 6: Implementation
2. Provide training for the system users
• Online training - runs over the Internet or off a CD-ROM
• Workshop training - is held in a classroom environment and lead by an instructor
Phase 6: Implementation
• Choose the right implementation method
– Parallel implementation – use both the old and new system simultaneously
– Plunge implementation – discard the old system completely and use the
new
– Pilot implementation – start with small groups of people on the new
system and gradually add more users
– Phased implementation – implement the new system in phases
Phase 7: Maintenance
• Maintenance phase - monitor and support the new
system to ensure it continues to meet the business goals
• Two primary maintenance activities:
1. Build a help desk to support the system users
• Help desk - a group of people who responds to knowledge workers’
questions
2. Provide an environment to support system changes
SDLC Summary
Outsourcing
OUTSOURCING
• Outsourcing – the delegation of specified work to a third party for a
specified length of time, at a specified cost, and at a specified level of
service

• The main reasons behind the rapid growth of the outsourcing


industry include the following:
• The Internet
• Globalization
• Global talent pool
• Technology
• Deregulation
Outsourcing Options
• IT outsourcing for software development can take one of four forms:
Outsourcing Process
• Like insourcing, the outsourcing process looks similar to the traditional SDLC
• Big exception here is that you “outsource” most of the work to another company

When outsourcing, you’ll develop two vitally important documents – a request for
proposal and a service level agreement
Outsourcing – RFP
• Request for proposal (RFP) – formal document that describes in
excruciating detail your logical requirements for a proposed system and
invites outsourcing organizations (vendors) to submit bids for its
development
• In outsourcing, you must tell another organization what you want
developed; you do that with an RFP
• Therefore, the RFP must be very detailed and complete
• Some RFPs can take months or even years to develop
Outsourcing – SLA
• Service level agreement (SLA) - formal contractually obligated
agreement between two parties
• In outsourcing, it is the legal agreement between you and the
vendor and specifically identifies what the vendor is going to
do (and by when) and how much you’re going to pay
• Supporting SLA documents – service level specifications and
service level objectives – contain very detailed numbers and
metrics
Outsourcing Advantages & Disadvantages
• Advantages:
• Focus on unique core competencies
• Exploit the intellect of another organization
• Better predict future costs
• Acquire leading-edge technology
• Reduce costs
• Improve performance accountability
• Disadvantages:
• Reduces technical know-how for future innovation
• Reduces degree of control
• Increases vulnerability of your strategic information
• Increases dependency on other organizations
Conclusion
Outsourcing is an important system development methodology
because the complexity of the IS is growing and it is becoming
increasingly difficult for an enterprise to develop and maintain systems
inhouse.
Component Based Systems
Agile Methods:
• Rapid Application Development
• Extreme Programming
• SCRUM
COMPONENT-BASED DEVELOPMENT (CBD)
• Component-based development (CBD) – focuses on building small
self-contained blocks of code (components) that can be reused across
a variety of applications
• CBD focuses on
• Using already-developed components to build systems quickly
• Building new components as needed that can be used in all future systems
• CBD Methodologies
• Rapid application development (RAD)
• Extreme programming (XP)
Rapid Application Development (RAD)
• Rapid application development (RAD) (also called rapid
prototyping) - emphasizes extensive user involvement in the rapid
and evolutionary construction of working prototypes of a system to
accelerate the systems development process
• Prototypes are models of the software components
• The development team continually designs, develops, and tests the
component prototypes until they are finished
Rapid Application Development (RAD)

Build new Use already-existing


software software
components components
Extreme Programming (XP)
• Extreme programming (XP) - breaks a project into tiny phases and
developers cannot continue on to the next phase until the first phase is
complete
Service Based Application
Development
Agile Development -- Scrum
Service Oriented System
• What is a service?
• A service provides a discrete business function that
operates on data. Its job is to ensure that the business
functionality is applied consistently, returns predictable
results, and operates within the quality of service
required.
• SOA services become the building blocks that form
business flows
• Services can be reused by other applications
Location based Systems
Empower Decision Align IT with
maker business operations

Increase operational Employ best practice


efficiencies methodology
Service Oriented System
GIS
Service
GPS Payment
API Service
Service
API API

Starbuck
App
Positioning
The user’s location can be obtained in one of two main ways:
GPS
GPS involves the equal distribution of 24 NAVSTAR satellites in six
circular orbital planes that are centered on the Earth and are inclined
at approximately 55° relative to the equator. Land-based receivers use
these satellites to determine their positions. A location-based service
could require that each of its users have a mobile device that contains
a GPS receiver.

E911
Federal Communications Commission requires wireless carriers to
pinpoint a caller’s telephone number to emergency dispatchers. E911
also ensures that carriers are to provide user call locations from their
wireless phones.
Geographic Information Systems (GIS)
• System for capturing, storing and analyzing
location data and associated attributes
which are spatially referenced to the earth.
• Tools to provide and administer base-map
data (man-made structures and natural
terrain).
• Point-of-interest data such as the location of
restaurants or cinemas.
• Information about the radio frequency
characteristics of the mobile network, which
allows determination of the user cell site.
Location Management Function
• So far we are able to tell both the position
the mobile user (by GPS or E911) and the
map data around his position (by GIS).
• LBS applications employ an additional
system to process positioning and GIS data,
called a location management function.
• The location management function acts as
a gateway and a mediator between
positioning equipment and LBS
infrastructure.
Applications
• Location-based information
- personalized information service for restaurants, cinemas, weather etc., e.g.,
FourSquare
• Tracking
- mobile commerce
- fleet applications to streamline distribution
• Emergency services
- relay pinpointed location information to authorities
- recent 3D-responder contract issued to enable altitude determination
Applications
• Location-based notification:
- advertising;
- automatic check-in system at airports.
• Location-based actuation:
- Payment based on proximity (EZ pass, toll watch);
- Zonal bills for cell-phones (flat-rate at home, special rate
elsewhere).
SCRUM Characteristics
• Self-organizing teams
• Product progresses in a series of two- to four-
week “sprints”
• Requirements are captured as items in a list of
“product backlog”
• No specific engineering practices prescribed
• Uses generative rules to create an agile
environment for delivering products
• One of the “agile processes”
Scrum Approach

Sprint
Sprints
• Scrum projects make progress in a series of “sprints”
• Analogous to Extreme Programming iterations
• Typical duration is 2–4 weeks or a calendar month at
most
• A constant duration leads to a better rhythm
• Product is designed, coded, and tested during the sprint
No changes during a sprint

Change

• Plan sprint durations around how long you can commit to keeping
change out of the sprint
Managing the sprint backlog
• Individuals sign up for work of their own choosing
• Work is never assigned
• Estimated work remaining is updated daily
• Any team member can add, delete or change the sprint backlog
• Work for the sprint emerges
• If work is unclear, define a sprint backlog item with a larger
amount of time and break it down later
• Update work remaining as more becomes known
A sprint backlog
Tasks Mon Tues Wed Thur Fri
Code the user interface 8 4 8
Code the middle tier 16 12 10 4
Test the middle tier 8 16 16 11 8
Write online help 12
Write the foo class 8 8 8 8 8
Add error logging 8 4
Tasks Mon Tues Wed Thur Fri
Code the user interface 8 4 8
Code the middle tier 16 12 10 7
Test the middle tier 8 16 16 11 8
Write online help 12
44 32 34 18 8

50
40
30
20 Burndown
Hours

10 Velocity (BDV)

0
Mon Tue Wed Thu Fri
A Sprint Burndown Chart
Actual
Burndown
Velocity (ABV)

Expected
Burndown Revised
Velocity (EBV) Burndown
Velocity (RBV)
Hours
Summary
• Current software development processes are too
heavyweight or cumbersome
• Current software development is too rigid
• More active customer involvement needed
• Agile methods focus on:
• Individuals and interactions over processes and tools
• Working software over comprehensive documentation
• Customer collaboration over contract negotiation
• Responding to change over following a plan
DATABASES, DATA LAKES, AND DATA WAREHOUSES
Building Business Intelligence
TOPIC ORGANIZATION
1. Relational Database Model
2. Structured Query Language
3. Data Lakes and Data Warehouses
4. Business Intelligence and Deep Neural
Networks
RELATIONAL DATABASE MODEL
• Database – collection of information that you
organize and access according to the logical
structure of the information
• Relational database – series of logically related two-
dimensional tables or files for storing information
• Relation = table = file
• Most popular database model
Database Characteristics
• Collections of information
• Created with logical structures
• Include logical ties within the information
• Include built-in integrity constraints
Database – Collection of Information
Database – Created with Logical Structures
• Data dictionary – contains the logical structure
for the information in a database
Before you can enter information
into a database, you must define
the data dictionary for all the tables
and their fields. For example,
when you create the Truck table,
you must specify that it will have
three pieces of information and
that Date of Purchase is a field in
Date format.
Database – Logical Ties within the Information
• Primary key – field (or group of fields) that uniquely describes each
record
• Foreign key – primary key of one file that appears in another file

Customer Number is the primary


key for Customer and appears in
Order as a foreign key
Database – Logical Ties within the Information
DBMS Engine
• DBMS engine – accepts logical requests from
other DBMS subsystems, converts them into the
physical equivalents, and access the database and
data dictionary on a storage device
• Physical view – how information is physically
arranged, stored, and accessed on a storage
device
• Logical view – how you need to arrange and
access information to meet your needs
Data Definition
• Data definition subsystem – helps you create and maintain the
data dictionary and structure of the files in a database
• The data dictionary helps you define…
• Field names
• Data types (numeric, etc)
• Form (do you need an area code)
• Default value
• Is an entry required, etc
Data Manipulation
• Data manipulation subsystem – helps you
add, change, and delete information in a
database and query it to find valuable
information
• Most often your primary interface
• Includes views, report generators, query-by-
example tools, and structured query language
Structured Query Language
• SQL – standardized fourth-generation query
language found in most DBMSs
• SQL along with Python and R language for
business intelligence
SQL (pronounced as “sequel”)
• What is SQL? • SQL can:
• SQL stands for Structured Query • execute queries
Language • retrieve data from a database
• Lets you access and manipulate • insert records in a database
databases • update records in a database
• Became a standard of the • delete records from a database
American National Standards
Institute (ANSI) in 1986, and of the • create new databases
International Organization for • create new tables in a database
Standardization (ISO) in 1987 • create views in a database
• • .. and much more
Create and Drop
• CREATE DATABASE databasename; CREATE DATABASE enrollment;

CREATE TABLE student (


ID int,
• CREATE TABLE table_name ( name string,
GPA numeric,
column1 datatype, dues bool
column2 datatype, );
column3 datatype, DROP TABLE student
....
);

• DROP TABLE table_name;


Simple Student Table
You Select Column(s)

Atribute1 Atribute2 Atribute3 Atribute4


ID Name GPA Dues
1 Jon Doe 3.5 1
You filter rows based on WHERE
condition 2 Jane Doe 4.0 1

3 Jon Smith 2.2 0


e.g.,
4 Jane Smith 2.9 1
WHERE GPA >=3.0
5 Jon Jones 2.75 0

6 Jane Jones 4.0 0


Select Statement
SELECT column1, column2, ... SELECT column1, column2, ...
FROM table_name FROM table_name
WHERE condition WHERE condition1 OR condition2 OR
condition3 ...;
ORDER BY column1, column2,
... ASC|DESC;

SELECT column1, column2, ... SELECT column1, column2, ...


FROM table_name
FROM table_name WHERE NOT condition;
WHERE condition1 AND condition2
AND condition3 ...;
Basic SQL Commands
• SELECT - extracts data from a database
• UPDATE - updates data in a database
• DELETE - deletes data from a database
• INSERT INTO - inserts new data into a database
• CREATE DATABASE - creates a new database
• ALTER DATABASE - modifies a database
• CREATE TABLE - creates a new table
• ALTER TABLE - modifies a table
• DROP TABLE - deletes a table
• CREATE INDEX - creates an index (search key)
• DROP INDEX - deletes an index
Logical Design of Relational Database Systems
Logical Design of Relational Database Systems
1. Write User Statements/stories

2. Construct E-R diagram

3. Convert E-R diagrams into relational tables.

4. Normalize Tables into 3rd Normal Form

7-155
Five Step Logical DB Design
1 User Statement 3 ERD
Information
Students are enrolled in one
StudentAddress
CourseName Classroom CourseNo StudentName

System CourseNo FacultyID


Credit
StudentID Grade StudentID Email
AdvisorID

or more courses. A faculty Course


M
Enroll
M
Student

may teach one or more M M

Mapping courses. Each student has an Teach Advise

advisor. A faculty works for a 1


1

department. 1 Faculty
M
Work for
1
Department
1

School

2
FacultyID DepartmentID DepartmentID DepartmentHead
Email
FacultyName DepartmentName
Has_a

5 Relational Tables in 3NF 4 Relational Tables


STUDENT
StudentID StudentName StudentAddress StudentEmail AdvisorID STUDENT
StudentID StudentName StudentAddress StudentEmail AdvisorID
COURSE
CourseNo CourseName Classroom FacultyID COURSE
CourseNo CourseName Classroom FacultyID
ENROLL
StudentID CourseNo Semester Grade Normalize ENROLL
Physical World
DEPARTMENT
StudentID CourseNo Semester Grade

DepartmentID DepartmentName DepartHead School DEPARTMENT


DepartmentID DepartmentName DepartHead School
FACULTY
FacultyID FacultyName Email DepartmentID FACULTY
FacultyID FacultyName Email DepartmentID
ERD Recap

Entity Relationship
• is a real-world object distinguishable • is a way of relating one entity to
or unique from other objects. another. Entities can therefore
• An entity can be a concrete or participate in a relationship.
physical object like employee, • it is commonly thought as a verb
student, faculty, customer etc. Or it connecting the entities or nouns.
could also be conceptual or abstract • It is normally represented by a
like transaction, order, course, diamond shape.
subjects etc.
• It can be thought of as a noun like
student, employee etc.
• It is normally represented by a
rectangle shape.
Cardinality
1 1
One-to-One Department has Dept_Head

1 M
One-to-Many Department has Programs

M N
Many-to-Many Student enrolls Course
Databases – Built-In Integrity Constraints
• Integrity constraints – rules that help ensure the
quality of information
• Data dictionary, for example, defines type of
information – numeric, date, and so on
• Foreign keys – must be found as primary keys in
another file
ERD
StudentAddress
CourseName Classroom CourseNo StudentName
CourseNo FacultyID StudentID Email
StudentID Grade AdvisorID
Credit
M M
Course Enroll Student

M M

Teach Advise

1
1

1 1
Faculty Work for Department
M 1

School
FacultyID DepartmentID DepartmentID DepartmentHead
Email
FacultyName DepartmentName
Has_a
ERD to Relations
STUDENT
StudentID StudentName StudentAddress StudentEmail AdvisorID

COURSE
CourseNo CourseName Classroom FacultyID

ENROLL
StudentID CourseNo Semester Grade

DEPARTMENT
DepartmentID DepartmentName DepartHead School

FACULTY
FacultyID FacultyName Email DepartmentID
Key Concepts
Key Attribute Non-Key Attributes

StudentID StudentName StudentAddress StudentEmail AdvisorID


301 Randall Jones West Lafayette R@purdue 202
302 Brady Kalb West Lafayette k@purdue 203
303 Stephen Kerber West Lafayette e@purdue 204
304 Prateek Khanna Lafayette l@purdue 205
305 Kyle Newell Lafayette y@purdue 202
306 Ryan Leidigh Lafayette d@purdue 203
Data Engineering:
Entity Relationship Diagram (ERD)
ERM and ERD
• Entity-Relationship Data Model (ERM) is a detailed, logical
representation of the data for an organization or for a
business area.
• Expressed in terms of:
• Entities
• Attributes
• Relationships
• Entity-Relationship Diagram (ERD) is a graphical
representation of a Entity-Relationship Model.
ERD
• The purpose of an ERD is to capture the richest possible
understanding of the meaning of data necessary for an information
system or organization.

• ERDs are made from Entities, Attributes, and Relations.


Entity
• What is an Entity?
• Has its own identity that distinguishes it from other entities.
• Examples:
• Person: PROFESSOR, STUDENT
• Place: STORE, UNIVERSITY
• Object: MACHINE, BUILDING
• Event: SALE, REGISTRATION
• Concept: ACCOUNT, COURSE
Attributes
• Each Entity has a set of Attributes
• Attribute is a property or characteristic of an entity that is of interest
to the organization.
• Example:
• STUDENT: Student_ID, Student_Name, Phone_Number, Major
Relationships
• Relationships are associations between one or more entity types.
• Are the “glue” that holds together components of an E-R model.
• The degree of a relationship = is the number of entity types that
participate in a relationship.
• There are 3 common relationships:
1. Unary (degree one)
2. binary (degree two)
3. Ternary (degree three)
Starting an ERD
1. Define the Entities.
2. Define the Relationships.
3. Add attributes to the relationships.
4. Assign Primary and Foreign Keys
5. Add cardinality to the relationships.
Capturing real world requirements

Information System

Statements to Students are enrolled in one or more


capture the courses. A faculty may teach one or
entities, their more courses. Each student has an
Mapping = attributes,
relationships, advisor. A faculty works for a
and activities department.

Physical World
Convert ERD into Database

ER Diagram Database

Entity Table

Attributes Field

One To One Foreign Key

Relationships One To Many Foreign Key

Many To Many Tables


ERD
StudentAddress
CourseName Classroom CourseNo StudentName
CourseNo FacultyID StudentID Email
StudentID Grade AdvisorID
Credit

Course M M
Enroll Student

M M

Teach Advise

1 1

Faculty Work for Department


M 1

School
FacultyID DepartmentID DepartmentID DepartmentHead
Email
FacultyName DepartmentName
ERD to Relations
STUDENT
StudentID StudentName StudentAddress StudentEmail AdvisorID

COURSE
CourseNo CourseName Classroom FacultyID

ENROLL
StudentID CourseNo Semester Grade

DEPARTMENT
DepartmentID DepartmentName DepartHead School

FACULTY
FacultyID FacultyName Email DepartmentID
Relational database design
Multi-table queries
// join 2 tables // Join 3 tables

SELECT
SELECT
e.StudentID AS Student_ID, StudentName, e.CourseNo, CourseName
DepartmentName AS Dept_name, AS CRS_Name, Grade

FacultyName AS Dept_Head FROM


`m382-02.school.enroll` AS e
FROM
JOIN
`m382-02.school.department` AS d
`m382-02.school.course` AS c
JOIN ON e.CourseNo = c.CourseNo
`m382-02.school.faculty` AS f JOIN
ON `m382-02.school.student` AS s

d.departhead = f.facultyID ON e.studentID = s.studentID


WHERE
e.studentID = 301
Practice ERD Requirements
A company has several departments.
Each department has a supervisor and
at least one employee. Employees must
Information System be assigned to at least one, but possibly
Statements to more departments. At least one
capture the employee is assigned to a project, but
entities, their an employee may be on vacation and
Mapping = attributes, not assigned to any projects. The
relationships, important data fields are the names of
and activities
the departments, projects, supervisors
and employees, as well as the
supervisor and employee number and a
Physical World unique project number
• Each department has exactly one supervisor.
• A supervisor is in charge of one and only one department.
• Each department is assigned at least one employee.
• Each employee works for at least one department.
• Each project has at least one employee working on it.
• An employee is assigned to 0 or more projects.
Practice ERD
Business Intelligence
Analyze airline on-time performance
INTRODUCTION
• OLTP
• Supports operational processing
• Sales orders, accounts receivable, etc
• Supported by operational databases & DBMSs
• OLAP
• Helps build business intelligence
• Supported by data warehouses and data-mining tools
BUSINESS INTELLIGENCE REVISITED
• Business intelligence (BI) – a collection information about
customers, competitors, business partners, competitive
environment, and your internal operations for making important,
effective, and strategic business decisions
• Hot topic in business today
• Current market is $50 billion and double-digit annual growth
BI Objectives
• Help people understand
• Capabilities of the organization
• State of the art trends and future directions of the market
• Technological, demographic, economic, political, social, and regulatory
environments in which the organization competes
• Actions of competitors
| ABC: Big Data in the Cloud

The 5 Vs of Big Data

Volume
(Data at rest)

Value Velocity
(Data into (Data in
Money) Motion)

Veracity Variety
(Data in (Data in many
Doubt) forms)
| ABC: Big Data in the Cloud

We Hold These Truths…


Database
• A database has a schema Table Table Data Type
Referential Integrity
• We transform the data into that Column Column Column Column
Size
schema Column Column Column Column

• Data conforms to the schema we Defaults/Checks


Table Table
define Column Column Column Column
• The schema defines the business Column Column Column Column

Table Table
Column Column Column Column

Column Column Column Column


| ABC: Big Data in the Cloud

Data lake architecture

Cloud Data Storage


Streaming sources Consumption
Integration Security
Clickstream
Log stream Classification Metadata
Data discovery
Data lake
Trans/oper. data Enrichment Data preparation
Quality
EDW
OLTP Processing Governance Visualization

External source BI/analytics

Partners BigQuery
SaaS
| ABC: Big Data in the Cloud

Business Intelligence Workflow

• Ingest data
• Download data

• Store data
• Fill data lake in Cloud Data Storage

• Prepare
• Create an enterprise data warehouse in BigQuery

• Analyze data
• Interactive
• Visualize in Data Studio
| ABC: Big Data in the Cloud

Bureau of Transportation Statistics Data

• All major US air carriers are required to file statistics about their domestic flights with
the BTS
• Actual departure and arrival times are defined precisely, based on when the parking
brake of the aircraft is released and when it is later reactivated at the destination
• Because of the precise nature of the rules, and the fact that they are enforced, arrival
and departure times from all carriers can be treated uniformly
• Had this not been the case, we would have to dig deeper into the quirks of how each
carrier defines “departure” and “arrival,” and do the appropriate translations
• Good business Intelligence begins with such standardized, repeatable, trustable data
collection rules
| ABC: Big Data in the Cloud

Should you or should you not cancel your meeting?

Gate Gate

Arrival delay = f {I1, I2, I3 …} Some of the variables but not all
Data Source
• Original Data Source:
https://www.transtats.bts.gov/Fields.asp

• Your Data Source


• Shared Bucket
• Files
• airlines.csv: https://storage.googleapis.com/382flights/airlines.csv
• airports.csv: https://storage.googleapis.com/382flights/airports.csv
• flights.csv: https://storage.googleapis.com/382flights/fligts.csv
• aircraft.csv: https://storage.googleapis.com/382flights/aircraft.csv
Airlines Aircraft
• IATA_CODE • Tail_number
• AIRLINE • Type
• Manufacturer
Airports • Issue_date
• IATA_CODE • Model
• AIRPORT • Status
• CITY • Aircraft_type
• STATE • Engine_type
• COUNTRY • year
• LATITUDE
• LONGITUDE
Flights
- YEAR: integer (required) - DEPARTURE_DELAY: integer
- QUARTER: integer (required) - TAKE_OFF_TIME: integer
- MONTH: integer (required) - LANDING_TIME: integer
- DAY_OF_MONTH: integer - SCHEDULED_ARRIVAL_TIME: integer
- DAY_OF_WEEK: integer - ACTUAL_ARRIVAL_TIME: integer
- FULL_DATE: string - ARRIVAL_DELAY: integer
- CARRIER: string - FLIGHT_CANCELLED: integer
- TAIL_NUMBER: string - CANCELLATION_CODE: string
- FLIGHT_NUMBER: string - SCHEDULED_ELAPSED_TIME: integer
- ORIGIN: string |- DESTINATION: string - ACTUAL_ELAPSED_TIME: integer
- SCHEDULED_DEPART_TIME: integer - AIR_TIME: integer
- ACTUAL_DEPART_TIME: integer - DISTANCE: integer
- CARRIER_DELAY: integer
- WEATHER_DELAY: integer
- NAS_DELAY: integer
- SECURITY_DELAY: integer
- LATE_AIRCRAFT_DELAY: integer
Draw an ERD for ontime flights database
Important fields/variables
• time related variables • Operations
• Year • delays
• month • arrival
• quarter • Departure
• day of month
• flight data
• geography • carrier code
• origin • tail number
• Destination • flight number
• Airport
• Causes of Delay
• airport • Weather
• Lat/Long
• Taxi out
• Taxi in
Key Performance Indicators
• Airport Information • Airport Performance
• Visualize US airports • Top 10 airports in terms of minimum departure delay
• Search airports by airport code • Top 10 airports in terms of longest taxi out
• Top 10 airports in terms of shortest taxi in
• Airline Popularity Analysis
• Cause of Delay Analysis
• Top 10 airlines in terms of # of flights
• Main cause of delay for each airline
• Average arrival delays of the top 10 airlines • Main cause of delay for different routes
• Performance analysis • Prediction
• Ontime performance analysis • Arrival delay for a specific flight
• How does the arrival delay relate to the day of • More to be provided later
the week
• How does the arrival delay relate to month of the
year
• How does arrival delay relate to distance?
• Airline Performance
• Arrival Punctuality
• Departure Punctuality
• Regularity
• What is the minimum, maximum, and average
arrival delay for each airline
Simple Queries
// aggregating all flights on the
// getting origin/departure wise
basis of month
#standardSQL flight count
SELECT #standardSQL
month, SELECT
COUNT(*) AS count ORIGIN_AIRPORT,
FROM COUNT(*) AS TotalFlight
`maximal-reserve- FROM
287914.flights.ontime` `maximal-reserve-
GROUP BY 287914.flights.ontime`
month GROUP BY
ORDER BY
ORIGIN_AIRPORT
count desc
ORDER BY
Totalflight DESC
Simple Queries
// Average and STD Devdeparture delay
#standardSQL
SELECT #standardSQL
DESTINATION_AIRPORT , SELECT
COUNT(*) AS TotalFlight avg( DEPARTURE_DELAY )
FROM FROM
`maximal-reserve-
`maximal-reserve-
287914.flights.ontime`
287914.flights.ontime`
GROUP BY
DESTINATION_AIRPORT #standardSQL
ORDER BY SELECT
Totalflight DESC stddev_pop( DEPARTURE_DELAY )
FROM
`maximal-reserve-
287914.flights.ontime`
Simple Queries
// flights from SFO #standardsql
SELECT
#standardSQL ORIGIN_AIRPORT,
SELECT AVG(DEPARTURE_DELAY) as dep_delay,
month, AVG(ARRIVAL_DELAY) as arr_delay
COUNT(*) AS TotalFlights FROM
FROM `maximal-reserve-
`maximal-reserve- 287914.flights.ontime`
287914.flights.ontime` GROUP BY ORIGIN_AIRPORT
WHERE ORDER BY ARR_DELAY
ORIGIN_AIRPORT = 'SFO' LIMIT
GROUP BY 1000
month
ORDER BY
TotalFlights
Simple Queries
// airport table query #standardsql
SELECT
#standardsql IATA_CODE,
SELECT AIRPORT AS airportName,
IATA_CODE, concat('(', LATITUDE , ', ',
AIRPORT, LONGITUDE , ')') AS coords
CITY, FROM
STATE, `maximal-reserve-
COUNTRY 287914.flights.airports`
FROM
LIMIT
10
`maximal-reserve-
287914.flights.airports`
LIMIT
100

You might also like