Pareto Principle Applied to QA

Pareto Principle
Applying Statistics to Software QA
Portland Data Science Group
Created by Andrew Ferlitsch
Community Outreach Officer
August, 2017

History
• Named after Italian economist Vilfredo Pareto from 1896
paper, Cours d'économie politique
• Noted in his paper:
• That 80% of wealth in Italy was owned by 20% of individuals.
• In his garden, he noted that 20% of the peapods generated 80%
of the peas.
• The concept of the 80/20 principle was advanced by
consultant Joseph Juran in 1941 for quality management:
• Concluded that typically 20% of problems are caused by 20%
of the causes.

Generalization
• The Pareto Principle is the observation that most things
are not evenly distributed. For example:
• 20% of the inputs creates 80% of the results.
• 20% of the workers produce 80% of the results.
• 20% of customers produce 80% of the revenue.
• 20% of features cause 80% of the usage.
• 20% of bugs cause 80% of the failures.
• The principle does not require an 80/20 distribution,
just a rule of thumb:
• Maybe 90/10
• May not add up to 100, such as 90/20.

Inputs vs. Outputs
The relationship of Inputs to the Flow of Outputs
Software
Component
Inputs Outputs
20% if the Inputs 80% if the Inputs
• Use Cases
• Features
• Individual Values to Single Test
Increasing
Complexity
• End to End (e2e)
• Feature Testing
• Single Testcase

Calculator Example - Typical
Tendency is to test from simplest to more complex.
Calculator
Typical selection
of inputs in QA.
0 x 0
1 x 0
1 x 1
1 x -1
2 x 1
2 x 2
3 x -2
3 x 4
6 x 11
10 x 123
0
0
1
-1
2
4
-6
12
66
1230
It’s not whether
These cases will
Have bugs, its that
No one will EVER
TYPE them in.
They produce ZERO
percent of the
output.
PASSED
Gain in Output (e.g.,
more digits.

Calculator Example - Pareto
Pick cases which produce in 20% Highest Output Gain / Complexity
Calculator
∞ x ∞
MaxInt x MaxInt
MaxPrecsion x MaxPrecision
Integer Overflow
Floating Point Errors
ACTUALLY PASSED
Gain in Output (e.g.,
more digits, more
computational
complexity.
MaxInt x 2
MaxInt x -1
MaxPrecsion x ∏
(MaxInt/2) x 2
(rand(1000)) x -5
∏ x ∏
(rand(1.0)) x ∏
Input Complexity

Smoke Test - Typical
Verify a Build is sufficiently stable to consume QA resources to test.
Build
Repositor
y
Smoke
Test
Acceptance
Test
2% (or less) of the software
Typically, always the same tests.
After many iterations:
• Not Checking anything new
• Statistical Likelihood of detecting failure decreasing as
• Code around smoke test becomes more harden
• Interface and Configuration around code stops changing.
• Code is better and better known by developers and build specialists.

Smoke Test - Sampling Distribution
Build (i.e. Population)
Error Rate of Random Sample
Sampling Distribution (smoke test)
µ = µ (mean - Errors)
σ =
σ
𝑛
(std. dev)
A collection of randomly chosen test cases
In a build is called a sampling distribution.
x̅
2% of the build
x̅
x̅
Not Known
On first smoke test, we start with assumption that
there is some unknown distribution of errors across
the build.
First Round of Smoke Test(ing)

Error Rate of Random Samplex̅
Unsorted List of Tests
in first Smoke TestTest 1
Test 2
Test 3
…
Test n
Examine Outputs and Sort for
20% inputs -> 80% outputs
Test 1A
Test 1B
Test 1C
…
Test 3A
Test 3B
…
Test 10Z
20% -> 80%
Remaining 80%
Sorted List of Tests
In first Smoke Test
Identify 20% of Tests that are 80% of the Outputs
Test 1A
Test 1B
Test 1C
…
New 1
New 2
…
New 3
First Smoke Test
20%
Kept
Second (Updated) Smoke Test
80%
New

Error Rate of Random Sample
Sampling Distribution (smoke Test)
x̅
2% of the build
Second smoke test contains 20% of the
First smoke test.
Next Round of Smoke Test(ing)
Test 1A
Test 1B
Test 1C
…
New 1
New 2
…
New 3
20%
Kept
80%
New

Error Rate of Random Samplex̅-
>
Examine Outputs and Sort for
20% inputs -> 80% outputs
Original
20%
…
20% of
80%
…
Remainder
20% -> 80%
Remaining 80%
Sorted List of Tests
In first Smoke Test
Identify 20% of New Tests that are 80% of the Outputs
Test 1A
Test 1B
Test 1C
…
New 1
New 2
…
New 3
Second Smoke Test
20% -> 80%
Third (Updated) Smoke Test
64%
New
Second (Updated) Smoke Test
16% -> 80%
Test 1A
Test 1B
Test 1C
…
New 1
New 2
…
New 3
36%

Iterate Keeping 20% -> 80% of Remainder and Replacing the Rest.
Third Smoke Test
Remainder
36%
-------
64%
20%->80%
Fourth Smoke Test
58.8%
-------
42.2%
20%->80%
Fifth Smoke Test
67.2%
-------
32.8
20%->80%
Sixth Smoke Test
73.8%
-------
26.2%
20%->80%
Seventh Smoke Test
79%
-------
21%
Eighth Smoke Test
84.8%
-------
16.8%
20%->80% 20%->80%
Convergence
to 100%

= predict the error rate of 80%
of the output / results.
Sampling Distribution (convergent smoke Test)
x̅
Next Round of Smoke Test(ing)
9th iteration: 88.2% / 11.8%
10th iteration: 90.6% / 9.4%
11th iteration: 92.5% / 7.5%
12th iteration: 94% / 6 %
13th iteration: 95.2% / 4.8%
14th iteration: 96.2% / 3.8%
15th iteration: 97% / 3%
Amount being replaced

Smoke Test - Epochs
• Epoch = 15 iterations (e.g., 15 weeks)
• Gradual (fresh) increase in high output tests
• Gradual reduction in effort for new tests
• Next Epoch
• Keep 20% of top outputters .
• Replace remaining 80% with new tests.
• Repeat cycle again.
Epoch 1
97%
----------
Completed Epoch
Epoch II
20%
-------
Next Epoch
20%->80%
80% -> Replace

Acceptance Test – Requirements Based
Verify that all the Requirements of a Build are Meet.
Requirements
Specification
Feature 1
Feature 1
Feature 1
Feature 1
Feature 1
Feature N
Breakdown each requirement
Into one or more features.
Testcase 1
Testcase 1
Testcase 1
Testcase 1
Testcase N
Breakdown each feature
Into one or more test cases.
Typically generates between 2000 and 5000 Testcases

Partial vs. Full Acceptance Test
• Partial Acceptance Test
• QA Manager/Lead selects only Features which have been
modified or added.
• Used to manage time constraints versus testing everything
each time.
• Typically 200 to 500 test cases.
• Full Acceptance Test
• Typically done at major milestones, and
• Attempt at Final Acceptance for Product Release/Evaluation.
• Typically 2000 to 5000 test cases.

Agile Build Release
• Incremental Sprints
• Feature development is preplanned before product
development is started.
• One or more new features are added per sprint.
• Selected set of bugs for existing features are fixed.
• Iterative Sprints
• Feature development is not preplanned, but evolves per
sprint.
• Planning injects and/or modifies features are the start of
each sprint.
• One of more new features are added per sprint.
• One or more existing features is modified.
• Selected set of bugs for existing features are fixed.

QA Automated
• Automation
• QA Automation was buzz word of the 1990s.
• Belief that all testing could be automated, ran quickly and
full acceptance test runs done on even minor build releases.
• Automated Testing Paradox
• Automated Tests are sensitive to code changes.
• Changes to interface.
• Changes to parameters.
• UI changes (even minor change can break automated
tests).
• Updated Requirements change behavior of code.
• Automated tests only stable if no changes to code. If no
changes, why run the tests?

Broken Automation Examples
• API
valid inputs could change
http://localhost:.../api?param1=var&param2=var
name of parameters could change or be dropped
• UI
location or type of HTML tag could change
<div id=‘at11’> … </div>
id could change
• Change in Function
Return type or return values could change
int funcName( int a, int b)
type or number of arguments could change

Automation vs. Manual Testing
• When change occurs, test cases stop running or
passing.
• QA person needs to identify if bug in code or test case needs
to be updated.
• If test case, QA person needs to rewrite the code for the test
case.
• Some organizations opt to limit or abandon
automated testing due to maintenance overhead.
• Automated Tests only for long stable code.
• Manual Tests for new and changing code.

Manual Testing
• Typical number of Manual Tests ran per person per day
is 30.
• Read and Follow test instructions, verify outcome.
• If error, need to enter defect report.
• Each day, need to review/answer questions/more info from
development team on previous entered defects.
• Need to verify defects that have been fixed.
• Start of day scrum, end of day accounting of activities.
• Rate of Progress
• 300 Test cases = 10 person days (2 weeks)
• 3000 Test cases = 10 x 10 person days (2 weeks)
• QA organizations never staffed to handle higher
number of test case loads for acceptance testing.

What to Test?
• If the tests are automated and need no maintenance,
then yes run them.
• If the tests are manual, and you have insufficient
resources, prioritize the tests by 20% of tests that are
80% of the output, such as:
• How likely end-user will do this.
• How much of the code does it exercise.
• How much change does it cause in the application state.
• The number of bugs its found historically.

Setting Priorities
• Rank test cases 1 thru 5, where 1 is the highest
outputters.
• Maintain a 20% distribution across the ranking.
• Run the testcases ranked 1 first.
• If time permits, precede to rank 2 and so forth.
• Aging – Keep track of how often a test is run.
• Periodically review test rankings.
• If a rank one test has been ran in high frequency and not
found bugs, consider switching it with a rank 2 test case.
• Apply similar periodic evaluation to rank 2, 3 and 4.
• If a lower ranked test case changes to higher output, then
consider switching it with a higher ranked test case.

Acceptance Test - Sampling Distribution
x̅
Assume 1000 selected tests per round.
Round 1: 50 bugsx̅
Round 2: 40 bugs
x̅ Round N: 6 bugs
Can we use the Central Limit Theorem. I don’t think so, certain conditions are not meet:
- Samples are not truly randomly selected
- The state is not static: new code is introduced, and old code is fixed.

Predicting Remaining Defect Rate
Fitted Line
Bugs per 1000 tests
Number of
Bugs
Time Series
Defect Burndown Chart
• Traditional Agile Method for Predicting Defect Rate
• Plot defects found per sprint.
• Plot a straight line and extrapolate.

Statistically Zero Defects
Number of
Bugs
Time Series
Defect Burndown Chart
• Defect rates over time do not fit a straight line!
• They are an S curve. Over time the curve flattens out and
reaches a limit.
Amount of new/changed
code > unchanged code
code ~= unchanged code
code < unchanged code
Threshold where defect change
rate hits a limit.
No amount of effort will have
statistical impact – Statistical Zero
(but not actual) Defect point.

Predicting Effort – A Bellman Approach
• A Bellman Equation approach to predicting test effort over the
development lifecycle.
E = Effort
E(n) = Effort for n iterations
Di = Defects found on an iteration
γ = Discount factor > 1 (e.g., 1.2)
E(n) = γ * D1 + γ2 * D2 + γ3 * D1 … + γn * Dn
The further a defect
is found in the lifecycle
the more progressively
the effort is penalized.

Predicting Effort - Example
Number of Defects per Test Run: 60, 50, 55, 40, 30, 20, 10, 5, 4, 4
E(n) = 1.2(60) + 1.4(50) + 1.7(55) + 2(40) + 2.5(30) + 3(20) + 3.6(10)
+ 4.3(5) + 5.2(4) + 6.2(4)
Time Series
10
20
30
40
50
60
70
80
90
Effort has plateaued
Statistical Zero Defects:
Effort exceeds return

How About using Machine Learning!
Total
Volume
of Code
Volume
of Code
Changed
Age
of
Code
Person
Hours
Number
of Open
Defects
Number
of
Closed
Defects
Number of
Defects
Found
From SCM Management Defect Reporting System
TRAIN
MODEL Predictor

Pareto Principle Applied to QA

Related slideshows

More Related Content

Similar to Pareto Principle Applied to QA

Similar to Pareto Principle Applied to QA (20)

More from Andrew Ferlitsch

More from Andrew Ferlitsch (20)

Recently uploaded

Recently uploaded (20)

Pareto Principle Applied to QA