Updated Android Benchmarks For 32 Bitand 64 Bit CPUsfrom ARMand Intel
Updated Android Benchmarks For 32 Bitand 64 Bit CPUsfrom ARMand Intel
net/publication/323801432
Updated Android Benchmarks For 32 Bit and 64 Bit CPUs from ARM and Intel
Contents
CITATIONS READS
2 3,244
1 author:
Roy Longbottom
UK Government
59 PUBLICATIONS 77 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Cray 1 Supercomputer Performance Comparisons With Home Computers Phones and Tablets View project
A Spreadsheet Computer Performance Queuing Model for Finite User Populations View project
All content following this page was uploaded by Roy Longbottom on 16 March 2018.
Summary
My original benchmarks generally ran successfully on devices controlled by up to Android 7. They could be installed,
using Android 8, but failed to run. This was found to be associated with an unimportant function that obtained the
version of Linux used for Android.
The earlier applications were produced using Eclipse, running via Linux Ubuntu. on a fast Core i7 based PC. With Eclipse
support being withdrawn, attempted updates made this unusable. I installed Android Studio, via Windows 10, on the
Core i7 based PC. initially with only Java support. The installation allowed access to an Android 8 emulator (AVD) that
demonstrated the faulty app condition. The existing Java only apps were imported into Android Studio and, with some
complicated option changes, produced working programs. I gave up with Native Code applications, because of painfully
slow attempts to update and use Android Studio, but still use it with Android 8 AVD. I managed to install Eclipse Neon
with Cygwin, via Windows 10, to reproduce working applications with no difficulty.
The Eclipse Neon software uses a later C compiler, that produces a few performance differences from the original, some
better and some worse. Results provided show old and new speeds, to demonstrate this. In the previous report, full
sets of results were provided up to systems using ARM Cortex-A53 CPUs. Here, for comparison purposes, results include
a set for a Cortex-A9 CPU, with new ones for Cortex A57, A72, A73 (Google Pixel 2) and Exynos 8890. The new devices
mainly use 64 bit Android.
In 2012, it looks as thought the top end Android devices had Cortex-A9 CPUs, up to 33% faster than the one reported
here (quad core 32 bit Nexus 7 at 1200 MHz). Running the old single core Classic Benchmarks, compiled for 64 bits,
showed 2017 average performance per MHz improvements of Whetstone 1.6 x, Dhrystone 3.7 x, Linpack 4.5 x and
Livermore Loops 2.8 x. Running at higher MHz indicates that real performance improved these ratios by 1.44 times.
The second set of tests are for CPU execution and/or transfer speeds with data from caches and RAM. Here,
2017/2012 unadjusted speed gains were, typically, 5 times from L1 cache, 9 times from L2 cache, 21 time reading RAM
and 10 times with RAM reading and writing.
The third benchmark group measures multi-core performance, using 1, 2, 4 and 8 threads. MP performance gains,
relative to single core speeds, could be expected to be at least 1.95 times for 2 cores and 3.80 times with 4 cores,
running at the same MHz. With the latest processors, half of the cores run at a lower MHz than the maximum
(big.LITTLE technology), where gains (for those covered here) might be 3.6 for a CPU with 4 cores and at least 6.8
using 8 out of 8 cores. It seems that the latter are difficult to achieve. On the plus side, the benchmarks demonstrate
that multiple cores are required to obtain maximum RAM data transfer speeds.
The MP benchmarks include one intended to demonstrate high levels of MFLOPS. The latest technology (here) produces
up to 5.27 MFLOPS per MHz, for a single core, ten times more efficient than 2012 systems. Recorded maximum multi-
core performance was 42 GFLOPS (4 cores), seventeen times faster than a 1.2 GHz quad core Cortex-A9 CPU.
The final benchmarks are for graphics and drive speed. The two for graphics cover OpenGL and drawing via Java.
Performance of both is dependent on Java that comes with different Android releases. The latest devices are indicated
to be up to three times faster than those in 2012 and this could be much higher if maximum speed was not clamped at
60 frames per second. Measuring drive speed has become more complicated but greater than 200 MB/second is
indicated for recent main drives, four times faster than six years earlier.
Then there are the programs provided for stress testing, with one for floating point arithmetic and the other for fast
data transfer with integer arithmetic. These have input parameters for a choice of tests, number of threads, data size
and running time. A new app, that measures MHz for up to 8 cores, has been provided. Although other frequency
sources could be employed, it does appear to reflect changing performance due to heating effects and battery charge
changes. Results are provided for 10 minute tests, using 8 threads. The original Cortex-A9 and new Cortex-A73 based
devices survived the test period, with most frequency samples showing all cores running at maximum MHz. Worst case
indicated a 60% reduction in average MHz and measured performance, with others between 20% and 30%. These
appear to be due to heating effects.
An example of the stress tests used to monitor the effects of a discharging battery is also provided. Here, the fastest
four cores were switched off from running at maximum MHz. Later, speed of the other four cores was reduced by 17%
until power was automatically turned off. Measured performance reduced from 37 to 14 GB/second, with average per
core frequency down from 1364 to 500 MHz, a similar proportion.
To Start
Download Benchmark Apps
A Settings, Security option may need changing to allow installation of non-Market applications.
NativeWhetstone2.apk Dhrystone2i.apk
First standard benchmark First integer benchmark
LinpackDP2.apk LinpackSP2.apk
First comptutational benchmark Single precision Linpack
MemSpeedi.apk
LivermoreLoops2.apk
Floating Point Cache and
First supercomputer benchmark
RAM Test
BusSpeedv7i.apk RandMemi.apk
Integer Bus, Cache and RAM Random/Serial Access
Test Cache and RAM Test
fft1.apk fft3c.apk
Original FFT Benchmark Optimised FFT Benchmark
NeonSpeedi.apk NEON-Linpacki.apk
NEON Memory Speed Test Linpack Benchmark using ARM
Using Intrinsic Functions NEON Intrinsic Functions
MP BENCHMARKS
MP-WHETSi.apk
MP-Dhryi.apk
Whetstone Floating and Fixed
Dhrystone Integer Benchmark
Point Tests
MP-MFLOPS2i.apk MP-BusSpd2i.apk
CPU, Cache, RAM MFLOPS Long running vesion
Long Running Test with staggered start
NEON-MFLOPS2i-
MP-RndMemi.apk
Multithreaded RandMem
MP.apk
Benchmark MP-MFLOPS using ARM
NEON Intrinsic Functions
DRIVE TESTS
DriveSpeed1.apk DriveSpeed2.apk
SD card/internal drive tests Drive tests, user defined path
STRESS TESTS
MP-FPU-Stress.apk MP-Int-Stress.apk
Floating Point Stress Test Integer Stress Test
Variable Run Time Parameters Variable Run Time Parameters
CP_MHz2.apk
Measure CPU MHz
Up to 8 Cores Separately
JAVA ONLY
JavaOpenGL1.apk JavaDraw.apk
3D Graphics Frames Per second Draw Frames Per Second
To Start
Background
Details and downloads of my original Android benchmarks are available in Android Benchmarks For 32 Bit and 64 Bit CPUs
from ARM and Intel.pdf. They comprise either all Java code or a Java front end and native code programmed in C. The
latter provides facilities to automatically target 32 bit or 64 bit ARM or Intel technology based systems.
The latest of the above benchmarks were produced using gcc 4.8, via Eclipse Android Development Tools (ADT),
running under Linux Ubuntu 14.04. In 2017, on introduction, it was found that the apps could be installed via Android
8.0 but failed to run. These included the all Java benchmarks. By that time, the updated Ubuntu Eclipse could not be
used to produce Android apps. Following claims that Eclipse ADT was no longer supported and that Android Studio
should be used instead, I installed the software.
Android Studio was installed on my fast Core i7 based PC using Windows 10, initially with only Java support. The
installation allowed access to an Android 8 emulator (AVD) that demonstrated the faulty app condition. This was
associated with an unimportant function that obtained the version of Linux used for Android (probably a string of
excessive length). Avoiding use of this function provided a workable Android app, with some effort in modifying Android
Studio settings after importing the original project. One issue was that some activities were painfully slow. Then, after
several days in installing options and trying to enable use of native code, I abandoned Android Studio, except for using
the AVD, that can allow native code apps to be downloaded, installed and tested. The new project files, that can be
imported into Eclipse (or Android Studio?), are available in this zip file.
After reading suggestions to use Eclipse Neon, I decided to try that, running via Windows, where Cygwin also has to be
installed. I used procedures detailed in the following (posted February 2017): Setting up an Android Development
Environment in Windows with Eclipse. I was delighted to see that existing projects could be used, where, unlike Android
Studio, with minor changes to comment out the error data, change version identification and omit compile directives to
include code for MIPS and old ARM CPUs, that are said not to be supported anymore. The icon images also needed to
be changed to differentiate between versions, but that only involved a few minutes for each app.
The latest Eclipse installation uses the newer gcc 4.9 compiler for ARM programs and (possibly) gcc 6, via Cygwin, for
Intel code. These result in performance variations, some better and some worse, compared with the earlier
benchmarks. As for the latter, the new ones have been tested on 32 bit and 64 bit ARM and Intel CPUs and emulators,
using Android 5, 6, 7 and 8. The 64 bit Intel CPU tests were run via Android/Remix OS (now discontinued). Running on a
3.9 Ghz Intel Core i7 processor, by far, these produced the fastest Android benchmark results. However, the latest ARM
CPUs are catching up.
Some of the benchmarks use ARM NEON intrinsic functions. The first of my Android benchmarks were only compiled for
ARM CPUs, but Android included a compatibility layer, called Houdini, that mapped ARM instructions into those for X86
processors. Later compilers appeared to translate these into Intel SSE type instructions. With the latest compiler, these
would not run using Eclipse Intel emulators. So they were compiled for ARM only, but still run on Intel CPUs.
CPU Benchmarks - The first set are the Classic Benchmarks that were the original 1970s to 1980s programs that set
standards of performance for computers, comprising Whetstone, Dhrystone, Linpack and Livermore Loops.
Memory Benchmarks - Next are programs that measure performance with data from caches and RAM. MemSpeed
(including NeonSpeed variant), BusSpeed and RandMrm all use the same range of data sizes beteen 4 KB and 64 MB.
Then there is a Fast Fourier Transform benchmark with multiple data sizes.
MultiThreading Benchmarks - These all measure performance using 1, 2, 4 and 8 threads. The first are MP-
Whetstone, MP-Dhrystone and MP-Linpack. The next batch all use memory sized 12.8 KB, 128 KB and 12.8 MB,
comprising MP-MFLOPS (including NEON-MFLOPS MP), MP-BusSpeed and MP-RandMem.
CPU Stress Testing Programs - These have variable parameters to run MP benchmarks for extended periods, for
identifying overheating and discharging battery performance issues.
This provides an overall rating in MWIPS, plus separate results for the eight test procedures in MFLOPS (floating point)
and MOPS (functions and integer). For full details and results via Windows. Linux, Android and via different programming
languages, see Whetstone Benchmark Results (including Windows tablet versions running on desktop PCs), also
Whetstone Benchmark History and Results from the 1960’s.
Below are results from the latest 4A8 compilations along with some of the original ARM/Intel measurements, for
comparison purposes. Further results are in the Earlier Report.
Preceding the detailed results are comparisons in the form of overall MWIPS per MHz and average (Geometric) MFLOPS
per MHz. The former provides ratios of around 1.0, for 32 bit devices, except where the EXP test is particularly slow.
The ratio is greater than 1.4 using the newest 64 bit technology. MFLOPS performance of the the latest compilations
similar to the older ones.
As reflected by T7 and P37, Java performance can vary significantly between different versions of Android.
32 Bit 64 Bit
Code CPU MHz MWIPS MFLOPS Code CPU MHz MWIPS MFLOPS
/MHz /MHz /MHz /MHz
Original ARM/Intel
T7 v7-A9 1200 0.61 0.22 P42 v8-A57 2000 1.00 0.19
P37 v8-A53 1500 1.04 0.26 P43 Ex8890 2300 1.42 0.31
P42 v8-A57 2000 0.64 0.20 R2 Core i7 3900 1.55 0.29
A5 Atom 1840 1.03 0.27
4A8 Compilations
T7 v7-A9 1200 0.86 0.19 P42 v8-A57 2000 0.99 0.18
P37 v8-A53 1500 0.97 0.23 P43 Ex8890 2300 1.45 0.33
T23 V8-A72 1800 1.14 0.26 P44 v8-A73 2350 1.41 0.28
A5 Atom 1840 1.02 0.28 R2 Core i7 3900 1.63 0.29
The Dhrystone integer benchmark produces a performance rating in Vax MIPS (AKA DMIPS). Further details of the
Dhrystone benchmark, and results from Windows and Linux based PCs, can be found in Dhrystone Benchmark results
The shown ratio, MIPS/MHz, is often quoted and is normally constant using the same benchmark on the same range of
processors. The significant improvements with the 64 bit versions might be due to using additional registers or newer
instructions, but it could be that newer features enable recognition of more code that can be optimised out. Over-
optimisation is a recognised feature of this benchmark. Further results are in the Earlier Report.
The Linpack benchmark speed is measured in MFLOPS, officially for double precision floating point calculations. For
fastest speed, the benchmark was recompiled to use single precision numbers natively and via NEON instructions.
Performance of the Linpack benchmark is almost entirely dependent on the calculation x[i]=x[i]+c*y[i], where changes
in CPU instructions used can have a dramatic effect. Results from various hardware and software platforms can be
found in Linpack Benchmark results, and more in an earlier Android report.
Besides MFLOPS ratings, calculations of MFLOPS per MHz are also shown below. The 64 bit ratios are generally better
than at 32 bits and all better than those for the Whetstone benchmark. Using different precision, or instructions at 32
or 64 bit operation, produces slightly different numeric results as shown in the sumchecks below.
Numeric Sumchecks
The Livermore Loops comprise 24 kernels of numerical application with speeds calculated in MFLOPS (double precision).
A summary is also produced, with maximum, minimum and various mean values, geometric mean being the official
average. Details and results from various hardware and software platforms are provided in Livermore Loops Benchmark
results report (including Windows tablet versions running on desktop PCs), with further Android based results in an
Earlier Report.
Below are MFLOPS scores for the 24 kernels, preceded by MFLOPS per MHz calculations for maximum, geometric mean
and minimum values. Geomean calculations are shown to be similar between the original and 4A8 compilations, but with
wide variations in those for maximum and minimum, due to the different compiler version used. Linpack DP ratios are
also replicated to show similarities, where newer technology and 64 bit working obtain higher ratings.
P37 199 296 331 325 132 141 236 267 250 328 229 225
v8-A53 342 475 452 242 150 75 435 229 391 213 185 167
82 105 237 279 413 295 111 113 187 228 316 316
209 239 131 133 353 100 186 250 185 138 286 136
A5 689 701 1108 874 230 488 673 677 824 696 377 469
Atom 662 770 877 405 440 428 24 750 958 365 386 767
Z8300 141 281 293 466 540 433 243 160 255 444 577 376
314 308 650 176 662 148 407 332 577 184 785 206
ARM/Intel 64 Bit Version--------------- ---------------------------------------
P42 1621 606 934 923 203 806 1413 1017 774 766 354 602
v8-A57 2303 1879 1633 765 146 773 1315 1663 507 633 390 1075
265 446 534 843 737 900 292 426 489 845 785 833
392 558 1145 240 570 163 366 553 666 252 672 361
P43 4375 1116 1039 1044 306 957 2180 1714 1289 1272 744 999
Ex8890 5905 3865 2546 1011 201 1281 2189 2282 2071 1168 857 1501
468 388 609 1371 1567 2029 395 379 606 1411 1298 1070
975 641 1217 339 1111 236 556 642 1222 351 1423 541
R2 9376 3395 2496 2523 560 2220 4519 3023 2579 2524 987 2306
Core i7 8892 5720 5828 2749 440 3146 4725 5820 5515 2572 1313 3565
1183 1273 2283 2333 2379 5723 534 1296 2274 1953 2496 3469
1069 967 2967 1436 1591 357 1067 2115 3103 1388 2023 2057
This benchmark measures data reading speeds in MegaBytes per second carrying out calculations on arrays of cache
and RAM data, sized 2 x 8 KB to 2 x 32 MB. Calculations are x[m]=x[m]+s*y[m] and x[m]=x[m]+y[m], using double and
single precision (DP and SP) floating point and x[m]=x[m]+s+y[m] and x[m]=x[m]+y[m] with integers. Million Floating
Point Operations Per Second (MFLOPS) speed can be calculated by dividing DP MB/second by 8 and 16, for the two
tests, and SP speeds by 4 and 8. For more details and older results see this archived file and earlier report.
Detailed results are below, following MFLOPS per MHz calculations for the multiply and add calculations. These are the
same as the main Linpack benchmark functions, whose results are provided for comparison purposes. As with earlier
benchmarks, for many results, 64 bit performance was better than at 32 bits. Memspeed ratios were closer to those for
Linpack NEON, rather than DP/SP scores. There are exceptions with the latest compilations being slower, where Linpack
NEON also seems to suffer.
T7, ARM Cortex-A9 1200 MHz, Android 4.1.2, DDR3 5.3 GB/s
16 1735 888 2456 2726 1364 2818 3254 1676 2213 4420 2068 2063 L1
32 1448 760 1474 1700 1039 1648 1538 800 1139 1712 1127 1103
64 1318 719 1290 1468 952 1385 1497 812 1166 1574 1129 1216 L2
128 1279 715 1289 1443 944 1336 1475 779 1099 1609 1133 1214
256 1268 714 1279 1435 943 1313 1494 783 1162 1551 1125 1213
512 1158 691 1204 1321 892 1228 1335 717 1110 1564 1110 1190
1024 729 553 735 772 632 742 780 584 705 824 690 720 RAM
4096 445 392 425 442 421 439 450 393 430 456 420 425
16384 435 390 428 435 412 431 441 403 426 457 428 432
65536 445 404 393 450 432 449 413 397 424 459 430 432
MFLOPS 217 222 407 419
16 2344 1172 2091 2998 1549 2528 4718 2397 2408 5062 2829 2534 L1
32 2227 1122 2014 2851 1514 2423 4465 2355 2319 4659 2714 2426
64 2169 1131 1945 2720 1476 2305 4160 2312 2231 4271 2601 2331 L2
128 2214 1141 1976 2778 1485 2333 4136 2300 2219 4246 2563 2182
256 2219 1142 1988 2816 1500 2343 4037 2298 2215 4214 2572 2311
512 2080 1095 1853 2579 1383 2178 3380 2074 1993 3426 2278 2082
1024 1474 952 1408 1657 1149 1543 1746 1478 1494 1756 1548 1508 RAM
4096 1333 945 1294 1414 1098 1341 1552 1366 1389 1565 1417 1409
16384 1328 947 1274 1395 1112 1357 1573 1373 1394 1583 1435 1421
65536 1351 936 1331 1461 1116 1413 1623 1387 1422 1642 1464 1444
MFLOPS 293 293 590 599
P42, Qualcomm 810, ARM Cortex A57, 2000 MHz, Adroid 5.1.1
Dual Channel RAM 1600 MHz, 25.6 GB/sec
16 6580 3487 4236 6996 3453 6933 5927 3517 14427 7498 17577 16451 L1
32 4446 2719 3320 5506 3068 3445 5050 2743 9725 5088 10310 11020 L2
64 4781 2870 2783 4127 2670 3449 4308 2542 8109 5828 9151 7730
128 4800 2800 2692 4107 2722 3482 4461 2570 8038 5366 9642 8300
256 4847 2885 2642 4043 2715 3460 4733 2670 8315 6045 9901 8309
512 4740 2954 2603 3948 2633 3220 4735 2583 7952 5975 9686 8168
1024 3623 2444 2424 3376 2317 2850 3804 2337 5169 4358 5534 5121
4096 2645 2429 2353 2583 2301 2577 2668 2261 2720 2688 2732 2670 RAM
16384 2576 2390 2335 2603 2295 2545 2661 2266 2703 2680 2738 2714
65536 2594 2213 2203 2658 2318 1902 2723 2240 2755 2739 2769 2763
MFLOPS 823 872 741 879
P42, Qualcomm 810, ARM Cortex A57, 2000 MHz, Adroid 5.1.1
Dual Channel RAM 1600 MHz, 25.6 GB/sec
16 15334 15434 14677 15163 16860 18519 13753 8271 8801 16927 8837 9117 L1
32 10261 13203 12917 13552 14029 14520 12295 7621 7988 13158 7772 8011
64 11339 11337 11961 12579 12436 13015 12746 8124 8674 12822 8496 8728 L2
128 11805 11414 11800 11654 11821 12351 12514 7896 8691 12662 8302 8746
256 10360 12029 11709 11976 9575 9569 12289 8050 8598 12206 8176 8650
512 12144 10733 11702 9326 11734 10165 12039 8057 8605 12138 8242 8350
1024 8995 8690 8649 8842 8670 8986 9706 6146 6400 8420 6197 6529
4096 3498 3510 3504 3519 3523 3530 3339 3213 3236 3358 3168 3175 RAM
16384 2972 2974 2974 2968 2985 2987 2863 2826 2822 2869 2764 2771
65536 2958 2908 2937 2932 2952 2928 2867 2825 2830 2881 2774 2771
MFLOPS 1917 3859 1719 2068
16 18080 27221 20496 32618 20498 36302 2050 4018 4220 4439 3472 3701 L1
32 23550 22959 19498 23419 19252 19533 4024 4026 4182 4280 3492 3709
64 19365 19132 19269 19291 19506 19233 4031 4035 4112 4006 3490 3482 L2
128 19350 19216 19065 19415 19104 19122 4034 4035 4112 3996 3441 3419
256 20535 20376 20278 20501 20184 20440 4029 4029 4112 3991 3411 3407
512 20626 20512 20392 20624 20313 20388 4026 4027 4113 3987 3405 3405
1024 20867 20714 20828 20845 20956 20621 4025 4014 4106 3983 3387 3403
4096 9587 8897 9341 9374 9427 9661 4042 4029 4225 4208 3475 3641 RAM
16384 9415 9448 9403 9398 9368 9291 4051 4044 4239 4203 3478 3648
65536 9319 9331 9309 9232 9310 9258 4057 4056 4249 4256 3483 3650
MFLOPS 2944 6805 507 1014
P44, Qualcomm 835, Cortex A73, MHz 4x2350 + 4X1900, Adroid 8.0
Dual channel LPDDR4 1866 MHz 29.8GB/s
16 64599 46528 48775 74152 58161 53413 35756 12734 58942 51023 77189 59013 L1
32 64632 47018 50799 79043 61563 57263 36041 12763 58972 51506 77980 59073
64 49628 42403 42997 51183 46691 46244 33758 12768 45965 42061 47033 46022 L2
128 48556 42075 42700 49985 46092 45831 33862 12777 46231 42263 47073 45834
256 44225 37976 38268 46194 41269 41254 29299 12767 41194 35646 43612 42116
512 31103 29151 29336 31472 30662 30681 24114 12753 29662 27226 30229 29998 L3
1024 30323 28673 28768 30925 29883 29889 23944 12757 28886 26658 29697 29404
4096 30004 28313 28366 30483 29488 29511 23464 12639 28599 26288 29366 29075
16384 15360 15647 15577 15118 15409 15430 13318 10367 13345 13707 13426 13383 RAM
65536 14766 15072 15004 14623 14807 14825 13051 10147 12924 13232 12923 12895
MFLOPS 8079 11755 4505 3194
This benchmark carries out the same calculations as the MemSpeed Benchmark, except they are all in single precision,
for comparison with NEON sections. The latter are carried out using NEON intrinsic functions, where, for Intel CPUs, the
compiler translated the functions into SSE instructions. This did not work in the revised 4A8 version, leading to only
ARM code being produced, where it still runs on Intel CPUs, apparently using the original Houdini conversion layer. For
further details and results see earlier Android report.
Some of the results are rather strange, where expectations might be similar speeds as MemSpeed, with normal
calculations, NEON faster than normal and 64 bits faster than 32 bits. Perhaps there are memory alignment issues with
this benchmark and MemSpeed. Note that slow maximum MFLOPS are often associated with nearly constant slow
performance at all memory sizes used, maybe caches are being flushed with most data reading being from RAM.
MFLOPS/MHz Comparisons
NeonSpeed MemSpeed
Original 4A8 Original 4A8
Norm Neon Norm Neon SP SP
ARM/Intel 32 Bit Version
T7 v7-A9 0.19 0.51 0.53 0.56 0.19 0.35
P37 v8-A53 0.20 0.78 0.53 0.69 0.20 0.40
T23 V8-A72 1.79 1.93 0.82 0.82
P42 v8-A57 0.87 1.76 0.87
A5 Atom Z830 0.48 0.82 0.89 0.81 0.47 0.45
ARM/Intel 64 Bit Version
P42 v8-A57 1.69 1.77 1.02 1.93 2.41 1.29
P43 Ex8890 2.29 2.41 1.35 3.73 2.96 0.44
P44 v8-A73 1.26 2.03 1.29
R2 Core i7 1.00 2.74 1.32 3.71 3.01 0.82
T7, ARM Cortex-A9 1200 MHz, Android 4.1.2, DDR3 5.3 GB/s
16 881 2440 2501 3334 3206 3465 2514 2702 3418 3169 2985 2880
32 901 1868 1705 2260 2083 2186 2530 2681 3474 3111 2964 3212
64 801 1395 1365 1573 1548 1581 1438 1379 1543 1461 1600 1630
128 784 1282 1278 1405 1389 1411 1366 1196 1504 1327 1566 1604
256 787 1279 1285 1420 1380 1409 1415 1253 1432 1312 1560 1632
512 777 1266 1267 1409 1370 1394 1321 1159 1468 1287 1551 1531
1024 604 786 762 769 770 828 782 760 821 805 807 892
4096 458 479 477 463 486 488 451 455 430 432 455 493
16384 436 447 448 469 470 469 451 456 451 441 453 508
65536 450 472 469 240 482 483 438 327 449 218 448 501
MFLOPS 225 610 633 676
16 1174 4673 2189 4908 4779 5312 3192 4154 3979 4546 4445 5021
32 1152 4271 2100 4478 4370 4784 2847 3638 3446 3932 3790 4162
64 1129 3961 2030 4122 4036 4335 2817 3573 3416 3839 3721 4058
128 1131 4001 2002 4195 4108 4386 2884 3653 3578 3905 3806 4021
256 1129 4038 2049 4230 4138 4393 2868 3679 3618 3990 3865 4230
512 1068 2970 1815 3043 3012 3158 2601 3137 3019 3320 3293 3480
1024 933 1724 1422 1733 1723 1756 1637 1723 1724 1749 1730 1760
4096 914 1533 1279 1297 1385 1438 1448 1506 1495 1476 1510 1519
16384 887 1502 1287 1511 1515 1516 1452 1502 1418 1446 1479 1482
65536 902 1322 1221 1113 1530 1479 1417 1474 1482 1124 1481 1495
MFLOPS 294 1168 798 1039
P42, Qualcomm 810, ARM Cortex A57, 2000 MHz, Adroid 5.1.1
Dual Channel RAM 1600 MHz, 25.6 GB/sec
16 3523 6058 4676 6052 6245 6163 6526 5962 6830 6346 7156 7165
32 2926 5163 3585 5275 5711 5322 4617 4048 5166 4707 5353 5166
64 2988 4406 2894 4892 4786 4713 4502 4027 4670 4157 5299 5168
128 2691 4085 2876 4648 4657 4662 4511 3932 4685 4222 5468 5242
256 2856 4274 2912 4865 4752 4714 4659 4141 4707 4056 5367 5176
512 2848 3929 2635 4550 4557 4509 4579 3590 4462 4070 5310 5078
1024 2613 3536 2634 3629 3597 3633 3532 3300 2601 2644 3814 3757
4096 2259 1928 2229 2407 2512 2503 2543 2523 1763 1903 2469 2637
16384 1654 1849 2373 2543 2592 2595 2549 2064 2516 2051 2557 1979
65536 2375 2632 2429 1902 2711 2731 2219 2530 2550 1499 1901 2639
MFLOPS 881 1515 1632 1491
P42, Qualcomm 810, ARM Cortex A57, 2000 MHz, Adroid 5.1.1
Dual Channel RAM 1600 MHz, 25.6 GB/sec
16 13518 14199 12972 13623 14554 14776 4095 15470 8869 18296 20535 22364
32 11676 11531 11652 11339 11308 12194 8197 15159 8763 17449 19000 20012
64 12050 11708 12105 12263 13013 12928 8142 14200 8692 14906 15645 16157
128 10989 11306 11338 10762 11664 11724 8018 14568 8702 14138 15337 15349
256 11239 11153 10987 11141 11579 11695 8075 13967 8618 13932 14960 15169
512 10534 10140 8796 10847 9789 11507 7715 13740 8252 13954 14651 14860
1024 9034 8906 11408 11178 9832 11898 7990 13460 8605 13470 14847 14532
4096 3670 3741 3768 3758 3785 3790 3193 3375 3274 3348 3451 3468
16384 3155 3187 3188 3145 3202 3191 2847 2888 2844 2892 2969 2976
65536 2888 2971 2953 2506 2979 2980 2890 2942 2894 2441 3021 3029
MFLOPS 3380 3550 2048 3868
16 21103 22131 20494 23346 23247 24956 12298 34094 14419 36314 36453 40732
32 17495 18631 19910 20141 21206 23021 12381 34284 14422 36620 36611 41090
64 12659 13007 12652 13222 13167 12930 12017 18659 12514 18859 18952 18959
128 12494 13420 13658 13521 13434 13489 12280 20062 13452 20186 20290 20231
256 13783 13821 13771 13754 13896 13886 12397 20563 13632 20580 20592 20551
512 13708 13984 13950 14002 13973 13938 12441 20763 13716 20754 20741 20745
1024 13890 14021 13982 14040 14051 14041 12453 20821 13748 20789 20777 20822
4096 2061 6638 2902 6375 6771 4990 9413 9638 9490 9588 9602 9615
16384 2402 5549 3000 4899 4928 5254 8819 9317 8731 9268 8803 8959
65536 2924 5659 3392 5779 6314 6676 7989 8917 8773 9003 8950 9349
MFLOPS 5276 5533 3095 8572
P44, Qualcomm 835, Cortex A73, MHz 4x2350 + 4X1900, Adroid 8.0
Dual channel LPDDR4 1866 MHz 29.8GB/s
16 14705 42623 16797 46797 52332 52696 20310 57722 25805 75312 75974 65421
32 14750 42792 17798 49624 54033 54690 20274 57929 26466 76702 80293 69875
64 15576 37757 17823 41115 41340 43711 20497 48675 25965 49709 49954 49877
128 15603 37884 17864 41382 43787 44439 20547 47772 25887 48555 49167 49125
256 15595 34071 17867 36314 38923 39580 20062 40375 24022 41242 44438 44263
512 15466 27798 17871 28896 30298 30414 18587 30667 20853 30845 31208 31146
1024 15353 27510 17998 28548 29643 29725 18719 25618 14507 25767 29924 30827
4096 14411 27435 18067 28355 29534 29617 18516 29858 20632 30055 30046 29561
16384 13472 15856 14241 15865 15732 15711 13710 14873 14565 14816 15075 14810
65536 13154 15139 13886 14634 14979 14942 13645 14632 14210 14511 14372 14388
MFLOPS 3901 10699 5137 14482
This benchmark (based on PC version with details and results in busspd2k results.htm) is designed to identify reading
data in bursts over buses. The program starts by reading a word (4 bytes) with an address increment of 32 words (128
bytes) before reading another word. The increment is reduced by half on successive tests, until all data is read. On
reading data from RAM, 64 Byte bursts are typically used. Then, measured reading speed reduces from a maximum,
when all data is read, to a minimum on using 16 word increments (64 bytes). Potential maximum speed can be
estimated by multiplying this minimum value by 16. For more details and further results see old archived report and the
earlier Android report. It seems that there can also be burst reading from caches and multiple threads or programs are
needed to demonstrate maximum throughput. See MP-BusSpeed Benchmark.
Results below indicate that the new 4A8 speeds and the earlier ones are effectively the same and 64 bit versions are
faster than 32 bit compilations. Maximum MB/second per MHz ratio calculations are also show, demonstrating faster
speeds using later technology.
Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read Inc32 Inc16 Inc8 Inc4 Inc2 Read
KBytes Words Words Words Words Words All Words Words Words Words Words All
T7, ARM Cortex-A9 1200 MHz, Android 4.1.2, DDR3 5.3 GB/s
16 2723 2420 3044 3364 3499 3500 2651 3171 3011 3063 3709 3064
32 1054 1087 1061 1382 1565 2145 886 955 906 1407 1542 1927
64 436 433 419 652 751 1160 394 402 398 648 669 1252
128 345 337 337 542 633 943 333 295 321 539 647 1109
256 329 309 322 522 614 961 327 338 322 534 640 1095
512 339 299 311 506 574 937 327 336 317 503 614 1069
1024 170 168 180 269 349 629 150 102 156 250 326 677
4096 59 55 84 127 176 338 57 56 88 129 177 383
16384 56 56 83 125 173 335 55 55 87 127 176 399
65536 56 56 82 125 174 334 55 55 87 127 175 400
Max/MHz 2.92 2.55
P37, ARM Cortex-A53 1500 MHz, Android 7.0 Single RAM LPDDR3 933 MHz, 7.5 GB/second
16 2080 2309 2730 2889 2905 2936 3235 3885 4370 4802 4860 3502
32 1081 1134 1734 2349 2806 2888 1033 1083 1802 3028 3898 3334
64 782 806 1437 2075 2689 2823 758 774 1430 2470 3543 3315
128 729 749 1338 2026 2687 2843 703 713 1294 2307 3436 3325
256 694 711 1294 2001 2686 2847 669 678 1249 2217 3372 3282
512 394 429 802 1272 2226 2638 432 459 693 1270 2369 2406
1024 199 196 317 595 1473 2497 187 179 391 783 1544 2753
4096 182 184 371 600 1242 2534 182 184 357 722 1396 2559
16384 184 186 372 620 1272 2509 172 178 253 520 1151 2178
65536 179 188 371 501 1223 2435 153 160 277 665 1326 2500
Max/MHz 1.96 2.33
P42, ARM Cortex A57, 2000 MHz, Adroid 5.1.1, Dual Channel RAM 1600 MHz, 25.6 GB/sec
Intel Atom Z8300 quad core 1.44 GHz Turbo 1.84 Android 5.1, 4 GB DDR 3 1600
16 5857 5444 6672 6835 6777 6923 4700 5431 6659 6909 6985 7039
32 1432 1525 2134 2492 3511 4828 1465 1571 2356 2683 3660 5116
64 1400 1523 2392 2688 3485 4690 1450 1566 2434 2790 3667 5125
128 1444 1546 2253 2419 3153 4427 1469 1596 2412 2774 3652 5112
256 1410 1588 2387 2750 3593 4949 1473 1598 2426 2763 3542 4998
512 1464 1567 2367 2692 3530 4643 1331 1534 2284 2618 3347 4014
1024 236 284 601 1276 2118 3462 334 356 705 1394 2257 3787
4096 176 202 322 561 1505 3000 182 207 421 813 1587 3109
16384 173 202 417 796 1585 3061 181 206 418 812 1583 3103
65536 172 199 429 655 1095 2189 181 207 423 817 1630 3155
Max/MHz 3.76 3.83
P42, ARM Cortex A57, 2000 MHz, Adroid 5.1.1, Dual Channel RAM 1600 MHz, 25.6 GB/sec
16 2904 6468 7186 7375 7476 7524 6406 6677 7341 7515 7605 7775
32 5065 4374 5298 7136 7325 7397 1280 1907 2810 5404 6619 7600
64 853 1184 2259 4109 6015 6598 630 1022 2053 3883 5893 7594
128 774 1079 2069 3869 5803 6465 645 1046 2059 3891 5639 7304
256 773 1078 2082 3869 5817 6468 638 1047 2072 3873 5552 7365
512 778 1084 2085 3876 5822 6474 629 1054 2073 3744 5476 7587
1024 771 1059 2029 3765 5711 6225 595 1000 2075 3867 5485 7129
4096 251 318 637 1203 1891 3115 258 330 654 1227 1959 3770
16384 229 293 591 1155 2040 3659 235 311 621 1182 2116 4008
65536 219 295 594 1159 2079 3648 234 311 621 1184 2128 4003
Max/MHz 4.70 4.86
P43, Exynos 8890 2.3 GHz , Android 7.0 Quad Channel RAM 29.8 GB/s,
16 2724 4271 5843 10001 10032 9994 3011 4418 6023 10176 10181 10345
32 4829 4500 5152 8049 9143 9514 4885 4392 5002 7947 9258 10284
64 1871 1885 3632 4950 8796 9369 1878 1886 3683 4943 8924 10332
128 1918 1937 3593 5010 8824 9402 1927 1934 3641 5009 8942 10347
256 1920 1938 3630 5050 8881 9437 1931 1934 3660 5033 9003 10349
512 1940 1952 3633 5056 8905 9452 1943 1948 3671 5050 9032 10368
1024 1947 1951 3641 5064 8919 9450 1948 1944 3663 5056 9030 10367
4096 963 454 1292 2613 5091 8894 909 384 1385 2647 4954 9446
16384 546 440 1331 2557 5027 8883 346 362 1324 2596 5077 9663
65536 533 445 1333 2598 5032 8945 355 356 1331 2616 5053 9667
Max/MHz 4.35 4.50
P44, Cortex A73, MHz 4x2350 + 4X1900, Adroid 8.0, Dual channel LPDDR4 1866 MHz 29.8GB/s
R2 Core i7 4820K at 3900 MHz 800 MHz RAM, 4 channels, 51.2 GB/s, Android 6.0.1,
16 12958 13094 20980 20905 21003 21077 13043 13097 19228 20341 20174 20014
32 12716 11344 14122 20945 21006 20757 13121 11925 14133 20203 19860 20015
64 6820 7270 11515 16554 19455 21628 6853 6946 10494 15755 19000 19882
128 7313 7511 11450 16387 19483 21674 6965 7150 10397 15655 19042 19901
256 4824 4957 8757 13097 17966 21651 4079 4194 7446 12202 17497 19877
512 2738 2783 5437 9828 16051 21535 2591 2639 5153 9183 15731 19765
1024 2725 2778 5446 9668 16019 21543 2574 2633 5162 9172 15714 19764
4096 2717 2772 5399 9558 15730 21300 2562 2634 5117 9078 15407 19591
16384 756 1089 2329 4508 8828 15692 633 924 2002 3875 7704 13431
65536 723 1044 2219 4318 8809 15722 611 892 1912 3714 7685 13374
Max/MHz 5.40 5.13
RandMem benchmark carries out four tests at increasing data sizes to produce data transfer speeds in MBytes Per
Second from caches and memory. Serial and random address selections are employed, using the same program
structure, with read and read/write tests using 32 bit integers. The main purpose is to demonstrate how much slower
performance can be through using random access. Here, speed can be considerably influenced by reading and writing in
bursts, where much of the data is not used, and by the size of preceding caches. For more details and further results
see archived details and earlier Android report. PC version details are in randmem results.htm 2013.
Below are results for the original and 4A8 versions. Maximum BusSpeedk Ram speeds are also shown for Serial Read
tests, indicating broad similarity. The old and new results are also mainly similar, as are all speeds using L1 cache. Then
there are the slower random speeds using L2 cache and much slower ones via RAM.
T7, ARM Cortex-A9 1200 MHz, Android 4.1.2, DDR3 5.3 GB/s
P42, Qualcomm 810, ARM Cortex A57, 2000 MHz, Adroid 5.1.1
Dual Channel RAM 1600 MHz, 25.6 GB/sec
Intel Atom Z8300 quad core 1.84 GHz Android 5.1, 4 GB DDR 3 1600
P42, ARM Cortex A57 2.0 GHz, Adroid 5.1.1 Dual Channel RAM, 25.6 GB/sec
BusSpd RAM
16 3569 9360 7200 6521 7390 9876 7418 6795 L1
32 6530 8417 5205 5689 7247 8828 6027 6624
64 5997 7708 3138 3140 7171 7896 2868 2911 L2
128 5998 7660 2247 2239 7070 7651 2227 2209
256 5954 7673 1866 1925 7074 7643 1832 1898
512 5957 7680 1792 1783 7074 7645 1751 1768
1024 5950 7681 1744 1641 7037 7278 1656 1649
4096 3768 1822 362 372 3422 1652 382 374 RAM
16384 3882 1576 233 224 3752 1509 227 219
65536 3937 1583 135 133 3796 1561 136 134
BusSpd RAM 3659 4008
P43, Exynos 8890 2.3 GHz , Android 7.0, Quad Channel RAM 29.8 GB/s,
P44, Cortex A73, 2350 MHz, Adroid 8.0, Dual channel, 29.8GB/s
R2 Core i7 4820K 3900 MHz, RAM 4 channels, 51.2 GB/s, Android 6.0.1,
The benchmarks run code for single and double precision Fast Fourier Transforms of size 1024 to 1048576 (1K to
1024K), each one being run three times to identify variance. Results provided are running times in milliseconds. Besides
Android, the bechmarks are available to run via Windows and Linux. Two versions are available FFT1, original version
and with optimised C code as FFT3c. Further details, results, and links for benchmarks and source code are in
FFTBenchmarks.htm 2012. with more in a later report. The latter includes a count of floating point operations executed
for each FFT size, enabling MFLOPS to be calculated. Particularly with the original Version 1.0 benchmark, data
addressing can be mainly on a skipped sequential basis, with speed degraded with burst reading and writing, as in the
RandMem Benchmark tests.
Results below tend to be the second set of measurements. Note that running times are generally more that double at
twice the FFT size, but some times greater when using a higher level cache. It could be expected that single precision
results would be at least as fast as double precision, but this is not always the case - see system P43 Version 3C up to
128K size. The reason is unclear.
Version 3c.0
1 0.270 0.248 0.184 0.230
2 0.65 0.65 0.42 0.65
4 1.67 1.45 1.12 1.50
8 4.30 3.23 2.41 3.34
16 8.33 10.35 5.73 8.04
32 19.23 25.38 12.63 20.28
64 46.41 58.90 30.28 47.96
128 103.31 128.44 70.34 111.33
256 221.99 267.12 162.60 241.52
512 464.30 558.13 374.86 517.52
1024 933.05 1182.49 744.08 1112.40
Version 3c.0
1 0.281 0.174 0.293 0.159 0.149 0.135
2 0.65 0.39 0.64 0.38 0.32 0.31
4 1.42 0.85 1.44 0.86 0.73 0.74
8 3.35 1.95 3.25 1.95 1.72 1.69
16 8.20 8.13 6.95 7.86 4.35 4.70
32 15.99 18.95 15.93 19.43 9.00 12.61
64 37.84 43.62 37.29 42.46 24.01 30.28
128 84.06 96.71 83.55 95.01 56.23 69.75
256 190.32 217.23 186.20 213.21 126.46 161.91
512 425.97 474.15 416.25 462.13 292.45 354.48
1024 928.38 1026.33 897.72 1001.54 638.54 791.93
Version 3c.0
1 0.041 0.041 0.075 0.045 0.043 0.042
2 0.087 0.096 0.149 0.111 0.093 0.100
4 0.20 0.23 0.35 0.26 0.22 0.23
8 0.49 0.58 0.77 0.55 0.51 0.52
16 1.12 1.23 1.65 1.88 1.21 1.19
32 2.45 2.88 4.00 4.30 2.62 2.76
64 7.31 8.49 8.89 13.05 6.21 8.16
128 16.05 23.84 22.07 30.64 15.44 22.83
256 40.14 60.04 54.22 71.70 39.50 58.00
512 99.35 142.94 159.88 173.48 96.11 136.55
1024 245.86 341.08 349.30 370.95 243.94 320.00
P43 Exy 8890 2.3GBz 4A8 Version P23 ARM A72 1.8GHz P44 ARM A73 2.35GHz
Adroid 7.0 Android 5.1.1 Android 8.0
64 Bit V 1 64 Bit 32 Bit 4A8 Version 64 Bit 4A8 Version
K Size SP DP SP DP SP DP SP DP
Version 1.0
1 0.096 0.096 0.048 0.048 0.041 0.042 0.122 0.109
2 0.206 0.252 0.103 0.222 0.088 0.124 0.261 0.235
4 0.54 0.82 0.51 0.70 0.23 0.33 0.56 0.53
8 1.42 1.64 1.42 1.65 0.66 0.74 0.79 0.96
16 2.14 2.52 1.96 2.24 1.46 1.69 2.11 1.68
32 4.84 5.80 4.54 5.21 3.23 4.08 2.90 2.40
64 9.54 10.34 8.88 9.28 8.02 23.32 5.31 7.85
128 17.36 21.93 14.19 20.53 46.72 76.29 13.43 33.55
256 36.55 68.07 35.38 60.17 152.47 185.97 66.42 107.46
512 110.43 192.26 100.55 157.88 341.39 441.27 198.51 258.60
1024 318.35 465.55 280.70 384.91 779.47 1053.30 456.33 566.70
Version 3c.0
1 0.197 0.040 0.196 0.033 0.041 0.038 0.051 0.041
2 0.433 0.093 0.426 0.079 0.089 0.089 0.109 0.088
4 1.01 0.21 1.03 0.18 0.21 0.26 0.25 0.20
8 2.49 0.46 2.53 0.41 0.47 0.63 0.55 0.47
16 3.01 0.95 3.48 0.93 1.10 1.65 1.16 1.08
32 3.59 2.07 4.22 1.99 2.43 4.17 2.41 2.30
64 7.76 5.12 8.95 5.02 6.61 8.99 4.58 5.52
128 16.08 12.27 17.62 11.93 19.78 22.21 10.57 14.92
256 23.57 32.33 29.00 31.25 41.03 50.12 26.16 36.05
512 57.18 84.39 59.94 80.62 90.74 110.98 63.11 82.84
1024 150.93 210.62 150.64 190.89 222.78 251.03 140.31 199.10
Version 3c.0
1 0.147 0.131 0.115 0.069
2 0.198 0.212 0.305 0.167
4 0.45 0.52 1.69 0.36
8 0.97 1.05 1.36 0.81
16 2.14 2.61 3.34 2.29
32 4.82 6.53 5.83 5.32
64 11.10 17.79 13.69 17.75
128 29.95 43.74 35.52 34.74
256 77.43 86.13 89.41 74.67
512 152.95 185.74 166.10 165.81
1024 314.54 460.91 350.61 370.28
Version 3c.0
1 0.020 0.015 0.014 0.018 0.020 0.019
2 0.032 0.032 0.029 0.040 0.046 0.041
4 0.072 0.076 0.064 0.090 0.093 0.093
8 0.16 0.17 0.22 0.20 0.21 0.21
16 0.39 0.45 0.47 0.40 0.49 0.49
32 0.85 0.96 1.11 0.87 1.14 1.05
64 1.82 2.05 2.18 1.89 2.39 2.23
128 3.94 4.36 4.45 4.05 4.86 4.75
256 8.47 9.78 8.66 9.28 10.36 10.96
512 19.52 24.29 17.74 23.36 23.90 27.71
1024 47.35 57.59 43.23 56.68 58.40 67.42
For more information on Whetstone Benchmark see stand alone version, above. The multithreading version runs multiple
copies of the same shared code, with separate variables. In this case, performance of each of the eight test functions
and overall MWIPS ratings is invariably (nearly) proportional to the number of CPU cores available. The driving program
checks that calculations on every thread produce consistent numeric results. Further details and download options for
earlier MP-Whets versions can be found in original multithreading benchmarks 2013 Archive and Last Version of Android
Report.
The overall times for all threads to finish are included with detailed performance ratings. Timing is calibrated to
determine repeat parameters used for all tests, using a single thread. This can vary between around 3 and 5 seconds,
depending on the start up state. Although the actual times cannot be compared across different systems, they can be
used to indicate MP efficiency. T7 and A5 CPUs are quad core, with 8 threads taking nearly twice as long as four. Then
R2 is quad core with hyperthreading, where 8 threads provide significant gains. The others have 8 cores, with half
running at a lower frequency, leading to lower gains with 8 threads. Lower gains can also be produced due to an
automatic reduction in MHz, as more cores are used or because of heating effects. See also Comparison Table.
4A8 Version
1T 840.7 249.0 236.4 131.5 25.7 14.8 560.6 1003.2 255.6
2T 1695.5 502.0 509.3 270.5 52.4 28.0 1183.7 2007.7 505.8
4T 3458.8 1022.4 1003.9 554.1 105.7 58.7 2322.9 3988.3 1011.8
8T 3502.3 1036.1 1025.9 560.9 106.7 59.7 2347.1 4076.5 1024.9
T23 32 Bit 2x1.8 GHz A72 + 2x1.4 GHz A53, Android 5.1.1
4A8 Version
1T 1904.7 519.1 513.1 315.1 67.0 25.0 1501.3 1803.6 770.3
2T 3756.8 925.9 989.5 631.4 134.6 47.9 2985.9 3499.3 1526.8
4T 5447.4 1352.9 1415.3 1018.9 200.6 65.5 4251.7 4667.4 2500.1
8T 6006.3 1425.0 1751.6 969.9 193.9 89.5 4475.5 5277.4 2520.1
P37 32 Bit 4x1.5 GHz A53 +4x1.2 GHz A53, Android 7.0
4A8 Version
1T 1138.0 370.7 375.7 185.7 31.5 20.6 582.0 897.1 494.8
2T 2291.3 630.1 590.6 373.3 64.5 41.3 1389.6 1870.3 979.2
4T 4585.5 1237.2 1206.1 740.7 129.2 83.2 2805.5 3734.6 1955.2
8T 8157.2 2261.8 1843.7 1340.3 234.8 150.5 4622.6 7014.8 3548.8
P42 32 Bit 4x2.0 GHz A57 + 4x1.5 GHz A53, Android 5.1.1
64 Bit
1T 1720.4 286.9 258.0 282.9 65.4 20.5 ###### 1587.3 1140.1
2T 3466.4 568.2 532.3 565.4 132.6 41.2 ###### 3258.7 2307.0
4T 7122.5 1257.8 1190.7 1270.1 228.2 88.0 ###### 6843.2 5016.2
8T 10558.7 2105.2 2025.0 2026.8 297.9 131.6 ###### 9168.3 7902.2
P42 64 Bit 4x2.0 GHz A57 + 4x1.5 GHz A53, Android 5.1.1
P44 64 Bit Cortex A73, 4x2.35 GHz + 4x1.9 GHz, Adroid 8.0
For further details see Dhrystone Benchmark above and the following, that includes further results and a download
optipon for the earlier version android multithreading benchmarks.htm 2013 and Earlier ARM/Intel Report. This
multithreading benchmark runs using 1, 2, 4 and 8 threads, executing multiple copies of the same program. An initial
calibration, using a single thread, determines the number of passes needed for an overall execution time of 1 second.
Then all threads are run using the same pass count, running time being extended when there are more threads than
CPUs. The same calculations are carried out on each thread. Separate data arrays are used for each thread but some
variables can be used by all threads. The latter is probably responsible for failure to increase throughput much, using
multiple threads.
Threads
System CPU MHz Android 1 2 4 8 None
See
This is a multithreading version of the above. Further details and results can be found in android neon benchmarks.htm
2013. and 2017 Android Report.
This benchmark is not generally available with the new 4A8 compilation as overall running time had increased to more
than 400 seconds, on a new phone.
This is a multithreading version of the above. See Last Version of Android Report and here for further results. In the
original MP-BusSpdi benchmark, all threads read data from the beginning. With large shared caches, this could lead to
exaggerated data transfer speeds for RAM based data, using multiple threads. The revised MP-BusSpd2i attempts to
avoid this by arranging for threads to have staggered starting points, but each still reading all the data, besides having
a much longer running time for consistent scores. Performance using a single thread is similar to the non-threaded
version and it is clear that multiple threads are needed to demonstrate maximum throughput. As usual, maximum RAM
speeds can be estimated from burst transfer results, such as 16 times Inc16 MB/second. some results are provided
below. Expected running time for this 2i version is around 50 seconds.
Considering just the important Read All results, there are some inexplicable difference between no threads and
threaded, old old and new, and 64 bit and 32 bit. Other than this, performance gains using cached data, with 8 core
systems, mainly showed reasonable performance gains using 2 to 8 threads. Multiple threads were also required for
maximum RAM throughput, but this was nowhere near the theoretical level. See throughput Comparison Table at the
end of the results below, showing that cached based improvements are generally not quite as good as in the
Whetstone Benchmark tests.
Apparent RAM speed improvements using R2 system, with a Core i7 CPU, running REMIX/Adroid, were influenced by the
10 MB L3 cache providing shared 49 MB data.
T7, ARM Cortex-A9 1.2 GHz, DDR3-1333, 5.3 GB/s Android 4.1.2,
4 x 32 KB L1 cache, 1 MB shared L2 cache
12.3 1T 2166 2774 3181 3307 3377 3263 2647 3281 3392 3642 3640 3287
2T 3924 5188 5207 5754 5759 5805 3985 5369 5635 6238 6266 5790
4T 7570 10011 10252 11165 11375 11777 7271 9901 10887 12418 12576 11488
8T 3510 4786 9011 8318 11351 11544 3404 4771 9734 8279 11375 11280
123 1T 383 409 359 558 663 983 345 352 333 547 651 1135
2T 525 541 520 741 1241 1814 497 522 495 722 1193 1923
4T 739 752 753 1219 1590 2776 704 726 725 1159 1539 2909
8T 735 741 753 1218 1607 2737 696 610 720 1158 1539 2900
49152 1T 56 51 81 126 172 330 54 54 85 125 170 404
2T 65 67 107 196 335 620 64 66 105 189 334 669
4T 70 68 108 215 426 835 71 69 107 211 423 844
8T 70 68 109 215 428 851 71 69 108 211 422 848
12.3 1T 2151 2396 2448 2516 2589 2632 2903 3715 3964 4258 4384 3335
2T 4042 4460 4824 4893 5336 5192 4908 6632 7279 7975 8065 6725
4T 6828 8657 9409 9755 10120 10339 7997 11780 13832 15518 16117 13141
8T 5401 6897 13508 11464 15960 16792 6406 9148 18628 17698 25240 20946
123 1T 674 692 1267 2019 2402 2584 675 666 1203 2094 3143 3239
2T 1031 1043 1999 3591 4737 5047 1018 1045 1984 3668 5781 6310
4T 1064 1164 2168 4185 7761 9879 1067 1110 2206 4366 8025 12191
8T 1734 1857 3429 6438 10447 15287 1800 1869 3622 6938 11690 17271
49152 1T 163 172 337 674 1236 2098 160 169 326 661 1300 2334
2T 297 282 566 1101 2175 3735 287 288 600 1175 2318 4224
4T 431 390 751 1470 3053 5716 430 360 739 1510 2956 5722
8T 406 369 786 1621 2897 6031 436 399 752 1716 4242 5817
12.3 1T 5322 6275 6475 6901 6959 6925 5150 5858 6174 6357 6430 6594
2T 4625 4163 6792 8964 10879 11027 7815 4272 6873 9035 10080 11443
4T 2221 3775 4091 8006 15158 19631 5105 11512 4076 8083 15779 18562
8T 1178 1840 3907 3884 8002 15691 2194 2360 4619 4036 8006 14740
123 1T 1438 1891 2342 2601 3477 4957 1325 1524 2154 2640 3445 4699
2T 2509 3489 4597 5115 6807 9275 2708 3028 4518 5321 6782 9398
4T 3591 4849 6905 8356 11204 14596 3821 4877 7098 8654 11165 15514
8T 3868 5327 7014 7860 10754 15998 3805 4686 7352 8170 9947 15291
49152 1T 179 205 391 802 1372 3023 175 185 429 789 1511 2787
2T 238 310 495 1204 2397 4559 215 302 605 1207 2352 4634
4T 240 336 653 1170 2008 4969 343 240 718 1196 2241 5784
8T 291 321 681 1316 2378 5329 312 335 641 1330 2589 5149
12.3 1T 3427 3104 3562 4203 3817 9313 4227 4385 3996 4604 4409 3679
2T 6433 6397 7282 8442 7780 17598 5439 6648 7744 9226 10911 6936
4T 6457 7015 8289 11472 12064 25356 6627 9079 11722 16045 15897 14411
8T 3052 3646 11766 11635 18518 35937 5970 7777 15033 17943 28098 23425
123 1T 527 718 1429 2151 3013 7563 537 642 1304 2086 3811 3327
2T 670 900 1818 3809 6057 13654 718 992 1894 3719 5963 7180
4T 707 977 1973 3996 7280 14666 667 994 1965 4015 7475 12584
8T 1315 1716 3222 6489 12504 24662 1292 1797 3423 6426 12678 20602
49152 1T 160 204 421 869 1372 2826 154 196 395 750 1437 2312
2T 259 293 576 1166 2281 4577 258 283 544 1130 2181 4130
4T 287 419 779 1593 3009 6519 321 425 801 1648 3023 6162
8T 233 419 750 1762 3377 6722 372 474 926 1774 3346 7758
12.3 1T 6897 7209 7237 7261 7363 37205 6109 9771 10070 7328 8449 7859
2T 8396 11837 13028 13850 14299 61243 14592 17326 18030 19515 20019 20347
4T 15301 15090 17095 19485 20765 85343 20825 26166 30411 33036 34255 32872
8T 4228 7820 13112 16952 23373 61980 15221 19637 36246 30585 42923 35993
123 1T 1760 1886 3392 5727 7105 19737 1562 1934 2648 4977 7451 10196
2T 2144 2203 4340 6587 12711 24501 2147 2191 4426 7450 11705 19772
4T 2308 2468 4675 8745 17676 34390 3045 3074 5427 10756 19892 31669
8T 3009 3173 6074 12099 22757 46088 3674 3658 6668 12791 24018 34620
49152 1T 598 613 1027 2253 4604 9114 373 324 1020 2452 3942 6888
2T 932 604 1214 2448 5380 9744 909 733 1323 2657 5963 11235
4T 513 513 1021 2215 4018 7752 1131 660 1181 2409 4740 8990
8T 661 671 1245 2493 5163 9666 1205 783 1500 2912 5630 10637
P44, Qualcomm 835, Cortex A73, GHz 4x 2.35 + 4x 1.9, Adroid 8.0
Dual channel LPDDR4 1866 MHz 29 8GB/s
12.3 1T 11234 11268 11549 9728 11075 83709 13738 14268 13210 13927 14662 13099
2T 13975 18788 21241 20376 21981 126069 20786 23177 28330 28920 29140 29136
4T 11950 16021 25702 25888 22591 129598 22942 25700 44809 50297 38672 39427
8T 7847 11333 22999 26446 39027 137208 20309 23269 49422 48860 71339 76706
123 1T 7270 7472 9070 11037 11565 57013 6627 7110 10155 14135 14721 14773
2T 12151 13359 18497 21814 22939 110321 12189 13171 21547 27892 29038 29461
4T 23054 19821 35736 42796 23494 145387 23459 18394 31322 40386 58468 39959
8T 25125 32352 39249 44178 46373 261178 27662 34905 55828 70513 75218 78417
49152 1T 651 966 1872 3496 7749 18057 617 843 1824 3717 7446 11914
2T 930 1979 3815 6002 11796 33883 1354 1602 3570 6552 13104 21442
4T 2876 3639 7142 13308 26695 60051 2922 2316 6518 11828 24504 39309
8T 3802 4639 12125 22329 39597 106907 4248 5552 11600 20951 42633 69139
Whetstone MWIPS
CPU 2 1.98 1.99 1.95 1.89 1.99 2.19 2.00 2.01
4 3.98 3.95 2.51 3.46 3.25 3.72 3.75 3.89
8 4.03 6.50 2.84 3.49 5.86 4.88 5.49 7.04
These are multithreading varieties of RandMem above. The latest are ARM/Intel versions of the longer running MP-
RndMem2.apk, available from android long MP benchmarks.htm 2016, with further details and results in Last Version of
Android Report. Expected running time for this version is up to 50 seconds.
In most cases, the new 4A8 compilations produced similar speeds to the earlier ones. The most striking feature of these
MP results is the apparent constant performance at all thread sizes, over the memory area covered, during read/write
tests. Although data access is started at staggered addresses, the whole data area is shared and it seems that this
leads to only one thread being used at a timetime, to ensure data integrity. Some examples demonstrated even worse
performance, indicating a decrease using multiple threads. See throughput Comparison Table at the end of the results
below, showing MP Read/Write performance ratios much less than 1.0, in certain cases. Read only multiple thread
throughput is shown as providing reasonable gains, in some cases, but not as good as seen in BusSpeed comparisons
T7, ARM Cortex-A9 1.2 GHz, DDR3-1333, 5.3 GB/s Android 4.1.2,
P44, Qualcomm 835, Cortex A73, GHz 4x 2.35 + 4x 1.9, Adroid 8.0
Serial Rd/Wr
L1 2 0.97 0.93 1.03 0.99 0.69 1.01 0.95 0.89
4 0.96 0.87 0.97 0.98 0.56 1.01 0.89 0.92
8 0.93 0.76 0.88 0.90 0.55 0.80 0.71 0.94
Random Rd
L1 2 1.92 1.89 1.98 1.82 1.98 1.88 2.06 2.11
4 3.69 3.58 2.15 2.48 3.15 3.24 3.52 3.90
8 3.59 4.10 2.00 1.97 4.53 3.34 3.55 3.38
Randon Rd/Wr
L1 2 0.98 0.91 0.96 0.97 0.62 1.20 1.04 0.98
4 0.97 0.79 0.92 0.95 0.55 1.01 0.87 0.74
8 0.95 0.79 0.81 0.91 0.59 0.91 0.68 0.96
The arithmetic operations executed are of the form x[i] = (x[i] + a) * b - (x[i] + c) * d + (x[i] + e) * f with 2 and 32
operations per input data word, using 1, 2, 4 and 8 threads. Data sizes are limited to three to use L1 cache, L2 cache
and RAM at 12.8, 128 and 12800 KB (3200, 32000 and 3200000 single precision floating point words). Further details,
results and links to download original MP-MFLOPS benchmark can be found at android multithreading benchmarks.htm
2013 with the latest ARM only MP-MFLOPS2 compilations from android long MP benchmarks.htm 2016 and Last Version
of Android Report. The newer versions have longer running times that avoid inconsistent speeds produced by the
original.
Each thread uses the same calculations but accessing different segments of the data. The program checks for
consistent numeric results, primarily to show that all calculations are carried out and can be run. An example of results
is shown below, showing the results sumchecks. These are dependent on the actual type of processing hardware and
rounding effects. Examples of alternatives are also shown.
Next, below, are detailed results for the earlier and 4A8 versions, many of which demonstrate similar performance.
Following these are MP performance gains and single thread maximum MFLOPS/MHz ratings. Most of the example gains
are not as good as in BusSpeed comparisons, possibly due to heating effects or avoidance by reducing MHz speeds. On
the other hand, single core MFLOPS per MHz speeds were better than those for other benchmarks above, with the
latest ARM compatible technology achieving up to 5.27 MFLOPS/MHz, implying the concurrent use of four SIMD
multiplies with some linked add instructions, and approaching the best Intel processor results shown.
MFLOPS
1T 188 156 116 598 578 574 180 152 113 625 602 593
2T 365 319 197 1195 1161 1145 366 316 193 1254 1217 1194
4T 682 709 237 2372 2345 2249 670 610 231 2481 2448 2311
8T 678 731 237 2361 2381 2254 724 722 230 2485 2471 2312
Total Elapsed Time 135.0 seco Total Elapsed Time 131.5 seconds
MFLOPS
1T 229 227 220 814 813 801 230 228 221 891 889 875
2T 455 450 435 1626 1623 1609 454 448 430 1778 1772 1753
4T 891 867 687 3225 3219 3181 897 874 672 3530 3515 3460
8T 1283 1307 708 5156 5241 5142 1370 1279 739 5717 5718 5602
Total Elapsed Time 90.1 secon Total Elapsed Time 83.7 seconds
MFLOPS
1T 1271 870 596 2229 2210 2192
2T 1474 1358 949 4388 4067 3917
4T 1985 1753 1140 6220 5876 5414
8T 1583 1551 1290 7514 7152 5962
MFLOPS
1T 1336 1337 1335 2467 2497 2481
2T 1820 2544 2162 4898 4986 4929
4T 2675 2703 2066 5714 5777 5799
8T 2595 2414 1712 6106 5746 5608
A5, Intel Atom Z8300 quad core 1.44 GHz Turbo 1.84
Total Elapsed Time 78.8 secon Total Elapsed Time 32.3 seconds
MFLOPS
1T 1485 1484 718 3584 3434 3430 1772 1746 764 4872 4805 4656
2T 2026 4811 1072 5343 5924 5713 2901 2449 1121 9014 8764 8519
4T 3438 3700 1451 6848 6575 7100 4209 7472 1461 10314 10064 10072
8T 4311 6688 1693 11429 12799 10276 4980 8758 1890 14707 14050 16111
Total Elapsed Time 26.1 secon Total Elapsed Time 19.2 seconds
MFLOPS
1T 4942 4085 3257 5914 5716 2313 3368 3085 2744 12117 11592 3015
2T 8655 7505 4041 11704 11351 4595 8917 7468 4184 20439 23380 6031
4T 9919 11422 3995 18471 18289 7728 9810 12004 4017 42010 41528 10848
8T 7303 14256 3560 20434 20882 12562 9345 15470 3877 36689 39021 14468
Total Elapsed Time 18.0 secon Total Elapsed Time 12.2 seconds
P44, Qualcomm 835, Cortex A73, GHz 4x 2.35 + 4x 1.9, Adroid 8.0
MFLOPS
1T 4539 3709 1367 7894 7933 7732
2T 8500 9170 1770 15336 15388 14802
4T 15335 7610 1896 25973 27855 23491
8T 10395 10542 1973 29124 31385 27072
MFLOPS
1T 13176 8885 6002 21867 22182 21447 18822 14134 5448 22173 20536 20684
2T 21999 22460 11030 42151 43598 45387 18157 26725 9211 43179 44068 42414
4T 24740 31790 15002 82615 86988 87136 39829 54158 16291 43014 77688 83338
8T 24161 41857 27639 78321 89838 85588 31033 47607 31904 81823 84067 84516
Total Elapsed Time 3.4 seco Total Elapsed Time 3.5 seconds
32 Ops/word
KB
12.80 2 2.01 2.00 1.99 1.94 1.85 1.69 1.94 1.95
4 3.97 3.96 2.32 3.21 2.12 3.47 3.29 1.94
8 3.98 6.42 2.48 3.22 3.02 3.03 3.69 3.69
NEON-MFLOPS-MP carries out the same calculations as MP-MFLOPS Benchmarks above, but with NEON intrinsic
functions used for all calculations. For further results see android neon benchmarks.htm, with further details and results
in Last Version of Android Report. On the older technology, as demonstrated in the results of the first two systems
below, these NEON functions could outperform the standard C code, used in MP-MFLOPS, by three times. With the later
technology, particularly at 64 bit working, the NEON code did not help. For Intel processors, the original compiler could
translate the functions into x86 SIMD instructions. As indicates for PC Remix below, this did not work. The new 4A8
versions were compiled for ARM only, but it seems that the original Houdini compatibility layer is still there to translate
the NEON functions into x86 instructions, on installation.
MFLOPS
1T 657 407 132 1077 1074 1053 636 354 127 1697 1595 1531
2T 1265 817 222 2147 2150 2078 1256 803 221 3388 3250 2816
4T 2024 1695 234 4214 4276 3555 1544 1564 230 6648 6502 3674
8T 2435 2495 234 4196 4100 3523 2452 2496 231 6596 6666 3584
Total Elapsed Time 39.0 seco Total Elapsed Time 30.4 seconds
MFLOPS
1T 716 686 432 1740 1740 1703 819 765 432 2146 2123 2081
2T 1367 1255 614 3457 3427 3358 1538 1431 605 4241 4158 4080
4T 2389 2131 726 6814 6682 6644 2708 2359 727 8308 8296 7853
8T 2914 2776 744 10082 9994 9712 2960 3613 763 12688 12314 10721
Total Elapsed Time 21.8 secon Total Elapsed Time 18.4 seconds
MFLOPS
1T 3599 3491 760 4883 4065 4122
2T 1756 1479 1033 7368 5361 7255
4T 2697 4732 1557 9983 8890 7810
8T 2632 4657 1887 10550 9839 10595
MFLOPS
1T 2813 3375 2072 5614 5452 5556
2T 3580 4888 2139 11090 10928 8249
4T 6521 6702 2026 12717 9457 8833
8T 5857 7140 2003 11882 12152 9899
A5, Intel Atom Z8300 quad core 1.44 GHz Turbo 1.84
MFLOPS
1T 1501 1551 1030 2520 2485 2301 1373 1202 959 4022 4065 3878
2T 2300 2957 1161 4699 4999 4632 2487 2207 821 8360 7949 7700
4T 3106 5126 1097 7929 8173 8015 2512 3981 1074 12539 13701 12795
8T 2692 4623 1108 7830 8432 7989 2641 4158 1108 12452 13535 12853
Total Elapsed Time 15.7 secon Total Elapsed Time 11.0 seconds
MFLOPS
1T 1089 1035 655 5759 5437 5302 1121 978 690 4406 4226 4147
2T 1890 1429 1109 8630 8497 8572 1625 1449 1058 7584 7166 7016
4T 2928 4321 1598 13407 12422 12254 2866 4020 1548 10354 9725 9481
8T 2702 5109 1899 20699 16673 15685 2938 5434 1817 16603 13018 12537
Total Elapsed Time 9.6 secon Total Elapsed Time 11.3 seconds
MFLOPS
1T 3216 2451 2517 11445 10700 10626 3840 3724 2221 9408 12213 12479
2T 12016 8371 3440 20673 20914 20678 13142 7651 3492 22942 23924 25043
4T 10495 14259 3449 32220 40296 32776 14053 16381 3950 41799 40234 38691
8T 5955 20639 3908 35496 37170 33943 17176 20587 4104 40815 38050 44242
Total Elapsed Time 4.0 secon Total Elapsed Time 3.6 seconds
P44, Qualcomm 835, Cortex A73, GHz 4x 2.35 + 4x 1.9, Adroid 8.0
MFLOPS
1T 3658 4176 1301 7865 7888 7811
2T 8498 7971 1749 15333 15402 15134
4T 14381 5276 1935 29803 21957 22070
8T 6871 5086 1930 26429 27159 24767
--------- Frames Per Second -------- --------- Frames Per Second --------
Triangles WireFrame Shaded Shaded+ Textured Triangles WireFrame Shaded Shaded+ Textured
9000+ 42.18 43.57 33.38 23.54 9000+ 22.61 23.23 17.71 13.46
18000+ 23.68 23.47 19.91 13.38 18000+ 12.03 12.11 10.36 7.57
36000+ 12.05 11.95 11.00 7.10 36000+ 6.14 6.01 5.64 4.03
Screen Pixels 1280 Wide 736 High Screen Pixels 1280 Wide 736 High
9000+ 27.46 27.68 21.16 17.96 9000+ 18.49 18.74 14.45 11.73
18000+ 14.56 14.60 12.47 10.36 18000+ 9.70 9.75 8.40 6.31
36000+ 7.17 7.21 6.56 5.37 36000+ 4.78 4.78 4.45 3.48
Screen Pixels 1776 Wide 1080 High Screen Pixels 1776 Wide 1080 High
9000+ 60.18 60.23 56.72 34.45 9000+ 29.77 30.17 22.58 18.54
18000+ 38.36 38.59 33.22 18.15 18000+ 16.09 16.03 13.70 10.78
36000+ 19.29 19.22 17.96 9.95 36000+ 8.31 8.27 7.79 5.76
Screen Pixels 1200 Wide 1848 High Screen Pixels 2048 Wide 1440 High
9000+ 35.89 35.50 28.79 25.02 9000+ 29.91 29.99 22.36 19.40
18000+ 19.48 19.51 17.13 12.62 18000+ 15.11 14.63 11.87 9.57
36000+ 8.60 8.34 8.00 6.65 36000+ 6.69 6.59 5.85 4.71
Screen Pixels 1080 Wide 1794 High Screen Pixels 1080 Wide 1920 High
9000+ 56.25 56.29 43.71 35.48 9000+ 60.02 60.00 59.70 59.58
18000+ 29.22 8.18 24.94 24.89 18000+ 59.31 58.40 53.62 37.95
36000+ 14.54 14.45 13.49 9.81 36000+ 34.69 34.32 32.09 21.58
Screen Pixels 1080 Wide 1794 High Screen Pixels 1920 Wide 996 High
Further details and results can be found in android graphics benchmarks.htm, that includes links to an off line version
that runs on PCs via Windows and Linux, with more in Last Version of Android Report. As with Java OpenGL, speeds are
limited to 60 FPS by imposed VSYNC. and Java included in new Android releases can produce wide variations in
performance.
Performance gains are not great, over the range of systems shown, but some might be masked by the forced maximum
of 60 FPS.
Display PNG Bitmap Twice 204 20.38 Display PNG Bitmap Twice 487 48.70
Plus 2 SweepGradient Circles 165 16.48 Plus 2 SweepGradient Circles 297 29.66
Plus 200 Random Small Circles 145 14.50 Plus 200 Random mall Circles 231 23.02
Plus 320 Long Lines 113 11.30 Plus 320 Long Lines 149 14.85
Plus 4000 Random Small Circles 39 3.81 Plus 4000 Random Small Circles 39 3.90
Screen pixels 1280 Wide 736 High Screen pixels 1280 Wide 736 High
Display PNG Bitmap Twice 246 24.53 Display PNG Bitmap Twice 236 23.57
Plus 2 SweepGradient Circles 158 15.77 Plus 2 SweepGradient Circles 149 14.85
Plus 200 Random Small Circles 130 12.98 Plus 200 Random Small Circles 132 13.19
Plus 320 Long Lines 98 9.71 Plus 320 Long Lines 103 10.24
Plus 4000 Random Small Circles 27 2.66 Plus 4000 Random Small Circles 41 4.06
Screen pixels 1776 Wide 1080 High Screen pixels 1776 Wide 1080 High
Display PNG Bitmap Twice 598 59.75 Display PNG Bitmap Twice 447 44.62
Plus 2 SweepGradient Circles 377 37.65 Plus 2 SweepGradient Circles 212 21.12
Plus 200 Random Small Circles 317 31.62 Plus 200 Random Small Circles 171 17.02
Plus 320 Long Lines 238 23.76 Plus 320 Long Lines 93 9.25
Plus 4000 Random Small Circles 90 8.92 Plus 4000 Random Small Circles 32 3.13
Screen pixels 1200 Wide 1848 High Screen pixels 2048 Wide 1440 High
Display PNG Bitmap Twice 313 31.21 Display PNG Bitmap Twice 515 51.47
Plus 2 SweepGradient Circles 164 16.39 Plus 2 SweepGradient Circles 368 36.73
Plus 200 Random Small Circles 148 14.76 Plus 200 Random Small Circles 352 35.11
Plus 320 Long Lines 117 11.70 Plus 320 Long Lines 290 28.90
Plus 4000 Random Small Circles 48 4.75 Plus 4000 Random Small Circles 118 11.80
Screen pixels 1080 Wide 1794 High Screen pixels 1080 Wide 1920 High
Display PNG Bitmap Twice 594 59.31 Display PNG Bitmap Twice 582 55.49
Plus 2 SweepGradient Circles 588 58.74 Plus 2 SweepGradient Circles 601 60.01
Plus 200 Random Small Circles 504 50.33 Plus 200 Random Small Circles 415 41.41
Plus 320 Long Lines 328 32.78 Plus 320 Long Lines 303 30.25
Plus 4000 Random Small Circles 112 11.14 Plus 4000 Random Small Circles 43 4.20
Screen pixels 1080 Wide 1794 High Screen pixels 396 Wide 674 High
This is primarily intended for measuring performance of SD cards and internal drives, but can also be used to test USB
drives. DriveSpeed carries out four tests.
Test 1 - Write and read three 8 and 16 MB; Results given in MBytes/second
Test 2 - Write 8 MB, read can be cached in RAM; Results given in MBytes/second
Test 3 - Random write and read 1 KB from 4 to 16 MB; Results are Average time in milliseconds
Test 4 - Write and read 200 files 4 KB to 16 KB; Results in MB/sec, msecs/file and delete seconds.
The first DriveSpeed benchmark has two run buttons, RunS for an SD card and RunI for the internal drive, the file path
being identified by standard functions. The external SD test worked on earlier Android tablets but failed on later Android
versions. RunS ran but provided distorted reading speeds by caching data in RAM. An extra button was added to
prevent large files from being deleted and a read only option to measure uncached speeds after rebooting.
DriveSpd2 requires input of the file path to use and this might be identified using a file browser app. The file path can
sometimes be selected for internal drives, SD cards and USB devices but there are complications associated with
permissions and caching.
Running these benchmarks can require a lot of experimentation. Lots of paths, results and explanations are android
benchmarks32.htm DriveSpeed. and android benchmarks32.htm Comparison with more in Last Version of Android Report.
The new 4A8 compilations have been tested on devices with 32 bit and 64 bit ARM and Intel CPUs. Following is an
example of running DriveSpd1.apk on a new phone. The SD card test (RunS) would not run properly (wrong default
path?) but the internal drive test could be run, but data was cached for reading. In this case, the More button was
used to avoid deleting the files. After powering the phone off and on, the More button was used to select Read Only,
with Runi, providing measurements of reading speeds.
P42, Qualcomm 810, ARM Cortex A57, 2000 MHz, Adroid 5.1.1
READ ONLY
MBytes/Second
MB Write1 Write2 Write3 Read1 Read2 Read3
There are two main stress test programs, that can use multiple threads to exercise (presently) all CPU cores, one using
floating point instructions, and the other carryinfg out integer arithmetic. Further detail is covered in the earlier report -
Android Benchmarks For 32 Bit and 64 Bit CPUs from ARM and Intel.pdf. The stress testing programs were reproduced
as 4A8 versions, along with an enhanced CPU MHz measurement program. Each of the stress test applications has five
buttons:
RunB - Run Benchmark - Runs most combinations of number of threads, data sizes and calculations per data word for
the FPU tests. This is mainly to help to decide which options to use for stress testing. The benchmark runs using fixed
parameters, carrying out exactly the same number of calculations using all thread combinations and data sizes. The
pass count changes according to the number of calculations per word, for the FPU tests.
RunS - Run Stress Tests - Default running time is 15 minutes, with the middle data size, intended for containment in L2
cache, using 8 threads. and 32 operations per word in the FPU tests.
False Errors - The need for continuous performance displays lead to false error reports, due to multiple copies of the
stress test programs running. This could occur with the original versions on rotating the device. The new version runs
in forced portrait display mode, but false errors can be caused if the run button is clicked again when the tests are
running. The main unique symptoms are multiple “End Time” message displays.
SetS - Specify run time parameters for stress test - These are 1, 2, 4, 8, 16 or 32 threads, 2, 8 or 32 Operations per
word for FPU tests, 12.8 or 16 KB, 128 or 160 KB, 12.8 or 16 MB for FPU or Integer tests, and running time in minutes.
Info - Test description and details - The is essentially the same as details provided here.
Save - This offers details of the results and identified CPU hardware and Operating System for E-mail. Default
addressee is the program author via results@roylongbottom.org.uk but this can be changed or additional addresses
added.
Unexpected Faster Speed - Performance depends on whether the data comes from caches or RAM, with a particular
effect on using the 160 or 128 KB options. Four threads, each using a dedicated quarter, should run at L2 cache speed
but, eight threads or more threads, at 20 KB or less, will probably mainly run at L1 cache speed. This can also apply, to
some extent, with 32 threads sharing 16 MB, where L2 cache can be the main source. See benchmark examples below.
CP_MHz2 measurements are instantaneous at a constant sampling rate in seconds, default 10, for a specified number
of minutes, default 15. This has Set, Run and Save buttons, as above.
Below are example Stress Test Benchmark and associated MHz results on a phone that is subject to heat related
performance degradation. Besides speed, the results identify the sumchecks, indicating consistent numeric calculations.
These were run with the battery fully charged.
In this case, there are wide, apparently random, variations in the operating frequency of all cores and, with different
thread counts, data area used and execution demands, it is impossible to identify which cores are used for a particular
set of calculations. For better understanding, see P42 Stress Test details in the next two pages.
MB/second
KB KB MB Same All
Secs Thrds 16 160 16 Sumcheck Tests
These are results for the same system covered in the previous page, running in Benchmark Mode. This is P42, a LG G
Flex2 phone. Maximum CPU MHz measurements could be 1555, for cores 0 to 3 (Cortex-A53), and 1958 for cores 4 to 7
(Cortex-A57). With 8 threads running continuously, the A57 cores were not indicated as being run at maximum speed
and that of the other cores was quickly degraded. After 10 minutes, measured total MB/second was reduced by 61%.
For most of the time, three of the A57 cores were not running. Remember that the MHz readings are instantaneous
samples and cannot be completely representative.
These are further results from the LG G Flex2 phone where MHz could be 1555, for cores 0 to 3 and and 1958 for cores
4 to 7. Performance degradation was not quite as bad as for the integer tests, but with MHz samples indicating more
variability.
10.2 1555 1555 1555 1555 1248 960 960 1248 1330
20.5 1555 1555 1555 1555 1248 960 864 768 1258
30.8 1478 1478 1478 1478 0 1344 1248 0 1063
41.0 1344 1344 1344 1344 0 1344 1248 0 996
51.4 1248 1248 1248 1248 0 1248 1248 0 936
61.7 960 960 960 960 0 1248 960 0 756
72.0 960 960 960 960 0 1440 1440 0 840
82.3 960 960 960 960 0 960 1344 0 768
92.7 960 960 960 960 0 960 1248 0 756
103.0 960 960 960 960 0 960 960 0 720
113.5 960 960 960 960 0 864 864 0 696
124.2 960 960 960 960 0 768 768 0 672
134.5 960 960 960 960 0 634 634 0 639
144.9 960 960 960 960 0 480 480 0 600
155.2 960 960 960 960 0 384 384 0 576
166.0 1248 1248 1248 1248 0 384 384 0 720
176.8 1344 1344 1344 1344 0 384 384 0 768
187.3 1248 1248 1248 1248 0 384 384 0 720
198.0 960 960 960 960 0 384 384 0 576
208.8 1248 1248 1248 1248 0 384 384 0 720
219.7 1478 1478 1478 1478 0 384 384 0 835
230.2 1344 1344 1344 1344 0 384 384 0 768
240.7 1248 1248 1248 1248 0 384 384 0 720
251.5 960 960 960 960 0 384 384 0 576
262.4 1248 1248 1248 1248 0 384 384 0 720
273.0 1344 1344 1344 1344 0 384 384 0 768
283.4 1478 1478 1478 1478 0 384 384 0 835
294.2 1344 1344 1344 1344 0 384 384 0 768
304.8 1248 1248 1248 1248 0 384 384 0 720
315.7 960 960 960 960 0 384 384 0 576
326.3 960 960 960 960 0 384 384 0 576
337.1 1248 1248 1248 1248 0 384 384 0 720
348.1 1344 1344 1344 1344 0 384 384 0 768
358.7 1248 1248 1248 1248 0 384 384 0 720
369.4 960 960 960 960 0 384 384 0 576
380.4 1248 1248 1248 1248 0 384 384 0 720
391.3 960 960 960 960 0 384 384 0 576
402.0 1248 1248 1248 1248 0 384 384 0 720
412.9 960 960 960 960 0 384 384 0 576
423.7 1248 1248 1248 1248 0 384 384 0 720
434.6 1344 1344 1344 1344 0 384 384 0 768
445.7 1248 1248 1248 1248 0 384 384 0 720
456.5 960 960 960 960 0 384 384 0 576
467.5 960 960 960 960 0 384 384 0 576
478.1 960 960 960 960 0 384 384 0 576
489.1 960 960 960 960 0 384 384 0 576
499.5 1248 1248 1248 1248 0 384 384 0 720
510.2 960 960 960 960 0 384 384 0 576
521.3 960 960 960 960 0 384 384 0 576
532.2 960 960 960 960 0 384 384 0 576
543.2 960 960 960 960 0 384 384 0 576
554.1 960 960 960 960 0 384 384 0 576
564.9 960 960 960 960 0 384 384 0 576
575.5 960 960 960 960 0 384 384 0 576
586.4 960 960 960 960 0 384 384 0 576
597.4 960 960 960 960 0 384 384 0 576
608.6 960 960 960 960 0 384 384 0 576
Min/Max 43%
This is for P44, a Google Pixel 2 phone with customised Cortex-A73 CPU and 4 cores rated at 2350 MHz and 4 at 1900
MHz. Measurements indicated maximums of 2458 and 1901 MHz. There was some variation in measured speeds but,
with MHz, indicated as only applying to the slower cores. These tests were actually run for 15 minutes, with the same
results pattern.
10.1 1171 1901 1901 1901 2458 2458 2458 2458 2088
20.3 1901 1901 1901 1901 2458 2458 2458 2458 2180
30.7 1901 1901 1901 1901 2458 2458 2458 2458 2180
40.9 1901 1901 1901 1901 2458 2458 2458 2458 2180
51.1 1901 1901 1901 1901 2458 2458 2458 2458 2180
61.4 1037 1670 1901 1478 2458 2458 2458 2458 1990
71.6 1901 1901 1901 1901 2458 2458 2458 2458 2180
81.8 1901 1901 1901 1901 2458 2458 2458 2458 2180
92.1 1901 1901 1901 1901 2458 2458 2458 2458 2180
102.4 826 1824 1901 1901 2458 2458 2458 2458 2036
112.6 1901 1901 1901 1901 2458 2458 2458 2458 2180
123.0 1901 1901 1901 1901 2458 2458 2458 2458 2180
133.2 1478 1747 1901 1901 2458 2458 2458 2458 2107
143.4 1901 1901 1901 1901 2458 2458 2458 2458 2180
153.8 1901 1901 1901 1901 2458 2458 2458 2458 2180
164.1 826 1670 1901 1901 2458 2458 2458 2458 2016
174.3 1901 1901 1901 1901 2458 2458 2458 2458 2180
184.7 1901 1901 1901 1901 2458 2458 2458 2458 2180
195.0 1325 1670 1901 1901 2458 2458 2458 2458 2079
205.2 1901 1901 1901 1901 2458 2458 2458 2458 2180
215.5 1901 1901 1901 1901 2458 2458 2458 2458 2180
225.7 1901 1901 1901 1901 2458 2458 2458 2458 2180
236.1 883 1478 1901 1901 2458 2458 2458 2458 1999
246.3 1901 1901 1901 1901 2458 2458 2458 2458 2180
256.6 1901 1901 1901 1901 2458 2458 2458 2458 2180
266.9 1402 1901 1901 1901 2458 2458 2458 2458 2117
277.2 1901 1901 1901 1901 2458 2458 2458 2458 2180
287.5 1901 1901 1901 1901 2458 2458 2458 2458 2180
297.7 1901 1901 1901 1901 2458 2458 2458 2458 2180
307.9 1248 1555 1901 1901 2458 2458 2458 2458 2055
Similar To
400.2 1901 1901 1901 1901 2458 2458 2458 2458 2180
410.5 1901 1901 1901 1901 2458 2458 2458 2458 2180
420.7 1248 1670 1901 1901 2458 2458 2458 2458 2069
430.9 1901 1901 1901 1901 2458 2458 2458 2458 2180
441.2 1901 1901 1901 1901 2458 2458 2458 2458 2180
451.5 826 1747 1747 1901 2458 2458 2458 2458 2007
461.7 1901 1901 1901 1901 2458 2458 2458 2458 2180
472.0 1901 1901 1901 1901 2458 2458 2458 2458 2180
482.2 1901 1901 1901 1901 2458 2458 2458 2458 2180
492.5 1248 1478 1901 1901 2458 2458 2458 2458 2045
502.7 1901 1901 1901 1901 2458 2458 2458 2458 2180
512.9 1901 1901 1901 1901 2458 2458 2458 2458 2180
523.1 960 1824 1901 1901 2458 2458 2458 2458 2052
533.3 1248 1901 1901 1901 2458 2458 2458 2458 2098
543.5 1901 1901 1901 1901 2458 2458 2458 2458 2180
553.8 1901 1901 1901 1901 2458 2458 2458 2458 2180
564.1 826 1901 1901 1901 2458 2458 2458 2458 2045
574.3 1901 1901 1901 1901 2458 2458 2458 2458 2180
584.6 1901 1901 1901 1901 2458 2458 2458 2458 2180
594.9 1901 1901 1901 1901 2458 2458 2458 2458 2180
605.1 1094 1901 1901 1901 2458 2458 2458 2458 2079
Min/Max 91%
Following are sample average MHz recordings and total MB/second for the other ARM compatible CPUs covered in this
report. Note that the 2012 quad core Nexus 7 (T7) is shown to run for 10 minutes at full speed. Next best is the 2017
quad core Amazon Fire HD 10 (T23), with limited speed reductions shown and apparent random variations in MHz
(needs more frequent sampling?). The other two 8 core phones, 2016 Lenovo Moto G4 (P37) and 2016 Samsung Galaxy
S7 edge, indicate gradual speed reductions with random MHz at the lower frequencies.
MB/sec/core
Av 2545 3754 5738 6704
Max 2592 4704 6820 9018
MB/sec/core/MHz
Av 2.12 3.51 3.95 4.31
Max 2.16 3.45 4.25 4.66
The following shows what can happen if the battery is nearly flat, again running with 8 threads on P37. The starting
point was all cores running at maximum MHz. When low battery charge (15%?) is detected, a warning is given to enable
Power Saver, ignored in this case. Shortly afterwards, the clock was turned off on the faster cores. Finally, the slower
cores ran at lower MHz. In this case, MHz and MB/second speeds were reduced by around 2.7 times. A later test
showed that the latter speeds continued until the phone turned itself off, due to insufficient battery capacity. It seems
that, whatever the charge state, selecting Power Saver switches off the faster processor cores.
T7 Nexus 7 quad core CPU 1.3, GHz 1.2 GHz > 1 core
Device Asus Nexus 7
RAM 1 GB DDR3L-1333 Bandwidth 5.3 GB/sec
Screen pixels w x h 1280 x 736 MHz
Twelve-core Nvidia GeForce ULP graphics 416 MHz
Android Build Version 4.1.2
Processor : ARMv7 Processor rev 9 (v7l)
processor : 0 BogoMIPS : 1993.93
processor : 1 BogoMIPS : 1993.93
processor : 2 BogoMIPS : 1993.93
processor : 3 BogoMIPS : 1993.93
Features : swp half thumb fastmult vfp edsp neon vfpv3 tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x2
CPU part : 0xc09 - Cortex-A9
CPU revision : 9
Hardware : grouper - nVidia Tegra 3 T30L
Revision : 0000
Linux version 3.1.10
Runs at 1.2 GHz
T23 Amazon Fire HD 10, 2 x 1.8 GHz Cortex A72 + 2 x 1.4 GHz
Cortex A53, GPU PowerVR GX6250
Device Amazon KFSUWI
Screen pixels w x h 1200 x 1848
Android Build Version 5.1.1
Hardware : MT8173
processor : 0, 1
model name : AArch64 Processor rev 0 (aarch64)
BogoMIPS : 26.00
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt lpae
evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 2
processor : 2, 5
model name : AArch64 Processor rev 0 (aarch64)
BogoMIPS : 26.00
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt lpae
evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 0
P43 Samsung Galaxy S7 edge, Exynos 8890 (2.3 GHz Quad + 1.6 GHz Quad)
14nm, Quad Channel RAM 29.8 GB/s, Mali T880 Graphics @ 624 MHz, L1 32KB, L2 1MB
Device Samsung SM-G935F
Screen pixels w x h 1080 x 1920
Android Build Version 7.0
processor : 0 to 3
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4
processor : 4 to 7
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x53
CPU architecture: 8
CPU variant : 0x1
CPU part : 0x001
CPU revision : 1
Linux version 3.18.14
To Start