Multi-Core Processing: Advantages & Challenges
Multi-Core Processing: Advantages & Challenges
Multi-Core Processing: Advantages & Challenges
Yousef Yaseen
Moores Law
1965: Intels Gordon Moore predicted that the number of transistors on a chip would double every 12 months into the near future (he later refined this, in 1975, to every two years)
Transistor Count
The constant decrease in feature size leads to an increase on transistor count on chip area which enables designing of more complex processors
Power
This amount of transistors on the chip area increases power consumption and produces more heat
Power vs Frequency
Power consumption and heat limited using frequency as a way of improving performance and processor performance increases have begun slowing
Multi-core
Multiple processor cores on the same die Multi-core chips dont necessarily run as fast as the highest performing single-core models, but they improve overall performance by handling more work in parallel
Multi-core Benefits
Gain 1.7X the performance without increasing the original power consumption
They improve an operating systems ability to multitask applications Another benefit comes from individual applications optimized for multi-core processors
Multi-core Challenges
Power
Run the multiple cores at a lower frequency to reduce power consumption
Integrating lots of smaller cores. Each small core delivers lower performance than a large complex core instead of integrating multiple complex cores on a die Incorporate a power management unit that has the authority to shut down unused cores or limit the amount of power
10
Temperature
The chip is architected so that the number of hot spots doesnt grow too large and the heat is spread out across the chip
The CELL processor follows a common trend to build temperature monitoring into the system, with its Temperature management unit
11
Cache Coherence
Cache coherence is a concern in a multicore environment because of distributed L1 and L2 cache. Since each core has its own cache, the copy of the data in that cache may not always be the most up-to-date version.
If a coherence policy wasnt in place garbage data would be read and invalid results would be produced, possibly crashing the program or the entire computer.
12
Cache Coherence
In general there are two schemes for cache coherence, a snooping protocol and a directorybased protocol.
The snooping protocol only works with a busbased system. The directory-based protocol can be used on an arbitrary network and is, there-fore, scalable. In this scheme a directory is used that holds information about which memory locations are being shared in multiple.
13
Cache Coherence
Directory based protocols are alternatives to snoopy based protocols which achieves low latencies and high bandwidth because of broadcasting and this protocol is implemented in present day technologies like in Core2Duo processors.
14
Multithreading
The most important, issue is using multithreading or other parallel processing techniques to get the most performance out of the multicore processor
The limitation is Amdahls Law, Parallel Speedup = 1/(Serial% +(1-Serial%)/N)
15
Multithreading
Rebuilding applications to be multithreaded means a complete rework by programmers in most cases to write applications with subroutines able to be run in different cores. Applications should be balanced. If one core is being used much more than another, the programmer is not taking full advantage of the multicore system.
Some companies have heard the call and designed new products with multicore capabilities; Microsoft and Apples newest operating systems can run on up to 4 cores, for example.
16
Open issues
Interconnection networks
Homogeneous vs. Heterogeneous Cores
Parallel programming
Software licensing
17
Interconnection Networks
The cores on a die must be connected to each other, and there are several possibilities, Classical buses, Rings, Crossbars, Switched networks, and Hierarchical interconnects. It is quite clear that manycore processors will have neither buses, rings or crossbars. For buses, long lines give high power consumption and low speed. Crossbars scale as the square of the number of ports and thus become untenable. Rings scale in terms of area and power. This leaves switched networks and hierarchical interconnects as the main competitors for the future.
18
Interconnection Networks
Coherency mechanism interacts heavily with the interconnect structure. A mesh network, for instance, fits naturally with a directory based coherency mechanism, whereas a hierarchical system could have snooping in the leaves using rings or buses and use directories between the groups.
19
Interconnection Networks
State of the art Crossbars are often used in designs with few processors, but rings and meshes are becoming more common.
Current challenges Rings and buses fit well with snooping cache coherence protocols, but for meshes directory based protocols are needed, and they have some scaling issues. Here hierarchical organizations might help.
20
Cores in a multicore environment could be homogeneous or heterogeneous. Homogenous cores are all exactly the same: equivalent frequencies, cache sizes, functions
Each core in a heterogeneous system may have a different function, frequency, memory model and heterogeneous cores may have the same instruction set or not.
21
Homogeneous cores are easier to produce since the same instruction set is used across all cores and each core contains the same hardware.
Each core in a heterogeneous environment could have a specific function and run its own specialized instruction set. This model is more complex, but may have efficiency, power, and thermal benefits that outweigh its complexity.
22
23
Parallel Programming
In May 2007, Intel fellow Shekhar Borkar stated that The software has to also start following Moores Law, software has to double the amount of parallelism that it can support every two years. Since the number of cores in a processor is set to double every 24 months
programmers need to learn how to write parallel programs that can be split up and run concurrently on multiple cores instead of trying to exploit single-core hardware to increase parallelism of sequential programs
24
Parallel Programming
State of the art Today, most multicore programming is done using either threads (pthreads, Windows threads or Java threads), OpenMP or the Intel TBB
Current challenges New programming languages generally take quite long to be widely adopted; very few programmers know how to program the massive on-chip parallelism afforded by multicore systems
25
Software licensing
Software vendors charge customers in various ways for using their products.
Intel defines a processor as a unit that plugs into a single socket on the motherboard, regardless of whether it has one or more cores, and advocates that software vendors charge accordingly, explained Jeff Austin, the companys desktop product manager. Microsoft agree and dont charge extra for using their software on multicore processors.
26
Software Licensing
BEA Systems and Oracle, on the other hand, charge more to use their software on multicore chips for per-processor licensing. Customers get added performance benefit by running our software on a chip with two cores, so we charge a fraction of the single CPU price for additional cores, said Bill Roth, the companys vice president of product marketing. Multicore-chip makers are concerned that this type of policy will hurt their products sales.
27
Oracle vs Microsoft
As the software landscape continues to transform, we anticipate that software licensing will continue to transform along with it. Oracle assigns Processor Factors to classes of CPUs
Selecting a new license model Depends on: Type of software Customer base Competition
29
Software Licensing
Current challenges
30
Conclusion
Adding multiple cores within a processor gave the solution of running at lower frequencies, but added interesting new problems. Multicore processors are architected to adhere to reasonable power consumption, heat dissipation, and cache coherence protocols. However, many issues remain unsolved. In order to use a multicore processor at full capacity the applications run on the system must be multithreaded. There are relatively few applications (and more importantly few programmers with the know-how) written with any level of parallelism. The interconnection networks also need improvement.
31
Conclusion
With so many different designs it is nearly impossible to set any standard for cache coherence, interconnections. The greatest difficulty remains in teaching parallel programming techniques (since most programmers are so versed in sequential programming) and in redesigning current applications to run optimally on a multicore system. Multicore processors are an important innovation in the microprocessor timeline. With skilled programmers capable of writing parallelized applications multicore efficiency could be increased dramatically. In years to come we will see much in the way of improvements to these systems. These improvements will provide faster programs and a better computing experience.
32
References
[1] [2] [3] [4] [5] [6] Gordon E. Moore: Cramming More Components onto Integrated Circuits. Electronics, April 19, 1965. Brooks, D., Martonosi, M.: Dynamic Thermal Management for High-Performance Microprocessors, In: Proceedings of the 7th International Symposium on High-Performance Computer Architecture, Monterrey, Mexico, January 2001. Naveh, A., Rotem, E., Mendelson, A., Gochman, S.: Power and Thermal Management in the Intel Core Duo Processor, Intel Technology Journal, (2006), 10(2). R.M. Ramanathan: Intel Multi-Core Processors - Making the Move to Quad-Core and Beyond, 2007. Geer, D.: Chip makers turn to multicore processors, 2005. Shekhar Borkar: Thousand Core Chips - A Technology Perspective, 2007.
[7]
[8] [9]
R. Merritt, CPU Designers Debate Multi-core Future, EETimes Online, February 2008, http://www.eetimes.com/showArticle.jhtml?articleID=206105179.
Faxen, K., Bengtsson, C., Brorsson, M., Grahn, H.: Multicore Computing - the State of the Art, December 3, 2008. Agarwal, A., Levy, M.: The KILL Rule for Multicore, At 44th DAC, June 2007.
[10] Goth, G.: Entering A parallel Universe ,communications of the acm, (2009), 53(9). [11] Williams, E.: Software Licensing Metrics - The Challenge in a Multicore Environment, SoftSummit, 2007.
[12] H. P. Hofstee. Power Efficient Processor Architecture and The Cell Processor. HPCA, 00:258262, 2005.
[13] D. Geer, For Programmers, Multicore Chips Mean Multiple Challenges, Computer, September 2007. [14] M. Creeger, Multicore CPUs for the Masses, QUEUE, September 2005. [15] T. Holwerda, Intel: Software Needs to Heed Moores Law, http://www.osnews.com/story/17983/Intel-Software-Needs-to-Heed-Moores-Law/ [16] Jeffery A. Proximity-Aware Directory-based Coherence for Multi-core Processor Architectures. Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures SPAA, 2007.
33
Yousef Yaseen