White Paper Interconnect Solutions Debugging Issues Advanced ARM CoreLink
White Paper Interconnect Solutions Debugging Issues Advanced ARM CoreLink
White Paper Interconnect Solutions Debugging Issues Advanced ARM CoreLink
Finding the optimal configuration options that meet the requirements of a particular system requires
complementary design tools to enable the designer to rapidly explore and correlate trade-offs in
performance, power, and area (PPA). This paper describes the challenges confronting the designer and
proposes a new tool leveraging ARM® and Cadence technology to overcome the challenges of today’s
highly integrated, multi-processor system-on-chip (SoC) designs.
Introduction
Contents The evolution of today’s system-on-chip (SoC) devices from uni-processor
Introduction.................................. 1 systems to heterogeneous multi-processor designs has added a significant
burden to the SoC designer’s job. Designers are confronted with integrating
Accelerating SoC Integration many high-performance masters and slaves with dynamically changing traffic
with CoreLink NIC-400 and profiles.
Interconnect Workbench............... 3
Figure 1 illustrates how functions with real-time, maximum-latency require-
Performance Implications of ments compete with high-bandwidth streaming traffic, along with CPUs that
Interconnect Choices.................... 3 need minimum latency to reach optimum performance. Advanced system intel-
lectual property (IP) —the “glue” that provides the interconnect tying all of the
Interconnect Design Choices......... 4
major functional blocks together and connecting them to main memory—is
Verifying Latency.......................... 4 required to help solve these competing requirements. Just as each system may
have its own unique set of design challenges, system IP is, by its nature, highly
Push-Button Testbench configurable, allowing the designer to choose the most optimal configuration
Generation................................... 6 for their design. Advanced system IP not only allows designers to select inter-
connect topologies but also places solutions such as hardware-managed cache
And There’s More......................... 7
coherency and dynamic end-to-end quality of service at their disposal.
Conclusion.................................... 8
Analyzing and Debugging Performance Issues with Advanced ARM CoreLink System IP Components
CPU GPU
Comms Geometry
Control Processor
Apps Renderer
Processor Tiling Network
Interface
DMA
Display
Controller
Controller
Audio
CODEC Peripheral
Interconnect
Peripheral
Image
Transform Peripheral
HD Video
Motion Peripheral
Estimation
Dynamic Static
Motion MemoryCtrl MemoryCtrl
Compensate
NAND Flash
Buffer Texture Buffer
The configuration options the designer chooses need to satisfy a multi-dimensional problem affecting the perfor-
mance of each function as well as the physical size and power dissipation.
Figure 2 shows a typical SoC core, which uses the ARM CoreLink™ System IP components connected to a Cadence ®
Databahn DDR controller.
Thin Link
Configurable: AXI4/AXI3/AX8
www.cadence.com 2
Analyzing and Debugging Performance Issues with Advanced ARM CoreLink System IP Components
The sophistication of these system IP components, which is necessary to allow the designer to integrate many
functions together, provides many choices to the designer. Finding the optimal configuration options that meet the
requirements of a particular system requires complementary design tools to enable the designer to rapidly explore
and correlate trade-offs in performance, power, and area (PPA). This paper describes the challenges confronting
the designer and proposes a new tool to accelerate the integration of many SoC functions with an optimized
system IP configuration.
Adding to this high configurability, the IP also allows the user to make additional choices to help with routing
congestion and layout through a mechanism called “Thin Links.” For a complex SoC with hundreds of IP,
connecting them all to the main system memory can create situations where an AMBA bus may need to be routed
across the chip. However, this situation may not be ideal for wide AMBA buses. Thin Links allow the user to create
a point-to-point AMBA connection using only a few wires, thereby alleviating the routing problem.
This connection is a user configuration choice for each interface. In fact, the NIC-400 is so configurable that ARM
provides CoreLink AMBA Designer, an interactive tool created specifically to make it easy for users to select imple-
mentation options. Figure 3 shows an example of using AMBA Designer to configure a complex NIC-400 inter-
connect.
www.cadence.com 3
Analyzing and Debugging Performance Issues with Advanced ARM CoreLink System IP Components
The Cadence Interconnect Workbench provides a suite of capabilities to enable this kind of “what if?” experimen-
tation. Let’s look at an example of the kind of analysis that Interconnect Workbench enables.
Figure 4 illustrates a bandwidth plot from a performance scenario with specific read bandwidth criteria met;
displayed USB and High-Speed I/O bandwidths are in the 100-300MBps range. Interconnect Workbench allows us
to quickly visualize this kind of simulation running on cycle-accurate register-transfer level (RTL) models of the inter-
connect using Cadence VIP for AMBA to model the masters and slaves.
To prevent blocking without adding more and more physical channels, virtual channels can be defined, allowing
virtual channels to remain clear for latency-critical masters even where another virtual channel is fully utilized.
Dynamic regulators can be inserted at the ingress to the interconnect network to prioritize traffic within a single
virtual or physical channel, thus ensuring the required quality of service is met for each master. Once an inter-
connect configuration is selected, the designer needs to be able to verify its performance under load.
Verifying Latency
An important question that designers should ask is, “What is the consequence of adding an asynchronous bridge
into my architecture with respect to latency?” The graph in Figure 5 shows the latency of the accesses across a
CCI-400 Interconnect with and without ADB-400 asynchronous domain bridges. The top chart is the latency distri-
bution without bridges, the bottom chart is with bridges.
www.cadence.com 4
Analyzing and Debugging Performance Issues with Advanced ARM CoreLink System IP Components
Interconnect Workbench allows us to also investigate latency through statistical distributions. Figure 6 shows a
latency distribution view of a group of simulations. It is easy to identify the slowest transactions on a distribution,
as the buckets to the right are the slowest. Also the chart clearly illustrates that the latency for reads and writes is
distorted and writes happen more quickly than reads.
From the latency distribution, Interconnect Workbench provides the ability to click on a bucket and show the
transaction(s) in that bucket along with all the details, thus enabling the rapid debug of latency outliers. As shown
in Figure 7, right-clicking on the transaction details further accelerates debugging by launching the SimVision tool.
Within the tool, the simulation waveform is already configured and markers highlight the transaction of interest.
www.cadence.com 5
Analyzing and Debugging Performance Issues with Advanced ARM CoreLink System IP Components
Interconnect Workbench provides a complete solution for automatically generating a UVM testbench for any ARM
Interconnect from the NIC-301™, NIC-400, and CCI-400™ CoreLink System IPs. Once a user has defined the inter-
connect implementation details, AMBA Designer generates the RTL as well as an IP-XACT XML file that matches
the design. Interconnect Workbench has been architected to read this IP-XACT and automatically generate a UVM
testbench in either of the most popular high-level verification languages (HVLs): e or SystemVerilog.
In a typical SoC, a mix of components makes up the “glue” connecting all of the major IP together. Reading an
IP-XACT description of these system IP cores enables Interconnect Workbench to provide performance analysis
capabilities for not only the interconnect components but also the cycle-accurate models of the DDR controller.
Figure 8 shows how Interconnect Workbench might be used to generate a testbench for the core of an SoC and
the included DDR controller.
www.cadence.com 6
Analyzing and Debugging Performance Issues with Advanced ARM CoreLink System IP Components
AXI4 VIP
Plug-in Plug-in
In the same way the Interconnect Workbench can help with non-cached systems, it can be used with the AMBA 4
cache-coherent protocols. In a cached system using, for example, the ARM CCI-400 cache-coherent interconnect
traffic from an I/O, the master can share the L2 cache of either of two processor clusters connected via the ACE
interfaces using snoop commands. If transactions have data cached in these clusters, there will be a “snoop hit.”
If the corresponding data is not stored in these caches, then the transaction will eventually be forced to go to
main memory, resulting in a “snoop miss”. The difference in latency of these hits and misses is significant, and it
is of paramount importance for a SoC designer to characterize the behavior of the system under differing loads
and conditions. Interconnect Workbench provides the perfect vehicle to do this kind of analysis. Figure 9 shows a
latency distribution for a CCI-400 simulation with data split by hits and misses.
www.cadence.com 7
Analyzing and Debugging Performance Issues with Advanced ARM CoreLink System IP Components
As can be seen, the expected lower latency for hits is validated by the analysis. The value of visualization is that is it
easy to see if hits were slower than expected or if the misses were quicker, which might point to either a functional
problem or perhaps an error in the scenario.
Conclusion
The increase in complexity of SoCs based on heterogeneous, multi-core systems can be addressed by advanced
system IP. Design integration can be accelerated with appropriate tools that simplify choice of architectures, clock
schemes, power domains, memory sizes, cache sizes, QoS mechanisms, and other configuration options. ARM’s
AMBA Designer provides a quick way to generate CoreLink interconnect designs from a large set of configurable
options. The Cadence Interconnect Workbench is a valuable tool for measuring and comparing different archi-
tectures and configurations in cycle-accurate RTL simulations for a variety of scenarios. Understanding how these
numerous and varying IP cores behave together in a system when pushed to their limits is key to ensuring that a
new design delivers on its expected performance targets.
Cadence Design Systems enables global electronic design innovation and plays an essential role in the
creation of today’s electronics. Customers use Cadence software, hardware, IP, and expertise to design
and verify today’s mobile, cloud and connectivity applications. www.cadence.com www.cadence.com
© 2013 Cadence Design Systems, Inc. All rights reserved worldwide. Cadence and the Cadence logo are registered trademarks of Cadence Design
Systems, Inc. in the United States and other countries. ARM and AMBA are registered trademarks and ACE, AHB, APB, AXI, AXI4, big.LITTLE, CCI-
400, CoreLink, Cortex, NIC-301, and NIC-400 are trademarks of ARM Ltd. 1496 10/13 CY/DM/PDF