RapidIO: The Embedded System Interconnect / Edition 1 available in Hardcover
- Pub. Date:
|Product dimensions:||6.80(w) x 9.80(h) x 1.00(d)|
About the Author
Read an Excerpt
By Sam Fuller
John Wiley & SonsCopyright © 2005 John Wiley & Sons, Ltd
All right reserved.
Chapter OneThe Interconnect Problem
This chapter discusses some of the motivations leading to the development of the RapidIO interconnect technology. It examines the factors that have led to the establishment of new standards for interconnects in embedded systems. It discusses the different technical approach to in-system communications compared with existing LAN and WAN technologies and legacy bus technologies.
The RapidIO interconnect technology was developed to ease two transitions that are occurring in the embedded electronics equipment industry. The first transition is technology based. The second transition is market based. The technology transition is a move towards high-speed serial buses, operating at signaling speeds above 1 GHz, as a replacement for the traditional shared bus technologies that have been used for nearly 40 years to connect devices together within systems. This transition is driven by the increasing performance capabilities and commensurate increasing bandwidth requirements of semiconductor-based processing devices and by device level electrical issues that are raised by semiconductor process technologies with feature sizes at 130 nm and below. The second transition is a move towards the use of standards based technologies.
1.1 PROCESSOR PERFORMANCE AND BANDWIDTH GROWTH
Figure 1.1 shows the exponential growth of processor performance over the last 30 years. It also depicts the slower growth of processor bus frequency over that same period of time. The MHz scale of the chart is logarithmic and the difference between the core CPU performance, represented by the clock frequency, and the available bandwidth to the CPU, represented by the bus frequency, continues to grow. The use of cache memory and more advanced processor microarchitectures has helped to reduce this growing gap between CPU performance and available bus bandwidth. Increasingly processors are being developed with large integrated cache memories and directly integrated memory controllers. However multiple levels of on- and off-chip cache memory and directly integrated memory controllers, while useful for reducing the gap between a processor's data demands and the ability of its buses to provide the data, does little to support the connection of the processor to external peripheral devices or the connection of multiple processors together in multiprocessing (MP) systems.
In addition to the increasing performance of processors, the need for higher levels of bus performance is also driven by two other key factors. First, the need for higher raw data bandwidth to support higher peripheral device performance requirements, second the need for more system concurrency. The overall system bandwidth requirements have also increased because of the increasing use of DMA, smart processor-based peripherals and multiprocessing in systems.
Multiprocessing is increasingly seen as a viable approach to adding more processing capability to a system. Historically multiprocessing was used only in the very highest end computing systems and typically at great cost. However, the continuing advance of semiconductor process technology has made multi-processing a more mainstream technology and its use can offer advantages beyond higher processing performance.
Figure 1.2 is a photograph of a multiprocessing computer system. This computer system uses 76 microprocessors connected together with RapidIO to solve very complex signal processing problems.
Multiprocessing can also be used to reduce cost while achieving higher performance levels. Pricing of processors is often significantly lower for lower-speed parts. The use of multiprocessing may also reduce overall system power dissipation at a given performance point. This occurs because it is often possible to operate a processor at a reduced frequency and achieve a greatly reduced power dissipation. For example, the Motorola 7447 processor has a rated maximum power dissipation of 11.9 W at an operating frequency of 600 MHz.
The same processor has a maximum power dissipation of 50 W at an operating frequency of 1000 MHz. If the processing work to be done can be shared by multiple processors, overall power dissipation can be reduced. In this case reducing the frequency by 40% reduces maximum power dissipation by 76%. When performance per watt is an important metric, multiprocessing of lower performance processors should always be considered as part of a possible solution.
Operating system technology has also progressed to the point where multiprocessing is easily supported as well. Leading embedded operating systems such as QNX, OSE, and Linux are all designed to easily support multiprocessing in embedded environments.
While these previous obstacles to multiprocessing have been reduced or removed, the processor interconnect has increasingly become the main limiting factor in the development of multiprocessing systems. Existing multiprocessor bus technologies restrict the shared bandwidth for a group of processors. For multiprocessing to be effective, the processors in a system must be able communicate with each other at high bandwidth and low latency.
1.3 SYSTEM OF SYSTEMS
Traditional system design has busied itself with the task of connecting processors together with peripheral devices. In this approach there is typically a single processor and a group of peripherals attached to it. The peripherals act as slave devices with the main processor being the central point of control. This master/slave architecture has been used for many years and has served the industry well.
With the increasing use of system on a chip (SoC) integration, the task of connecting processors to peripherals has become the task of the SoC developer. Peripheral functionality is increasingly being offered as part of an integrated device rather than as a standalone component.
Most microprocessors or DSPs designed for the embedded market now contain a significant amount of integration. L2 cache, Memory controllers, Ethernet, PCI, all formally discrete functions, have found their way onto the integrated processor die. This integration then moves the system developer's task up a level to one of integrating several SoC devices together in the system. Here a peer-to-peer connection model is often more appropriate than a master/slave model.
Figure 1.3 shows a block diagram of the Motorola PowerQUICC III communications processor. This processor offers a wealth of integrated peripheral functionality, including multiple Ethernet ports, communications oriented ATM and TDM interfaces, a PCI bus interface, DDR SDRAM memory controller and RapidIO controller for system interconnect functionality. It is a good example of a leading edge system on a chip device.
1.4 PROBLEMS WITH TRADITIONAL BUSES
The connections between processors and peripherals have traditionally been shared buses and often a hierarchy of buses (Figure 1.4). Devices are placed at the appropriate level in the hierarchy according to the performance level they require. Low-performance devices are placed on lower-performance buses, which are bridged to the higher-performance buses so they do not burden the higher-performance devices. Bridging may also be used to address legacy interfaces.
Traditional external buses used on more complex semiconductor processing devices such as microprocessors or DSPs are made up of three sets of pins, which are soldered to wire traces on printed circuit boards. These three categories of pins or traces are address, data and control. The address pins provide unique context information that identifies the data. The data is the information that is being transferred and the control pins are used to manage the transfer of data across the bus.
For a very typical bus on a mainstream processor there will be 64 pins dedicated to data, with an additional eight pins for parity protection on the data pins. There will be 32-40 pins dedicated to address with 4 or 5 pins of parity protection on the address pins and there will be approximately another 30 pins for control signaling between the various devices sharing the bus. This will bring the pin count for a typical bus interface to approximately 150. Because of the way that semiconductor devices are built there will also be a large complement of additional power and ground pins associated with the bus. These additional pins might add another 50 pins to the bus interface pin requirement, raising the total pin count attributed to the bus alone to 200. This 200 pin interface might add several dollars to the packaging and testing cost of a semiconductor device. The 200 wire traces that would be required on the circuit board would add cost and complexity there as well. If the bus needed to cross a backplane to another board, connectors would need to be found that would bridge the signals between two boards without introducing unwanted noise, signal degradation and cost. Then, if you assume that your system will require the connection of 20 devices to achieve the desired functionality, you begin to understand the role that the bus can play in limiting the functionality and feasibility of complex embedded systems.
For the sake of simplicity, we will discuss peak data bandwidth as opposed to sustained data bandwidth which is often quite a bit lower than peak. The peak data bandwidth of this bus would be the product of the bus frequency and the data bus width. As an example the PCI-X bus is the highest performance general purpose peripheral bus available. If we assume that we are operating the PCI-X bus at 133 MHz and using the 64-bit data path, then the peak data bandwidth is 133 x 64 = 8.5 Gbit/s or approximately 1 Gbyte/s.
To increase the performance of the interface beyond 1 Gbyte/s we can either increase the frequency or we can widen the data paths. There are versions of PCI-X defined to operate at 266 and 533 MHz. Running at these speeds the PCI-X bus can support only one attached device.
When compared with the original bus interface on the Intel 8088 processor used by IBM in the first IBM PC, we find that available bus performance has increased significantly. The original 8088 processor had an 8 bit wide data bus operating at 4.77 MHz. The peak data bandwidth was therefore 419 Mbit/s or 4.77 Mbyte/s. When compared with this bus the current PCI-X peripheral bus has widened by a factor of 8 and its signaling speed has increased by a factor of 28 for an overall improvement in peak bandwidth of approximately 2000%. Owing to improvements in bus utilization the improvement of actual bandwidth over the last 20 years has been even more dramatic than this, as has the improvement in actual processor performance.
While the growth in bus performance over the last several years has been impressive there are many indications that a new approach must be taken for it to continue. Here are four important reasons why this is the case.
1.4.1 Bus Loading
Beyond 133 MHz it becomes extremely difficult to support more than two devices on a bus. Additional devices place capacitive loading on the bus. This capacitive loading represents electrical potential that must be filled or drained to reach the desired signal level, the additional capacitance slows the rise and fall time of the signals.
1.4.2 Signal Skew
Because a traditional bus is a collection of parallel wires with the signal valid times referenced to a clock signal, there are limits to how much skew can exist between the transition of the clock and the transition of a signal. At higher speeds the length of the trace as well as the signal transition times out of and into the devices themselves can affect the speed at which the bus is clocked. For a 133 MHz bus the cycle time is 7.5 ns, the propagation delay in FR4 printed circuit board material is approximately 180 ps/inch. For a quarter cycle of the bus (1875 ps) this would be approximately 10 inches of trace.
1.4.3 Expense of Wider Buses
The traditional solution of widening the buses has reached the point of diminishing returns. 128 bit wide data buses have not been well accepted in the industry, despite their use on several processors. The wider buses further reduce the frequency at which the buses may run. Wider buses also increase product cost by increasing the package size and pin count requirements of devices. They may also increase system costs by forcing more layers in the printed circuit boards to carry all of the signal trace lines.
1.4.4 Problems with PCI
PCI is a very common peripheral bus used in computing systems. PCI plays a limited role in embedded systems for attachment of peripheral devices. PCI introduces several additional performance constraints to a system.
1. PCI doesn't support split transactions. This means that the bus is occupied and blocked for other uses for the entire time a transaction is being performed. When communicating with slower peripheral devices this could be for a relatively long time.
2. The length of a PCI transaction isn't known a priori. This makes it difficult to size buffers and often leads to bus disconnects. Wait states can also be added at any time.
3. Transactions targeting main memory typically require a snoop cycle to assure data coherency with processor caches.
4. Bus performance is reduced to the least common denominator of the peripherals that are attached. Typically this is 33 MHz, providing peak transfer rates of only 266 Mbyte/s and actual sustained transfer rates less than 100 Mbyte/s.
1.5 THE MARKET PROBLEM
Among the products of the various companies that together supply electronic components to the embedded marketplace, there is no single ubiquitous bus solution that may be used to connect all devices together. Many vendors offer proprietary bus or interconnect technology on their devices. This creates a market for glue chips that are used to bridge the various devices together to build systems. Common glue chip technologies are ASIC and FPGA devices, with the choice of solution typically guided by economic considerations.
The number of unique buses or interconnects in the system increases the complexity of the system design as well as the verification effort. In addition to the device buses, system developers often also develop their own buses and interconnect technologies because the device buses do not offer the features or capabilities required in their systems.
The embedded market is different from the personal computer market. In the personal computer market there is a single platform architecture that has been stretched to meet the needs of notebook, desktop and server applications. The embedded equipment market is quite different in character from the personal computer market. It is also generally older with legacy telecommunications systems architectures stretching back forty years. The technical approaches taken for embedded equipment reflect the availability of components, historical system architectures and competitive pressures. The resulting system designs are quite different from that of a personal computer.
The embedded market also does not have as much dependency on ISA compatibility, favoring architectures such as ARM, PowerPC and MIPS as opposed to the X86 architecture that is predominant on the desktop.
Despite the disparate technical problems being solved by embedded equipment manufacturers and the variety of components and architectures available to produce solutions, there is still a desire for the use of standard interconnects to simplify the development task, reduce cost and speed time to market. This desire for standard embedded system interconnects is the primary impetus behind the development of the RapidIO interconnect technology.
1.6 RAPIDIO: A NEW APPROACH
The RapidIO interconnect architecture is an open standard which addresses the needs of a wide variety of embedded infrastructure applications. Applications include interconnecting microprocessors, memory, and memory mapped I/O devices in networking equipment, storage subsystems, and general purpose computing platforms.
This interconnect is intended primarily as an intra-system interface, allowing chip-to-chip and board-to-board communications with performance levels ranging from 1 to 60 Gbit/s performance levels.
Two families of RapidIO interconnects are defined: a parallel interface for high-performance microprocessor and system connectivity and a serial interface for serial backplane, DSP and associated serial control plane applications. The serial and parallel forms of RapidIO share the same programming models, transactions, and addressing mechanisms.
Supported programming models include basic memory mapped IO transactions; port-based message passing and globally shared distributed memory with hardware-based coherency. RapidIO also offers very robust error detection and provides a well-defined hardware and software-based architecture for recovering from and reporting transmission errors.
The RapidIO interconnect is defined as a layered architecture which allows scalability and future enhancements while maintaining backward compatibility.
Excerpted from RapidIO by Sam Fuller Copyright © 2005 by John Wiley & Sons, Ltd. Excerpted by permission.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.
Table of ContentsPreface.
1 The Interconnect Problem.
1.1 Processor Performance and Bandwidth Growth.
1.3 System of Systems.
1.4 Problems with Traditional Buses.
1.5 The Market Problem.
1.6 RapidIO: A New Approach.
1.7 Where Will it be Used?
1.8 An Analogy.
2 RapidIO Technology.
2.2 The Specification Hierarchy.
2.3 RapidIO Protocol Overview.
2.4 Packet Format.
2.5 Transaction Formats and Types.
2.6 Message Passing.
2.7 Globally Shared Memory.
2.8 Future Extensions.
2.9 Flow Control.
2.10 The Parallel Physical Layer.
2.11 The Serial Physical Layer.
2.12 Link Protocol.
2.13 Maintenance and Error Management.
2.15 Operation Latency.
3 Devices, Switches, Transactions and Operations.
3.1 Processing Element Models.
3.2 I/O Processing Element.
3.3 Switch Processing Element.
3.4 Operations and Transactions.
4 I/O Logical Operations.
4.2 Request Class Transactions.
4.3 Response Class Transactions.
4.4 A Sample Read Operation.
4.5 Write Operations.
4.6 Streaming Writes.
4.7 Atomic Operations.
4.8 Maintenance Operations.
4.9 Data Alignment.
5 Messaging Operations.
5.2 Message Transactions.
5.3 Mailbox Structures.
5.4 Outbound Mailbox Structures.
6 System Level Addressing in RapidIO Systems.
6.1 System Topology.
6.2 Switch-based Systems.
6.3 System Packet Routing.
6.4 Field Alignment and Definition.
6.5 Routing Maintenance Packets.
7 The Serial Physical Layer.
7.2 Control Symbols.
7.3 PCS and PMA Layers.
7.4 Using the Serial Physical Layer.
7.5 Transaction and Packet Delivery Ordering Rules.
7.6 Error Detection and Recovery.
7.7 Retimers and Repeaters.
7.8 The Electrical Interface.
8 Parallel Physical Layer Protocol.
8.1 Packet Formats.
8.2 Control Symbol Formats.
8.3 Control Symbol Transmission Alignment.
8.4 Packet Start and Control Symbol Delineation.
8.5 Packet Exchange Protocol.
8.6 Field Placement and Definition.
8.7 Link Maintenance Protocol.
8.8 Packet Termination.
8.9 Packet Pacing.
8.10 Embedded Control Symbols.
8.11 Packet Alignment.
8.12 System Maintenance.
8.13 System Clocking Considerations.
8.14 Board Routing Guidelines.
9 Interoperating with PCI Technologies.
9.1 Address Map Considerations.
9.2 Transaction Flow.
9.3 PCI-X to RapidIO Transaction Flow.
9.4 RapidIO to PCI Transaction Mapping.
9.5 Operation Ordering and Transaction Delivery.
9.6 Interactions with Globally Shared Memory.
9.7 Byte Lane and Byte Enable Usage.
9.8 Error Management.
10 RapidIO Bringup and Initialization Programming.
10.1 Overview of the System Bringup Process.
10.2 System Application Programming Interfaces.
10.3 System Bringup Example.
11 Advanced Features.
11.1 System-level Flow Control.
11.2 Error Management Extensions.
11.3 Memory Coherency Support.
11.4 Multicasting Transactions in RapidIO.
11.5 Multicasting Symbols.
12 Data Streaming Logical Layer (Chuck Hill).
12.2 Type 9 Packet Format (Data Streaming Class).
12.3 Virtual Streams.
12.4 Configuring Data Streaming Systems.
12.5 Advanced Traffic Management.
12.6 Using Data Streaming.
13 Applications of the RapidIO Interconnect Technology.
13.1 RapidIO in Storage Systems.
13.2 RapidIO in Cellular Wireless Infrastructure (Alan Gatherer and Peter Olanders).
13.3 Fault-tolerant Systems and RapidIO (Victor Menasce).
14 Developing RapidIO Hardware (Richard O’Connor).
14.2 Implementing a RapidIO End Point.
14.3 Supporting Functions.
14.4 Implementing a RapidIO Switch.
15 Implementation Benefits of the RapidIO Interconnect Technology in FPGAs (Nupur Shah).
15.1 Building the Ecosystem.
15.2 Advances in FPGA Technology.
15.3 Multiprotocol Support for the Embedded Environment.
15.4 Simple Handshake.
15.5 Low Buffering Overhead.
15.6 Efficient Error Coverage.
16 Application of RapidIO to Mechanical Environments (David Wickliff).
16.1 Helpful Features for Mechanical Environments.
16.2 Channel Characteristics.
16.3 Industry Standard Mechanical Platforms Supporting RapidIO.
Appendix A: RapidIO Logical and Transport Layer Registers.
A.1 Reserved Register and Bit Behavior.
A.2 Capability Registers (CARs).
A.3 Command and Status Registers (CSRs).
A.4 Extended Features Data Structure.
Appendix B: Serial Physical Layer Registers.
B.1 Generic End Point Devices.
B.2 Generic End Point Devices: Software-assisted Error Recovery Option.
Appendix C: Parallel Physical Layer Registers.
C.1 Generic End Point Devices.
C.2 Generic End Point Devices: Software-assisted Error Recovery Option.
C.3 Switch Devices.
Appendix D: Error Management Extensions Registers.
D.1 Additions to Existing Registers.
D.2 New Error Management Register.