Bringing HPC Technology to Mil-Aero, Embedded Deployment



Embedded HPEC systems are following the commercial HPC trend towards Intel x86 architectures running Linux, but deployed on VITA’s rugged OpenVPX form factor with RapidIO interconnect fabrics.

The five hundred of the fastest computers in the world, based on the Linpack benchmark, are featured on a website called top500.org.  Recently, 91% of the commercial high-performance computing (HPC) systems on the list were running Linux, nearly half (44%) were using Ethernet and over 80% were using an x86 architecture CPU.  These systems are used in scientific, complex simulation, and many other computationally intensive applications.  It’s a fair bet that these would be the architectures that sensor systems like high-end radar, SIGINT and other sensor integrators would be using if their applications requirements were not SWaP- and cost-constrained.

In contrast to these “big iron” systems, for many years embedded vendors and system integrators built large, distributed systems around niche-oriented architectures such as PowerPC, real-time operating systems, and the not-yet-mainstream RapidIO serial fabric.  Sensor modes developed for one platform could not be used on any other.  Software was platform-specific and was difficult to port and maintain.  This constrained innovation.  Different platforms could not talk to each other so data sharing was difficult, resulting in lost opportunities to take advantage of actionable information.

But Intel’s investment in AVX changed that.  Advanced Vector Extensions (AVX) is an extension to the x86 instruction set architecture that makes the Single Instruction Multiple Data (SIMD) x86 engine suitable for floating point-intensive calculations in multimedia, scientific and financial applications.  In short: Intel-based CPUs are more than suitable for high-speed, floating point intensive calculations. As well, the OpenFabrics Alliance has created open source software that, along with development by Curtiss-Wright, allows interfacing RDMA-based Ethernet layers to the very common RapidIO-based embedded boards deployed in many SWaP-constrained sensor platforms.

HPEC is Embedded HPC
With this new-found performance, the large node parallel computing systems that are specifically oriented to sensor computing are moving away from PowerPC to Intel.  This gives the application developer access to a broad software ecosystem and opens up a whole new set of possibilities for open architecture development.

The shift to Intel CPUs allows sensor computing architectures to more easily use Linux.  While it’s true that Linux practically runs on every processor architecture known (and probably some unknown), the marriage of Intel and Linux provides the most seamless path to adopting software components developed for HPC.  The vast majority of HPC systems run Linux and Intel, and the majority of open source projects also focus on this architecture.  With Linux/Intel as the basis of the OpenVPX computing standard, back-end sensor processing that needs to be able to scale to many nodes can take advantage of commercial HPC’s software ecosystem (Figure 1).

121008_military_1
Figure 1: RapidIO enables a cross-platform ecosystem composed of various CPU/GPU vendors, software, and sensors.

The process of adapting HPC technologies to the embedded space has recently been described as high-performance embedded computing (HPEC).  Several vendors in COTS computing, including Curtiss-Wright, use the term HPEC to mean embedded HPC.   Just as HPC is synonymous with the historical term “supercomputing,” HPEC systems are the SWaP-constrained variant of supercomputers.  In the defense computing market, the highest performing OpenVPX systems, from vendors like Curtiss-Wright, fit 28 Intel CPUs (112 cores) in a 16-slot chassis, interconnected with a 224 GB/sec dual-star system fabric (Figure 2).   But it’s not only about CPUs, buses and interconnects.  HPEC is about being able to run the same software that is used in HPC.

121008_military_2
Figure 2: Curtiss-Wright showcasing 224GB/s dual-star fabric with 28 Intel CPUs (112 cores) in a mere 16-slot chassis.

Fabric Discontinuity – Software Continuity
HPC is dominated by Ethernet and InfiniBand, while HPEC 6U OpenVPX computing has been and continues to be dominated by RapidIO.  This apparent discontinuity has been one of the major roadblocks to bringing HPC technologies to the HPEC world as the fabric has traditionally had a major impact on software architecture.

The first thing to consider is why stick with RapidIO in the face of other reasonably good options?  The answer is simple: RapidIO dominates telecommunications DSP computing which faces many of the same constraints as military DSP.  Even better, RapidIO is backed by a volume commercial market.  IDT, the leading RapidIO switch vendor, just announced that they have shipped 2.5 million RapidIO switches.  RapidIO has a dominant position in the DSP processing that is essential to 4G and 3G wireless base stations.  RapidIO has captured virtually 100% of the 3G market in China, the fastest growing telecom market.  To put it another way, when you talk on your cell phone, there is something like a 90% chance that the bits that represent your voice are at some point transmitted between two DSP processors over a RapidIO link.

There are a number of reasons why RapidIO makes sense in the context of HPEC OpenVPX computing:

  • It supports distributed backplanes that don’t use a central switch, saving SWaP and cost.
  • It runs twice as fast as 10 Gb Ethernet unlocking more processor performance
  • It’s robust so there is less CPU devoted to ensuring packet delivery.
  • It has an OpenVPX roadmap.  While InfiniBand is an excellent choice in HPC, it is a point technology in OpenVPX HPEC.  Unlike alternatives such as Ethernet and RapidIO, InfiniBand is not anticipated (per simulation) to run reliably at 10 GHz over existing OpenVPX technology.  It will require a connector change which is a fairly involved and slow-moving process for an organization like VITA.

There were two major challenges in getting RapidIO working in the Intel environment.  The first was a classic interconnect problem.  PowerPC processors supported RapidIO natively, but Intel did not, so a bridge was needed.  The IDT Tsi721 provided this critical piece of technology.  The Tsi721 converts from PCIe to RapidIO and vice versa and provides full line rate bridging at 20 Gbaud. Using the Tsi721 designers can develop heterogeneous systems that leverage the peer to peer networking performance of RapidIO while at the same time using multiprocessor clusters that may only be PCIe enabled. Using the Tsi721, applications that require large amounts of data transferred efficiently without processor involvement can be executed using the full line rate block DMA+Messaging engines of the Tsi721.

The second major challenge related to RapidIO was software.  RapidIO isn’t used in HPC so it doesn’t run the same software as those large cluster-based systems in the top500 that use fabrics like Ethernet and InfiniBand.  InfiniBand vendors encountered these same market constraints while trying to grow beyond their niche.  It’s hard to “fight” Ethernet.  However, Ethernet wasn’t appropriate for the highest performance HPC systems because of the CPU and/or silicon overhead associated with TCP offload.  The answer came in the form of new protocols and new software.

Open Fabric Alliance
The OpenFabrics Alliance (OFA) was formed to promote Remote Direct Memory Acess (RDMA) functionality that allows Ethernet silicon to move packets from the memory of one compute node to the memory of another with very little CPU intervention.  There are competing protocols to do this, but wisely, the OFA created a unified software layer called OFED which is supported by Intel, Chelseo, Mellanox and the other members of the Ethernet RDMA ecosystem.  OFED is used in business, research and scientific environments that require highly efficient networks, storage connectivity and parallel computing.

The OpenFabrics Enterprise Distribution (OFED™) is open-source software for RDMA and kernel bypass applications.  One of the things that traditionally slowed Ethernet down and wasted the CPU was the need to copy a packet payload numerous times before it was shipped out the Ethernet interface (Figure 3).  RDMA has eliminated the unnecessary copying so a packet can be transferred from one application to another across the fabric with minimal CPU impact (single digit CPU utilization). The term that is used to describe OFED is software verbs, but for those readers not deeply into software, verbs can be thought of like an API, a uniform software interface that provides application portability.

121008_military_3
Figure 3: OpenFabrics Enterprise Distribution (OFED) is open source RDMA that efficiently moves data from one application across a fabric to another application, with minimal CPU overhead. Contrast this to the multi-copy process of Ethernet, as shown here.

OFED over RapidIO
Our company has ported OFED to the IDT Tsi721 bridge which we believe represents the first time that OFED is running on the industry de facto standard IDT implementation of RapidIO (ie, not requiring non-mass market, proprietary FPGA IP controlled by a single vendor).  This makes data movement over the RapidIO fabric indistinguishable from any other fabric that uses OFED, such as Ethernet, except that RapidIO operates at about twice the speed of 10GbE.  OFED support on a RapidIO system enables it to run a broad set of open source software components supported by members of the OFA and the open source community.

One of the most important software components is middleware called Message Passing Interface (MPI) as supported by the open source project Open MPI.  MPI is a message passing library interface specification and is a key enabling technology for the HPEC systems.  MPI is developed by the MPI Forum, a large developer community comprised of both industry and research organizations.  It is a portable, language independent protocol that is used to share data among distributed processors.  It has become a de facto standard for communication among high-performance compute clusters and is used by many of the TOP500 most powerful computers in the world.

MPI includes the following:

  • Point-to-point communications
  • Collective operations
  • Process groups
  • Process topologies
  • Environmental management and inquiry
  • Process creation and management
  • Parallel I/O
  • Support for FORTRAN, C, C++ and other languages

Summary
With OFED support, OpenVPX systems based on the RapidIO data plane are able to seamlessly leverage the software ecosystem that has developed for High-Performance Computing as illustrated by TOP500 systems.  This brings a new level of software portability to the highest performing VPX systems as this new generation takes advantage of the wide ecosystem around Linux-based x86 computing.


strod_eranEran Strod is a System Architect in the Advanced Multicomputing Group at Curtiss-Wright Controls Defense Solutions.  He leads the HPEC Center of Excellence (HPEC COE) which provides middleware and tools for high-performance DSP applications.  Prior to joining CWCDS, while at Mercury Computer Systems, Eran served on the VITA Board of Directors.  Prior to that, at Freescale Semiconductor, Eran led the group-wide RapidIO fabric initiative.  His career in the embedded and software industries has recently passed the 20 year mark.

Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • TwitThis

Tags: ,