An innovative architecture

An innovative architecture

FPGA coprocessors have been discussed for a while in high-performance computing circles as a method for dramatically increasing system performance and offloading computation from CPUs. Celoxica's Accelerator is the first market data handler to directly connect a network interface to a co-processor, eliminating one of the major contributors to latency in a hardware/software co-processing system: the peripheral bus transactions between the co-processor and the network device.

Architecture diagram 1Once in the co-processor, the network data is processed in a highly-parallel processing pipeline at line-speed. The TCP/UDP/IP protocol stack, A/B feed arbitrage, FAST and ITCH decoders and message filtering based on customer-defined criteria are all implemented in dedicated hardware, with the lowest-possible latency.

 

 

 

 

 

 

Co-procssor diagramThe co-processor board fits into standard PCI Express (PCIe) or HyperTransport (HTX) slot, which offer high-speed and low-latency interconnects. Once converted to a binary representation, the network data is transferred via the high-speed bus directly to the CPU's memory. A C/C++ API is then available for you integrate this system with your own application, getting feed updates before the competition.

 

 

 

 

Hardware recovery

A common data feed topology has a pair of redundant UDP multicast feeds, known as the A and B feeds. Thanks to the parallel processing pipelines for each feed, the FPGA can detect errors in the A feed and switch to the B feed with zero latency. In the event that the error occurred in both feeds, then the FPGA can immediately request retransmission of the Accelerator cardmissing data over UDP or TCP: this can save many microseconds of latency over recovery using a standard network card and the software network stack.

 

 

Message Filtering

Rather than receive the full feed on the host CPU, you can define custom filters through a simple API, dramatically reducing the amount of data that the CPU has to handle.

Define filters on the market data to get just what you need minimising the load on your server

Filter on variables including symbol, price, quantity and expiration date

Low-latency interface

A co-processor FPGA directly connected to the network port means that the Accelerator can provide consistent low latency which is independent of the data volume. The high-speed link to the host CPU provides the lowest latency bus transfers available in standard servers.

Hardware protocol stack (TCP, UDP, IP) - No overhead typically associated with software protocol stack

Hardware FAST and ITCH decoders

Advancing ultra low-latency trading

©Celoxica 2009. T: +44 (0)1235 863656
Contact us "" Privacy policy "" Legal notice "" Site map "" Home ""