
Skip navigation

Check out our Message Medium blog to find out what’s going on in the hardware-accelerated, low-latency messaging world. Architecting for High-Performance ContinuityFebruary 9, 2010 at 5:00 pm ![]() Removing Systemic RiskThe goal of architectures is to remove systemic risk while ensuring predictable, market-leading performance. Fortunately – or realistically – this work does not require a rip and replace; it can be applied to existing trading infrastructures. The following are key items to consider:
1. Build continuity into systems with the greatest number of adjacencies Redundancy and high-availability are critical requirements for all components in a mission critical system, but more so for those that interconnect other components. This would include liquidity connectivity, network infrastructure and messaging middleware. It’s important to note that building a mesh infrastructure as a means to resiliency actually has the opposite effect because risk increases when five nodes are interconnected in this manner. This frequently occurs at both the network and messaging levels. Redundancy does not mean plugging two systems into a risk-prone infrastructure – it doesn’t make the risk go away. The underlying paths must mitigate the “adjacency effect.”
2. Eliminate unnecessary integration Much like a great team, each player in the trading infrastructure must play his part and play it well. Too often, technology is adopted that allows systems to do many things. The challenge is that individual component behavior is obscured and surprisingly at risk along with the other items with which they are comingled. Only do what you need to do. This often generates conflicts because business units prefer to build out a separate infrastructure rather than embrace the economies of scale proposed by a centralized technology group. If the latter could guarantee a specific risk profile, that would be an ideal scenario. Otherwise, the savings may not outweigh potential operational issues.
3. Optimize performance characteristics Major technology evolution occurs every five to seven years. When it does, the results are dramatic. Given that the benefits are often a magnitude higher and the cost, a fraction of the previous incarnation, adoption at the right time is important. This is most often when the technology has been in the market (not a lab) for about nine months. Competitive pressures also influence this number. We saw this with networks evolving to hardware; processors adding more cores; and now messaging moving to silicon. It’s a continual evolution requiring ongoing education and evaluation for even the savviest of firms.
4. Effectively provision management of multiple information streams A great deal of market data and order flow is moving through the organization. It’s critical that these streams do not all come together at under-provisioned rendezvous points, because volatility and micro-bursting can turn these into information dams, slowing data flow and causing risk-inducing latency. This is a key consideration in messaging systems for market data distribution and order routing: software-based systems will need distributed streams while hardware-based systems can handle aggregated flows. In this case, the risk profile is matched to the infrastructure capabilities. Achieving operational excellenceArchitecture is one side of the coin; operations is the other. Operational excellence is not a one-time activity; it requires regular tuning and modification. Quantified data and the ability to measure it are critical success factors. The following figure highlights the reconciliation of risk with some of the requisite operational elements. 1. Establish operational checkpoints Gone are the days when high-performance systems were compromised by monitoring capabilities. Establish checkpoints at the system level to mitigate the risk of component failure. Establish checkpoints at major ingress and egress points to mitigate the risk of systemic failure. Make sure that the checkpoints can report independently of being polled, especially when baseline conditions are breached. Checkpoints need to be managed as well. Too many checkpoints - if improperly set up - can adversely affect the trading cycle by slowing systems down or creating excess network traffic.
2. Measure, measure, measure Having the checkpoint in place is one thing, but the performance expectations both individually and in the aggregate must also be considered. Not knowing is not an excuse. Set the criteria, establish benchmarks, validate regularly, and warn when operational thresholds are compromised. Measure data volume - not just averages but peaks as well. Measure latency across the entire trading cycle in addition to specific execution points. Measure end-to-end system performance and compare with benchmarks and trending curves. Measure server utilization rates. Measure your service providers and your trading partners. Aggregate what you measure and perform regular statistical analysis. 3. Isolate problems dynamically while maintaining performance Even with the proper planning, problems are inevitable. If components can be dynamically isolated while maintaining overall system performance, that’s a major step forward. The slow consumer problem described earlier is an excellent example from the messaging arena. This problem is being solved by today’s contemporary messaging platforms because they have the intelligence to be self-isolating. High-availability and resiliency are important, but have historically impacted performance, at least in the short-term. Far better to add the requisite intelligence to prune systems (with notification of course) while ensuring consistent high-performance.
Continuity is keyThe cost of a lapse in operational continuity in today’s high-performance trading infrastructures is too high to leave things to chance. Numerous and diverse systems create complex interdependencies with complicated risk profiles. The most effective means to model this risk is with chaos theory, but it becomes impractical in real-world environments. Capriciously adding new technology may not mitigate the perils either. Fortunately, risk can be driven down substantially by addressing continuity factors in key, individual components, especially those that have a large number of adjacencies. Architectural improvement can be done on both new and, more likely, existing trading platforms. Major technology innovations play a key role here. Architecture progression in isolation is not enough; operational controls and metrics must be established as well. Though we’ll never entirely remove all risk, we can certainly reduce the chance of systemic failure by orders of magnitude. That will keep savvy firms both in and ahead of the market.
Rob Ciampa Systemic Failures in High-Performance TradingJanuary 29, 2010 at 5:00 pm
In a scientific context, the word chaos has a slightly different meaning than it does in its general usage as a state of confusion, lacking any order. Chaos, with reference to chaos theory, refers to an apparent lack of order in a system that nevertheless obeys particular laws or rules; this understanding of chaos is synonymous with dynamical instability, a condition discovered by the physicist Henri Poincare in the early 20th century that refers to an inherent lack of predictability in some physical systems…The two main components of chaos theory are the ideas that systems - no matter how complex they may be - rely upon an underlying order, and that very simple or small systems and events can cause very complex behaviors or events.
In his book, Chaos Theory in the Financial Markets, Dimitris N. Chorafas analyzes in great depth the role of nonlinear systems, volatility, risk and cumulative exposure, as well as cognitive models for financial operations. Dr. Chorafas states:
Rob Ciampa
Tags: dimitris m. chorafas (1) high-performance trading (1) systemic failure (1) dma (1) ecn (2) stp (1) ems (1) henri poincare (1) Comments (0)
The Appliance Lifecycle Maturity ModelOctober 30, 2009 at 3:00 pm
I recently came across a commentary on technology appliances in the data center where the author was asking a large number of rhetorical questions. Underlying his questions was a premise that companies (except for the top .001%) can solve all their business problems with general purpose computing devices. Perhaps it’s time to throw out those Cisco routers, those Juniper firewalls, those IBM intrusion detection systems? Perhaps not.
Appliances, like other products, evolve through a lifecycle influenced by data conditions, operational economics, technology maturation and market acceptance. Early adopters of appliances are often large firms, but over time smaller firms and consumers dominate in broader, deeper markets. Importantly, though, what is an “appliance”? My internet access device at home now consists of a router, a switch, a firewall, a wireless access point and an intrusion detection system – and it’s the size of book. In time, I expect my next one to have server capabilities, audio and video storage, etc. The point is: appliances should (and do) evolve. Some thoughts on the maturation process:
Regards, Messaging Infrastructure TestingSeptember 30, 2009 at 2:30 pm
My radar goes off when legacy vendors with legacy products announce non-legacy performance numbers. “Wow,” I say, “I’m very interested in hearing more about the innovation.” Often, however, the innovation is in the testing and not in the architecture. Recently, one of the “big boys” called out Tervela when they touted the performance numbers of their new-and-improved-legacy-messaging.
I decided to check out their testing methodology. I couldn’t help but scratch my head on the lab and testing.
Years ago when I was a smarter man (in engineering), I had some excellent labs that would benchmark and test infrastructure performance and reliability: WANs, LANs, apps, etc. The challenge my group had was not in setting up in infrastructure, but rather in establishing flows that would test real world scenarios, edge cases, etc. This was hard and – at times - non-scientific. The interdependencies and varying load conditions made it nearly impossible to model. That didn’t stop us and we called it dirty water testing. Clean water testing was just the opposite: simple, deterministic, and unrealistic. We did both, but the dirty water mimicked real-world. We had no surprises when we went into production.
I want to applaud the big boys for calling us out after their clean water test. As a courtesy, please check out our testing methodology and the results for messaging infrastructure testing. It’s dirty water and it’s why a Tervela messaging infrastructure works as advertised. We’ll discuss their approach later. (And their less-than-green footprint, too.)
If you can't see the whole messaging infrastructure testing article, email me and I'll send you a copy.
Regards, Changing Liquidity LandscapesAugust 19, 2009 at 2:30 pm
I was in meeting recently where the topic of liquidity came up. It naturally moved to liquidity venues, their business models and the technical underpinnings of contemporary exchanges. I decided to pull an excerpt from a piece that Barry Thompson and I wrote several months ago for Banking Technology. Tough to separate superior business performance from technical excellence.
The ECN Assault
The baby is all grown up. BATS, the electronic communication network founded in 2005, has received exchange status from the SEC, making it the first ECN to reach this level organically. And it's not going unnoticed.
This means the threat level for traditional exchanges has just hit red alert. Infamous for highlighting weaknesses in traditional exchanges while advocating its price, technology and latency advantages, BATS has taken over 10% of the US equities market since its launch. As an exchange, BATS no longer has to publish quotes via other exchanges, which reduces overall trading latency vis-à-vis traditional exchanges. What's more, its newness means there is no legacy infrastructure dragging down performance. Instead investments in the latest technologies provide lower latency and more efficient use of human capital required to support internal systems.
The proliferation of ECNs - or multi-lateral trading facilities as they are known in Europe - also threatens the evolution of traditional liquidity pools around the world. Seeded by some of the largest market makers from exchanges like NYSE/EuroNext and Deutsche Börse, these investments are more than financial. They bring immediate order flow, enabling ECNs to quickly build up liquidity. Like BATS, they also are able to easily leverage new technologies to maintain advantage and can be more aggressive when it comes to pricing structure.
The "maker-taker" model, originally popularized by the equity and options trading industries, was also adopted by the former US Futures Exchange. The pricing model offers rebates for market makers while charging fees to customers who remove liquidity from the exchange. It brings together favorable market conditions and supports market growth by encouraging market makers to add liquidity, which results in tighter bid-ask spreads. The difference in price between the two becomes profit for the execution venue. This is where the lower cost base of the ECN model proves a key competitive differentiator over established exchanges.
The exchange strikes back
In today's volatile market, traditional exchanges are seeing order flow and liquidity being threatened. As such, they are actively seeking new ways to improve service, increase product 'stickiness' and remain competitive.
CME Group, which operates the CME, CBOT and NYMEX, has leveraged its innovation in technology to rapidly grow its business on the CME Globex trading platform from less than 15% of total volume in 2000 to nearly 90% total volume today. With more than two billion trades taking place at the exchange in 2007 (worth a notional value of $1.2 quadrillion), CME Group has differentiated itself from other exchanges through a vast distribution network. They have customers in more than 85 countries with access to the exchange through Globex nearly 24 hours every trading day and offer speeds of approximately 10 milliseconds within the CME network.
In August 2008 CME Group completed its acquisition of NYMEX and has publicly stated that the over the counter success of the ClearPort platform will figure into its future. Volume on the ClearPort platform is already up nearly 40% to 470,000 contracts a day over last year. CME Group also has strategic partnerships with several other key exchanges, building on the strength of Globex, including BM&FBovespa in Brazil, KRX in Korea, the Dubai Mercantile Exchange and the Green Exchange.
Interdependencies within the trading technology domain
Central to high-velocity trading, stability and resiliency are not traits synonymous with legacy architectures. The primary impediment to innovating through the addition of new asset classes or integrating new applications to bring about competitive advantage is a sagging foundation. Many traditional exchanges have built their market data distribution frameworks on outmoded software-based messaging infrastructures. When these systems break down the traditional mechanisms they use to compensate only exacerbate the problems.
Any time you are dealing with the sheer physics of information movement within complex systems - especially in a high-volume liquidity environment - your foundation needs to accelerate and integrate the entire trading ecosystem. The capacity requirements for this kind of trading are way beyond what software alone can provide. Only a hardware-based system can yield the substantial headroom and horsepower to handle large data bursts and unforeseen peaks in volume.
Because messaging systems are at the core of the vast array of financial applications running on an exchange backbone, they are often the prime culprit when system failures occur. With emerging trends like the proliferation of algorithmic trading influencing technology requirements on a constant basis, exchanges must ensure that they have a solid foundation on which they can layer the adjunct services they need to remain competitive.
Technology is the Path
Having the most stable and resilient underlying technology is critical when building out services to keep clients and attract new ones. Having close physically proximity to an exchange is important as is deploying latency- tuned applications that add value and attract liquidity. Market demands are fueling the need for smarter technology architectures so that migrating applications and adding new services no longer requires a complete overhaul of the existing system. The liquidity venue with the most modern and scalable infrastructure is always going to win.
Regards, Tags: ecn (2) bats (1) sec (1) nyse/euronext (1) deutsche börse (1) cme (1) cbot (1) nymex (1) globex (1) Comments (0)
High Frequency Trading CommentaryJuly 23, 2009 at 11:15 am
These systems are so fast they can outsmart or outrun other investors, humans and computers alike. And after growing in the shadows for years, they are generating lots of talk.Speed matters. Predictable speed across the entire trade execution route matters even more.
Regards, Silver Bullet for Trading?July 13, 2009 at 10:00 am Interesting discussion thread occurring on LinkedIn regarding Kevin McPartland’s recent Tabb Group Report “Hardware Acceleration: Traders and Teraflops.” I’ve echoed my comment below because it’s worth repeating: effective trades depend on several factors and the proper application of hardware can have a significant impact. Note the use of “proper application” i.e. the right architecture. Here’s the note:
There is no silver bullet for the high-performance financial transactions and order flow. Our market deals with ever-changing feeds, algos, distribution requirements, etc., making it difficult to apply one set of processing logic holistically. Add in bus contention, OS optimization, multiple cores, caching, threading, and network flows/QoS and you have some serious, complex tuning to do. Not an easy task, especially since some may be outside one’s control and operational domain.
ASICs, FPGAs, network processors, GPUs and processors are additional parts of the equation and each one does certain things well. When we designed our message switches, we chose to do incorporate a blend of several of the components listed above, allowing us to deliver optimal performance because each element is doing what it does best. That’s the equation. I still hear many technically astute people say “I’m just going to put it in an FPGA.” Think again – an FPGA is only one variable and you have to solve for several more.
Regards, TMX-500 Design PhilosophyJune 30, 2009 at 1:00 pm
We announced the Tervela TMX-500 Message Switch last week at SIFMA to a great deal of fanfare. Attendee response to the new offering was entirely positive, many echoing the word “wow.” We had it running and opened up for all to see. Based on some questions at the show, I wanted to take a couple of minutes to share what was behind the announcement and illuminate our design philosophy.
Planning for the TMX-500 began in 2008, some time after the TMX-1000 was publicly available and in-production at top-tier financial services firms: investment banks, hedge funds, broker-dealers, etc. We had a good deal of experience to work from. It’s worth noting that we saw demand for a complementary, smaller message switch from Tervela, so now we offer both the TMX-500 and the TMX-1000.
We run a continual and disciplined product management process at Tervela, one that is very much market-driven. We spend a great deal of time with customers (business leaders and technical architects), third-party thought leaders, standards groups, ISVs, systems integrators, and perhaps – most importantly – people who would never buy a hardware-accelerated messaging solution. Over my career, I have found this last group to be a treasure trove of product requirements. They also, ironically, become the largest purchasing group of the products they say they’d never buy.
So what were the drivers? What did the market want? What did customers and prospects want?
They wanted hardware-accelerated messaging, but in a smaller form factor that gave them deployment options for workgroups, data centers, co-lo facilities, etc. We brought it down to 2U (1U = 1.75 inches).
They wanted deployment flexibility. We can multi-home to diverse networks through 16 x 1 Gigabit Ethernet ports or trunk through 4 x 10 Gigabit Ethernet ports.
They wanted to scale linearly without pain. We can connect 1 to 16 TMX-500s into a single unified fabric.
They wanted to reduce the data center footprint. Our first TMX-500 customer order reduces 2+ racks of messaging servers by 92% (!) to 5U (2 x 2U TMX-500s + a 1U TPM management platform).
They wanted to cut power consumption, too. The TMX-500 tops off at 250 watts, but runs steady state in the 100s. This is a big deal. More on it in a subsequent post.
They wanted real economics. We gave them great CapEx, OpEx and TCO with the TMX-500. Software solutions can’t match it.
They wanted the Tervela differentiators: high-performance, low-latency, scalability, and a seamlessly integrated fabric. We didn’t compromise. In fact, we made them better.
They wanted insane reliability. We went solid state, added more environmental sensors, variable-rate N+1 fans, redundant power, ECC protected memory, etc.
They wanted future-proofing. The combination of programmable ASICs and Tervela’s operating system for messaging, TVOS, ensures future support without any compromise.
They wanted to beat their competition. We gave them the TMX-500.
There’s much, much more. Please let me know if you’d like additional details.
Regards, SIFMA 2009June 29, 2009 at 10:40 pm
First off, SIFMA 2009 was different from SIFMA 2008, which was different from all other previous ones as well. That’s how tradeshows go: each one has a different personality, much like the children in a family. Fewer attendees? Yes. Third floor of the venue closed? Yes. But so what? We knew this going in, adjusted our plan accordingly and had an outstanding event. What does that mean? We had twice the number of scoped project discussions on enterprise messaging systems than we had last year. A 100% increase, not to mention the number of follow-on meetings. We didn’t have many tire kickers or “tourists” as a press person remarked to me. Pete Harris from A-Team Group expressed similar positive observations in his recent post.
Next, we didn’t go there half-cocked with a crap booth and no story to tell. It’s just not our style. We introduced the Tervela TMX-500 Message Switch, had the hottest booth at the show (literally or pun intended) and staffed it with our team members who do real-world messaging projects. I’ll save the TMX-500 discussion for another post, but I was either part of or witness to some of the most poignant discussions on messaging philosophy and architecture that I’ve heard in some time. That’s why we were there. That’s why people attended.
Finally, what’s the scoop with this “Wicked Hot Message Sauce” theme? Candidly, it came to me several months ago during an early, weekend morning caffeine infusion at Starbucks. I was reflecting on a comment made by an IT executive last year about his desire to “spice up” his application performance. The rest, of course, is obviously history. I did take some grief from my New York colleagues about the word “wicked” and my inability to stray from my Boston roots. Frankly, that wasn’t even in the equation. I used it in the context of something more powerful than “freakin’.” I’ll close, however, with some of my Bostonian vernacular: check out Tervela's TMX-500; it’s wicked awesome.
P.S. To all the great attendees, press, analysts, friends, partners and vendors who stopped by our booth: Thank you for making the show wonderful.
Cheers, Tags: sifma (2) securities industry and financial markets association (1) pete harris (2) tmx-500 (2) wicked hot messa (1) Comments (0)
Best Effort MessagingMay 31, 2009 at 4:30 pm But it worked in the lab... A deep and profound (please read it a couple of times) post by Kirk Wylie. He claims that he's not trying to be down on best effort messaging, but he does have some valid points that we often hear from customers.
Here's a quote:
Best Effort = Development Win, Production Fail
Best effort messaging is meant to be fast. That's good. But hardware (with some decent architecture) can make guaranteed messaging fast. Check out our last post. The reality is that there are many types and qualities of service when it comes to messaging. Pick the one that makes the most sense for the applications you're running. And be realistic. To Kirk's point: a little bit of diligence in the lab will go a long way to avoiding surprises in production. |