
We plan to share more in-depth latency analysis in the future. This represents simple order placement and fill flow. This includes an application that can measure round-trip time as observed by an internal order gateway, to give a general sense of latency distributions of a single order pipeline.īelow, we show the latency profiles for order placement in the new backend exchange platform. In the process of developing the upgraded exchange system, we have developed tooling to measure multiple performance characteristics of the exchange. The new system introduces multiple architectural improvements, including messaging improvements such as isolating the exchange’s high throughput, low latency messaging domain from the general purpose store/forward domain.īy isolating message traffic related to latency-sensitive trading operations such as order placement and order cancellation in this part of the system, we are able to observe improvements to the exchange’s performance that we’ll detail below. We are very excited about the additional planned upgrades to our core exchange trading platform which we preview below and will detail more in a future post. Improvements to Performance and Scalability By ensuring we test this failure mode more frequently, we will be confident our systems can handle this and similar challenges well. Moving forward and in support of our chaos engineering approach, we will ensure reproducibility of this failure mode in our test environments and improve our trading platform to gracefully degrade and recover from this kind of subsystem interruption. Once state reconciliation was completed and all services were stable, markets were restored by first enabling ActiveTrader and API connectivity in limit-only mode, then re-enabling Mobile and Retail Web. Since this incident occurred at a time when Gemini was transitioning order flow to an upgraded version of the exchange matching engine, the process of state reconciliation prior to restarting production services required reconciling state across two trading systems. After we restarted the impacted systems, it was determined that the messaging system errors led to state divergence of some downstream systems due to how our systems interacted with it.
#Gemini exchange outage manual
The messaging system automatically restarted, but many internal message consumers and producers required manual intervention. All three nodes that make up this messaging platform failed at the same time with the same exception. Normally, this messaging infrastructure allows for strong reliability guarantees for applications that depend on it. It is multi-node and generally fault tolerant. This messaging system is responsible for fast, high-capacity, reliable delivery of messages within our distributed exchange platform. Prior to the planned final migration of BTC/USD and ETH/USD to the new platform on Friday, we experienced a failure of the messaging infrastructure that this system depends on.
#Gemini exchange outage series
This post provides additional information about the recent service disruption and how we are improving the reliability and performance of our exchange through a series of upgrades.

This involved migrating 79 trading pairs on a coordinated schedule to a new platform while continuing to operate our exchange 24 hours a day, seven days a week.
#Gemini exchange outage upgrade
In addition to maintaining our trading and custody platform, we have been working in parallel to upgrade our backend exchange platform to improve system capacity and scalability. During that downtime, all our customers' funds remained secure.

On Friday, December 10, we experienced a service disruption that resulted in an exchange outage for ten hours.
