Technology

Crypto L3 Atom

The Problem

Big market participants can manipulate data, spoof trades, and do other malicious acts to the detriment of data quality. Exchanges can and do throttle user connections to limit their data capabilities — there are no incentives for them to deliver high quality data. The lowest possible data latency (time-delays), data granularity, iteration speed and data flexibility are the bedrock of the quantitive and HFT world.

Today's crypto market data
✗ Non-Regulated hard to sue.
✗ LOB events are aggregated (three events are reported in LOB)
✗ Change in data notified as it happens to web-socket feeds
✗ Connections are over a World Wide Web and drop constantly (part of a throttling policy)
✗ Corrupted Matching Engine has preferred users that jump queue
✗ Free, and no SLA
Traditional markets
✗ Highly Regulated, change in data given years in advance
✗ Protocols such as FIX have hundreds of message types including news and email
✗ Connections over a private network  
✗ All connections will have an identical latency in the nanoseconds
✗ The matching engine works on a strict FIFO basis
✗ Expensive but has strict SLAs

Our BIG Innovation

Information is the most valuable commodity in the world

Whoever has access to the most exhaustive, reliable, and fastest data is poised to come out on top. Using high-performance computing (HPC) techniques and 3rd generation Intel Xeon Scalable processors (code-named Ice Lake), we have addressed crypto’s biggest problem — unreliable, dirty data. Our innovations enable us to reverse engineer the limit order book at an atomic level in real-time, opening up limitless possibilities.

Unlike in the heavily regulated traditional markets, crypto exchanges have no obligation to provide clean, accurate data. The data they do provide is plagued by throttling mechanisms, glitches and disconnections. In order to provide a next-generation order book for users, GDA has worked on the development of primary and replica microservices with different Content Delivery Networks to ensure that if a connection is lost, data is still collected from a replica microservice.

Some of the innovative technologies we use to overcome these issues include:

High-Performance Computing and Better Networking Engineering

Hardware optimised for computation

  • c6i.metal Ec2 instance optimized for computation
  • 3rd Generation Intel Xeon Scalable processors (code-named Ice Lake
  • L1 Bandwidth is 100 Gbs
  • AVX512 Vector instruction set
  • 10 nm Architecture


Kernel Bypass Networking With FD.io and Vector Parallel Processing

  • Data Plane Development Kit
  • FD.io VPP - the "Magic" of Vectors
  • Intel Xeon instruction cache and data cache always hot
  • Minimized memory latency and usage
  • 1 Tb/s Throughput

Max Software Optimisation

Python HPC Techniques

  • NumPy
  • NdArray: Shape shifting
  • Ufuncs
  • Collective methods
  • Statistical Analytics

C++ HPC Techniques

  • Vectorisation
  • Arithmetic Intensity and Roof-line Model
  • Memory Traffic, Unit Stride, Data Alignment, Optimize Cache reuse
  • SIMD enabled functions
  • Intel Cpp compiler
  • Intel Open MP for Multi-Threading
  • Intel Clusters MPI for distributed computing

Creating Rich Data, Normalisation and Enhancement

Processing Snapshots + trades + 3 LOB events in-memory - real-time

  • LOB rebuilt in memory, Metrics calculated in real-time
  • Using multidimensional NdArrays and Shapeshifting we move from 3 events to 8 atomic events in the lob
  • We are aware of our position in the lob price queues, Enhanced L3
  • Data is normalized and enhanced with the above additional information in real-time


Statistical Analytics on Market Quality

  • SciPy calculates order flow analytics

The Result

Atomic Granularity for Powerful L3 Data

Get a deeper understanding of the order book and liquidity

  • Participant market impact to market microstructure correlations mapping
  • Participant market impact
  • Participant fill rate
  • Value areas, liquidity & volume clusters mapping
  • Shapeshifting phenomenon detection
  • Liquidity traps and SFP detection
  • MM fakeouts density

Typical Use Cases

GDA’s Open Crypto Data Initiative will be an essential part of crypto’s expansion into more institutional research, backtesting, and optimisation activities such as:

Research & development

Advanced quantitative research & backtesting, insights into market microstructure and order book dynamics, designing market behaviour models, data-driven contract insights, and liquidity cluster heat-map development.

Predictive Analytics & Predictions

Liquidity clusters shapeshifting phenomenon, market microstructure correlations, market strength & weakness index, trend developments, identify trading ranges and the trend distribution and estimate fluctuations and volatility.

Signal generation

Signal contact strategy development, multi assets accelerated signals and cross strategy design and development (generating alpha combining strategies across CEX/DEX).

Execution & Optimisation

Enhancing trade execution (CEX & DEX), smart order routing, deeper liquidity cost optimisation, HFT load-balancing, pricing & fee optimisation, and cost-effective market entry.

Management

Liquidity aggregation, portfolio constructions, composition & analytics, and rebalancing for reliable and lightning-fast data streams.

Risk

Advanced risk model development, Tail-hedging and liquidation management.

Incorporated Design Principles

Integrating Multiple Content Delivery Networks (CDN)

Our infrastructure uses individual microservices to connect to multiple exchange CDNs. This allows us to duplicate the same stream in multiple pathways per product per exchange, enabling reliable and consistent data feeds. For example, if one CDN goes down in the US, the other CDN in Germany will still be up and running, ensuring an interrupted data stream.

Federated Consensus Algorithm for the Valid Price

As we will aggregate data from multiple sources, one issue to be addressed is to set the ideal valid price for a product. The federated consensus algorithm aggregates the values and identifies true values from outliers.

Custom Data Dissemination Methods

Our data feeds are available over custom-built, reliable adapters.

Simple-to-Understand Data Structure

High-quality and reliable data does not have to require complex and hard-to-comprehend data structures. Our R&D team adopt NoSQL object formats and keep the data as flat as possible, meaning no infinitely nested complex data structures. With this simplicity in mind, any trader can take full advantage of GDA’s advanced data environment.

Dashboard Monitoring

Monitoring of our infrastructure is possible through Amazon Cloudwatch and OpenSearch dashboard. Alarms are carefully set up to report any extremes, in terms of price, data usage, error messages, costs, with all data volumes being logged.

Data Collection and Warehousing​

For data collection, GDA has designed microservices that connect to the data source and stream the data to our data storage.

Data Lake

The data lake utilises an Amazon S3 bucket to store the data. Data feeds stream into a buffer which flushes every minute into the data lake. This setup makes it easy to do research with the data using Amazon SageMaker and EMR clusters.

Data Warehouse

Our data warehouse will support low latency data streaming through our custom-built adapters. The design of the data warehouse is based on an OLAP Cube.

Data Normalisation and Enhancement

Because of the nature of the data and the data analytics that we will be doing, incoming data that GDA collects needs to be normalised. When we say normalisation, we are referring to the normalisation of data within a database schema.

Database normalisation has a number of steps involved starting at the first normal form (1NF) and going up to the fifth normal form (5NF). Our normalisation implementation consists of the first three normal forms:

  • First normal form — 1NF
  • Second normal form — 2NF
  • Third normal form — 3NF

Implementing normalisation makes inserting, updating and removing data much easier and simplifies the extraction of data metrics from the order book, especially where these metrics are aggregating data.

Open Data Streaming Suite (ODSS)

Extremely low latency and ultra-reliable.

The ODSS consists of market data adapters where Lite, Advanced and Max users can subscribe to real-time and historical data streams.

List of API and WebSocket Connections

The following shows the available connections users have access to:

REST API ( HTTPS)

  • Clients can connect to the assigned URL
  • The client has to be authenticated with JWT before streaming data, using client_id and password or API token
  • The HTTP connection is downgraded to WebSocket and real-time data is streamed over the WebSoc interface

Apache Pulsar

  • The client can join our native Pulsar messaging system and stream payloads in realtime

Apache Kafka

  • Clients can join our native Kafka infrastructure to stream real-time data

Redis (Streams and Sorted Sets)

Arctic

ZeroMQ

Kafka

RabbitMQ

PostgreSQL

Pub/Nub

Security Management

Throttling controls and duplicate IP restrictions are implemented to avoid any adverse back pressure on our market data dissemination system, this ensures a fast and reliable service for all users. Upon connection, client IDs are identified and corresponding user details and permissions are looked up. Source IP and browser information are ascertained and can be throttled according to the user’s permissions. Max users will not be subject to throttling.

A comprehensive article outlining the open data initiative can be found here.