Crypto L3 Atom
Big market participants can manipulate data, spoof trades, and do other malicious acts to the detriment of data quality. Exchanges can and do throttle user connections to limit their data capabilities — there are no incentives for them to deliver high quality data. The lowest possible data latency (time-delays), data granularity, iteration speed and data flexibility are the bedrock of the quantitive and HFT world.
Today's crypto market data
✗ Non-Regulated hard to sue.
✗ LOB events are aggregated (three events are reported in LOB)
✗ Change in data notified as it happens to web-socket feeds
✗ Connections are over a World Wide Web and drop constantly (part of a throttling policy)
✗ Corrupted Matching Engine has preferred users that jump queue
✗ Free, and no SLA
✗ Highly Regulated, change in data given years in advance
✗ Protocols such as FIX have hundreds of message types including news and email
✗ Connections over a private network
✗ All connections will have an identical latency in the nanoseconds
✗ The matching engine works on a strict FIFO basis
✗ Expensive but has strict SLAs
Our BIG Innovation
Information is the most valuable commodity in the world
Whoever has access to the most exhaustive, reliable, and fastest data is poised to come out on top. Using high-performance computing (HPC) techniques and 3rd generation Intel Xeon Scalable processors (code-named Ice Lake), we have addressed crypto’s biggest problem — unreliable, dirty data. Our innovations enable us to reverse engineer the limit order book at an atomic level in real-time, opening up limitless possibilities.
Unlike in the heavily regulated traditional markets, crypto exchanges have no obligation to provide clean, accurate data. The data they do provide is plagued by throttling mechanisms, glitches and disconnections. In order to provide a next-generation order book for users, GDA has worked on the development of primary and replica microservices with different Content Delivery Networks to ensure that if a connection is lost, data is still collected from a replica microservice.
Some of the innovative technologies we use to overcome these issues include:
High-Performance Computing and Better Networking Engineering
Hardware optimised for computation
c6i.metal Ec2 instance optimized for computation
3rd Generation Intel Xeon Scalable processors (code-named Ice Lake
L1 Bandwidth is 100 Gbs
AVX512 Vector instruction set
10 nm Architecture
Kernel Bypass Networking With FD.io and Vector Parallel Processing
Data Plane Development Kit
FD.io VPP - the "Magic" of Vectors
Intel Xeon instruction cache and data cache always hot
Minimized memory latency and usage
1 Tb/s Throughput
Max Software Optimisation
Python HPC Techniques
NdArray: Shape shifting
C++ HPC Techniques
Arithmetic Intensity and Roof-line Model
Memory Traffic, Unit Stride, Data Alignment, Optimize Cache reuse
SIMD enabled functions
Intel Cpp compiler
Intel Open MP for Multi-Threading
Intel Clusters MPI for distributed computing
Creating Rich Data, Normalisation and Enhancement
Processing Snapshots + trades + 3 LOB events in-memory - real-time
LOB rebuilt in memory, Metrics calculated in real-time.
Using multidimensional NdArrays and Shapeshifting we move from 3 events to 8 atomic events in the lob
We are aware of our position in the lob price queues, Enhanced L3
Data is normalized and enhanced with the above additional information in real-time
Statistical Analytics on Market Quality
SciPy calculates order flow analytics
Atomic Granularity for Powerful L3 Data
Get a deeper understanding of the order book and liquidity
Participant market impact to market microstructure correlations mapping
Participant market impact
Participant fill rate
Value areas, liquidity & volume clusters mapping
Shapeshifting phenomenon detection
Liquidity traps and SFP detection
MM fakeouts density
Typical Use Cases
GDA’s Open Crypto Data Initiative will be an essential part of crypto’s expansion into more institutional research, backtesting, and optimisation activities such as:
Research & development
Advanced quantitative research & backtesting, insights into market microstructure and order book dynamics, designing market behaviour models, data-driven contract insights, and liquidity cluster heat-map development.
Predictive Analytics & Predictions
Liquidity clusters shapeshifting phenomenon, market microstructure correlations, market strength & weakness index, trend developments, identify trading ranges and the trend distribution and estimate fluctuations and volatility.
Signal contact strategy development, multi assets accelerated signals and cross strategy design and development (generating alpha combining strategies across CEX/DEX).
Execution & Optimisation
Enhancing trade execution (CEX & DEX), smart order routing, deeper liquidity cost optimisation, HFT load-balancing, pricing & fee optimisation, and cost-effective market entry.
Liquidity aggregation, portfolio constructions, composition & analytics, and rebalancing for reliable and lightning-fast data streams.
Advanced risk model development, Tail-hedging and liquidation management.
Incorporated Design Principles
Integrating Multiple Content Delivery Networks (CDN)
Our infrastructure uses individual microservices to connect to multiple exchange CDNs. This allows us to duplicate the same stream in multiple pathways per product per exchange, enabling reliable and consistent data feeds. For example, if one CDN goes down in the US, the other CDN in Germany will still be up and running, ensuring an interrupted data stream.
Federated Consensus Algorithm for the Valid Price
As we will aggregate data from multiple sources, one issue to be addressed is to set the ideal valid price for a product. The federated consensus algorithm aggregates the values and identifies true values from outliers.
Custom Data Dissemination Methods
Our data feeds are available over custom-built, reliable adapters.
Simple-to-Understand Data Structure
High-quality and reliable data does not have to require complex and hard-to-comprehend data structures. Our R&D team adopt NoSQL object formats and keep the data as flat as possible, meaning no infinitely nested complex data structures. With this simplicity in mind, any trader can take full advantage of GDA’s advanced data environment.
Monitoring of our infrastructure is possible through Amazon Cloudwatch and OpenSearch dashboard. Alarms are carefully set up to report any extremes, in terms of price, data usage, error messages, costs, with all data volumes being logged.
Data Collection and Warehousing
For data collection, GDA has designed microservices that connect to the data source and stream the data to our data storage.
The data lake utilises an Amazon S3 bucket to store the data. Data feeds stream into a buffer which flushes every minute into the data lake. This setup makes it easy to do research with the data using Amazon SageMaker and EMR clusters.
Our data warehouse will support low latency data streaming through our custom-built adapters. The design of the data warehouse is based on an OLAP Cube.
Data Normalisation and Enhancement
Because of the nature of the data and the data analytics that we will be doing, incoming data that GDA collects needs to be normalised. When we say normalisation, we are referring to the normalisation of data within a database schema.
Database normalisation has a number of steps involved starting at the first normal form (1NF) and going up to the fifth normal form (5NF). Our normalisation implementation consists of the first three normal forms:
First normal form — 1NF
Second normal form — 2NF
Third normal form — 3NF
Implementing normalisation makes inserting, updating and removing data much easier and simplifies the extraction of data metrics from the order book, especially where these metrics are aggregating data.
Open Data Streaming Suite (ODSS)
Extremely low latency and ultra-reliable.
The ODSS consists of market data adapters where Lite, Advanced and Max users can subscribe to real-time and historical data streams.
List of API and WebSocket Connections
The following shows the available connections users have access to:
REST API ( HTTPS)
Clients can connect to the assigned URL
The client has to be authenticated with JWT before streaming data, using client_id and password or API token
The HTTP connection is downgraded to WebSocket and real-time data is streamed over the WebSoc interface
The client can join our native Pulsar messaging system and stream payloads in realtime
Clients can join our native Kafka infrastructure to stream real-time data
Redis (Streams and Sorted Sets)
Throttling controls and duplicate IP restrictions are implemented to avoid any adverse back pressure on our market data dissemination system, this ensures a fast and reliable service for all users. Upon connection, client IDs are identified and corresponding user details and permissions are looked up. Source IP and browser information are ascertained and can be throttled according to the user’s permissions. Max users will not be subject to throttling.
A comprehensive article outlining the open data initiative can be found here.