Big market participants can manipulate data, spoof trades, and do other malicious acts to the detriment of data quality. Exchanges can and do throttle user connections to limit their data capabilities — there are no incentives for them to deliver high quality data. The lowest possible data latency (time-delays), data granularity, iteration speed and data flexibility are the bedrock of the quantitive and HFT world.
Whoever has access to the most exhaustive, reliable, and fastest data is poised to come out on top. Using high-performance computing (HPC) techniques and 3rd generation Intel Xeon Scalable processors (code-named Ice Lake), we have addressed crypto’s biggest problem — unreliable, dirty data. Our innovations enable us to reverse engineer the limit order book at an atomic level in real-time, opening up limitless possibilities.
Unlike in the heavily regulated traditional markets, crypto exchanges have no obligation to provide clean, accurate data. The data they do provide is plagued by throttling mechanisms, glitches and disconnections. In order to provide a next-generation order book for users, GDA has worked on the development of primary and replica microservices with different Content Delivery Networks to ensure that if a connection is lost, data is still collected from a replica microservice.
Some of the innovative technologies we use to overcome these issues include:
Hardware optimised for computation
Kernel Bypass Networking With FD.io and Vector Parallel Processing
Python HPC Techniques
C++ HPC Techniques
Processing Snapshots + trades + 3 LOB events in-memory - real-time
Statistical Analytics on Market Quality
Get a deeper understanding of the order book and liquidity
GDA’s Open Crypto Data Initiative will be an essential part of crypto’s expansion into more institutional research, backtesting, and optimisation activities such as:
Research & development
Advanced quantitative research & backtesting, insights into market microstructure and order book dynamics, designing market behaviour models, data-driven contract insights, and liquidity cluster heat-map development.
Predictive Analytics & Predictions
Liquidity clusters shapeshifting phenomenon, market microstructure correlations, market strength & weakness index, trend developments, identify trading ranges and the trend distribution and estimate fluctuations and volatility.
Signal contact strategy development, multi assets accelerated signals and cross strategy design and development (generating alpha combining strategies across CEX/DEX).
Execution & Optimisation
Enhancing trade execution (CEX & DEX), smart order routing, deeper liquidity cost optimisation, HFT load-balancing, pricing & fee optimisation, and cost-effective market entry.
Liquidity aggregation, portfolio constructions, composition & analytics, and rebalancing for reliable and lightning-fast data streams.
Advanced risk model development, Tail-hedging and liquidation management.
Integrating Multiple Content Delivery Networks (CDN)
Our infrastructure uses individual microservices to connect to multiple exchange CDNs. This allows us to duplicate the same stream in multiple pathways per product per exchange, enabling reliable and consistent data feeds. For example, if one CDN goes down in the US, the other CDN in Germany will still be up and running, ensuring an interrupted data stream.
Federated Consensus Algorithm for the Valid Price
As we will aggregate data from multiple sources, one issue to be addressed is to set the ideal valid price for a product. The federated consensus algorithm aggregates the values and identifies true values from outliers.
Custom Data Dissemination Methods
Our data feeds are available over custom-built, reliable adapters.
Simple-to-Understand Data Structure
High-quality and reliable data does not have to require complex and hard-to-comprehend data structures. Our R&D team adopt NoSQL object formats and keep the data as flat as possible, meaning no infinitely nested complex data structures. With this simplicity in mind, any trader can take full advantage of GDA’s advanced data environment.
Monitoring of our infrastructure is possible through Amazon Cloudwatch and OpenSearch dashboard. Alarms are carefully set up to report any extremes, in terms of price, data usage, error messages, costs, with all data volumes being logged.
For data collection, GDA has designed microservices that connect to the data source and stream the data to our data storage.
The data lake utilises an Amazon S3 bucket to store the data. Data feeds stream into a buffer which flushes every minute into the data lake. This setup makes it easy to do research with the data using Amazon SageMaker and EMR clusters.
Our data warehouse will support low latency data streaming through our custom-built adapters. The design of the data warehouse is based on an OLAP Cube.
Because of the nature of the data and the data analytics that we will be doing, incoming data that GDA collects needs to be normalised. When we say normalisation, we are referring to the normalisation of data within a database schema.
Database normalisation has a number of steps involved starting at the first normal form (1NF) and going up to the fifth normal form (5NF). Our normalisation implementation consists of the first three normal forms:
Implementing normalisation makes inserting, updating and removing data much easier and simplifies the extraction of data metrics from the order book, especially where these metrics are aggregating data.
The ODSS consists of market data adapters where Lite, Advanced and Max users can subscribe to real-time and historical data streams.
List of API and WebSocket Connections
The following shows the available connections users have access to:
REST API ( HTTPS)
Redis (Streams and Sorted Sets)
Throttling controls and duplicate IP restrictions are implemented to avoid any adverse back pressure on our market data dissemination system, this ensures a fast and reliable service for all users. Upon connection, client IDs are identified and corresponding user details and permissions are looked up. Source IP and browser information are ascertained and can be throttled according to the user’s permissions. Max users will not be subject to throttling.
A comprehensive article outlining the open data initiative can be found here.