Industry: Consumer electronics
Author: Zidong Liu (Database Engineer at Xiaomi)
Transcreator: Ran Huang; Editor: Tom Dewan
Xiaomi Corporation is a consumer electronics company founded in 2010, with smartphones and smart hardware connected by an Internet of Things (IoT) platform at its core. We have established a leading consumer AI+IoT (AIoT) platform, with 324.8 million smart devices connected to our platform, excluding smartphones and laptops.
Our hyper growth is accompanied by massive data volumes. We used MySQL in many applications, but we ran into some problems:
To address these issues, we migrated from MySQL to TiDB, an open source, cloud-native, distributed SQL database that is compatible with the MySQL protocol. We also integrated TiDB into our private cloud platform and provided service for various Xiaomi applications. TiDB not only solves the problems above, but also provides better disaster recovery, boosts analytical performance, and lowers our maintenance costs.
In this article, I'll explain how we outgrew MySQL and how TiDB is a better alternative for us. Then, I'll share our experience with TiDB and our ambitions for building a hybrid cloud infrastructure with TiDB.
As our company grew, our MySQL-centric infrastructure gradually lagged behind our business expansion. The traditional standalone DBMS no longer met our expectations.
One major issue was storage bottlenecks. At most, a single drive used by MySQL at Xiaomi could store 2.6 TB of data. Yet many applications’ data already exceeded 2 TB, with only a few hundred GB available. A common way to increase storage was to shard the database. However, if you didn't understand the application logic thoroughly, sharding could be a costly move. Many of our applications were older, so even if we wanted to refactor the code, it was hard to figure out where to begin.
A distributed, MySQL compatible database like TiDB efficiently solves this problem. It's distributed, so scaling out to store more data is a piece of cake. It's MySQL compatible, so migrating from MySQL doesn't require drastic code changes.
MySQL's high availability solutions were complicated. Under the source-replica architecture, a cluster has one source node and multiple replica nodes. Without the source node, the cluster could not function.
We tried many methods to achieve high availability, such as combining a load balancer, Orchestrator, and middleware. They worked out sometimes. However, whenever the source node crashed on a mission-critical application, our maintenance staff would be so worried about whether the database service pulled through that they couldn't sleep.
In MySQL, the source node accepted writes, and other nodes replicated data from the source. When the application performed heavy writes, the source node easily reached a bottleneck. That brought about high latency between the source and replicas. Latency was a nagging pain. With high latency, data might not be replicated quickly across nodes, and applications read inconsistent data from different nodes—a disaster.
TiDB is suitable for highly concurrent writes. It uses the Raft consensus algorithm to distribute Leader replicas across nodes, so applications can write data on multiple nodes.
Xiaomi does business in manufacturing, internet, IoT, financial payments, and so on. We use many applications, each of which may have its own middleware solution. It was a huge burden to maintain all of them.
TiDB makes everything easier. Because TiDB eliminates the need for manual sharding, we don't need middleware.
Not only does TiDB solve all our problems, it also brings added bonuses we didn't expect:
Metrics | Before HTAP | After HTAP |
Spark job duration | 35 min | 8 min |
P99th latency | 300~400 ms | ~100 ms |
CPU utilization | TiKV: 100% | TiFlash: 30% |
IO utilization | TiKV: 80% | TiFlash: 30%~50% |
In the 10 months since we started using TiDB, it's grown to encompass more than 20 clusters and more than 100 TiKV nodes, covering various industries from e-commerce to supply chain to big data.
TiDB expanded rapidly within Xiaomi because many application teams were impressed with its scalability. When they encountered a storage bottleneck, it was easy to persuade them to migrate their applications to TiDB. The application scale doubled within just three months.
Migrating to TiDB wasn't smooth all the way through. We also met challenges:
TiDB is also part of our hybrid cloud architecture. Hybrid cloud means combining private cloud and public cloud and providing a uniform environment for all applications. We plan to put at least 50% of our databases on the cloud.
The following figure shows our planned architecture, of which TiDB and its tools are an integral part:
Big thanks to the TiDB community for your strong support over the past year. Going forward, we will continue to grow with the community. We plan to: