Scaling Etsy Payments with Vitess: Part 2 – The “Seamless” Migration main image

Etsy Engineering | Enhancing Etsy Payments with Vitess: Part 2 – Achieving a Smooth Migration

Introduction:

Between Dec 2020 and May 2022, the Etsy Payments Platform, Database Reliability Engineering, and Data Access Platform teams successfully migrated 23 tables with over 40 billion rows from four unsharded payments databases into a single sharded environment managed by Vitess. In this second part of our series on Sharding Payments with Vitess, we discuss the process of cutting over a crucial high traffic system. We created a staging infrastructure for operational testing, making mistakes, and rebuilding as necessary. We discovered barriers and unknowns that needed adjustment, learned to use VDiff for data consistency and performance verification, and found effective solutions for secondary indexing. We also highlight the importance of Vitess’s VReplication feature in the read/write switch process. Despite some unexpected challenges, the migration was successful, and normal operations were unaffected throughout the process. Stay tuned for part 3, where we delve into reducing the risks during the cutover.

Full Article: Etsy Engineering | Enhancing Etsy Payments with Vitess: Part 2 – Achieving a Smooth Migration

Title: Etsy Successfully Completes Major Data Migration to Improve System Efficiency

Introduction

Between December 2020 and May 2022, Etsy undertook a significant endeavor to migrate 23 tables containing over 40 billion rows from four unsharded payments databases into a single sharded environment managed by Vitess. This move aimed to enhance the efficiency and performance of Etsy’s payment system. In this article, we will explore the process of migrating the data, overcoming challenges, and the successful cutover of the production system.

You May Also Like to Read  The Art of Training Your Own Alpaca-Style ChatGPT: A User-Friendly Guide (Part Two)

Testing and Staging Infrastructure

To ensure a successful data migration, Etsy created a staging infrastructure that closely simulated the production environment. This staging environment allowed the operational testing of Vitess’s internal tooling against snapshots of the production MySQL data. Engineers had the freedom to make mistakes and rebuild the environment as necessary. By redistributing and testing the data multiple times, Etsy gained confidence in the efficacy of their migration strategy.

Discovering Barriers and Adjustments

During the testing phase, Etsy’s teams discovered certain barriers and unknowns that required adjustments. For example, they learned to use VDiff to confirm data consistency and verify the performance of the tooling. They also explored various secondary indexing methods, such as CreateLookupVindex and ExternalizeVindex, to overcome challenges like sharding on a nullable column. These discoveries enabled them to refine their approach and ensure a smooth transition.

Switching Reads and Writes with VReplication

One of the key milestones in the data migration process was the ability to switch reads and writes seamlessly. Etsy relied on Vitess’s VReplication feature, which sets up replication streams to propagate writes in the desired direction. This feature allowed them to distribute writes to the correct shards and reverse the streams when necessary. This capability provided Etsy with the confidence to switch back if any issues arose during the cutover.

Unexpected Challenges and Solutions

During the migration, Etsy encountered unexpected challenges. For instance, they experienced a significant increase in query volume due to scatter queries that were inefficient and bombarding the databases with empty result sets. To address this, Etsy implemented CreateLookupVindex, a secondary vindex that helped direct queries to the appropriate shards, reducing the query volume to manageable levels. Identifying such challenges early on allowed Etsy to find solutions and prevent disruptions during the final transition.

You May Also Like to Read  Maximizing Activation Sparsity in Large Language Models: How ReLU Makes a Comeback

Coordinating MoveTables and Secondary Vindexes

As Etsy grew more comfortable with the migration process, they deployed both the MoveTables and Secondary Vindexes concurrently. However, they faced a caveat: the Secondary Vindexes could only be externalized after the switching of writes was complete. Externalizing the Vindexes prematurely would result in missing lookup records. This required careful coordination and adjustment of workflows to ensure a smooth transition.

Conclusion

Thanks to meticulous testing, the robust performance of Vitess, and careful planning, Etsy accomplished this major data migration without disruptions or impact to normal operations. The staging environment played a crucial role in providing a realistic testing ground, and the innovative use of VReplication and Secondary Vindexes helped overcome unforeseen challenges. Etsy’s successful migration enhances the efficiency and scalability of their payment system, setting the stage for continued growth and improved customer experiences.

Summary: Etsy Engineering | Enhancing Etsy Payments with Vitess: Part 2 – Achieving a Smooth Migration

Between Dec 2020 and May 2022, Etsy successfully migrated their payments data to a single sharded environment managed by Vitess. This process involved redistributing data to forty new shards, testing in a staging environment, and using Vitess’s VReplication feature to switch over production writes. They encountered some challenges, such as handling scatter queries and externalizing Secondary Vindexes, but were able to overcome them through careful planning and testing. Overall, the migration was completed without disruption or downtime, allowing Etsy to successfully shard their payments system. This is part 2 of a series on Sharding Payments with Vitess.

Frequently Asked Questions:

Q1: What is machine learning?
A1: Machine learning is a branch of artificial intelligence that focuses on developing algorithms and models which enable computers to learn and make predictions or decisions without being explicitly programmed. It involves algorithms that automatically analyze and interpret data, improving their performance over time through continuous learning.

You May Also Like to Read  Learn about pointer networks in this comprehensive introduction - FastML

Q2: How does machine learning work?
A2: Machine learning algorithms work by feeding data into a computer system and allowing it to learn from that data, identifying patterns, relationships, and trends. The system then uses this knowledge to make predictions, classify or cluster data, or automate certain tasks. The performance of machine learning models improves as they receive more data and refine their patterns.

Q3: What are the main types of machine learning?
A3: The main types of machine learning are supervised learning, unsupervised learning, and reinforcement learning. In supervised learning, the algorithm is trained on labeled data, where the desired output is known. Unsupervised learning involves training on unlabeled data and enables the system to identify patterns and structures within the data. Reinforcement learning involves training an algorithm to interact with an environment and learn from feedback received based on its actions.

Q4: What are the real-world applications of machine learning?
A4: Machine learning is widely used in various industries, including healthcare, finance, marketing, and transportation. It is used for fraud detection, personalized marketing campaigns, drug discovery, image and speech recognition, autonomous vehicles, recommendation systems, predictive maintenance, and many other applications where data analysis and decision-making are involved.

Q5: What are the challenges faced in machine learning?
A5: Some challenges in machine learning include the requirement of vast amounts of quality labeled data for training, selecting the appropriate algorithm for the task at hand, avoiding overfitting or underfitting models, dealing with unbalanced or biased datasets, and ensuring the ethical use of machine learning technology. Additionally, interpretability and explainability of machine learning models can be a challenge, especially in sensitive domains where accountability and transparency are crucial.