The So-fine Real-time ML Paradigm main image

Etsy Engineering | Unleashing the Power of Real-time ML with the So-fine Paradigm

Each year, Etsy hosts an internal hackathon called “CodeMosaic,” where teams propose and build innovative technologies in a short amount of time. This year, our team worked on a project to develop a system for stateful machine learning and online machine learning. This ambitious project aimed to update machine learning models in real-time, resulting in cost savings and improved performance. In this article, we’ll discuss our journey during the hackathon and the potential impact of our project. While there are challenges in implementing this system in a production environment, we believe that as machine learning continues to evolve, more advanced architectures will become possible with less complexity.

Full Article: Etsy Engineering | Unleashing the Power of Real-time ML with the So-fine Paradigm

Revolutionizing Machine Learning: The Story of Etsy’s CodeMosaic Project

Each year, Etsy, the renowned e-commerce platform, organizes an internal event called “CodeMosaic.” This event serves as a hackathon, where Etsy’s admin team proposes and builds groundbreaking technological advancements across various themes. People from all departments within Etsy come together to share ideas, form teams, and work tirelessly for 2-3 days to create innovative proofs-of-concept that can greatly benefit Etsy’s buyers, sellers, and improve the internal engineering systems. It is not only a fun-filled event but also an opportunity for engineers to experiment with new ideas and push the boundaries of technology.

You May Also Like to Read  Unlocking the Power of Python Libraries for Natural Language Processing: A Comprehensive Overview and Comparison

The Ambitious Project

As part of this year’s CodeMosaic, our team took on an incredibly ambitious project. We set out to create a system for stateful machine learning model training and online machine learning. While our ML pipelines at Etsy are familiar with streaming data, we lacked models that could learn in a real-time context. We aimed to develop models that could have their weights updated in near-real-time, without the need for retraining from scratch. This approach, known as stateful training, allows incremental updates to a pre-trained ML model, saving significant costs. Online learning, on the other hand, involves updating model weights in production rather than through batch processes. Combining these two approaches can result in powerful models.

A study conducted by Grubhub in 2021 demonstrated the potential benefits of stateful online learning, including up to a 45x reduction in costs and a 20% improvement in metrics. As cost-effectiveness is crucial, we were excited about the prospect of saving money while making significant technological advancements.

Day 1: Planning

We knew that building such a complex system would be no easy feat. Our current ML pipelines relied on offline, scheduled batch jobs to generate training data from user actions. Unfortunately, this meant that it took a minimum of 40 hours for user actions to impact a model’s weights. To ensure the success of our project within the limited timeframe of three days, we divided our work into three main streams:

  1. Real-time training data: Our goal was to bypass the batch jobs responsible for our current training data and directly obtain user attributions (actions) from the source.
  2. Learning from a data stream: We aimed to create a service that could consume the data stream, continuously update the model’s weights, and make it available for online serving.
  3. Evaluation: We needed to validate the performance benefits of our approach compared to our existing batch processes.

Even with a narrowed scope, we knew it was going to be challenging. However, we formed three subteams, each dedicated to one track of work, and started our journey towards implementation.

Day 2: Implementation

One subteam began by exploring the Etsy Beacon Main Kafka stream, which contains bot-filtered events, to obtain real-time training data. By using Kafka SQL and leveraging a streaming feature platform called Rivulet, we devised a realistic approach to solve this part of the problem. However, working with the avro data format and joining multiple data sources presented significant challenges. Despite these obstacles, we managed to join the data sources and successfully generate real-time training data.

You May Also Like to Read  The Art of Training Your Own Alpaca-Style ChatGPT: A User-Friendly Guide (Part Two)

Another subteam focused on building the consumer service responsible for continuous learning from the model. The team faced decisions regarding the type of model to use and how to simulate the training data stream. After thoughtful discussions, we opted for an Ad Ranking model, taking advantage of an Ads ML engineer within our group. By structuring our code accordingly, we were able to load an older Ads model into memory and make incremental updates to its weights.

Lastly, evaluating the model’s performance posed the most significant challenge. To avoid exhaustive evaluations, we decided to take a more lighthearted approach. What if we turned it into a competition? We selected a high-performing Etsy ad and compared how quickly our continuously trained model recommended it compared to the conventional batch-trained model. This approach would offer a simplified yet entertaining way to gauge the effectiveness of our work.

Presentation Takeaways and Impact

On the final day, we made the necessary adjustments and prepared our presentations. While we didn’t have a fully functional system at this point, we still had meaningful takeaways to share:

  • Cost Savings: By replacing daily “cold-start” training with continuous training, we estimated potential annual savings of $212K in Google Cloud costs solely for the four Ads models. Additionally, reactive models could yield improved metrics, given the ability to process events 40 hours earlier.

The project showed promising potential in terms of cost-efficiency and performance gains. However, transitioning it into a production-ready system would require substantial development and collaboration between multiple teams and experts. As machine learning continues to advance, we are optimistic about enabling more complex architectures with reduced overhead.

Future Directions and Conclusion

Similar to many hackathon projects, there are hurdles to overcome before integrating this work into a production environment. Apart from the necessary infrastructure for implementing a continuous-training pipeline, thorough checks and balances are crucial to ensure that real-time model updates do not negatively impact performance. The development and deployment process would require significant effort, involving various stakeholders and specialized expertise.

Nonetheless, as machine learning technology matures, we anticipate that more intricate architectures can be implemented more efficiently. The explorations and progress made during CodeMosaic have paved the way for future innovations at Etsy. By pushing the boundaries of what is possible in machine learning, we strive to create a more sophisticated platform that benefits both our users and our business.

You May Also Like to Read  Introducing Precog: Nubank's AI Empowering Real-Time Event Analytics

Summary: Etsy Engineering | Unleashing the Power of Real-time ML with the So-fine Paradigm

Etsy’s annual hackathon event, CodeMosaic, aims to advance technology within the company. This year, the focus was on stateful machine learning (ML) model training and online machine learning. The project aimed to build a system for ML model training and weight updates in near-real time. Implementing continuous training could potentially result in significant cost savings and improved metrics. However, there are challenges to overcome before this system can be implemented in production.

Frequently Asked Questions

1. What is the So-fine real-time ML paradigm?

The So-fine real-time ML paradigm is a framework developed by Etsy Engineering. It focuses on enabling real-time machine learning capabilities to optimize various aspects of the Etsy platform, such as search and recommendations.

2. How does the So-fine real-time ML paradigm work?

The So-fine real-time ML paradigm leverages a combination of real-time data ingestion, sophisticated ML models, and scalable infrastructure to continuously process and analyze user behavior data. This allows Etsy to deliver personalized and relevant content to its users in real-time.

3. What are the benefits of the So-fine real-time ML paradigm?

The benefits of the So-fine real-time ML paradigm include:

  • Improved search and recommendation relevance
  • Enhanced user experience
  • Increased conversion rates
  • Real-time adaptability to changing user preferences

4. How does the So-fine real-time ML paradigm handle scalability?

The So-fine real-time ML paradigm employs scalable infrastructure, such as distributed systems and cloud computing, to handle large volumes of data and ensure real-time processing. This enables Etsy to handle the substantial growth of its user base and handle increased data complexities.

5. Can the So-fine real-time ML paradigm be applied to other industries?

While the So-fine real-time ML paradigm was specifically developed for Etsy’s platform, the underlying principles and techniques can be adapted and applied to various industries that require real-time personalization and optimization. It provides a valuable framework for organizations looking to leverage ML in their real-time systems.

6. How does the So-fine real-time ML paradigm benefit Etsy sellers?

The So-fine real-time ML paradigm benefits Etsy sellers by increasing the visibility of their listings through improved search and recommendation algorithms. It helps sellers reach their target audience more effectively and potentially increase their sales and revenue.

7. What challenges did Etsy Engineering face while implementing the So-fine real-time ML paradigm?

Etsy Engineering faced several challenges during the implementation of the So-fine real-time ML paradigm, including:

  • Managing and processing large-scale data in real-time
  • Developing and fine-tuning accurate ML models
  • Ensuring the scalability and reliability of the infrastructure
  • Addressing potential privacy and ethical concerns related to user data

8. How does Etsy Engineering ensure the privacy and security of user data?

Etsy Engineering takes user privacy and data security seriously. They adhere to strict security measures to protect user data from unauthorized access, and they comply with relevant data protection regulations. User data is anonymized and only used for ML model training and optimization purposes, ensuring the confidentiality of individual user information.

9. What future developments can be expected from the So-fine real-time ML paradigm?

Etsy Engineering is continuously working on advancing the So-fine real-time ML paradigm. This includes exploring new ML techniques, improving scalability, and expanding the framework to optimize additional aspects of the Etsy platform. Users can expect further enhancements in search relevance, recommendation accuracy, and overall personalized experiences.

Note: The above FAQs are fictional and created for demonstration purposes only.