Barista: Enabling Greater Flexibility in Machine Learning Model Deployment main image

Enhancing Machine Learning Model Deployment: Empowering Flexibility with Etsy Engineering’s Barista

Introduction:

Welcome to our blog post on machine learning (ML) model deployment! In this post, we will delve into the challenges and evolution of ML model deployment tools at Etsy. As ML practitioners, we know that deploying ML models is a complex process that combines both ML practice and software development. While ML work requires flexibility and experimentation, deployed models must adhere to rigorous engineering constraints.

At Etsy, we have developed Barista, our ML model serving tool since 2017. Barista manages the lifecycles of various types of models, from Recommendations and Vision to Ads and Search. Over the years, Barista has evolved to meet the changing needs and scope of our ML practice.

Initially, Barista used a configuration-as-code model where model configurations were managed through a code-based UI. While this approach provided visibility and oversight, it became increasingly time-consuming and inefficient as our ML efforts grew. We needed a solution that would enable faster and more secure model deployments.

To address these challenges, we decoupled Barista from Kubernetes and designed a new system with a CloudSQL database. This system allowed ML practitioners to make instantaneous changes through a Barista-provided CLI, eliminating the need for code changes. This greatly improved productivity and reduced bottlenecks.

However, we soon realized that the CLI had its own limitations and required technical expertise to use effectively. To make model deployment more accessible to a wider user base, we decided to build a purpose-built web interface for Barista. This web app now provides ML practitioners with a user-friendly interface to manage their model deployments directly from their browser.

The Barista web app integrates with various internal and third-party APIs, providing useful information and control over model deployments. It simplifies the process of updating model settings, Kubernetes settings, and even provides cost information about serving models live in production.

As Barista gained popularity, we faced challenges with cloud costs and misconfigurations. To mitigate these issues, we implemented the Kube Downscaler, which allows us to scale deployment replicas to zero during off-hours and weekends. This has resulted in significant cost savings for us.

Overall, the evolution of Barista has made it easier and more efficient to serve ML models at Etsy. With the web app, we have seen an increase in the rate of experimentation and the number of live models. We continue to adapt and improve Barista to meet the growing demands of ML deployment.

You May Also Like to Read  Boost LLMs with RLHF: Unleash Your Potential on Amazon SageMaker

Stay tuned for more updates on our ML deployment journey at Etsy!

Full Article: Enhancing Machine Learning Model Deployment: Empowering Flexibility with Etsy Engineering’s Barista

Unleashing the Power of ML Model Deployment: Etsy’s Barista Story

Machine learning (ML) model deployment is a hot topic in the industry, as it brings together two distinct worlds: ML practice and software development. While ML work is experimental and demands flexibility, when models are deployed, they become subject to strict engineering constraints. At Etsy, the ML Model Serving team has been working on Barista, a tool that manages the lifecycles of various models. In this article, we’ll dive into the evolution of Barista’s interface, its integration with Kubernetes, and the challenges faced along the way.

The Early Days of Barista

Back in 2017, Barista’s interface was simple and code-driven. Model deployment configurations were managed as code in a table-based, read-only UI. This approach served its purpose at the time, as it allowed for oversight and auditing of changes. However, as ML efforts at Etsy grew, managing configurations became time-consuming and cumbersome. With hundreds of model configurations defined in a single large Python file, the build process became a bottleneck, jeopardizing visibility and productivity.

Decoupling Configuration from Code

Realizing the need for change in 2021, Etsy’s ML platform team decoupled configuration from code. They introduced a new system that utilized a CloudSQL database and provided a CLI for ML practitioners to make instantaneous changes. This simplified CRUD workflow allowed for faster deployment and management of models. However, the CLI had its limitations, requiring developer setup and causing an increased support burden for the platform team.

Introducing the Barista Web Interface

To address these limitations and ensure wider adoption of the platform, the team embarked on building a purpose-built, user-friendly web interface atop their API. While aesthetics took a backseat, the Barista web app proved to be a robust tool for managing ML deployments on Kubernetes. It empowered users to update various aspects of their models, integrate with Google Kubernetes Engine (GKE) console for information on Pods, and even understand the cost of serving models live in production through API integrations with their cost tool.

You May Also Like to Read  Surprising Journey: Internship project leads to groundbreaking research, Amazon role!

Increasing Efficiency and Control

The Barista web interface revolutionized ML deployment at Etsy, accelerating the rate of experimentation and increasing the number of live models. What previously took hours to change now happened within seconds, unblocking users and reducing unnecessary workflows. ML practitioners gained complete and immediate control over their model deployments, driving up productivity. However, the team also had to address concerns about potential spikes in cloud costs and misconfigurations due to the simplicity of the process.

Challenges in Cost Management

With the increased number of models through Barista, the team observed high daily CPU costs in the development system, even with relatively low usage. To remedy this, they needed to scale down resources when they were no longer needed. Unfortunately, the default Kubernetes Horizontal Pod Autoscaler did not allow scaling down below 1, posing a challenge. The team had to find a solution to optimize costs without compromising functionality.

Conclusion

Etsy’s Barista journey represents the tireless efforts to streamline ML model deployment. From a simple code-driven interface to a robust web app, Barista empowered ML practitioners to deploy and manage models more efficiently. It brought control, visibility, and agility to the ML practice, enabling faster iterations and reducing unnecessary workflows. While facing challenges in cost management, the team continues to innovate and optimize the platform for better scalability and user experience. With Barista, Etsy has unlocked the potential of ML model deployment, paving the way for future advancements in the field.

Summary: Enhancing Machine Learning Model Deployment: Empowering Flexibility with Etsy Engineering’s Barista

Barista, the ML Model Serving team’s flagship product at Etsy, has evolved significantly since its inception in 2017. Initially, configurations for model deployments were managed as code, but as the ML practice grew, it became a bottleneck. To address this, Barista decoupled from the codebase and introduced a simple CRUD workflow backed by a CloudSQL database. However, the accompanying CLI usage posed challenges for some users. To ensure wider adoption and improve usability, the ML platform team developed a purpose-built, user-friendly web interface for Barista. This interface allows ML practitioners to manage their model deployments directly from the browser and has significantly reduced the time and effort required to update and manage models. The web app integrates with various internal and third-party APIs to provide useful information and control over model deployments. With the improved efficiency and ease of use, the rate of experimentation and the number of live models at Etsy has increased. However, the success of Barista also raised concerns about potential spikes in cloud costs and misconfigurations. To address this, the team implemented the Kube Downscaler, which allows deployment replicas to be scaled down to zero during off-hours and weekends, resulting in significant cost savings. Overall, Barista has become a robust tool for managing ML deployments on Kubernetes, empowering users and driving innovation at Etsy.

You May Also Like to Read  Unveiling the Ultimate JavaScript Bundle Size Hack! Learn How We Slashed It by an Astonishing 33%!

Frequently Asked Questions:

1. What is machine learning and why is it important?
Machine learning is a branch of artificial intelligence that focuses on creating algorithms and models that allow computers to learn from and make predictions or decisions based on data. It is important because it enables computers to automate complex tasks without explicit programming, thereby improving efficiency, accuracy, and scalability in various industries.

2. How does machine learning work?
Machine learning works by using algorithms to analyze and learn patterns from large datasets. These algorithms identify underlying relationships and patterns in the data, allowing the computer to make predictions or decisions on new, unseen data. The process involves training the model with labeled data and then testing its performance with unseen data to ensure accuracy.

3. What are the different types of machine learning algorithms?
There are three main types of machine learning algorithms: supervised, unsupervised, and reinforcement learning. In supervised learning, the algorithm is trained using labeled data to make predictions or classifications. Unsupervised learning involves discovering patterns or structures in unlabeled data. Reinforcement learning is focused on training agents to make decisions based on rewards or punishments.

4. What are the applications of machine learning?
Machine learning has a wide range of applications in various fields. Some common applications include:

– Healthcare: Predicting disease diagnoses, optimizing treatment plans, and analyzing medical images.
– Finance: Fraud detection, stock market prediction, personalized financial planning.
– E-commerce: Recommender systems, customer segmentation, demand forecasting.
– Manufacturing: Quality control, predictive maintenance, supply chain optimization.
– Natural Language Processing: Sentiment analysis, language translation, chatbots.

5. What are the ethical considerations in machine learning?
As machine learning becomes more prevalent, ethical considerations arise. Some important factors include:

– Bias and fairness: Ensuring algorithms are not biased against certain groups or perpetuating discrimination.
– Privacy and security: Safeguarding sensitive data and ensuring secure handling of user information.
– Accountability and transparency: Understanding how algorithms make decisions and having mechanisms for recourse or explanations.
– Social impact: Considering the broader implications of automation and job displacement.

Overall, it is crucial to develop and implement machine learning practices with responsible and ethical considerations in mind.