person working at nubank

Introducing Precog: Nubank’s AI Empowering Real-Time Event Analytics

Introduction:

Introduction

Welcome to a world where your banking experience is seamless and efficient. Imagine calling your bank and instantly being connected to an expert agent who can quickly and effectively resolve your issue. No more navigating through complicated menus, listening to endless recordings, or being transferred from one agent to another. This is not just a distant dream, but a reality for many Nubank customers, thanks to our real-time event AI, Precog.

In this article, we will explore Nubank’s incredible journey over the past decade, from a single-product company to a multinational financial powerhouse operating in three countries. We will delve into the development of Precog and discuss its system architecture, the technical challenges we faced, and the remarkable results we achieved.

At Nubank, we have always embraced machine learning as a key tool in making data-driven decisions. Initially, we focused on credit underwriting and built highly successful models that propelled us to unprecedented success. However, as we expanded our operations, offering a wide range of financial products to a diverse customer base across multiple countries, we encountered new challenges. The rapid pace of growth and complexity made it difficult for some data science teams to keep up, particularly in areas like Customer Excellence, which handles support.

Fortunately, advancements in artificial intelligence, particularly in model architectures and the foundation model paradigm, opened up new possibilities. In late 2021, we began envisioning a new approach to AI at Nubank that would address these challenges. We decided to start with our customer support platforms, which play a crucial role in predicting and addressing customer needs in real time.

Nubank is a decentralized organization with various business units offering different products and functionalities. This decentralized structure allows for agility and autonomy but also presents the challenge of maintaining a holistic view of the customer across all products. Traditionally, we relied on business knowledge to identify useful features for specific models and then searched for relevant data across the organization. However, as our product portfolio and complexity grew, keeping up with this approach became increasingly difficult, and our models fell behind the evolving product landscape.

To address this fundamental issue, we needed to automate and scale up real-time data processing and feature engineering to support multiple product lines and use cases. This is where Precog came into play. The key insight behind Precog is the realization that customer events, such as app click streams and transactions, can be encoded as sequences of symbols. Drawing on techniques from natural language processing, like embeddings and sequence models, we can understand and predict customer needs using this rich source of data.

We started by focusing on app click streams, which provide valuable insights but lack a consistent structure across different product teams. Each record is identified by a metric name and a JSON structure with attribute/value pairs that can change rapidly with app updates. To handle this semi-structured data, Precog converts records into sequences of text-based identifiers, filtering out infrequent symbols. For our initial approach, we represented customers using a bag-of-words model based on their event symbols. Precog learns embeddings of the events using self-supervised contrastive learning, which minimizes the distance between anchor and positive samples while maximizing the distance to negative samples. These learned embeddings are then incorporated into downstream models for training and serving.

The main component of Precog is a pipeline that trains event and customer embeddings, integrating them into downstream models. We use the Starspace library for embedding training, which offers flexibility and efficiency. To train downstream models, we combine records keyed by anonymous customer identifiers and timestamps with the labels we want to predict, along with other features. These records are joined with relevant sets of events to make predictions, such as determining the contact reason for a customer support call.

You May Also Like to Read  Amazon SageMaker Clarify: Unlocking Secrets of Clinical Decisions

At runtime, an event consumer microservice transforms raw event data into a string format and stores it in a low-latency temporary storage (Redis). The downstream model microservice retrieves relevant events from the cache, computes embeddings, and uses them as features for classification.

During the development of Precog, we encountered several challenges and made key decisions to optimize its performance and efficiency:

1. Optimal event window: Our modeling approach initially did not consider the order or age of events, but we realized that recent events were often more relevant. Balancing coverage and precision, we settled on a three-hour event window but continue to explore ways to incorporate event age information.

2. Data volume and cost: The volume of data involved presented storage cost concerns. We decided to keep prepared data in low-latency storage temporarily for inference, reducing costs. We initially used AWS DynamoDB but found it cost-prohibitive due to the volume of events. We then switched to Redis, which was more cost-effective despite requiring a slightly more complex implementation.

3. Frequent retraining: Event definitions change frequently, necessitating regular retraining of embeddings and downstream models. To address this, we implemented a standardized retraining pipeline that can be easily adopted by downstream models.

Our first application of Precog was in the routing of customer support phone calls. By incorporating Precog embeddings into the model, we were able to significantly increase the volume of calls correctly routed to specialized agents without customer input, improving issue resolution time and customer satisfaction.

In conclusion, Nubank’s journey over the past decade has led us to innovate and develop Precog, our real-time event AI. Precog has been instrumental in addressing the challenges posed by our expanding operations and diverse product offerings. By leveraging machine learning techniques and effectively processing customer event data, Precog has revolutionized our customer support platforms and delivered exceptional results.

Stay tuned for more insights into Nubank’s cutting-edge technologies and our ongoing commitment to providing seamless and personalized financial experiences.

Full Article: Introducing Precog: Nubank’s AI Empowering Real-Time Event Analytics

Nubank, a leading financial institution, has revolutionized customer support by implementing real-time event AI called Precog. This innovative technology has allowed Nubank customers to experience seamless interactions with expert agents who rapidly resolve their issues. In this article, we will explore Nubank’s journey over the past decade, the development of Precog, its system architecture, the challenges faced, and the remarkable results achieved.

From its inception, Nubank has utilized machine learning to make informed decisions, starting with credit underwriting. Skilled data scientists developed highly effective models that propelled Nubank to unprecedented success. However, as Nubank expanded its operations to multiple countries and diversified its product offerings, the complexity of the business posed challenges for data science teams, particularly in horizontal departments like Customer Excellence, which handles support.

Simultaneously, the field of AI progressed rapidly, thanks to advancements in model architectures and the emergence of the foundation model paradigm. In late 2021, Nubank recognized the need to adapt to this new reality and envisioned a comprehensive AI strategy. The company decided to focus on enhancing its customer support platforms, specifically by predicting customers’ support needs in real-time.

Nubank operates through various business units, each offering a range of products and functionalities implemented as independent microservices. While this organizational structure allows for flexibility and agile development, it also presents difficulties in obtaining a holistic view of customers across different product lines. Traditionally, Nubank relied on business knowledge to identify useful features for models, requiring extensive data collection across the organization. However, the rapid growth of Nubank’s product portfolio made it challenging to keep pace, resulting in models lagging behind product evolution.

You May Also Like to Read  Exciting News: Beta Version of Impressive SwiftyDropbox Update Now Accessible to All

To address this fundamental issue, Nubank sought to automate and scale up real-time data processing and feature engineering to support multiple product lines and use cases. This is where Precog, Nubank’s real-time event AI, became the game-changer. Precog’s key insight lies in encoding customer events, such as app click streams and transactions, as sequences of symbols. Leveraging techniques originally developed for natural language processing, such as embeddings and sequence models, Precog enables Nubank to understand and predict customer needs.

The most valuable data source for Nubank’s models was the app click stream. Therefore, Nubank prioritized capturing and analyzing this data. Through an internal system that receives events from the Nubank mobile app and other sources via a flexible API, app click stream data is collected. However, this data presents a challenge due to its lack of enforced structure across different product teams. Each record is identified by a metric name and a JSON with attribute/value pairs determined by the engineers who implement each flow. To address this, Precog converts records into sequences of text-based identifiers, disregarding infrequent symbols like unique ids.

Various methods can be employed to learn from these string representations, each with different levels of complexity. Nubank opted for a simple yet versatile approach, representing customers using a bag-of-words of their events’ symbols. Precog learns embeddings of events in a self-supervised manner, utilizing contrastive learning. This approach minimizes the distance between the anchor (a set of customer events with a randomly removed event as a positive sample) and positive samples while maximizing the distance to negative samples. By aggregating the resulting vector for each symbol in the vocabulary, Precog generates the customer representation.

Precog’s core component is a pipeline that trains events/customers embeddings and incorporates the learned embeddings into downstream models during training and serving. To train the embeddings, Nubank utilizes the Starspace library, renowned for its flexible and efficient implementation of entity embeddings. For downstream models, Nubank combines records keyed by anonymous customer identifiers and timestamps with labels to predict and other pertinent features. These records are then joined with the corresponding sets of events. For instance, when predicting the contact reason for a customer support call, Nubank employs the events preceding the call and the classified contact reason provided by agents.

During runtime, an event consumer microservice (implemented in Clojure, Nubank’s canonical business service language) converts raw event data into a string format and stores it in a low-latency temporary storage (Redis). When serving customer requests, the downstream model microservice (built in Python) retrieves relevant events from the cache, computes embeddings, and employs them as features for a classification model.

Despite the significant advancements made by Nubank with Precog, there were challenges encountered along the way. Determining the optimal event window was an important factor in model performance. As the most relevant events tend to be recent, increasing the event window improved performance. However, reducing the window too much lowered coverage since many customers may not have interacted with the app recently. After thorough testing, Nubank settled on a 3-hour event window, but intends to explore better approaches that strike a balance between coverage and precision, including incorporating event age information.

Another challenge Nubank encountered was data volume and cost. Storing large volumes of data in a low-latency database (AWS DynamoDB) proved prohibitively expensive due to the significant number of writes and reads. To overcome this, Nubank switched to an in-memory database (Redis). Although this decision added complexity to the implementation, it reduced costs to a fraction of the original version, ensuring a cost-effective solution.

Frequent retraining was another obstacle faced by Nubank. As event definitions regularly change, embeddings and downstream models dependent on them must be retrained accordingly. To address this issue, Nubank devised a standardized retraining pipeline that can be easily adopted by downstream models, ensuring regular updates.

The initial application of Precog at Nubank is in routing phone calls. A downstream model employing customer embeddings and other features ranks the most likely product that customers require assistance with. If the model is confident in its prediction, customers are directed to specialists. Otherwise, they interact with generalist agents who may transfer them to specialists if needed. Precog’s implementation has significantly improved routing efficiency and customer support services at Nubank.

You May Also Like to Read  Effective Strategies for Dealing with Cheaters in Online Shooter Games

In conclusion, Nubank’s adoption of real-time event AI, Precog, has revolutionized customer support, enabling seamless interactions with expert agents. By leveraging machine learning and natural language processing techniques, Precog has enabled Nubank to gain a comprehensive understanding of customer needs. Despite the challenges faced along the way, Nubank developed a robust system architecture and tackled issues like identifying optimal event windows, managing data volume and cost, and facilitating frequent retraining. The successful implementation of Precog has transformed Nubank into a multi-product financial powerhouse, serving customers across multiple countries. Nubank’s forward-thinking approach to AI exemplifies its commitment to delivering exceptional customer experiences.

Summary: Introducing Precog: Nubank’s AI Empowering Real-Time Event Analytics

Nubank, a leading financial company, has developed an AI system called Precog that revolutionizes customer support. Precog uses machine learning and real-time event data to predict customer needs and provide instant solutions, eliminating the need for tedious menus and transfers. Nubank’s growth and diverse product offerings made it challenging for their Data Science teams to keep up, but Precog automated data processing and feature engineering to support multiple product lines. By encoding customer events as sequences of symbols, Precog learns embeddings and incorporates them into downstream models, resulting in improved call routing accuracy and customer satisfaction.

Frequently Asked Questions:

Q1: What is machine learning and how does it work?
A1: Machine learning is a branch of artificial intelligence that focuses on developing algorithms and models that enable computers to learn from and make predictions or decisions without being explicitly programmed. It works by using large datasets to train models and algorithms, which then analyze and identify patterns or relationships to make predictions or take actions.

Q2: What are the main types of machine learning algorithms?
A2: There are three main types of machine learning algorithms: supervised learning, unsupervised learning, and reinforcement learning. Supervised learning involves training models on labeled data to make predictions or classify new data. Unsupervised learning deals with finding patterns or grouping similar data without any prior knowledge. Reinforcement learning focuses on training models through repeated trial and error interactions to maximize rewards or minimize negative outcomes.

Q3: How can machine learning be used in real-life applications?
A3: Machine learning has a wide range of applications in various industries. It can be used for image and speech recognition, fraud detection, recommendation systems, spam filtering, autonomous vehicles, medical diagnostics, and even predicting stock prices. The possibilities are endless, and machine learning is evolving rapidly to solve complex problems efficiently.

Q4: What are the challenges in implementing machine learning?
A4: Implementing machine learning algorithms and models come with several challenges. One of the main challenges is acquiring and cleaning high-quality data, as the accuracy and reliability of the predictions heavily depend on the quality of the input data. Another challenge is selecting the appropriate algorithm or model, as different problems require different approaches. Additionally, ensuring the model’s interpretability and avoiding biased outcomes are important ethical considerations.

Q5: How can one start learning and working with machine learning?
A5: To start learning and working with machine learning, one can begin by gaining a solid understanding of the basic concepts and algorithms. This can be achieved through online tutorials, courses, or books that offer comprehensive explanations and practical examples. It’s also crucial to practice and experiment on real-world datasets to gain hands-on experience. Utilizing machine learning libraries and frameworks like TensorFlow or scikit-learn can help simplify the implementation process. Building a strong foundation and keeping up with the latest advancements in the field are key to becoming proficient in machine learning.