Llama 2 foundation models from Meta are now available in Amazon SageMaker JumpStart

Newly Released on Amazon SageMaker JumpStart: Meta’s Llama 2 Foundation Models

Introduction:

We are thrilled to announce that customers can now access the Llama 2 foundation models developed by Meta through Amazon SageMaker JumpStart. The Llama 2 family consists of large language models ranging from 7 billion to 70 billion parameters, which have been pre-trained and fine-tuned for various text generation tasks. These models are optimized for dialogue use cases and can be easily integrated into your machine learning projects using SageMaker JumpStart. In this post, we will guide you on how to use Llama 2 models via SageMaker JumpStart. So let’s dive in and explore the world of Llama 2 language models!

Full Article: Newly Released on Amazon SageMaker JumpStart: Meta’s Llama 2 Foundation Models

Llama 2 Foundation Models Now Available on Amazon SageMaker JumpStart

Today, Meta announced the availability of Llama 2 foundation models developed by Meta through Amazon SageMaker JumpStart. These models are part of the Llama 2 family of large language models (LLMs) and offer a range of pre-trained and fine-tuned generative text models. They come in different parameter sizes, ranging from 7 billion to 70 billion.

Optimized for dialogue use cases, the fine-tuned LLMs, called Llama-2-chat, provide excellent performance in conversational scenarios. With SageMaker JumpStart, users can easily try out and utilize these models for their machine learning (ML) projects. SageMaker JumpStart is an ML hub that offers access to algorithms, models, and ML solutions, allowing users to quickly get started with ML.

Introduction to Llama 2

Llama 2 is an auto-regressive language model built on an optimized transformer architecture. It is primarily designed for commercial and research use in English. The Llama 2 models are available in different parameter sizes, including 7 billion, 13 billion, and 70 billion, and both pre-trained and fine-tuned variations. The tuned models have undergone supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

Pre-training data for Llama 2 models consisted of 2 trillion tokens from publicly available sources. While the fine-tuned models are suitable for assistant-like chat, the pre-trained models can be adapted for various natural language generation tasks. Nevertheless, additional fine-tuning may be necessary to customize and optimize the models, and Meta provides a responsible use guide to assist developers in this process by incorporating appropriate safety mitigations.

Understanding SageMaker JumpStart

SageMaker JumpStart is an excellent resource for ML practitioners, offering a wide selection of publicly available foundation models. These models can be deployed on dedicated Amazon SageMaker instances, which operate in a network-isolated environment. Users can use SageMaker for model training and deployment, allowing them to customize the models according to their specific needs.

By integrating SageMaker JumpStart with Amazon SageMaker Studio or using the SageMaker Python SDK, users can easily discover and deploy Llama 2 models with just a few clicks. Additionally, SageMaker features such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger, or container logs can be leveraged to monitor and control model performance and MLOps.

Deploying Llama 2 Models

Discovering Llama 2 models in SageMaker Studio is straightforward. After accessing SageMaker Studio, navigate to SageMaker JumpStart, where you will find pre-trained models, notebooks, and prebuilt solutions. In the Foundation Models: Text Generation carousel, you’ll find the flagship Llama 2 models.

If the Llama 2 models are not visible, simply update your SageMaker Studio version by restarting it. Alternatively, you can explore all Text Generation Models or search for “llama” to find the Llama 2 model variants.

You May Also Like to Read  Discover the Top 5 Coding Programs for Middle School Students with Inspirit AI

To deploy a Llama 2 model, you can either choose the Deploy button or open the provided notebook. The notebook offers comprehensive instructions on how to deploy the model for inference and clean up resources. To deploy using the notebook, specify the appropriate model_id, as shown in the following code:

“`python
from sagemaker.jumpstart.model import JumpStartModel

my_model = JumpStartModel(model_id=”meta-textgeneration-llama-2-70b-f”)
predictor = my_model.deploy()
“`

Fine-tuned chat models like Llama-2-7b-chat, Llama-2-13b-chat, and Llama-2-70b-chat are designed to accept a history of chat between the user and the chat assistant, generating subsequent chat responses. On the other hand, pre-trained models like Llama-2-7b, Llama-2-13b, and Llama-2-70b require a string prompt and perform text completion based on the given prompt.

Conducting Inference with Llama 2 Models

To run inference against a deployed Llama 2 model, use the SageMaker predictor with the appropriate payload. The payload should contain the input data and optional inference parameters. For example, the following payload demonstrates the structure for fine-tuned chat models:

“`python
payload = {
“inputs”: [
[
{“role”: “system”, “content”: “Always answer with Haiku”},
{“role”: “user”, “content”: “I am going to Paris, what should I see?”}
]
],
“parameters”: {“max_new_tokens”: 256, “top_p”: 0.9, “temperature”: 0.6}
}
“`

It is important to note that by default, the endpoint does not accept the end-user license agreement (EULA). To successfully invoke the endpoint, set the `accept_eula` parameter to `true`. The payload structure also allows passing other custom attributes, in this case, `accept_eula`. The `max_new_tokens` parameter controls the size of the output generated by the model, while the `temperature` parameter regulates the randomness in the output.

List of Available Llama Models

SageMaker JumpStart offers a variety of Llama models, each with its own unique capabilities. Here is a list of the available Llama models and their associated model_ids, default instance types, and maximum total tokens:

– Llama-2-7b: meta-textgeneration-llama-2-7b (4096 tokens, default instance type: ml.g5.2xlarge)
– Llama-2-7b-chat: meta-textgeneration-llama-2-7b-f (4096 tokens, default instance type: ml.g5.2xlarge)
– Llama-2-13b: meta-textgeneration-llama-2-13b (4096 tokens, default instance type: ml.g5.12xlarge)
– Llama-2-13b-chat: meta-textgeneration-llama-2-13b-f (4096 tokens, default instance type: ml.g5.12xlarge)
– Llama-2-70b: meta-textgeneration-llama-2-70b (4096 tokens, default instance type: ml.g5.48xlarge)
– Llama-2-70b-chat: meta-textgeneration-llama-2-70b-f (4096 tokens, default instance type: ml.g5.48xlarge)

Important Considerations

When using SageMaker endpoints, it’s crucial to note that there is a timeout limit of 60 seconds. Therefore, even though the model can generate up to 4096 tokens, requests that take longer than 60 seconds may fail. For the 7 billion, 13 billion, and 70 billion models, it is recommended to set the `max_new_tokens` parameter to no greater than 1500, 1000, and 500, respectively, while keeping the total number of tokens below 4K.

Conclusion

The availability of Llama 2 foundation models through Amazon SageMaker JumpStart opens up a world of possibilities for ML practitioners. With the ability to access and utilize these models easily, users can explore various natural language generation tasks and enhance their ML projects. The integration of SageMaker JumpStart with SageMaker Studio and the SageMaker Python SDK provides a seamless experience, allowing users to deploy and fine-tune models efficiently.

Summary: Newly Released on Amazon SageMaker JumpStart: Meta’s Llama 2 Foundation Models

We are thrilled to announce that Meta’s Llama 2 foundation models are now accessible to customers through Amazon SageMaker JumpStart. The Llama 2 family consists of pre-trained and fine-tuned language models that range in size from 7 billion to 70 billion parameters. These models are optimized for dialogue use cases and can be easily used with SageMaker JumpStart, a machine learning (ML) hub that offers access to algorithms, models, and ML solutions for quick ML implementation. In this article, we provide a step-by-step guide on how to use Llama 2 models via SageMaker JumpStart. Llama 2 is an auto-regressive language model that utilizes an optimized transformer architecture. It is specifically designed for commercial and research use in English and is available in various parameter sizes, both pre-trained and fine-tuned. The fine-tuned models have undergone supervised fine-tuning and reinforcement learning with human feedback to ensure they align with human preferences for safety and usefulness. Llama 2 has been pre-trained on 2 trillion tokens of data from publicly available sources. The fine-tuned models are intended for assistant-like chat, while the pre-trained models can be adapted for various natural language generation tasks. It’s important to note that regardless of the model version being used, developers can refer to Meta’s responsible use guide for additional guidelines on customizing and optimizing the models with appropriate safety measures. Amazon SageMaker JumpStart offers a wide selection of publicly available foundation models for ML practitioners. These models can be deployed to dedicated Amazon SageMaker instances within a network isolated environment, and ML practitioners can also customize the models using SageMaker for training and deployment. With just a few clicks in either Amazon SageMaker Studio or programmatically using the SageMaker Python SDK, users can discover and deploy Llama 2 models. This allows for easy model performance evaluation and MLOps controls using SageMaker features like Amazon SageMaker Pipelines, Amazon SageMaker Debugger, and container logs. The deployment of the model takes place within a secure AWS environment and is under the user’s VPC control, ensuring data security. Llama 2 models are currently available in Amazon SageMaker Studio in the us-east-1, us-west-2, eu-west-1, and ap-southeast-1 Regions. To access the foundation models, ML practitioners can use SageMaker JumpStart in either the SageMaker Studio UI or the SageMaker Python SDK. In SageMaker Studio, users can access SageMaker JumpStart, which offers a collection of pre-trained models, notebooks, and prebuilt solutions under the Prebuilt and automated solutions section. In the SageMaker JumpStart landing page, users can browse through the available solutions, models, notebooks, and other resources. The Foundation Models: Text Generation carousel contains two flagship Llama 2 models. If the Llama 2 models are not visible, users can update their SageMaker Studio version by shutting down and restarting the program. Another way to find the Llama 2 models is by selecting Explore all Text Generation Models or searching for “llama” in the search box. Clicking on the model card provides detailed information about the model, including the license, training data used, and instructions on how to use it. The card also includes two buttons, Deploy and Open Notebook, which enable users to utilize the model. Upon choosing either button, a pop-up will display the end-user license agreement and acceptable use policy for acknowledgment. After acknowledging the terms, the user can proceed to the next step to use the model. When selecting the Deploy option and acknowledging the terms, the model deployment process will commence. Alternatively, users can deploy the model by following the example notebook provided in the Open Notebook option. The example notebook provides detailed instructions on how to deploy the model for inference and clean up resources afterward. To deploy the model using a notebook, users need to select the appropriate model by specifying the model_id, as shown in the provided code snippet. Additionally, users can customize the default instance type and default VPC configurations by specifying non-default values in JumpStartModel. Once the model is deployed, users can perform inference against the deployed endpoint using the SageMaker predictor. The payload for the inference should include a history of chat between the user and the chat assistant when using the fine-tuned chat models (Llama-2-7b-chat, Llama-2-13b-chat, Llama-2-70b-chat). For the pre-trained models (Llama-2-7b, Llama-2-13b, Llama-2-70b), a string prompt is required for text completion. To successfully invoke the endpoint, users need to set accept_eula=true, as accept_eula is set to false by default. The key/value pairs used to pass the EULA (end-user license agreement) are specified in the custom_attributes parameter. Users can control the text generation process at the endpoint using the provided inference parameters. The maximum new tokens parameter defines the size of the output generated by the model. It’s important to note that the number of tokens is not necessarily the same as the number of words, as each token may not correspond to an English language word. The temperature parameter controls the randomness in the output, with higher values resulting in more creative outputs. All inference parameters are optional. In the article, a table is provided that lists all the available Llama models in SageMaker JumpStart, including the model_ids, default instance types, and the maximum number of total tokens supported for each model. It’s recommended to set the max_new_tokens parameter to values no greater than 1500, 1000, and 500 for the 7B, 13B, and 70B models, respectively, while keeping the total number of tokens below 4K. Llama models can be used for various text completion tasks, including answering questions, language translation, sentiment analysis, and more. The input payload for the endpoint includes the text to be completed and optional inference parameters. The article provides sample example prompts and the corresponding text generated by the model. To facilitate ML practitioners in using Llama 2 models, Amazon SageMaker JumpStart offers an easy-to-use interface to discover, deploy, and use these powerful language models.

You May Also Like to Read  Testing Apps in the Wild: Empowering Wildland Practitioners with Fuels Data | by Wildlands | July 2023

Frequently Asked Questions:

Q1: What is artificial intelligence (AI)?

A1: Artificial Intelligence refers to the simulation of human intelligence in machines, enabling them to perform tasks that typically require human intelligence. It involves the development of intelligent machines that can imitate human behavior, learning, and problem-solving capabilities.

Q2: How does artificial intelligence work?

A2: AI systems rely on various techniques such as machine learning, natural language processing, computer vision, and robotics, among others. These techniques enable machines to analyze and interpret data, make decisions, recognize patterns, and adapt to new information or circumstances.

Q3: What are the main applications of artificial intelligence?

A3: Artificial intelligence finds applications across various industries, including healthcare, finance, transportation, customer service, and manufacturing. AI is used for tasks like speech and image recognition, autonomous vehicles, fraud detection, personal assistants (e.g., Siri, Alexa), and recommendation systems, to name a few.

Q4: What are the ethical concerns surrounding artificial intelligence?

A4: One of the prominent ethical concerns with AI is related to privacy and data protection. As AI systems rely on vast amounts of data, ensuring the responsible use and safeguarding of personal information becomes crucial. Additionally, there are concerns about job displacement, biases in AI algorithms, and potential misuse of AI in surveillance or military applications.

Q5: How can artificial intelligence benefit society?

A5: Artificial intelligence has the potential to bring about significant advancements in various sectors. It can enhance healthcare by enabling faster and more accurate diagnoses, improve transportation systems with autonomous vehicles, revolutionize customer service through chatbots, and contribute to more efficient energy usage. AI also has the potential to free humans from repetitive tasks, allowing them to focus on more creative and complex problem-solving.

You May Also Like to Read  Transforming Your Siri Experience with the Revolutionary Voice Trigger System