Llama 4 fashions from Meta now accessible in Amazon Bedrock serverless

The latest AI fashions from Meta, Llama 4 Scout 17B and Llama 4 Maverick 17B, at the moment are accessible as a totally managed, serverless choice in Amazon Bedrock. These new basis fashions (FMs) ship natively multimodal capabilities with early fusion know-how that you need to use for exact picture grounding and prolonged context processing in your purposes.

Llama 4 makes use of an revolutionary mixture-of-experts (MoE) structure that gives enhanced efficiency throughout reasoning and picture understanding duties whereas optimizing for each price and pace. This architectural strategy permits Llama 4 to supply improved efficiency at decrease price in comparison with Llama 3, with expanded language assist for world purposes.

The fashions have been already accessible on Amazon SageMaker JumpStart, and now you can use them in Amazon Bedrock to streamline constructing and scaling generative AI purposes with enterprise-grade safety and privateness.

Llama 4 Maverick 17B – A natively multimodal mannequin that includes 128 specialists and 400 billion complete parameters. It excels in picture and textual content understanding, making it appropriate for versatile assistant and chat purposes. The mannequin helps a 1 million token context window, providing you with the flexibleness to course of prolonged paperwork and sophisticated inputs.

Llama 4 Scout 17B – A general-purpose multimodal mannequin with 16 specialists, 17 billion lively parameters, and 109 billion complete parameters that delivers superior efficiency in comparison with all earlier Llama fashions. Amazon Bedrock presently helps a 3.5 million token context window for Llama 4 Scout, with plans to broaden within the close to future.

Use instances for Llama 4 fashions
You should use the superior capabilities of Llama 4 fashions for a variety of use instances throughout industries:

Enterprise purposes – Construct clever brokers that may purpose throughout instruments and workflows, course of multimodal inputs, and ship high-quality responses for enterprise purposes.

Multilingual assistants – Create chat purposes that perceive pictures and supply high-quality responses throughout a number of languages, making them accessible to world audiences.

Code and doc intelligence – Develop purposes that may perceive code, extract structured knowledge from paperwork, and supply insightful evaluation throughout massive volumes of textual content and code.

Buyer assist – Improve assist techniques with picture evaluation capabilities, enabling simpler downside decision when clients share screenshots or images.

Content material creation – Generate inventive content material throughout a number of languages, with the flexibility to know and reply to visible inputs.

Analysis – Construct analysis purposes that may combine and analyze multimodal knowledge, offering insights throughout textual content and pictures.

Utilizing Llama 4 fashions in Amazon Bedrock
To make use of these new serverless fashions in Amazon Bedrock, I first have to request entry. Within the Amazon Bedrock console, I select Mannequin entry from the navigation pane to toggle entry to Llama 4 Maverick 17B and Llama 4 Scout 17B fashions.

The Llama 4 fashions will be simply built-in into your purposes utilizing the Amazon Bedrock Converse API, which offers a unified interface for conversational AI interactions.

Right here’s an instance of easy methods to use the AWS SDK for Python (Boto3) with Llama 4 Maverick for a multimodal dialog:

import boto3
import json
import os

AWS_REGION = "us-west-2"
MODEL_ID = "us.meta.llama4-maverick-17b-instruct-v1:0"
IMAGE_PATH = "picture.jpg"


def get_file_extension(filename: str) -> str:
    """Get the file extension."""
    extension = os.path.splitext(filename)[1].decrease()[1:] or 'txt'
    if extension == 'jpg':
        extension = 'jpeg'
    return extension


def read_file(file_path: str) -> bytes:
    """Learn a file in binary mode."""
    strive:
        with open(file_path, 'rb') as file:
            return file.learn()
    besides Exception as e:
        elevate Exception(f"Error studying file {file_path}: {str(e)}")

bedrock_runtime = boto3.shopper(
    service_name="bedrock-runtime",
    region_name=AWS_REGION
)

request_body = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "text": "What can you tell me about this image?"
                },
                {
                    "image": {
                        "format": get_file_extension(IMAGE_PATH),
                        "source": {"bytes": read_file(IMAGE_PATH)},
                    }
                },
            ],
        }
    ]
}

response = bedrock_runtime.converse(
    modelId=MODEL_ID,
    messages=request_body["messages"]
)

print(response["output"]["message"]["content"][-1]["text"])

This instance demonstrates easy methods to ship each textual content and picture inputs to the mannequin and obtain a conversational response. The Converse API abstracts away the complexity of working with totally different mannequin enter codecs, offering a constant interface throughout fashions in Amazon Bedrock.

For extra interactive use instances, you can too use the streaming capabilities of the Converse API:

response_stream = bedrock_runtime.converse_stream(
    modelId=MODEL_ID,
    messages=request_body['messages']
)

stream = response_stream.get('stream')
if stream:
    for occasion in stream:

        if 'messageStart' in occasion:
            print(f"nRole: {occasion['messageStart']['role']}")

        if 'contentBlockDelta' in occasion:
            print(occasion['contentBlockDelta']['delta']['text'], finish="")

        if 'messageStop' in occasion:
            print(f"nStop purpose: {occasion['messageStop']['stopReason']}")

        if 'metadata' in occasion:
            metadata = occasion['metadata']
            if 'utilization' in metadata:
                print(f"Utilization: {json.dumps(metadata['usage'], indent=4)}")
            if 'metrics' in metadata:
                print(f"Metrics: {json.dumps(metadata['metrics'], indent=4)}")

With streaming, your purposes can present a extra responsive expertise by displaying mannequin outputs as they’re generated.

Issues to know
The Llama 4 fashions can be found at present with a totally managed, serverless expertise in Amazon Bedrock within the US East (N. Virginia) and US West (Oregon) AWS Areas. You too can entry Llama 4 in US East (Ohio) by way of cross-region inference.

As standard with Amazon Bedrock, you pay for what you employ. For extra info, see Amazon Bedrock pricing.

These fashions assist 12 languages for textual content (English, French, German, Hindi, Italian, Portuguese, Spanish, Thai, Arabic, Indonesian, Tagalog, and Vietnamese) and English when processing pictures.

To begin utilizing these new fashions at present, go to the Meta Llama fashions part within the Amazon Bedrock Person Information. You too can discover how our Builder communities are utilizing Amazon Bedrock of their options within the generative AI part of our neighborhood.aws website.

— Danilo

How is the Information Weblog doing? Take this 1 minute survey!

(This survey is hosted by an exterior firm. AWS handles your info as described within the AWS Privateness Discover. AWS will personal the info gathered by way of this survey and won’t share the data collected with survey respondents.)