Amazon SageMaker | AWS Machine Learning Blog

Achieve ~2x speed-up in LLM inference with Medusa-1 on Amazon SageMaker AI

Researchers developed Medusa, a framework to speed up LLM inference by adding extra heads to predict multiple tokens simultaneously. This post demonstrates how to use Medusa-1, the first version of the framework, to speed up an LLM by fine-tuning it on Amazon SageMaker AI and confirms the speed up with deployment and a simple load test. Medusa-1 achieves an inference speedup of around two times without sacrificing model quality, with the exact improvement varying based on model size and data used. In this post, we demonstrate its effectiveness with a 1.8 times speedup observed on a sample dataset.

Falcon 3 models now available in Amazon SageMaker JumpStart

We are excited to announce that the Falcon 3 family of models from TII are available in Amazon SageMaker JumpStart. In this post, we explore how to deploy this model efficiently on Amazon SageMaker AI.

GraphStorm SageMaker Arhcitecture Diagram

Faster distributed graph neural network training with GraphStorm v0.4

GraphStorm is a low-code enterprise graph machine learning (ML) framework that provides ML practitioners a simple way of building, training, and deploying graph ML solutions on industry-scale graph data. In this post, we demonstrate how GraphBolt enhances GraphStorm’s performance in distributed settings. We provide a hands-on example of using GraphStorm with GraphBolt on SageMaker for distributed training. Lastly, we share how to use Amazon SageMaker Pipelines with GraphStorm.

Build agentic AI solutions with DeepSeek-R1, CrewAI, and Amazon SageMaker AI

In this post, we demonstrate how you can deploy an LLM such as DeepSeek-R1—or another FM of your choice—from popular model hubs like SageMaker JumpStart or Hugging Face Hub to SageMaker AI for real-time inference. We explore inference frameworks like Hugging Face TGI which helps streamline deployment while integrating built-in performance optimizations to minimize latency and maximize throughput. Additionally, we showcase how the SageMaker developer-friendly Python SDK simplifies endpoint orchestration, allowing seamless experimentation and scaling of LLM-powered applications.

Governing the ML lifecycle at scale, Part 4: Scaling MLOps with security and governance controls

This post provides detailed steps for setting up the key components of a multi-account ML platform. This includes configuring the ML Shared Services Account, which manages the central templates, model registry, and deployment pipelines; sharing the ML Admin and SageMaker Projects Portfolios from the central Service Catalog; and setting up the individual ML Development Accounts where data scientists can build and train models.

The flow from input forms to the final output, including how integrations and AI services are utilized.

Enhancing LLM Capabilities with NeMo Guardrails on Amazon SageMaker JumpStart

Integrating NeMo Guardrails with Large Language Models (LLMs) is a powerful step forward in deploying AI in customer-facing applications. The example of AnyCompany Pet Supplies illustrates how these technologies can enhance customer interactions while handling refusal and guiding the conversation toward the implemented outcomes. This journey towards ethical AI deployment is crucial for building sustainable, trust-based relationships with customers and shaping a future where technology aligns seamlessly with human values.

How Travelers Insurance classified emails with Amazon Bedrock and prompt engineering

In this post, we discuss how FMs can reliably automate the classification of insurance service emails through prompt engineering. When formulating the problem as a classification task, an FM can perform well enough for production environments, while maintaining extensibility into other tasks and getting up and running quickly. All experiments were conducted using Anthropic’s Claude models on Amazon Bedrock.

The following diagram illustrates the workflow of patch-level prediction tasks on a WSI

Accelerate digital pathology slide annotation workflows on AWS using H-optimus-0

In this post, we demonstrate how to use H-optimus-0 for two common digital pathology tasks: patch-level analysis for detailed tissue examination, and slide-level analysis for broader diagnostic assessment. Through practical examples, we show you how to adapt this FM to these specific use cases while optimizing computational resources.

DeepSeek-R1 model now available in Amazon Bedrock Marketplace and Amazon SageMaker JumpStart

DeepSeek-R1 is an advanced large language model that combines reinforcement learning, chain-of-thought reasoning, and a Mixture of Experts architecture to deliver efficient, interpretable responses while maintaining safety through Amazon Bedrock Guardrails integration.

Track LLM model evaluation using Amazon SageMaker managed MLflow and FMEval

In this post, we show how to use FMEval and Amazon SageMaker to programmatically evaluate LLMs. FMEval is an open source LLM evaluation library, designed to provide data scientists and machine learning (ML) engineers with a code-first experience to evaluate LLMs for various aspects, including accuracy, toxicity, fairness, robustness, and efficiency.

Category: Amazon SageMaker