Anthropic Launches Claude 3.5 Sonnet

While Anthropic focuses on the explainability of LLMs and has invested a lot of resources in it, it does not seem to have affected their R&D progress. Just one month after the launch of GPT-4o, they have released Claude 3.5 Sonnet, which is comparable to 4o. It's intriguing to wonder how powerful the 3.5 Opus will be.

Today, we are launching Claude 3.5 Sonnet, the first product in the upcoming Claude 3.5 series. Claude 3.5 Sonnet raises the industry intelligence standard, outperforming competitors' models and Claude 3 Opus in various assessments, while matching the speed and cost of our mid-range model, Claude 3 Sonnet.

Claude 3.5 Sonnet is now freely available on Claude.ai and the Claude iOS app, with Claude Pro and Team plan subscribers gaining access to it with higher rate limits. It is also available through the Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI. The model costs $3 per million input tokens and $15 per million output tokens, with a 200K token context window.

Get cutting-edge intelligence at twice the speed
Claude 3.5 Sonnet sets new industry benchmarks for graduate-level reasoning (GPQA), undergraduate-level knowledge (MMLU), and coding capabilities (HumanEval). It shows significant advancements in grasping nuances, humor, and complex instructions, and excels at writing high-quality content in a natural, friendly tone.

Claude 3.5 Sonnet operates at twice the speed of Claude 3 Opus. This performance increase, combined with an affordable price, makes Claude 3.5 Sonnet an ideal choice for complex tasks such as context-aware customer support and coordinating multi-step workflows.

In our internal agent coding evaluations, Claude 3.5 Sonnet solved 64% of problems, outperforming Claude 3 Opus, which solved 38%. Our evaluations tested the model's ability to fix errors or add features to open-source codebases given a natural language description of the required improvements. With guidance and the right tools, Claude 3.5 Sonnet can write, edit, and execute code independently, featuring complex reasoning and troubleshooting capabilities. It handles code transformations with ease, making it particularly suited for updating legacy applications and migrating codebases.

State-of-the-art vision
Claude 3.5 Sonnet is our most powerful vision model to date, surpassing Claude 3 Opus on standard vision benchmarks. These significant improvements are most evident for tasks requiring visual reasoning, such as interpreting charts and graphics. Claude 3.5 Sonnet can also accurately transcribe text from imperfect images—a core functionality in retail, logistics, and financial services, where AI can glean more insights from images, graphics, or illustrations than from text alone.

Artifacts—the new way to use Claude
Today, we are also launching Artifacts on Claude.ai, a new feature that expands the ways users interact with Claude. These Artifacts display in a dedicated window alongside the conversation when users ask Claude to generate content such as code snippets, text documents, or website designs. This creates a dynamic workspace where users can view, edit, and build upon Claude's creations in real-time, integrating AI-generated content seamlessly into their projects and workflows.

This preview feature marks the evolution of Claude from a conversational AI to a collaborative work environment. It's just the beginning of a broader vision for Claude.ai, which will soon expand to support team collaboration. In the near future, teams (and eventually entire organizations) will be able to securely centralize their knowledge, documents, and ongoing work in a shared space, with Claude as a ready assistant.

Committed to safety and privacy
Our models have undergone rigorous testing and are trained to reduce misuse. Despite the leap in intelligence of Claude 3.5 Sonnet, our red team assessment concluded that Claude 3.5 Sonnet remains at ASL-2 level. For more details, please refer to the model card appendix.

As part of our commitment to safety and transparency, we collaborate with external experts to test and improve the safety mechanisms in our latest models. We recently provided Claude 3.5 Sonnet to the UK Artificial Intelligence Safety Institute (UK AISI) for pre-deployment safety assessment. The UK AISI completed the testing of 3.5 Sonnet and shared its results with the US Artificial Intelligence Safety Institute (US AISI) as part of a memorandum of understanding that benefits from the partnership announced earlier this year between the US and UK AISIs.

We have integrated policy feedback from external subject matter experts to ensure our assessments are robust and consider new trends in abuse. This engagement has helped our team enhance the evaluation of 3.5 Sonnet against various types of misuse. For instance, we leveraged feedback from Thorn's child safety experts to update our classifiers and fine-tune our models.

One of the core constitutional principles guiding the development of our AI models is privacy. We do not use data submitted by users to train our generative models unless explicitly permitted by the user. To date, we have not used any customer or user-submitted data to train our generative models.

Upcoming releases
Our goal is to significantly improve the trade-off curve between intelligence, speed, and cost every few months. To round out the Claude 3.5 series, we will release Claude 3.5 Haiku and Claude 3.5 Opus later this year.

In addition to developing the next generation of models, we are also working on new modes and features to support more enterprise use cases, including integration with enterprise applications. Our team is also exploring features such as Memory, which will enable Claude to remember user preferences and specified interaction histories, making their experience more personalized and efficient.

We are constantly working to improve Claude and are eager for user feedback. You can submit feedback on Claude 3.5 Sonnet directly within the product to inform our development roadmap and help our team improve your experience. As always, we look forward to seeing what you build, create, and discover with Claude.