FT商學院

Anthropic makes ‘jailbreak’ advance to stop AI models producing harmful results

Leading tech groups including Microsoft and Meta also invest in similar safety systems

Artificial intelligence start-up Anthropic has demonstrated a new technique to prevent users from eliciting harmful content from its models, as leading tech groups including Microsoft and Meta race to find ways that protect against dangers posed by the cutting-edge technology.

In a paper released on Monday, the San Francisco-based start-up outlined a new system called “constitutional classifiers”. It is a model that acts as a protective layer on top of large language models such as the one that powers Anthropic’s Claude chatbot, which can monitor both inputs and outputs for harmful content.

The development by Anthropic, which is in talks to raise $2bn at a $60bn valuation, comes amid growing industry concern over “jailbreaking” — attempts to manipulate AI models into generating illegal or dangerous information, such as producing instructions to build chemical weapons.

您已閱讀24%(878字),剩餘76%(2761字)包含更多重要資訊,訂閱以繼續探索完整內容,並享受更多專屬服務。
版權聲明:本文版權歸FT中文網所有,未經允許任何單位或個人不得轉載,複製或以任何其他方式使用本文全部或部分,侵權必究。
設置字型大小×
最小
較小
默認
較大
最大
分享×