Anthropic looks to fund a new, more comprehensive generation of AI benchmarks

Anthropic, an AI company, has announced a new program aimed at funding the development of innovative benchmarks to evaluate the performance and safety of AI models, including generative models like Claude. This program will provide financial support to third-party organizations that can effectively measure advanced capabilities in AI models, with applications accepted on a rolling basis.

The motivation behind this program is to enhance the overall field of AI safety by providing valuable tools that benefit the entire AI ecosystem. Anthropic acknowledges that creating high-quality, safety-relevant evaluations is a significant challenge, and the demand for such evaluations currently outpaces the supply.

The benchmarks developed through this program will focus on AI security and societal implications, addressing issues such as a model’s ability to carry out cyberattacks, enhance weapons of mass destruction, manipulate or deceive people, and self-censor toxicity. Anthropic also intends to support research into benchmarks and end-to-end tasks that probe AI’s potential for aiding in scientific study, conversing in multiple languages, and mitigating ingrained biases.

To achieve these goals, Anthropic envisions new platforms that allow subject-matter experts to develop their own evaluations and large-scale trials of models involving thousands of users. The company has hired a full-time coordinator for the program and may purchase or expand projects it believes have the potential to scale.

While Anthropic’s effort to support new AI benchmarks is commendable, concerns have been raised about the company’s commercial ambitions and potential bias in the evaluation process. Anthropic’s desire for certain evaluations to align with its AI safety classifications, which it developed with input from third parties, might force applicants to accept definitions of “safe” or “risky” AI that they might not agree with.

Additionally, a portion of the AI community questions Anthropic’s emphasis on “catastrophic” and “deceptive” AI risks, like nuclear weapons risks. Some experts argue that there is little evidence to suggest AI will gain world-ending, human-outsmarting capabilities anytime soon, if ever. These concerns may impact the willingness of open, corporate-unaffiliated efforts to create better AI benchmarks to collaborate with Anthropic.

.st1{display:none}See more