ai benchmark - Search News

Testing The Limits: Three Ways AI Benchmarks Are Evolving

When it comes to real-world evaluation, appropriate benchmarks need to be carefully selected to match the context of AI ...

This new AI benchmark measures how much models lie

Researchers behind the MASK benchmark found that more knowledge doesn't mean more 'moral virtue.' See which model lies the ...

MIT Technology Review1d

These new AI benchmarks could help make models less biased

New AI benchmarks could help developers reduce bias in AI models, potentially making them fairer and less likely to cause harm. The research, from a team based at Stanford, was posted to the arXiv ...

9don MSN

People are using Super Mario to benchmark AI now

Thought Pokémon was a tough benchmark for AI? One group of researchers argues that Super Mario Bros. is even tougher.

Analytics Insight10h

Gemma 3: Google’s New AI Beats OpenAI’s o3-mini and DeepSeek-V3

Google has launched Gemma 3, the third generation of its open-source AI models. The model is better than rivals like DeepSeek ...

7don MSN

Chatbots Are Cheating on Their Benchmark Tests

To measure the success of their work, companies cite industry-standard benchmark tests whenever they release a new model. The ...

21h

Google debuts two new AI models for powering robots

Google LLC today introduced two new artificial intelligence models, Gemini Robotics and Gemini Robotics-ER, that are ...

eWeek12d

Tencent’s New DeepSeek Competitor Looks Promising Based on Key AI Benchmarks

See how Tencent’s newest AI platform called Hunyuan Turbo S compared to top competitors, including DeepSeek-R1-Zero.

11hon MSN

Alibaba launches new version of AI assistant tool as competition heats up

The launch comes as its latest effort to gain an edge amid growing competition in AI application front, further intensified ...

Science News6d

Medical AI tools are growing, but are they being tested properly?

AI medical benchmark tests fall short because they don’t test efficiency on real tasks such as writing medical notes, experts say.

NextBigFuture2d

AI Agent Competitive Landscape and Manus AI Innovations

Manus AI, developed by the Chinese startup Monica.im is making a lot of splash as the world’s first fully autonomous AI agent ...

MIT Technology Review2d

These new AI benchmarks could help make models less biased

They could offer a more nuanced way to measure AI’s bias and its understanding of the world. New AI benchmarks could help developers reduce bias in AI models, potentially making them fairer and ...

Results that may be inaccessible to you are currently showing.

Hide inaccessible results