When it comes to real-world evaluation, appropriate benchmarks need to be carefully selected to match the context of AI ...
Researchers behind the MASK benchmark found that more knowledge doesn't mean more 'moral virtue.' See which model lies the ...
New AI benchmarks could help developers reduce bias in AI models, potentially making them fairer and less likely to cause harm. The research, from a team based at Stanford, was posted to the arXiv ...
Thought Pokémon was a tough benchmark for AI? One group of researchers argues that Super Mario Bros. is even tougher.
Google has launched Gemma 3, the third generation of its open-source AI models. The model is better than rivals like DeepSeek ...
To measure the success of their work, companies cite industry-standard benchmark tests whenever they release a new model. The ...
Google LLC today introduced two new artificial intelligence models, Gemini Robotics and Gemini Robotics-ER, that are ...
See how Tencent’s newest AI platform called Hunyuan Turbo S compared to top competitors, including DeepSeek-R1-Zero.
The launch comes as its latest effort to gain an edge amid growing competition in AI application front, further intensified ...
AI medical benchmark tests fall short because they don’t test efficiency on real tasks such as writing medical notes, experts say.
Manus AI, developed by the Chinese startup Monica.im is making a lot of splash as the world’s first fully autonomous AI agent ...
They could offer a more nuanced way to measure AI’s bias and its understanding of the world. New AI benchmarks could help developers reduce bias in AI models, potentially making them fairer and ...
Results that may be inaccessible to you are currently showing.
Hide inaccessible results