Terminal-Bench 2.0 launches alongside Harbor, a new framework for testing agents in containers

The developers of Terminal-Bench, a benchmark suite for evaluating the performance of autonomous AI agents on real-world terminal-based tasks, have released version 2.0 alongside Harbor, a new framework for testing, improving and optimizing AI agents in containerized environments. The dual release aims to address long-standing pain points in testing and optimizing AI agents, particularly those…

Read More

The Download: a new home under the sea, and cloning pets

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. The first new subsea habitat in 40 years is about to launch Vanguard feels and smells like a new RV. It has long, gray banquettes that convert into bunks, a microwave cleverly hidden…

Read More

Cloning isn’t just for celebrity pets like Tom Brady’s dog

This week, we heard that Tom Brady had his dog cloned. The former quarterback revealed that his Junie is actually a clone of Lua, a pit bull mix that died in 2023. Brady’s announcement follows those of celebrities like Paris Hilton and Barbra Streisand, who also famously cloned their pet dogs. But some believe there…

Read More

The first new subsea habitat in 40 years is about to launch

Vanguard feels and smells like a new RV. It has long, gray banquettes that convert into bunks, a microwave cleverly hidden under a counter, a functional steel sink with a French press and crockery above. A weird little toilet hides behind a curtain. But some clues hint that you can’t just fire up Vanguard’s engine…

Read More

NYU’s new AI architecture makes high-quality image generation faster and cheaper

Researchers at New York University have developed a new architecture for diffusion models that improves the semantic representation of the images they generate. “Diffusion Transformer with Representation Autoencoders” (RAE) challenges some of the accepted norms of building diffusion models. The NYU researcher’s model is more efficient and accurate than standard diffusion models, takes advantage of…

Read More

Moonshot’s Kimi K2 Thinking emerges as leading open source AI, outperforming GPT-5, Claude Sonnet 4.5 on key benchmarks

Even as concern and skepticism grows over U.S. AI startup OpenAI’s buildout strategy and high spending commitments, Chinese open source AI providers are escalating their competition and one has even caught up to OpenAI’s flagship, paid proprietary model GPT-5 in key third-party performance benchmarks with a new, free model. The Chinese AI startup Moonshot AI’s…

Read More
Back To Top