When ChatGPT debuted in late 2022, it functioned as a cultural and industrial flashpoint, transforming large language models (LLMs) into an "everything app" for millions. But as the initial shock of the technology subsides, the industry is already looking past the chatbot. The next phase, which some are calling "LLMs+," moves away from simple text generation toward systems capable of solving complex, multi-part problems that currently require days or weeks of human labor.
To achieve this level of utility, AI labs are prioritizing autonomy and efficiency. If these models are to tackle the world’s most significant challenges, they must be able to operate independently for extended periods without constant human prompting. This requires a fundamental shift in how models are built and powered, moving away from the "brute force" scaling of the past few years toward more elegant, specialized architectures.
One of the most promising avenues is the "mixture-of-experts" (MoE) approach. Instead of activating a monolithic model for every query, MoE splits the LLM into smaller, specialized units. By only switching on the parts of the model relevant to a specific task, developers can significantly reduce the computational cost and energy required to run them. Other researchers are even questioning the dominance of the Transformer—the neural network architecture that underpins almost all current LLMs—exploring whether diffusion models, typically used for image generation, might offer a more robust path forward.
With reporting from MIT Technology Review.
Source · MIT Technology Review


