For much of the current generative AI boom, high-end models have been effectively locked within the Nvidia ecosystem. The specialized kernels and CUDA-specific operations required for complex tasks—such as converting a single 2D image into a high-fidelity 3D mesh—often make these tools inaccessible to anyone without a dedicated server rack or a costly cloud subscription.
A new port of Microsoft’s TRELLIS.2 model, recently shared by developer Shivam Kumar, challenges this hardware hegemony. By rewriting several hundred lines of code to replace CUDA-specific operations with pure-PyTorch alternatives, Kumar has enabled the 4-billion-parameter model to run on Apple Silicon. The implementation swaps proprietary sparse convolution kernels and hashmap operations for native Metal Performance Shaders (MPS) equivalents, allowing the model to function entirely offline on Mac hardware.
While the performance on an M4 Pro chip—roughly three and a half minutes to generate a 400,000-vertex mesh—pales in comparison to the near-instant results of an enterprise-grade H100, the shift is significant. It represents a move toward local, sovereign computing for designers and developers. By removing the dependency on remote clusters, the port demonstrates that even resource-intensive generative tasks are beginning to find a more accessible home on the desktop.
With reporting from Hacker News.
Source · Hacker News


