New top story on Hacker News: Llama.cpp can do 40 tok/s on M2 Max, 0% CPU usage, using all 38 GPU cores
Llama.cpp can do 40 tok/s on M2 Max, 0% CPU usage, using all 38 GPU cores
20 by samwillis | 4 comments on Hacker News.
20 by samwillis | 4 comments on Hacker News.
Comments
Post a Comment