Apple Teams Up With NVIDIA to Speed Up AI Language Models

by

Apple has shared details on a collaboration with NVIDIA to greatly improve the performance of large language models (LLMs) by implementing a new text generation technique that offers substantial speed improvements for AI applications.

ml research apple
Apple earlier this year published and open-sourced Recurrent Drafter (ReDrafter), an approach that combines beam search and dynamic tree attention methods to accelerate text generation. Beam search explores multiple potential text sequences at once for better results, while tree attention organizes and removes redundant overlaps among these sequences to improve efficiency.

Apple has now integrated the technology into NVIDIA's TensorRT-LLM framework, which optimizes LLMs running on NVIDIA GPUs, where it achieved "state of the art performance," according to Apple. The integration saw the technique manage a 2.7x speed increase in tokens generated per second during testing with a production model containing tens of billions of parameters.

Apple says the improved performance not only reduces user-perceived latency but also leads to decreased GPU usage and power consumption. From Apple's Machine Learning Research blog:

"LLMs are increasingly being used to power production applications, and improving inference efficiency can both impact computational costs and reduce latency for users. With ReDrafter's novel approach to speculative decoding integrated into the NVIDIA TensorRT-LLM framework, developers can now benefit from faster token generation on NVIDIA GPUs for their production LLM applications."

Developers interested in implementing ReDrafter can find detailed information on both Apple's website and NVIDIA's developer blog.

Top Rated Comments

attohs Avatar
attohs
50 minutes ago at 03:25 am
NVidia? Did hell freeze over again?
Score: 6 Votes (Like | Disagree)
Little Endian Avatar
Little Endian
13 minutes ago at 04:03 am
Apple is in triage mode over Siri!! Yes everyone knows how bad Siri is!! All AI LLM is far from perfect but so far I would rather deal with with any AI/LLM engine rather than Siri. I have an android phone with Google’s Gemini which is a far from perfect but I find myself using it 90% of the time over Siri. If my life depended on it I would avoid Siri at all costs. I would rather seek help from an alcoholic meth head with dementia rather than trust Siri. For heavens sakes she still can’t even dial a phone number or route me to the correct address with a greater than ~90% success rate.
Score: 2 Votes (Like | Disagree)
redbeard331 Avatar
redbeard331
43 minutes ago at 03:32 am
Good we have to hurry this up.



Attachment Image
Score: 1 Votes (Like | Disagree)
Account25476 Avatar
Account25476
43 minutes ago at 03:32 am
What is needed here it’s a miracle
Score: 1 Votes (Like | Disagree)
vegetassj4 Avatar
vegetassj4
41 minutes ago at 03:35 am
NVIDIA and Apple??!!? Working together again?



Attachment Image
Score: 1 Votes (Like | Disagree)
dannys1 Avatar
dannys1
36 minutes ago at 03:39 am
Not just Apple and Nvidia teaming up again - but on a product Apple won't sell and based on software Apple has opensourced!

In the long term this is only going to help consumers and businesses who want to run offline LLMs at home (on Nvidia hardware)
Score: 1 Votes (Like | Disagree)
Read All Comments