NVIDIA, in collaboration with Google, today launched optimizations across all NVIDIA AI platforms for Gemma, Google's new open, lightweight language model with 2 to 7 billion parameters that can be run n anywhere, thereby reducing costs and accelerating innovative work for domain-specific use cases.
Teams from both companies worked closely to accelerate performance of Gem, built from the same research and technologies used to create Gemini models, with NVIDIA TensorRT-LLM, an open source library for optimizing the inference of large language models, when run on NVIDIA GPUs in the data center, in the cloud and locally on workstations with NVIDIA RTX GPUs or PCs with GeForce RTX GPUs. This allows developers to target the installed base of more than 100 million NVIDIA RTX GPUs available in high-performance AI PCs globally.
Discover Gem 2B et Gem 7B directly from your browser on the NVIDIA AI Playground. Gemma arrives on Chat With RTX. Adding support for Gemma is planned soon on Chat with RTX, a technology demo from NVIDIA that uses fetch-augmented generation and TensorRT-LLM software to deliver generative AI capabilities to users on their local RTX-powered Windows PCs.
Chat with RTX allows users to customize a chatbot with their own data by easily connecting local files on an RTX PC to a large language model. Because the model runs locally, it delivers results quickly and the user's data remains on the device. Instead of relying on cloud-based LLM services, Chat with RTX allows users to process sensitive data on a local PC without the need to share it with a third party or have an Internet connection.