Faster LLM Inference - Search News

20d

Lumai Launches the World’s First Optical Computing System for Real-Time, Billion-Parameter LLM Inference

Lumai, the optical compute company addressing scalable AI, today announced its Lumai Iris inference server – the world’s first optical computing system to successfully run billion-parameter large ...

Morning Overview on MSN

OpenAI hires startup Gimlet Labs to optimize its models for Cerebras chips — claiming 10x faster AI inference at the same cost

A startup called Gimlet Labs says it can split AI workloads across chips from different manufacturers and make inference up ...

9to5Mac

Apple collaborates with NVIDIA to research faster LLM performance

In a blog post today, Apple engineers have shared new details on a collaboration with NVIDIA to implement faster text generation performance with large language models. Apple published and open ...

TweakTown

Dell PowerEdge XE9712: NVIDIA GB200 NVL72-based AI GPU cluster for LLM training, inference

Dell has just unleashed its new PowerEdge XE9712 with NVIDIA GB200 NVL72 AI servers, with 30x faster real-time LLM performance over the H100 AI GPU. Dell Technologies' new AI Factory with NVIDIA sees ...

Agent harnesses, like OpenClaw, are changing how we build and run AI models

After nearly four years and hundreds of billions burned building smarter and more capable models, folks understandably would ...

VentureBeat

AI chip race: Groq CEO takes on Nvidia, claims most startups will use speedy LPUs by end of 2024

Everyone is talking about Nvidia’s jaw-dropping earnings results — up a whopping 265% from a year ago. But don’t sleep on Groq, the Silicon Valley-based company creating new AI chips for large ...

Nasdaq

Apple and Nvidia Partner to Enable Faster LLM Token Generation

Discover top-rated stocks from highly ranked analysts with Analyst Top Stocks! Easily identify outperforming stocks and invest smarter with Top Smart Score Stocks Apple introduced ReDrafter earlier ...

Business Wire

Meta Collaborates with Cerebras to Drive Fast Inference for Developers in New Llama API

SUNNYVALE, Calif.--(BUSINESS WIRE)--Meta has teamed up with Cerebras to offer ultra-fast inference in its new Llama API, bringing together the world’s most popular open-source models, Llama, with the ...

Hosted on MSN

Apple embraces Nvidia GPUs to accelerate LLM inference via its open source ReDrafter tech

ReDrafter delivers 2.7x more tokens per second compared to traditional auto-regression ReDrafter could reduce latency for users while using fewer GPUs Apple hasn't said when ReDrafter will be deployed ...

VentureBeat

ServiceNow open sources Fast-LLM in a bid to help enterprises train AI models 20% quicker

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Training a large language model (LLM) is ...

MacStories

AI Experiments: Fast Inference with Groq and Third-Party Tools with Kimi K2 in TypingMind

It all started because I heard great things about Kimi K2 (the latest open-source model by Chinese lab Moonshot AI) and its performance with agentic tool calls. The folks at Moonshot AI specifically ...

9to5Mac

Three highlights from Apple’s recent workshop on natural language processing

A few months ago, Apple hosted a two-day event that featured talks and publications on the latest advancements in natural language processing (NLP). Today, the company published a post with multiple ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results