- AI Business Asia
- Posts
- GPT-4o Mini: the most cost-efficient model to date, 29 times cheaper than GPT-4, yet as good.
GPT-4o Mini: the most cost-efficient model to date, 29 times cheaper than GPT-4, yet as good.
Leaked: Nvidia working on AI chips specially for China
In today’s newsletter:
Faster, cheaper and better: Introducing GPT-4o Mini
(AI) Chips Ahoy! 🍪
OpenAI AI chip talks with Broadcom
Nvidia’s back in the China market with new AI chips
4 AI tools in the spotlight
4 new investments
Read time: 4-6 mins
Sorry, we couldn’t resist the 🍪 cookie reference.
GPT-4o Mini: Most cost-efficient model to date, 29 times cheaper than GPT-4, yet just as good.
OpenAI’s got another exciting update, GPT-4o mini’s launch last Thursday introduced a faster, more cost-efficient version to expand the use cases and applications leveraging its APIs.
Key points:
Superior performance:
Outperforms GPT-4 on chat preferences in LMSYS leaderboard
Surpasses GPT-3.5 Turbo and other small models on academic benchmarks across both textual intelligence and multimodal reasoning, and supports the same range of languages as GPT-4o.
Beating Claude Haiku and Gemini Flash in maths and coding proficiency, and multimodal reasoning
Better than GPT-3.5 Turbo for tasks such as extracting structured data from receipt files or generating high quality email responses when provided with thread history.
It will take over from GPT3.5 Turbo as the underlying model of ChatGPT’s free version
Most cost-effective:
GPT-4o costs $5 per 1M input tokens + $15 per 1M output token, while GPT-4o mini is $0.15 and $0.6 respectively. That is on average 29 times cheaper.
Feature/Model | GPT-4o Mini | GPT-4o | GPT-4 |
---|---|---|---|
Context Window | 128K tokens | 128K tokens | 8,192 tokens |
Max Output | 16.4K tokens | 8,192 tokens | 8,192 tokens |
Input cost | $0.15 per million | $5 per million | $30 per million |
Output costs | $0.60 per million | $15 per million | $60 per million |
82.0% | 88.7% | 86.4% | |
59.4% | 69.1% | 56% | |
MGSM Score | 87% | 90.5% | 90.2% |
87.2% | 90.2% | 87.1% |
MMLU - Multitask accuracy; MMMU - Multimodal understanding & reasoning; MGSM - Maths reasoning; HumanEval - code generation
So what?
It is clear that OpenAI is stepping up its game to deliver more accessible products to the market, trying to push and defend its market share. In less than 20 months, the cost of using models have dropped by more than 26 times (in a medium usage scenario at 10 million tokens per month, OpenAI’s text-davinci 003 would have cost $2,400 for the year while GPT-4o mini annual costs $90).
At the same time, shifting to a small, task-specific model will allow applications that require fast, real-time responses and network-independent to become much more feasible. it means we will see a proliferation of mobile Apps powered by LLM at the edge side, e.g. a true AI companion running on your smartphone that requires no Internet connection, which is aligned with what all mobile makers are trying to make - AI smartphones.
The News: East meets West
generated by stability.ai
News from Asia:
Nvidia is working on a new AI chip series for China
It’s a big deal: The B20 chip is expected to generate substantial revenue for Nvidia.
A strategic response to competition from Chinese tech giants’ own AI chip development
But US government might quash it with the foreign direct product rule
Tenstorrent emerges as Nvidia challenger, partners with Samsung, Hyundai Motor and Rapidus
Singapore missing out on AI chip boom but continues to have advantage in traditional chips for automotive, industrial, consumer, and mobile markets
Chinese big tech’s AI talent are departing to start own companies amid China’s unicorn boom
iFlytek to invest $51 million to build an international HQ in Hong Kong
News from the West:
OpenAI and Broadcom in talks to create new AI chip
This is not just technology, this is a power play so that OpenAI will not be dependent on Nvidia
Mistrial AI announces NeMo, a 12B model created in partnership with Nvidia
128K tokens with state of art reasoning, world knowledge and coding accuracy
Quantisation awareness during training, which enables FP8 inference without compromising performance → crucial for organisations looking to deploy large language models efficiently.
United Nations makes the case for unified global AI governance
Apple and Meta have both decided to withhold AI models from EU users due to regulatory concerns
NVIDIA introduces Vision Language Models (VLMs) for dynamic video analysis, enabling users to interact with image and video input using natural language.
Who, what, how?
Trending Tools & Apps
Intelligent Canvas - Brainstorm, ideate and iterate with AI; Miro on steroids
Sourcer AI - AI powered fact checking at your fingertips via browser extension
Superjoin - ChatGPT in your (Google) Sheets
Cohesive - AI enrichment and web scraping at scale in Google Sheets
Until next time!
Leo & Lex
What content did you like in today's edition? |
Reply