Good morning!
This edition is all about the new features in the OpenAI API. The updates are numerous!
Here is a quick recap of the announcement:
Updates to O1 API: (Only accessible to Tier 5 Users Only)
Vision capabilities: The model can now reason over images like GPT4o.
Function calling: Useful to call your own or external APIs.
Structured output support: Guaranty the output in JSON.
Faster and leaner processing: Uses 60% fewer tokens for "thinking," running significantly faster than the O1 preview.
A new
reasoning_effort
API parameter lets you control how long the model thinks before answering.
Real-Time API
WebRTC integration: Simplifies live API implementation. It enables smooth and responsive interactions in real-world conditions, even with variable network quality.
Lower costs for audio processing:
gpt-4o-realtime-preview-2024-12-17
audio tokens (input and output) are now 60% cheaper. A new smaller model,gpt-4o-mini-realtime-preview-2024-12-17
, is also available to the real-time API.
Preference Fine-Tuning is now available.
Preference fine-tuning: They use DPO (Direct preference optimization) instead of RLHF (a different algorithm that does not use reinforcement learning) to let you fine-tune GPT-4o by providing pairs of outputs and marking your preferences with GPT-4o-mini support coming soon.
This method is separate from SFT (Supervised fine-tuning), which encourages the model to generate correct outputs by replicating labeled outputs, and RLFT (Reinforcement fine-tuning for O1 models), also available on the platform.
Miscellaneous Updates
The system prompt is now called the “developer” prompt. This is the API's new name for the system prompt. OpenAI says, “It has been renamed to more accurately describe its place in the instruction-following hierarchy.”
For the curious, let’s dive deeper into the impact of these changes.
First, what was previously known as a “system prompt” is now called a “developer message.” This message is a general instruction set guiding the model’s behavior—whether adopting a particular persona, like speaking like a pirate or restricting responses to a specific format, such as JSON.
We can speculate that the change in terminology from “system prompt” to “developer message” stems from the internal use of system prompts on OpenAI’s side. They likely employ their own system-level instructions to guide the model securely, and the term “system prompt” may have confused the models. Since various known methods exist to circumvent safeguards or “jailbreak” models, OpenAI enforces these underlying system prompts to maintain a higher level of security and prevent unethical outputs.
O1 Update: Functions Structure And Visual Understanding
Note: The first thing to note is that the O1 API is currently available only to Tier 5 users. To qualify, you must have spent at least $1,000 on the API. For the other users, the model is updated on the playground and in the web application.
Structured output wasn’t previously guaranteed in O1. While you could ask for a structured response like JSON before, the new update ensures a well-formed output without hallucinated keys or values.
O1 now supports function calling, a feature that has long been available on non-reasoning models. Now that O1 supports function calling, it can potentially perform internet calls, parse web pages, use math APIs for complex calculations, and much more. In essence, developers can now integrate their own custom functions to extend O1’s capabilities.
Finally, the O1 model now understands images, which is a huge win for many use cases. Although you can’t directly upload PDFs yet, you could work around this by converting each page into an image. It might be more costly, but O1 excels at challenging tasks like assisting with tax filing, as showcased in the demo.
Overall, these O1 model updates bring it closer to having the same tools and comforts as the 4o models. With a bit of engineering, you can achieve nearly the same functionality—such as performing internet calls or parsing any documents. We believe it’s only a matter of time before O1 fully matches the feature set of 4o.
Live-stream
With live-stream you can now build real-time voice assistant much cheaper and much easier then before! It takes just a few lines of code to get started and build cool features using voice for the web.
One of the initial challenges with the Real-Time API has been cost, particularly for audio tokens, which were more expensive than their text-based counterparts. While typing or copying large amounts of text can quickly rack up token usage, speech naturally limits the rate at which tokens accumulate. The good news is that OpenAI is now significantly reducing audio token costs, making real-time voice interaction much more affordable.
GPT-4o Real-Time Preview (2024-12-17):
GPT-4o Mini Real-Time Preview (2024-12-17):
These changes make real-time voice interaction more affordable, paving the way for richer, cost-effective user experiences.
Preference Fine-Tuning (with DPO)
Quick Recap of Fine-Tuning Methods
Supervised Fine-Tuning:
Trains the model to predict the next token given some context. Useful for adjusting the model’s overall generation patterns.
Reinforcement Fine-Tuning:
Focuses on teaching the model to produce specific, desired outputs for certain inputs. Ideal for guiding it toward particular tasks or behaviors.
Preference Fine-Tuning (DPO):
Essentially an A/B testing approach for AI. You provide two responses and indicate which one is better. This helps shape the model’s tone, style, and level of creativity.
How to Fine-Tune
You can start fine-tuning directly from the web platform no code (Dashboard → Fine-Tuning) or go to the documentation of fine-tuning (search for preference fine-tuning). It’s great that OpenAI allows you to customize the model to your needs.
Caveats and Recommendations
Fine-tuning can be computationally expensive and may not yield dramatic improvements. Before you invest in it, we recommend experimenting with few-shot prompting techniques. Prompt engineering can often produce better results at zero cost, making it well worth a try.
Also, note that you don’t get access to the model’s weights—OpenAI manages and hosts the model. If your use case involves highly sensitive data (e.g., medical, insurance, or financial information) that you cannot share with a third party, fine-tuning through OpenAI’s platform may not be viable.
Dev Day Conferences
OpenAI has released all the videos from their Dev Day conference on YouTube. They’re packed with valuable insights not just for developers but also for startup founders and investors. During one of the sessions, Sam Altman mentioned progress on faster image-based models. Could this be a hint at DALL·E 4?
We got pretty close on a few of our bingo picks. For example, “O1 file upload” now supports images, but since PDFs and CSVs are still missing, we’re not counting it as a win. “O1 internet search”? Still not natively supported. Sure, we could hack together our own backend and pretend it’s the same, but that doesn’t count either. And about that “O1 price cut (50%)”—we do have fewer tokens used for reasoning (-60%), which kind of feels like a discount, but it’s not an official price reduction. In other words, these updates came close, but not close enough to earn a checkmark on our bingo card.
See you tomorrow for day 10!