OpenAI has been showing off a new multi-modal artificial intelligence model to some customers that can both talk to you and recognize objects, according to a new report from OpenAI information. The outlet, citing unnamed sources, said this could be part of the company’s planned reveal on Monday.
New model reportedly interprets images and audio faster and more accurately than existing separate transcription and text-to-speech models. Apparently, it’s capable of helping customer service agents “better understand a caller’s tone of voice, or whether they’re being sarcastic,” and “in theory,” the model could help students learn math or translate real-world symbols, writes information.
The outlet’s sources said the model can outperform GPT-4 Turbo at “answering certain types of questions,” but is still prone to errors.
Developer Ananay Arora posted a screenshot of the above call-related code, and he said that OpenAI may also be preparing a new built-in ChatGPT function to make calls. Arora also found evidence that OpenAI had servers configured for real-time messaging and video communications.
If announced next week, none of this will be GPT-5. CEO Sam Altman explicitly denied that its upcoming announcement has anything to do with a model that is considered “substantially better” than GPT-4. information writes that GPT-5 may be publicly released by the end of this year.