May 23, 2024 · 7 min
GPT-4o: OpenAI’s new AI model redefines Human-Computer interactions On May 13, 2024, OpenAI launched GPT-4o, its latest generative language model. This release was strategically timed before Google’s AI announcements, planned the 14th. GPT-4o isn’t just an upgrade: it aims to redefine how we interact with computers and smartphones, pushing the limits of real-time and multimodal communication.
The features seem to be released promptly, either immediately or within days of the announcements, which is not always the case in the AI industry.
OMNI-MODAL AI
The “o” in GPT-4o stands for “omni,” highlighting its aspiration to seamlessly handle text, audio, and visual inputs and outputs. Previously, using such varied data types required separate models: GPT-4 for text, DALL-E for images, and Whisper model for audio. Now, GPT-4o combines all these capabilities into a single, cohesive model with the aim of making interactions more natural and fluid. Users can input text, audio, or visual data, and receive responses in any of these formats.
INTELLIGENCE AND SPEED
When comparing GPT-4 to GPT-4o, even using just text input and output as with GPT-4, the improvement is impressive. The upgraded model delivers smarter and faster responses, providing better results. This boost offers the best of both worlds: increased speed and intelligence. While new features like voice mode and visual processing are exciting, the fundamental improvement in the model alone is remarkable.
GPT4o-benchmark-performances [https://admin.united4.ai/wp-content/uploads/2024/05/Capture-decran-2024-06-25-a-09.15.03-1024x547.png]GPT4o-benchmark-performances
VOICE MODE: REINVENTING CONVERSATIONAL AI
A standout feature of GPT-4o is its advanced voice mode. This allows the AI to engage in human-like conversations, complete with intonations, emotional responses, and the ability to handle interruptions smoothly. Thanks to rapid expression speed and possibility to interrupt, interactions don’t feel robotic as they used to with voice AI. GPT-4o’s voice even has a good level of expressiveness. Users can interact with ChatGPT through voice commands, receiving responses that include emotional inflections and natural pauses.
In demonstrations, OpenAI showcased the voice mode’s versatility, from reading users’ emotions via smartphone cameras to guiding them through breathing exercises, telling bedtime stories, and solving math problems. This ability to detect and respond to emotions adds depth to AI interactions, making them more engaging and supportive.
Evenso it creates a « wow » effect the first time you see demonstrations or use it, voice mode can still be improved. It was demonstrated participating in a group meeting. This is an excellent idea to summarize discussions or ask it to refocus the meeting when it drifts off-topic. While it responds well when participants interrupt it, it often feels the need to intervene, sometimes excessively, with lengthy inputs, similar to an overzealous colleague. It frequently interjects between human speakers without being prompted, making its contributions feel unnecessary at times
There also has been some controversy over the voice itself. Critics have noted its uncanny resemblance to the voice of Scarlett Johansson, who has expressed her dissatisfaction with this similarity.
MAC (APPLE) DESKTOP APP: QUICK AND SMOOTH ACCESS
Having a dedicated desktop app means users can access ChatGPT even more quickly and smoothly, without needing to open a browser. For instance, professionals working on projects can quickly get real-time assistance with research, calculations, or content generation directly from their desktops without breaking their workflow. On Mac, just click Command+Space for Spotlight and Option+Space for ChatGPT: a child’s play.
Currently, the app is available on Mac but not on Windows, which might seem surprising given Microsoft’s significant financial investment in OpenAI. However, this lack of favoritism can be explained by the high number of developers and early adopters using Macs. The closed environment of MacOS also makes it easier to develop a dedicated app.
WHAT ABOUT MICROSOFT?
Even though the ChatGPT desktop app shouldn’t be available on Microsoft before the end of the year, GPT-4o is integrated into its ecosystem with the Copilot feature. Newer models of its computers even include a dedicated Copilot button (even better than Mac’s Option+Space!). This feature, powered by ChatGPT-4o, offers similar functionalities to ChatGPT, including text generation, voice mode and being able to « see » the user’s screen to interact with it.
GOOGLE’S ANNOUNCEMENTS
OpenAI’s announcements were strategically timed just before Google’s conference, capturing attention and setting high expectations. By going first, OpenAI’s innovations seem particularly impressive, as they are not immediately compared to other announcements. This leaves the audience waiting to see if Google can match or surpass these advancements.
OpenAI is the disruptor, showcasing breakthroughs that can overshadow Google’s offerings. However, these companies have different objectives in the AI field. OpenAI focuses exclusively on AI development, while Google incorporates AI into its already widely used products and services. Google’s strength lies in this integration, enhancing tools that millions already rely on, like Gmail, with its 2 billion users, and Google Search, with 8 billion daily queries.
ACCESSIBILITY AND FREE USAGE
A significant shift with GPT-4o is its accessibility. Unlike previous versions limited to paid subscribers, GPT-4o is free for all users. This democratization aligns with OpenAI’s alleged mission to make powerful AI tools widely available. Free users have limitations on the number of queries they can make. For example, free users get 16 GPT-4o messages every 3 hours, Plus users get 80 and Teams users get 160.
REAL-WORLD APPLICATIONS AND FUTURE POTENTIAL
GPT-4o’s potential applications are vast. Beyond everyday tasks, the model can handle complex interactions like real-time translation, interactive storytelling, and live commentary for events like sports matches. OpenAI envisions a future where GPT-4o can watch a live video feed and provide real-time commentary, explaining events as they unfold.
The model’s advanced visual and audio processing abilities also open up new possibilities for accessibility tools. For example, GPT-4o could translate sign language into spoken words, facilitating communication for the hearing impaired.
Beyond everyday use, ChatGPT has limitless applications in businesses, and these new features expand the possibilities even further. For instance, GPT-4o’s advanced capabilities can enrich customer service, streamline operations, and improve decision-making processes. United4 specializes in analyzing business strategies to help companies identify use cases and implement AI solutions effectively. By leveraging these innovations, businesses can stay ahead of the curve and drive significant value.
LOOKING AHEAD: GPT-5 AND BEYOND
While GPT-4o represents a significant milestone in Open AI’s development, the journey doesn’t end here. The AI community is already buzzing with anticipation for GPT-5, which promises even more advanced capabilities. The fact that there are so many rumors and expectations, whether true or not, highlights the rapid pace of AI innovation and the genuine excitement surrounding it.
BALANCING INNOVATION AND SAFETY
Security concerns have always been an issue within the AI sector. Several key figures have left OpenAI, including Chief Scientist Ilya Sutskever, who announced his departure on May 15 on X (Twitter). These departures, particularly from the “safety” team, raise questions about the company’s approach to security and the balance between innovation and safety measures. The ongoing debate within the AI community about how to ensure the responsible use of AI technologies while continuing to advance the field will undoubtedly continue.
CONCLUSION
Like a lot of AI announcements, GPT-4o has a “wow” effect. Watching the demonstration videos or interacting with the tool, you are bound to be impressed and eager to share your experiences with colleagues and friends. Whether it’s the natural conversations, the seamless integration of voice and vision, or the noticeably improved accuracy over GPT-4, GPT-4o is quite impressive.
One piece of advice: try it!
Read more