OpenAI has introduced new multimodal capabilities for ChatGPT, enabling voice conversations and image interactions with the model in real-time. Initially available for Plus and Enterprise subscribers, these features will later become accessible to free users.
OpenAI brings voice and image prompts to ChatGPT
Voice input, akin to mobile voice assistants like Siri and Google Assistant, allows users to speak their queries, with ChatGPT converting speech to text, processing it, and responding vocally. This functionality is available on both iOS and Android, offering diverse applications.
The image input feature allows users to convey questions using images, and ChatGPT analyzes these images to provide relevant responses. Drawing tools can be used to highlight specific image areas or clarify queries through text or voice, enhancing user experiences and aiding tasks like bicycle repairs or cooking.
ChatGPT can now see, hear, and speak. Rolling out over next two weeks, Plus users will be able to have voice conversations with ChatGPT (iOS & Android) and to include images in conversations (all platforms). https://t.co/uNZjgbR5Bm pic.twitter.com/paG0hMshXb— OpenAI (@OpenAI) September 25, 2023
While these additions expand interaction possibilities, OpenAI is mindful of potential misuse and is taking precautions to prevent unethical use.
To use voice with ChatGPT, users can go to Settings in the mobile app, access New Features, opt into voice conversations, tap the headphone icon on the home screen, and select a preferred voice from five options.
For image prompts, users can tap the plus button, capture or select an image, and use drawing tools to guide the assistant.
OpenAI has also launched DALL-E 3, an upgraded AI art tool integrated with ChatGPT. This integration facilitates creating detailed prompts and improves performance with complex commands, addressing issues like generating realistic human hands.