Main menu


ChatGPT Expands Capabilities to Include Voice Conversations and Image-Based Queries


ChatGPT is set to receive significant updates that will empower the AI chat program to handle voice commands and image-based queries. These enhancements will enable users on both Android and iOS platforms to engage in voice conversations with ChatGPT and input images across various major platforms.

Initially, these new features will be accessible to Plus and Enterprise users, with the possibility of extending access to image-based features to others in the future.

To take advantage of these capabilities, users will need to subscribe to voice conversations in the ChatGPT app by navigating to the settings and selecting the new features option. By clicking the microphone icon, users can choose from a selection of five distinct voices.

OpenAI highlights that voice conversations are powered by a novel model capable of generating human-like voices from text within seconds of sampling speech. These five voices were developed with the assistance of professional actors.

On the other hand, OpenAI's speech recognition system, Whisper, transforms spoken words from users into text.

OpenAI also mentions various practical applications for these features. For example, users can show an image, such as a malfunctioning grill, to the AI chat program and inquire about the issue. Alternatively, they can seek assistance in planning a meal based on a photo of their refrigerator contents or even request help with solving a math problem.

OpenAI leverages both GPT-3.5 and GPT-4 to power image recognition features. To utilize ChatGPT's image-based functions, users can click the image icon. For iOS and Android users, tapping the collect button is required to capture a new image or select an existing one from their device. Users can also inquire about multiple images and use a drawing tool to highlight specific parts of the image.

In an announcement blog post, OpenAI acknowledges the potential for misuse, such as bad actors mimicking the voices of public figures or ordinary individuals for fraudulent activities. To address this concern, OpenAI is primarily focusing on voice conversations with this technology and collaborating with selected partners for limited use cases.

Regarding images, OpenAI has collaborated with Be My Eyes, a free app that assists blind and visually impaired individuals by connecting them with volunteers through video calls to help them understand their surroundings.

OpenAI also emphasizes the importance of ChatGPT respecting individuals' privacy while analyzing and providing data about people appearing in images, recognizing that ChatGPT's accuracy is not always guaranteed.

The company has released a research paper detailing safety properties for the image-based function, which they have named GPT-4 Vision.

However, it's worth noting that ChatGPT is currently more proficient at understanding English text in images compared to other languages. OpenAI acknowledges that the chatbot's performance in languages using non-Roman scripts is currently rated as "poor," and non-English-speaking users are advised to refrain from using ChatGPT for text in images at this time.