OpenAI GPT Image 1.5: Enhanced Editing & Lower API Costs

OpenAI has released GPT Image 1.5, its new image generation model, which introduces precise editing capabilities. The model allows users to modify specific areas of an image while preserving other elements. This update also brings improvements in instruction following, detail preservation, and processing speed, with the model operating four times faster than its predecessor, GPT Image 1.0.

The API price for GPT Image 1.5 has been reduced by over 20%. The model is now available to all ChatGPT users.

Precise Editing and Creative Transformations

A core feature of GPT Image 1.5 is its precise editing functionality. Users can upload an image and request specific modifications, with the model altering only the designated parts. This maintains consistency in elements such as lighting, composition, and character appearance across multiple editing iterations.

Official examples demonstrate the model's proficiency in various editing operations, including adding, deleting, compositing, blending, and transplanting elements. One example showed a multi-step editing process starting with three input images—two men and a dog—composited into a single scene. Subsequent steps involved adding background elements, blending different artistic styles (e.g., anime, plush toy, real person) within the same image, changing outfits, and finally isolating one subject into a new context. Throughout these changes, the core elements of the subjects remained consistent.

The model also supports creative transformations. In one test, a photograph of Sam Altman was used, with the background changed to a Chengdu street night scene and the subject's style transformed into a plush toy, while maintaining the original pose and expression.

Instruction Following and Text Rendering

GPT Image 1.5 exhibits stronger instruction following compared to previous versions. A test involving the generation of a 6x6 grid with specific items in each cell showed that the new model accurately placed most items, unlike the older version which miscounted cells and misplaced elements. This capability is particularly relevant for applications requiring precise execution of complex instructions, such as infographics, product catalogs, and educational materials.

The model has also advanced in text rendering, handling denser and smaller text with improved clarity. An example demonstrated the model rendering Markdown content into a newspaper layout, where tables, titles, and body text were clearly presented. This marks a significant improvement over previous image models, which often blurred extensive text.

Quality Improvements and Limitations

Other quality enhancements in GPT Image 1.5 include more natural rendering of multiple faces and overall image realism. A comparison of 1970s London street scenes generated by the new and old versions showed that the new model produced more natural faces and a more accurate sense of the era.

Despite these advancements, certain limitations persist. The model still faces challenges with style consistency, occasionally struggles with generating multiple faces simultaneously, and its rendering quality for non-English text requires further improvement. For instance, an attempt to render product packaging with Chinese text resulted in blurred characters.

API Access and Pricing

The GPT Image 1.5 API offers the same capabilities as the ChatGPT Images feature. Detailed model information is available on the OpenAI platform documentation.

The cost for image input and output with GPT Image 1.5 has been reduced by 20% compared to the previous generation. Pricing varies by quality and resolution:

The "High" quality mode of GPT Image 1.5 is priced similarly to Google's 2K resolution offerings, which cost $0.139 for 2048x2048 images.

GPT Image 1.5 is being rolled out to all ChatGPT users and API users globally without special selection requirements. Users can access the model through ChatGPT's image generation interface and the OpenAI Playground. A prompting guide is also available for developers.