DALL·E 2: the AI ​​that generates images (and not just landscapes) from text

The Open AI company has given birth to a new kind of “artist”. DALL·E 2, as it is called, is an artificial intelligence (AI) capable of converting almost any requested situation into an image, from a simple textual description.

DALL E 2 is a new AI system capable of creating realistic images and art from natural language description “, can we read on the project website as a summary. It is indeed a system capable of putting, literally, “words into images”. It is therefore enough to describe the image that one wishes to obtain so that the AI ​​composes in its own way something that corresponds to it.

Recently, GauGAN2, another rather similar AI, had also caught the eye. The editorial staff of Trust My Science then tested this tool, which makes it possible to generate landscapes from words. If DALL·E 2 stands out from this cousin, it is in particular because it is not limited to landscapes. Nor even, moreover, to images close to what can be found in reality.

That’s how Aris Konstantinidis, an engineer for Open AI, was able to generate these amazing images of pandas speeding through the desert, their eyes covered in adorable vintage pilot goggles. Among the images put forward to promote AI, there is also a koala playing basketball, or an astronaut on horseback.

DALL·E 2 can, according to its creators, combine concepts, attributes and styles, to create images as close as possible to the proposed text. Mira Murati, another employee of the company, was able to obtain the following image for the request: 35mm macro photograph of a large family of mice wearing hats sitting comfortably by the fireside » :

It is also possible for the AI ​​to edit existing images. All you have to do is select the zone to be modified and indicate what must be removed or added. It can modify on the fly the composition, the shadows, the textures… This function is an addition compared to the first version of DALL·E, released in January 2021. This big brother who started talking about him in January 2022 ” generates more realistic and accurate images with 4x higher resolution “, according to its designers.

Ingest images and texts

To deploy all this creativity, DALL E 2 learned the relationship between images and the text used to describe them “explains Open AI. As is often the case, what we put here under the rather broad term “artificial intelligence” could in fact be called more precisely “machine learning”. To “learn”, DALL·E uses what is called a neural network.

A neural network is so named because it is based on a system inspired by the functioning of biological neurons, which then moved closer to statistical methods. Concretely, artificial intelligence “feeds” on a large amount of data to extract logical connections, and process them for a result. The food for this AI was therefore composed of a huge amount of images, associated with text labels. The company’s researchers detail this process in their research work.

This AI also uses a process called “diffusion”. The idea is to start from a pattern of random dots and gradually change this pattern into an image as specific aspects are recognized. Of course, as the company points out, all that great creativity can be easily undermined if mislabeled images are injected into the system. Like a child learning the wrong word for an object.

The company also highlights possible flaws in the use of DALL·E 2: “ Without sufficient safeguards, models like DALL·E 2 could be used to generate a wide range of misleading and otherwise harmful content, and could affect how people perceive the authenticity of content more generally. DALL E 2 additionally inherits various biases from its training data, and its outputs sometimes reinforce societal stereotypes “. For the moment, access to the tool is therefore limited, and you have to register on a waiting list to hope to test it.

