Imagen, from Google, is the latest example of an AI seemingly capable of producing high-quality images from a text prompt, but they’re not quite ready to replace human illustrators yet.
May 26, 2022
Tech companies race to create artificial intelligence algorithms that can produce high-quality images from text prompts, with technology appearing to be advancing so quickly that some predict that human illustrators and stock photographers will soon be out of work. In reality, the limitations of these AI systems mean that it will likely be some time before they can be used by the general public.
Both models use a neural network that has been trained on a large number of examples to categorize how images relate to text descriptions. When a new text description is given, the neural network repeatedly generates images and changes them until they most closely match the text based on what it has learned.
While the images presented by both companies are impressive, researchers have questioned whether the results are picked to show the systems in the best light. “You have to present your best results,” says Hossein Malekmohamadi at De Montfort University in the UK.
One problem with reviewing these AI creations is that both companies have refused to release public demos for researchers and others to test them. Part of the reason for this is the fear that the AI could be used to create misleading images, or simply that it could produce harmful results.
The models are based on datasets from large, unmoderated parts of the Internet, such as the LAION-400M dataset, which Google says contains “pornographic images, racist statements and harmful social stereotypes.” The researchers behind Imagen say that because they can’t guarantee it won’t inherit some of this problematic content, they can’t release it to the public.
OpenAI claims to improve the “security system” of the DALL-E 2 by “refining its text filters and tuning its automated detection and response system for content policy violations,” while Google tries to address the challenges by developing a “vocabulary of potential harm.” Neither firm was able to new scientist before publication of this article.
Unless these problems can be solved, it seems unlikely that major research teams like Google or OpenAI will make their text-to-image systems available for general use. Smaller teams may choose to release similar technology, but the sheer amount of computing power required to train these models on massive data sets tends to limit work on them to big players.
Despite this, friendly competition between the big companies is likely to keep the technology moving at a rapid pace, as tools developed by one group can be incorporated into another group’s future model. Diffusion models, in which neural networks learn how to reverse the process of adding random pixels to an image in an effort to improve them, have shown promise in machine learning models over the past year. Both DALL-E 2 and Imagen rely on diffusion models, after the technique proved effective in less powerful models, such as OpenAI’s Glide image generator.
“For these kinds of algorithms, if you have a very strong competitor, it means it helps you build your model better than the others,” Malekmohamadi says. “For example, Google has multiple teams that match the same type [AI] platform.”
More on these topics:
#texttoimage #generators #put #illustrators #work