CFan Academy of Sciences: AI self-portrait unveils the tip of the iceberg of intelligent creativity

A foreign study concluded that 47 percent of existing jobs in the United States will be at risk of being replaced by AI (artificial intelligence) and automation in the coming decades. Although many experts have questioned this algorithm, the threat of AI is indeed getting closer. What we are talking about now is that manual or repetitive jobs may be taken by AI, but creative jobs are relatively safe. Is this really the case? In fact, AI has long been interested in human creative work, and has made great strides.

AI self-portraits are amazing

The editors of the New York Times approached IBM Research to draw a picture by way of AI for the cover of an AI album. The request seemed simple, but in fact concealed a big challenge. Already existing AI technologies, such as intelligent driving of cars, translation, playing games, and even making trailers for movies, among others, are tasks that do not require AI to create new material, but simply to analyze the existing information at hand and make choices based on training. But now the AI has to go beyond the existing information to create a new work of art in a “self-thinking” way, which is difficult to imagine. In the end, IBM’s AI turned in a surprisingly good job (Figure 1).


Figure 1 AI self-portrait – AI and human creativity go hand in hand (Source: IBM Research)

Refining the theme of the work like humans

The above AI work is produced, and its algorithm can be divided into three major parts, somewhat similar to the creative process of an artist, which we will explain one by one below.

The first is to determine the core concept of the work, which is the theme of the work. The theme, as the core idea of the artwork, in previous AI creations (such as writing poems), the keywords were basically specified by people in advance, and the AI was passive. For this AI creation, however, the researchers decided to let the AI determine it on its own.

The researchers first assembled about 3,000 AI-related articles from the New York Times, and then analyzed them through natural language processing software to identify semantic concepts that are highly relevant to AI, such as “robotics,” “self-driving,” and “computing,” etc. A total of 30 items were selected.

Because these words do not contain the word “AI”, it is impossible for the software to extract them through a simple keyword search, and the software can only accurately complete the screening task if it “understands” the meaning of the words intelligently, just like a human being. Natural language processing technology plays an important role (Figure 2).


Figure 2 One of the implementation models for natural language understanding (to understand language, the computer has to understand the world)

Textual concepts need to be graphically represented, and the AI trained a neural network for visual recognition (Figure 3) with the 30 concepts filtered out above in order to select all the images containing the AI’s meaning from the New York Times’ diverse and complex article images and to rate the images in terms of closeness to the AI. Of the top 10 rated images, project participants eventually selected an image of a human and a robot shaking hands (which still does not completely exclude human intervention and requires working hand-in-hand with humans).


Figure 3 One of the neural network models for visual recognition

Original images starting from learning imitation

After the theme concept is determined, the AI then enters the creation process. Most human painters go through a process of learning to imitate when they start to paint, and so does AI painting, but in a more unique way. The researchers first collected more than 1,000 images of robots and human hands to serve as the AI’s learning training data set (Figure 4).


Figure 4 Example of robot and human hand pictures

There are two broad categories of learning models for AI, the Discriminative Model and the Generative Model. Suppose a picture is given to determine what animal is inside the picture, a cat or a dog, this is the Discriminative Model. Suppose we are given a set of multiple cats and a new cat is generated that is not in this set, then this is a generative model.

The AI described in this paper uses a deep learning model, Generative Adversarial Networks (GAN), which consists of both a discriminative model (D) and a generative model (G). D constantly identifies whether the pictures generated by G are real or not, and excludes the fake ones. The end result of this process of constant gaming between these two learning modules is to produce a picture that is real enough to hold hands and is a new work that is different from the graphs in the original dataset (Figure 5).


Fig. 5 Working principle model of generative adversarial network

The icing on the cake of the work packaging

Since the work is customized for a specific newspaper, of course it has to match the consistent style of the original newspaper as much as possible. This is no longer a difficult task for the AI. The previous cover images of the New York Times were assembled into a dataset, and a style conversion neural network was trained to automatically convert the above newly created image of a man holding hands with a machine to generate multiple pieces of artwork, from which the most satisfactory ones were selected.

More extended applications

The unique feature of IBM’s AI self-portrait technology is that the AI can automatically determine the concept of the theme, and then generate a new work of art from this concept, and can also be customized according to the requirements of different art styles. This can be applied to many fields. For example, writers can use it to design the cover of a new book, film and television can use it to design posters, musicians can use it to set album covers, etc.

Leave a Comment