Chinese tech giant Baidu recently showed off its AI capabilities with the unveiling of a newly “completed” ink painting by Chinese painter Lu Xiaoman (1903 – 1965), which was finished by the firm’s deep learning-based art generation platform. 

As part of the presentation of the artwork, Baidu held a roundtable discussion with local auction house Duo Yun Xuan on Nov. 16 in Shanghai. The two partnered on the completion of Lu’s work, which the beloved 20th-century cultural figure had left unfinished.

Baidu, AI, Wenxin Yige, art generation
Lu Xiaoman’s original unfinished work (middle), Human artist Le Zhenwen’s interpretation (left) ,and Baidu AI’s interpretation (right). Credit: Baidu

This discussion presented two attempts to complete Lu’s original unfinished work: one is from famous Chinese artist Le Zhenwen, and the other is from Baidu Wenxin Yige, an art generation platform developed on Baidu’s deep-learning framework PaddlePaddle. The intention is to offer a comparison between the AI interpretation of the work and that of a human artist.

According to Baidu, its version of the work went through four phases: AI learning, AI painting, AI coloring, and theme poem composition. During the process, Baidu partnered with Duo Yun Xuan to collect public ink paintings to train models and reach a better outcome.

Baidu, AI, Wenxin Yige, art generation
The generation process. Credit: Baidu

The twin artworks will be sold on Dec. 8 at Duo Yun Xuan’s 30th-anniversary auction event.

Below are comments on the project from Xiao Xinyan, chief architect of Baidu Wenxin Yige. His words have been translated, edited, and condensed for clarity. 

How does AI generate such artwork?

In short, AI will shuffle and compose the concepts and datasets it has learned previously, which is somewhat of a knowledge presentation.

From a technical point of view, AI learns before it paints, just as human beings do. It is trained from a vast amount of data in image-text matches. Every painting has a text description. Al can learn the association between languages and images, as well as multiple corresponding concepts related to the images. 

For instance, the concept of mountains could have a wide variety of image styles. So then how do people use AI to paint? They need to provide it with a text description, such as “a pine tree on a mountain.” AI will call on its learned experience and knowledge to generate a vague initial version randomly and then modify and perfect it continuously. There could be hundreds of rounds in the modification process, with the overall outline becoming clearer and clearer during the process, enriching the details. The work will be finally completed when it meets people’s esthetic requirements.

How is Baidu exploring art generation tech?

We [Baidu] adopt self-developed technology. There are two main points to our AI painting tech. Firstly, the image quality is high and looks delicate. We utilize a powerful diffusion model, which is a major technical innovation. Via multimodality of text and image, we can [give AI] a deep understanding, enabling it to create delicate artworks. 

Also, we have a better understanding of Chinese culture, and we will build a relevant dataset to feed it for generations in such a style. For the training datasets, we also developed algorithms to evaluate the aesthetics to ensure they meet the criteria. 

And considering users’ descriptions can be inaccurate, we enhanced the inputs system via a knowledge graph to provide related keywords for a better user experience.

So far, the feedback from users is quite positive; the platform has greatly improved their efficiency. For most casual users, they find the AI generator quite helpful. Looking ahead, we plan to explore a wider range of usage scenarios, for example using AI to assist children to practice painting. 

What is the position of human beings in AI art generation?

The human being is of great importance in AI-driven paintings. In my opinion, human is the mentor of AI. We need to develop the neural network of the AI painting model: there are different models with various effects [and we need to choose ideal ones from them].

The human also has to feed AI some material to learn and determine how the AI should be trained. For example, Baidu Wenxin Yige was fed with traditional Chinese elements and cultural data to have a better understanding of this genre. 

[The platform] can generate an image within minutes. On the first version of the piece drafted by Lu Xiaoman, the Baidu team consulted artist Le for advice. He then provided more training samples for a better outcome.

At the very beginning, AI needs people to teach it to generate the image: what content should appear in the picture and what styles should be presented.

Humans are also the ones to make a final decision despite the machine having an automatic algorithm to tell if the generated work is good enough because AI is not as accurate as human beings in this case.

Ward Zhou is a tech reporter based in Shanghai. He covers stories about industry of digital content, hardware, and anything geek. Reach him via ward.zhou[a] or Twitter @zhounanyu.