Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Stroke-GAN Painter: Learning To Paint Artworks Using Stroke-Style Generative Adversarial Networks

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Computational Visual Media

https://doi.org/10.1007/s41095-022-0287-3 Vol. 9, No. 4, December 2023, 787–806

Research Article

Stroke-GAN Painter: Learning to paint artworks using stroke-


style generative adversarial networks

Qian Wang1,2 , Cai Guo1,3 , Hong-Ning Dai4 ( ), and Ping Li2,5 ( )

c The Author(s) 2023.

Abstract It is a challenging task to teach machines to 1 Introduction


paint like human artists in a stroke-by-stroke fashion.
Painting, as an important visual art, symbolizes
Despite advances in stroke-based image rendering and
human imagination and ingenuity. Human artists
deep learning-based image rendering, existing painting
have used a variety of painting tools to create
methods have limitations: they (i) lack flexibility to
their artworks with specific characteristic styles.
choose different art-style strokes, (ii) lose content
details of images, and (iii) generate few artistic styles However, it is time-consuming for people to master
for paintings. In this paper, we propose a stroke-style painting skills, requiring much learning, imitating,
generative adversarial network, called Stroke-GAN, to and practising. Recent computer-aided painting
solve the first two limitations. Stroke-GAN learns styles methods generate non-photorealistic images similar
of strokes from different stroke-style datasets, so can to paintings, thereby offering effective painting-
produce diverse stroke styles. We design three players assistants for human painting learners, but it is
in Stroke-GAN to generate pure-color strokes close to still a challenging task to teach machines to paint
human artists’ strokes, thereby improving the quality artworks based on given images like human artists.
of painted details. To overcome the third limitation, Unlike directly generating a style-transfer image or
we have devised a neural network named Stroke-GAN photographic image [1–4], machine painting is carried
Painter, based on Stroke-GAN; it can generate different out by a machine or computer in a stroke-by-stroke
artistic styles of paintings. Experiments demonstrate manner. The key to teaching a machine to mimic
that our artful painter can generate various styles of human artists lies in addressing the following three
paintings while well-preserving content details (such challenges: (i) painting artistic strokes on the canvas
as details of human faces and building textures) and in a human-painting order, starting from a given input
retaining high fidelity to the input images.
image, (ii) generating artistic strokes with textures
Keywords AI painting; painting strokes; artistic style like human artists’ strokes, and (iii) preserving
detailed contents of a given image while creating
a painting instead of reconstructing a photorealistic
image.
1 School of Computer Science and Engineering, Macau
University of Science and Technology, Macau, China.
Some conventional methods include stroke-based
E-mail: anrogim@outlook.com. rendering (SBR) methods [5–7], which have made
2 Department of Computing, The Hong Kong Polytechnic contributions to stroke modeling. The quality of such
University, Hong Kong, China. stroke textures is good and mimics human strokes.
3 School of Computing and Information Engineering, However, these methods achieve a semi-automatic
Hanshan Normal University, Chaozhou, China. E-mail: painting process which needs substantial user
c.guo@hstc.edu.cn.
intervention. Furthermore, this is time-consuming and
4 Department of Computer Science, Hong Kong Baptist
University, Hong Kong, China. E-mail: hndai@ieee.org ( ).
requires considerable painting skills of the user, and
5 School of Design, The Hong Kong Polytechnic University, moreover, these SBR models have a limited number
Hong Kong, China. E-mail: p.li@polyu.edu.hk ( ). of painting styles. Unlike conventional SBR models,
Manuscript received: 2022-02-14; accepted: 2022-04-12 learning-based methods have flexible frameworks

787
788 Q. Wang, C. Guo, H.-N. Dai, et al.

which can adapt to diverse artistic styles. In addition, to-fine manner, as shown in Fig. 1. In particular,
they can create paintings without user intervention. the painting quality becomes better by repeating
Recently, researchers have typically used recurrent multiple learning-to-paint processes, during which it
neural networks (RNNs) [8, 9] and reinforcement proceeds from being a novice to being a veteran. In
learning (RL) models [10–12] to generate stroke-by- contrast to existing methods, such as sketching [8, 14],
stroke artworks. However, such unified frameworks doodling [10], Neural Painter (NP) [15], and MDRL
lack flexibility to choose different styles of strokes and Painter (MDRLP) [12], our painter can generate
some paintings generated in particular artistic styles diverse artistic styles of painting with different types
(e.g., pastel-like paintings) lack fine details. of strokes. Moreover, the images generated by our
To address limitations of existing painting methods, painter also preserve key content details (such as
we have built a new model leveraging advantages face details in portraits) well, as shown in Fig. 1. In
of both conventional SBR methods and learning- summary, our work makes the following three main
based methods, as an extension of Ref. [13]. We contributions.
first describe a novel stroke generative adversarial • We propose a three-player-game model, Stroke-
network (Stroke-GAN) which learns different stroke GAN, to generate strokes in an artistic style,
styles from stroke-style datasets and generates diverse which are fully adjustable in terms of stroke
stroke styles with adjustable parameters (stroke path, path, stroke size and shape, stroke color and
stroke size and shape, stroke color and transparency). transparency, thereby greatly improving the
Based on Stroke-GAN, we then describe a neural- stylization of generated paintings. Stroke-GAN
network painter which learns to create different styles has two generators and one discriminator; the
of paintings in a stroke-by-stroke paradigm. We call second generator learns to purify the stained-
the entire framework Stroke-GAN Painter. Stroke- color strokes generated by the first generator.
GAN Painter learns to generate a painting in a coarse- Consequently, the generated strokes have pure

Fig. 1 Learning-to-paint process of Stroke-GAN Painter. Rightmost column: input. Top to bottom: oil painting, watercolor painting, pastel
painting. N : number of painting times, rather than number of strokes.
Stroke-GAN Painter: Learning to paint artworks using stroke-style generative adversarial networks 789

colors and textures closer to human artists’ However, most of these methods require substantial
strokes. human intervention to choose key parameters, and are
• We design a painter based on Stroke-GAN; it does thus inconvenient for ordinary users. Moreover, SBR
not need to be trained on any image datasets. methods only generate a limited number of artistic
It learns to create different artistic styles of styles [20], consequently leading to inflexibility.
paintings based on stylized strokes, e.g., oil- 2.2 Learning-based rendering
painting-like artworks, watercolor-like artworks,
pastel-like artworks, in a unified framework. Recently, researchers have adopted learning-based
Our generated paintings preserve content details methods to improve the painting effects compared
of reference images well while offering stylistic to traditional SBR methods. SPIRAL [10, 21]
diversity of paintings. develops an adversarially-trained deep reinforcement
• Experimental results show that our painter learning (DRL) agent which learns structures of
generates diverse artistic styles of paintings images. However, it does not reconstruct details
for various types of the image-content, e.g., of human portraits well. Sketch-RNN [8] constructs
portraits, landscapes, animals, plants, and stroke-based drawings after training on human-drawn
buildings. Paintings generated by Stroke-GAN image datasets; it achieves excellent results for
Painter preserve content details well, such as eyes common objects, although it only makes simple-line
and teeth of portraits and fine details of buildings. paintings. StrokeNet [9] trains an agent to learn to
User studies with a variety of participants paint based on a differentiable render and recurrent
demonstrate that paintings generated by Stroke- neural network (RNN); it generalizes poorly on color
GAN Painter are preferred 77% of the time images. Computers are now able to generate quite
for pastel paintings, in comparison to another realistic oil paintings [12, 22, 23], and pastel-like
method, and 31% of the time for oil paintings, in paintings [15]. MDRLP [12] paints oil-painting-like
comparison to three other methods, in terms of pictures with a small number of strokes, although
fidelity and stylization. it only mimics one style in a unified framework and
loses brush-stroke textures. Methods such as those in
Refs. [22, 23] improve stroke textures by redesigning
2 Related work their stroke rendering. NP [15] has a similar design
We briefly survey closely related studies on to our method since both generate strokes using a
machine painting. We classify related studies into GAN-based module. However, our approach differs
conventional stroke-based rendering (SBR), learning- from theirs in several ways. Firstly, we use a three-
based rendering, and image-style transfer (IST). player GAN model to generate adjustable strokes
while NP only uses a normal GAN to generate fixed
2.1 Conventional stroke-based rendering
strokes. Secondly, our method can learn from any
SBR methods essentially reconstruct images as non- stroke datasets while NP must use strokes produced
photorealistic imagery using stroke-based models. by the MyPaint program as the stroke dataset.
Researchers have adopted SBR methods to different Thirdly, NP requires a massive manually-labelled
types of artworks, e.g., paintings [5–7], pen-and-ink stroke dataset to generate one-by-one action strokes.
drawings [16, 17], and stippled drawings [18, 19]. Their model requires the same strokes as the stroke
In particular, Ref. [5] introduces a semi-automatic dataset provided by MyPaint to ensure that the
painting method based on a greedy algorithm; it optimized strokes are those needed by the painting
needs substantial human intervention to control stroke model. Our model has no data-labelling owing to that it
shapes and select stroke locations. The authors of can paint well by using the three-player Stroke-GAN to
Ref. [6] propose a style design for their painting generate strokes similar to those in the stroke dataset.
method by using spline-brush strokes to render the
2.3 Image style transfer
image, but this method requires a high degree of
painting skill of its users. The work in Ref. [7] Image style transfer (IST) methods are popular in
proposes a method to segment an image into areas both research projects and industrial applications
with similar levels of salience to control the strokes. [24–28], although few methods have been applied to
790 Q. Wang, C. Guo, H.-N. Dai, et al.

stroke-based image rendering. PaintBot [29] based on


a DRL network recreates the target image in a stroke-
by-stroke manner while the painting style is only
restricted to the style of the reference image. Neural
Painter [15] uses a method to generate style strokes
to recreate the target image without style reference
images. However, it requires a large manually-labelled
stroke dataset and also lacks flexibility to choose
different styles of strokes. Moreover, the generated
paintings lack fine details, e.g., in the human face
and textures of buildings.
In this paper, we propose a new learning-based Fig. 2 The network architecture of Stroke-GAN Painter mainly
method, Stroke-GAN Painter. It integrates the comprises Stroke-GAN, the rendering module, and the canvas.
(a) shows style strokes generated by Stroke-GAN. (b) shows states of
advantages of SBR methods with learning-based the canvas during the learning-to-paint process.
methods to generate diverse styles of paintings with
high quality. Specifically, Stroke-GAN can generate to create paintings from novice to veteran ability
different styles of strokes, and stroke designer modules and its painting quality improves with more painting
can generate different style stroke datasets for Stroke- time. Section 3.2 presents the stroke designer module
GAN. Meanwhile, we use a rendering module to to generate different styles of stroke datasets for
optimize stroke selection, to ensure Stroke-GAN training Stroke-GAN. Section 3.3 presents Stroke-
generates well-behaved strokes, which are rendered GAN, which feeds in values obtained from stroke
onto the canvas to create high-quality paintings. selection to generate style strokes, which are then
painted on the canvas Cn . The rendering module
3 Stroke-GAN Painter feeds in both U0 and Cn to optimize stroke selection,
thereby finishing the painting process. Section 3.4
3.1 Overview
describes the painting process and stroke selection
We propose a new painting model, Stroke-GAN optimization.
Painter, to achieve stroke-by-stroke painting for
3.2 Stroke designer module
machines or computers. The goal of Stroke-GAN
Painter is to paint in diverse artistic styles of 3.2.1 Stroke modeling
paintings in a unified framework. We mainly consider The artistic style of a painting can be affected
oil paintings, watercolor paintings, and pastel by the stroke style. The stroke designer module
paintings here, although further painting styles could is used to create different styles of strokes to
also be easily added to our framework. When given an provide training datasets for Stroke-GAN. Inspired by
image, our model can continually render style-strokes previous studies [12, 15], our stroke designer modules
onto the canvas to create different artistic styles of consider the following variables: P denotes a set of
paintings by choosing the style of the strokes. Figure 2 control points of a Bézier curve, S denotes a set
depicts our proposed Stroke-GAN Painter which controlling size of a geometric shape, T denotes a set
consists of a stroke designer module, Stroke-GAN, to control stroke transparency, and V denotes a set
the rendering module, and the canvas. The stroke to control stroke color.
designer module provides different styles of stroke • Stroke path: We use different geometric shapes
datasets for training Stroke-GAN, which generates to represent the brush tip and Bézier curves to
style strokes (see Fig. 2(a)). The rendering module represent the path of a brush. The points in
feeds in both the reference image U0 and canvas Cn P = {(xi , yi )|i = 0, . . . , d} control a Bézier curve,
to optimize stroke selection, which controls Stroke- where d is the order of the Bézier curve. See
GAN to generate well-behaved strokes. The states Eq. (1) later.
of the canvas during the learning-to-paint process • Stroke size and shape: We use S = {s0 , s1 } to
are shown in Fig. 2(b). Stroke-GAN Painter learns define the size of a geometric shape to control the
Stroke-GAN Painter: Learning to paint artworks using stroke-style generative adversarial networks 791

stroke size and shape. The size of the brush tip since the model is designed to learn to paint from
varies with the values of S. novice to veteran ability.
• Stroke transparency: We use T = {t0 , t1 } to 3.3.2 Motivation for Stroke-GAN
control the transparency of the stroke. The Our design for Stroke-GAN follows the Deep
transparency of the stroke varies with the values Convolutional Generative Adversarial Network
of T. (DCGAN) [31], which offers more stable training than
• Color: Primary colors denoted by V = {r, g, b} conventional GANs. However, the strokes generated
determine the color of the stroke. by a normal DCGAN (i.e., a two-player game) have
3.2.2 Stroke datasets stained colors. It is unreasonable to try to reproduce
The stroke designer module provides different stroke- the painting process of human artists by rendering
style datasets for training Stroke-GAN. Each stroke strokes in stained colors. To address this problem,
dataset includes 200,000 stroke images, each with we design a second generator (the third player) to re-
64 × 64 resolution. We currently provide stroke color strokes with three color parameters to generate
datasets in three diverse styles, although others could pure-colored strokes, which are better for painting
be added: a watercolor-stroke dataset, an oil-painting- (closer to the strokes painted by human artists).
stroke dataset, and a pastel-stroke dataset. The 3.3.3 Three-player-game of Stroke-GAN
pastel-stroke dataset is from Ref. [15] while both oil- In order to obtain pure-colored strokes for reasonable
painting-stroke dataset and watercolor-stroke dataset painting strokes, we provide a coloring module as
are provided by our stroke designer modules. We a second generator G0 immediately following the
model the stroke as its path and tip, using a Bézier normal generator G (being essentially a convolutional
curve (BC) to construct the stroke path. Circles neural network), as shown in Fig. 3. Stroke-GAN
with variable radii are used to simulate the stroke tip. consists of a normal generator G, a coloring module
Each stroke is made from 100 variable circles moving G0 , and a discriminator D. Let ht denote the one-
along BC, which is given by dimensional (1-D) vector used as the random noise
d
!
X d to generated strokes. Let hst and hct denote 1-D
B(t) = (1 − t)(d−i) ti Pi , t ∈ [0, 1] (1)
i=0
i vectors being fed into G and G0 , respectively, and
where Pi denotes the control point with coordinates ht = [hst , hct ]. After being fed with hst and hct ,
(xi , yi ) ∈ P. respectively, G learns to generate stroke images but
stained-colored, and G0 learns to generate pure-color
3.3 Stroke-GAN
strokes. The discriminator first determines whether
3.3.1 Basics a stroke generated by G is valid or not, and then
In order to improve the painting quality and the determines the stroke image generated by G0 . Both
fidelity of the stroke, we have designed a three- G and G0 update during the adversary mode with
player GAN model, Stroke-GAN, to generate stylized D. Since the randomness of DCGAN leads to the
strokes for the painting process. Stroke-GAN is the unexpected stroke color, we build a second generator
core component which allows our model to use a G0 (the coloring module) to control the stroke color.
unified framework to produce diverse artistic styles Particularly, the vector ht consists of 50 elements.
of paintings from a given image. Its third player,
the coloring module, allows preservation of high
fidelity details. Paintings contain several elements:
lines, textures, colors, and so on [30]. Our Stroke-
GAN is designed to generate strokes containing these
elements; the stroke style has a major influence on
the painting style. As Stroke-GAN has an end-to-
end training model, our painter framework has the
flexibility to change the artistic style by choosing
Fig. 3 Structure of Stroke-GAN. The stroke designer module provides
differently-trained Stroke-GAN models. We do not stroke datasets. Stroke-GAN consists of a normal generator G, a
need to train the whole painter on any image datasets coloring module G0 , and a discriminator D.
792 Q. Wang, C. Guo, H.-N. Dai, et al.

The normal generator (G) feeds the variable hst with allows Stroke-GAN to easily learn different styles
47 elements and outputs the stroke image G(hst ) while of strokes. Therefore, while Stroke-GAN utilizes a
hct is used to control the stroke color {r, g, b} fed into unified framework, it can generate different styles of
G0 . Stroke-GAN improves the original DCGAN to paintings.
purify the colors of strokes (see Fig. 3). The coloring
3.3.4 Training of Stroke-GAN
module feeds hct with the stroke G(hst ) to learn to
purify the stroke. We train Stroke-GAN to acquire different stroke
Since the stroke image just contains the background models to endow our painter the ability to paint
and the stroke, we can use threshold segmentation [32] with different stroke styles. Figure 3 depicts the
to separate the stroke region from the background. structure of Stroke-GAN. Stroke-GAN is trained with
Let P = [p1 , . . . , pc ] denote the matrix of pixels in 128 images as a mini-batch for each stroke dataset
the stroke image, where c is the number of channels containing 200,000 images. We use the whole dataset
of the image. The average of P is denoted by P . as the training set since the DCGAN model has no
Since the pixel values of the background in the stroke need for validation. When training Stroke-GAN, we
image generated by G are unknown, we should train directly feed the generator with a 1-D vector (ht )
G0 to find the threshold of the background pixels. to generate a stroke image. The initial values of
We denote the threshold by γ. We use threshold the elements in ht are random. During training,
segmentation to eliminate the background region from the generator learns to produce images close to the
the stroke image, and the result is denoted by Pe , dataset by optimizing the values of these elements.
which is obtained as Eq. (2): We use the Adam optimizer to train the Stroke-GAN
Pe = γ − P (2) model, with a learning rate of 0.0002, and values
of betas of 0.5 and 0.999. Each pair comprising a
The values of the elements in Pe are 0 or close to 0
generated stroke image and a real stroke image is
in the background region. We use min(·) and max(·)
then fed into the discriminator. The discriminator
to calculate the minimum and maximum values in
next determines whether the pair of strokes is valid
Pe , respectively. We determine the stroke region and
or not. If the generated stroke image is similar to
denote the results by SP , which is obtained by
the real stroke image, the pair is valid, and invalid
SP = (Pe − min(Pe ))/(max(Pe ) − min(Pe )) (3)
otherwise.
In particular, the values of the elements in SP close The training procedure for Stroke-GAN is given
to 1 are stroke pixels, and values close to 0 represent in Algorithm 1. We essentially train the normal
background pixels. We then use hct to recolor the generator G, the coloring module G0 , and the
stroke and obtain the pure stroke image PS as Eq. (4): discriminator D by back-propagating the loss, to
PS = [SP , SP , SP ] · hct (4) update the parameters θg , θc , and θd , respectively.
The coloring module endows our painter with In particular, the discriminator D has the loss `d , the
more creativity in painting, e.g., outputting paintings generator G has the loss `g , and the generator G0
with different colors even from the same input has the loss `c . We use binary cross entropy (BCE)
image. The reason is that the coloring module denoted by `(z, y) to measure the loss of the input
in Stroke-GAN takes in hct directly to recolor the sample z on conditional variable y, with the number
stroke image by learning the color information of of the samples of T , as Eq. (5):
the input. This is the reason that Stroke-GAN can T
1 X 
be trained without the data-labelling restriction of `(z, y) = − yt log(zt ) − (1 − yt ) log(1 − zt )
T t=1
generating an image the same as the labeled image.
Even though Stroke-GAN generates strokes different (5)
from the dataset, the coloring module learns color When training the discriminator on real stroke images
information by analyzing the input image directly so (denoted by x), y = 1, so using Eq. (5), the loss `dr
as to ensure that the rendered canvas is close to the for real strokes is

input image. The design of the coloring module plays `dr = ` D(x), 1 (6)
an important role in ensuring the GAN generates When training the discriminator on fake stroke images
realistic human strokes. Moreover, this design also generated by G, we then have y = 0, so the loss for
Stroke-GAN Painter: Learning to paint artworks using stroke-style generative adversarial networks 793

Algorithm 1 Training Stroke-GAN. Discriminator D, so


normal generator G, second generator G0 . The number T
1 Xh    
of iterations is m, and the mini-batch is k `d = − log D(xt ) − log 1 − D G(hst )
T t=1
1: for m do  i
2: for k do − log 1 − D G0 (G(hst ), hct ) (10)
3: Sample real data {x1 , . . . , xk } from stroke dataset;
4: Set label = 1; Similarly, the normal generator G has the loss:
T
5: Train D with the real-loss function using 1X  
BCELoss(D(x), label); `g = − log D(G hst ) (11)
T t=1
6: Sample fake data {g1 , . . . , gk } generated by using
random noise ht = [hst , hct ]; and the loss function of the coloring module is
Set label = 0; T
7: 1X  
8: Train D with the real loss added to the fake-loss `c = − log D G0 (G(hst ), hct ) (12)
T t=1
function using
where the strokes in G(hst ) are visually stained. The
   
BCELoss D(x), 1 + BCELoss D G(hst ) , label ;
9: Set label = 1; coloring module learns to compute the content of the
10: Sample fake data {g1 , . . . , gk } generated by using stroke region in G(hst ) and recolor the stroke by hct .
random noise hst ; Figure 4(a) compares sample strokes generated
11: Train G with
 the lossfunction  by Stroke-GAN with and without the coloring
BCELoss D G(hst ) , label ; module (CM) for various stroke datasets. Figure 4(b)
12: Sample fake data {g1 , . . . , gk } generated by using shows convergence of Stroke-GAN for different stroke
random noise hst ;
datasets. The stained-color strokes generated by
13: Input hct ;
14: Train G0 with
 the loss function: 
Stroke-GAN W/O CM cannot mimic human artists’
BCELoss D G0 (G(hst ), hct ) , label ; strokes. The pure-color strokes generated by Stroke-


15: end for GAN W CM are close to human artists’ strokes,


16: end for benefiting the quality of the content. Although both
Stroke-GAN W/O CM and Stroke-GAN W CM can
generate strokes similar to the given strokes, the
fake strokes, `da , is former generates parti-colored strokes while the latter
`da = ` D(G(hst )), 0

(7) generates pure-colored strokes, which are better as
When training the discriminator on fake stroke images they are closer to those of human artists. Stroke-GAN
generated by G0 , we again have y = 0, so the loss for can learn any styles of strokes given an appropriate
fake strokes `db , is training dataset. In this paper, we use three stoke
   datasets: watercolor, oil-painting, and pastel stroke
`db = ` D G0 (G(hst ), hct ) , 0 (8)
datasets, and save the trained models as Style 1,
The entire loss of the discriminator is Style 2, and Style 3, respectively. We then choose the
`d = `dr + `da + `db (9) corresponding model to give a certain artistic style.

Fig. 4 Training Stroke-GAN. (a) Stroke samples generated by two comparative methods: Stroke-GAN with coloring module (W CM) and
without coloring module (W/O CM). (b) Loss of Stroke-GAN versus iterations for three different stroke datasets.
794 Q. Wang, C. Guo, H.-N. Dai, et al.

In summary, Stroke-GAN has the following merits.


Firstly, it can recolor the stroke according to the
design of the coloring module, thereby improving the
artistic creativity of the painter. For example, the
color of the painting can be recreated close to but
not the same as the input reference image. Secondly,
it can flexibly learn any style of strokes as long as a
stroke dataset is available, owing to its completeness
and independence. Thirdly, it enables end-to-end
training so that it can be easily applied in a painting
models for various artistic styles, by choosing different
Fig. 5 Coarse-to-fine learning-to-paint process. The reference image
Stroke-GAN models. is U0 , while Cn denotes the current state of the canvas, and N is the
number of painting processes. The width and the height of the canvas
3.4 Rendering module
are w and h, respectively.
3.4.1 Basics
We endow our painter with the capability to paint in U0 . The content state of a certain canvas and the
diverse artistic styles. Besides using diverse stroke- set of stroke selections are denoted by Cn and Hn ,
styles generated by Stroke-GAN, we also consider respectively. The height h and the width w of the
the feature-extraction network (FEN) to extract canvas are automatically configured according to the
contents of reference images. After processing by aspect ratio of the reference image.
the FEN, the original reference image may lose some We model our painting process as a stroke state
content but retaining the core information of the optimization process with a canvas state set S, a
image. We design the rendering module with an stroke selection H, and a mapping f : S → H. Let
FEN and an optimization algorithm. We use the S = {Cn |n = 1, . . . , N }, H = {Hn |n = 1, . . . , N },
FEN in our rendering module to process the features where N is the number of iterations of the painting
of the reference image and the canvas. We use the model (also the number of times of painting). Let
optimization algorithm to “pick” well-behaved strokes the total number of strokes needed to complete the
for rendering the canvas. painting be T . Since each Hn has T elements, we then
3.4.2 Painting process choose Hn = {ht |t = 1, . . . , T }, where ht is also the
We design our painter to mimic the painting process input of Stroke-GAN when training the Stroke-GAN
used by human artists painting in a given style. (mentioned in Section 3.3.3). One ht in the stroke
Painting is conducted in a coarse-to-fine manner, selection is used to pick one stroke (generated by
in which our painter learns to paint from scratch Stroke-GAN). For each painting iteration, the stroke
to a finely-detailed painting after multiple times of selection outputs a set of Hn of size T to let Stroke-
painting. The Stroke-GAN generates a sequence GAN generate T strokes. One painting process is
of strokes at one time and the rendering module finished when T strokes have all been rendered onto
optimizes stroke selection (see Section 3.4.3) to the canvas, i.e., Cn is completed for some n.
“render” these strokes on the canvas. The painting Each stroke generated by the Stroke-GAN is
process is shown in Fig. 5. The painting model essentially an image, with 64 × 64 pixels. Thus,
consists of the rendering module, stroke selection, the canvas is divided into grid cells for rendering
the canvas, and the Stroke-GAN. The rendering convenience, with the size of each cell also 64 × 64
module optimizes the stroke selection and the Stroke- pixels. We render T strokes onto the canvas cell-
GAN generates continually strokes used to paint. by-cell to finish one painting process. In each cell,
Stroke-GAN Painter learns to paint from novice the content at the stroke position in the new stroke
to veteran ability. We observe that the painting image replaces that at the corresponding position.
quality improves by repeating multiple learning-to- Stroke-GAN produces T strokes at a time, where T
paint processes. The painting quality becomes better equals the number of strokes in each cell multiplied by
with increased n. The reference image is denoted by h × w cells. The strokes are sequentially rendered in
Stroke-GAN Painter: Learning to paint artworks using stroke-style generative adversarial networks 795

a grid order. Stroke-GAN runs once in one painting the backpropagation algorithm for L(U0 , Cn ). Each
process. The mapping f : S → H uses the transition Cn (the state of the canvas) is rendered by T strokes,
function Cn+1 = f (Cn , Hn+1 ). Stroke selection first and each stroke is produced according to the values
outputs an initial set H1 ; each element in H1 denotes in ht . Therefore, the values of the elements in
an effect for rendering a stroke at a certain position each ht can be updated to the ones needed by
of the canvas. The rendering module then optimizes backpropagation for L(U0 , Cn ).
stroke selection via the stroke-selection-optimization
3.5 Style reconstruction
algorithm and generates a new set of elements Hn ,
which are used to render strokes on the canvas to get 3.5.1 Basics
Cn . We continue the coarse-to-fine process from Cn As explained in Section 3.3, Stroke-GAN is the core
to Cn+1 , where Cn+1 denotes a finer-grained painting component for generating the style. The rendering
(with optimized strokes). Continuing the above process, module also contributes to the stylization of the
we finally obtain the best painting CN . whole painting. In particular, we use the FEN in the
3.4.3 Stroke selection optimization rendering module to extract features of the reference
A rendering module is used to optimize stroke image as well as the painted canvas. This process
selection; it consists of an FEN and the optimization loses some content of the original image but mimics
algorithm. During the painting process, the rendering paintings close to the original image. This step can
module first feeds in both the reference image and make the generated painting differ from but be similar
the painted canvas to compute the distance between to the reference image, providing artistic mimesis or
them, and then optimizes the stroke selection. Stroke “realism”. We reconstruct the painting style by using
selection picks well-behaved strokes generated by the Stroke-GAN and FEN in the rendering module.
the Stroke-GAN. This ensures that the state of the In particular, Stroke-GAN endows the painter with
canvas Cn+1 is better than Cn . This procedure diverse styles of strokes and the FEN in the rendering
continues until one painting process is complete, and module creates the artistic style.
the painting CN is generated after N painting process. 3.5.2 Artistic style
The learning-to-paint process works in a coarse-to-fine Since Ref. [15] indicates that the content objective
manner. preserves only high-level features while the para-
It is a key task to optimize the stroke generated meterization can fill in details, we also only take high-
by the Stroke-GAN in the stroke selection step. The level features as inputs. We adopt two of the most
rendering module first extracts the feature maps of representative deep neural networks: GoogleNet [33]
U0 and Cn , and then processes the difference of the or residual nets (ResNets) [34] for the digital
input and the canvas by computing their `1 -distance. rendering module. The design of using FEN to process
We denote the extracted feature maps of the input the original reference image, rather than directly
reference image U0 and those of the painted canvas Cn using the original image, endows our painter with
by F(U0 ) = {Ij |j = 1, . . . , M } and F(Cn ) = {cj |j = artistic creativity while retaining a high similarity
0, . . . , M }, respectively, where M is the number of to the reference image. We focus on realism in oil
features extracted by the neural rendering module. paintings, so we use ResNet to build the FEN in the
In particular, Ij and cj denote the features of U0 and rendering module. ResNets have high accuracy when
those of a certain state of the canvas Cn , respectively. extracting information [34], so can keep a high fidelity
We calculate the `1 -distance loss function L(U0 , Cn ) as in the extracted features. On the other hand, features
M
1 X extracted by GoogleNet are relatively sparse so that
L(U0 , Cn ) = |Ij − cj | (13)
M j=1 may offer more space for our painterly creativity.
Therefore, we use GoogleNet for watercolor and pastel
This essentially computes the distance between
paintings.
the features of the reference image U0 and those of
canvas Cn . The stroke selection algorithm optimizes 3.5.3 Stroke style
the generated stroke by resetting the values of Different strokes can produce different styles of
the elements in ht based on the `1 -distance loss artwork even when used by the same human artist.
function. The function f(Cn , Hn+1 ) is computed by We provide three kinds of strokes to endow our painter
796 Q. Wang, C. Guo, H.-N. Dai, et al.

with more creativity. Figure 6 shows different stroke 4.2 Comparison of stroke styles
styles for watercolors, oil-painting, and pastels. The We compare paintings generated by different styles
watercolor strokes have smooth, soft contours and of strokes on CelebA, ImageNet, and real-world
the brush paths are simple and pure. In contrast, photos. In Fig. 7 and Fig. 8, the top row shows
oil-painting strokes have sharp contours and volatile the inputs, and the images in successive rows
paths. When stacking multiple strokes on the canvas, show oil painting, watercolor painting, and pastel
the oil-painting texture can be recognised easily due painting results. Img 39 and Img 12 were randomly
to these characteristics. The pastel strokes seem to be selected from CelebA [35], Img 32 and Img 35 were
accumulated from many uneven points (mimicking randomly selected from ImageNet [36], and the others
the granular textures of pastel paintings). These are real-world photos. Figures 7 and 8 show that
different styles of strokes cause the canvas to show all generated paintings exhibit different styles in
different styles of painting. After utilizing different contrast to the reference images. In particular,
FENs to process the reference image in conjunction the oil-painting-stroke paintings in the second row
with different types of strokes, we obtain different well preserve textures, lines, and color features,
styles of paintings. consequently capturing fine details in the reference
images. Meanwhile, we observe stroke textures from
oil paintings, demonstrating oil-painting stylization.
The watercolor-stroke paintings in the third row
exhibit a style between pastel paintings and oil
paintings; this style is good at expressing details
for scenery and building images (see Imgs 26, 15, 11,
21, and 23). The pastel-stroke paintings in the last
Fig. 6 Samples from three datasets used to mimic the styles of row preserve some textures and lines while losing
watercolor, oil-painting, and pastel strokes
some color features.
Figure 9 plots `1 -distances between the generated
4 Experimental results images and the reference images. Specifically, Fig. 9(a)
We have evaluated our approach with several plots the `1 -distance versus painting times for the
experiments. We first describe the implementation. three stroke styles. We observe that all three styles
Then, we evaluate three styles of paintings generated nearly converge after 300 painting times, although
by our painter and compare the output of the the oil-painting stroke style converges faster than
proposed Stroke-GAN Painter to state-of-the-art the other two. The pastel-stroke style paintings
methods. Finally, we study alternatives to under- converge the slowest since they lose more content detail.
stand how our Stroke-GAN Painter generates different Figure 9(b) compares convergence for three types of
styles of paintings by fine-tuning the design. image datasets using the same watercolor-stroke style.
It takes 200 painting iterations to recreate the images
4.1 Implementation
in CelebA, 300 iterations for images from ImageNet,
Our experiments were conducted on a workstation and 400 for real-world photos. The portrait images
with an i7-7700k CPU and an NVIDIA Titan RTX of CelebA are relatively easier to learn than those of
GPU. We evaluated our painter on three image ImageNet and real-world photos as they have fewer
datasets: CelebA [35], ImageNet [36], and real-world features.
photos. These images cover various types of content
4.3 Comparison to prior methods
including portraits, landscapes, animals, plants, and
buildings. All images used in experiments are labelled 4.3.1 Methods
by “Img No.”. Style 1 (Stroke-GAN model 1) and We further evaluate our painter by comparing it
the rendering module using GoogleNet were used to to various state-of-the-art learning-based methods,
generate watercolor paintings, Style 2 and ResNet including Neural Painter (NP) [15], MDRL Painter
were used to generate oil paintings, and Style 3 and (MDLRP) [12], SNP [22], and PaintTF [23]; these
GoogleNet were used to generate pastel paintings. outperform other learning methods and traditional
Stroke-GAN Painter: Learning to paint artworks using stroke-style generative adversarial networks 797

Fig. 7 Three artistic styles of paintings generated by Stroke-GAN Painter using images from CelebA [35], ImageNet [36], and a real-world
photo (Img 15).

Fig. 8 Three artistic styles of paintings generated by Stroke-GAN Painter using images from real-world photos.
798 Q. Wang, C. Guo, H.-N. Dai, et al.

we cannot see facial texture and teeth in the woman’s


portrait generated by NP while the image generated
by our painter well preserves those details, thereby
looking more vivid. NP [15] only generates fixed
strokes, while our Stroke-GAN generates variable
strokes thanks to the coloring module. This allows
strokes to be tuned according to the input image,
thus retaining more details. Figure 11 compares
Fig. 9 The `1 -distance between the generated images and reference
images. (a) `1 -distance of different stroke styles for real-world photos.
oil paintings generated by MDRLP, SNP, PaintTF,
(b) `1 -distance of different datasets for the watercolor-stroke style. and our painter. Images generated by our painter
suffer less content loss than those generated by
MDRLP, SNP, and PaintTF. It is quite obvious
SBR methods. We do so using two representative
when comparing inset close-ups, e.g., our painter
artistic styles for our model: pastel-stroke painting
well preserves details of the man’s eyes and mouth,
and oil-stroke painting. We use NP for comparative
and textures of the motorcycle and the cloud. Our
pastel-stroke paintings, and MDRLP, SNP, and
model uses the independent Stroke-GAN to generate
PaintTF for comparative oil paintings. In order to
strokes with diverse shapes and variable sizes so can
obtain the best paintings generated by the compared
depict detailed contents. However, brushstrokes used
methods, we use the authors’ pre-trained models and
in SNP and PaintTF have few shape variants as
default parameter values .
they are directly generated by their entire models.
4.3.2 Qualitative comparison
In particular, their models have only two shapes of
Figures 10 and 11 compare paintings generated by strokes despite variant stroke size and angles. On the
our painter and these compared models. Figure 10 other hand, the stroke-texture representation differs
compares pastel paintings generated by NP and between all compared methods. MDRLP presents
our painter. Our painter generates images with stroke textures while losing some content since it has
more details and textures than NP. For example, no special process during stroking to mimic the stroke
textures. Adding more strokes and painting steps
can make the result more similar to the input photo
instead of a painting. SNP and PaintTF present
stroke textures by adding a stroke-texture mask after
stroke generation. In other words, the stroke contains
no textures when generated and the textures have
no affect on optimizing the stroke. Our Stroke-GAN
painter renders the stroke texture by generating a
sharp contour that mimics the thick edge of oil paints
in a stroke. Therefore, the painting results present
irregular-line textures instead of the brush textures.
4.3.3 Quantitative comparison
To further compare the quality of paintings generated
by Stroke-GAN Painter and the other methods,
we conducted a two-step user study inspired by
Refs. [37, 38]. For fairness, the experiments were
blind trials, in which users did not know which
paintings were generated by which methods. User
Study I investigated relative preferences for paintings
generated by these methods. User Study II investi-
gated preservation of detailed content and stroke
Fig. 10 Comparison to the prior method: pastel-stroke paintings
generated by our painter and NP [15]. textures in paintings generated by these methods.
Stroke-GAN Painter: Learning to paint artworks using stroke-style generative adversarial networks 799

Fig. 11 Comparison results: oil-painting-stroke paintings generated by our painter, MDRLP [12], SNP [22], and PaintTF [23].

User Study I. User Study I used two ques- Painter as their preference, for all pairs of paintings;
tionnaires, the first one to compare pastel style our paintings gained 77% of all votes on average.
artworks, and the second one to compare oil- These high votes imply that paintings from Stroke-
painting style artworks. Since User Study I was GAN Painter present pastel-painting style better than
designed to evaluate the preferences of people for the compared ones.
artworks generated by different methods, we did not Similarly, the second questionnaire evaluated the oil-
emphasize the backgrounds of users in the comparison, painting effect, comparing our painter, MDRLP [12],
although they came from both artistic and non- SNP [22], and PaintTF [23]. The second group
artistic backgrounds. of participants was also chosen to have diverse
In the first questionnaire, we arbitrarily choose backgrounds, age, and gender (40 females, 32 males).
20 images from CelebA [35] (3 images), ImageNet [36] We also select 20 images from CelebA, ImageNet,
(6 images), and real-world photos (11 photos); the and real-world photos to cover different content types
images included various types of contents including (images numbered 21–40 in Fig. 12). We asked par-
landscapes, buildings, animals, and portraits. The ticipants to choose which image is closer to an oil
first group of participants had various backgrounds painting and which they preferred in each set of
(10% with artistic training), age groups (17–50), and images. Again, more participants (31% among four
gender (44 females, 43 males). We evaluated pastel- methods) voted paintings generated by our method
stroke paintings generated by our painter and NP [15]. presenting better oil-painting style than those from
For each reference image (images numbered 1–20 in other methods.
Fig. 12), we obtained a pair of images painted by our User Study II. We used the second user study
painter and NP. We evaluated the user preference to compare paintings generated by our Stroke-GAN
for and stylization of generated images: we asked Painter and other methods in terms of content detail
participants to choose which image better represented and stroke textures. We again used two questionnaires
a pastel-stroke painting and which they preferred in (on a Likert scale [39]) for pastel paintings and oil
each pair of images. Figure 12(a) depicts the results. paintings, separately. The participants were divided
Most users picked the results created by Stroke-GAN into two groups: users with and without an artistic
800 Q. Wang, C. Guo, H.-N. Dai, et al.

Fig. 12 User Study I. (a) Pastel-stroke-painting results generated by Stroke-GAN Painter and NP [15]. (b) Oil-painting results created by
Stroke-GAN Painter, MDRLP [12], SNP [22], and PaintTF [23]. Vertical axis: percentage of users’ preferences for an image. Horizontal axis:
numbered image pairs.

background. All participants were chosen from various Table 2 User Study II. Scores of paintings generated by our methods
and other methods for content details and stroke textures with artistic
age groups (17–50) and gender (20 females, 5 males) background. CI = confidence interval. LB = lower bound. UB =
for each questionnaire. We compared the average upper bound
score (µ), variance (σ), and the 95% confidence 95% CI
Item Method µ σ
interval for paintings generated by each method. LB UB
Tables 1 and 2 give results for the two user groups, NP 3.446 0.225 3.258 3.635
respectively. Ours 3.775 0.229 3.584 3.967
In Table 1 (users without an artistic background),
Contents MDRLP 2.717 0.446 2.459 2.974
the content details of paintings generated by
PaintTF 2.397 0.559 2.074 2.719
MDRLP [12], PaintTF [23], and SNP [22] gained low SNP 2.237 0.567 1.909 2.564
scores (lower than 3). One reason lies in the fact that Ours 3.803 0.313 3.623 3.984
their paintings lose too many details, and the stroke NP 3.594 0.286 3.355 3.833
textures generated by MDRLP [12] are also difficult Ours 3.956 0.221 3.772 4.141
to recognize for most users. In contrast, the paintings
Strokes MDRLP 2.583 0.398 2.353 2.813
generated by Stroke-GAN Painter have better scores PaintTF 3.668 0.214 3.544 3.791
for both content details and stroke textures. Similarly, SNP 3.682 0.153 3.594 3.770
in Table 2 (users with an artistic background), our Ours 3.604 0.239 3.466 3.742

Table 1 User Study II. Scores of paintings generated by our method


and other methods for content details and stroke textures without artistic method again gained higher evaluation scores than
background. CI = confidence interval. LB = lower bound. UB = upper the other methods. Comparing Table 1 to Table 2,
bound
users lacking an artistic background gave higher
95% CI
Item Method µ σ scores than users with an artistic background in most
LB UB cases. Nevertheless, both users with and without
NP 3.626 0.181 3.530 3.723 artistic backgrounds evaluated our paintings higher
Ours 3.940 0.227 3.820 4.060 than those of other methods. In particular, the
Content MDRLP 2.839 0.308 2.683 2.996 average score (µ) for the content details reaches
PaintTF 2.471 0.386 2.274 2.668 3.940 and the upper bound is 4.060 (in the 95%
SNP 2.374 0.298 2.153 2.595 confidence interval) in Table 1. In Table 2, the
Ours 3.361 0.214 3.251 3.611 average score of the stroke texture reaches 3.956
NP 3.605 0.265 3.465 3.746 with upper bound 4.141. Interestingly, for stroke
Ours 3.722 0.258 3.585 3.859
textures, users without artistic background gave a
Stroke MDRLP 2.816 0.286 2.670 2.962 higher score (2.816) for MDRLP [12] than users with
PaintTF 2.582 0.368 2.394 2.769 an artistic background (2.583), and the highest score
SNP 2.576 0.294 2.358 2.794 given by users without artistic background is 3.468
Ours 3.468 0.201 3.366 3.571
(our method). However, opposite scores are given by
Stroke-GAN Painter: Learning to paint artworks using stroke-style generative adversarial networks 801

users with artistic background. The scores given for Table 4 User Study III. Scores of paintings generated by our methods
and SOTA methods for color tone and aesthetic beauty with artistic
SNP [22] and PaintTF [23] are higher than those for background. CI = confidence interval. LB = lower bound. UB =
our method although our score is 3.604, close to these upper bound
two methods. 95% CI
Item Method µ σ
In summary, both pastel and oil paintings from LB UB
our method contain more detailed contents than NP 3.857 0.243 3.654 4.060
do the compared methods. For non-artistic users, Ours 3.913 0.250 3.704 4.121
the stroke textures are not well presented by most
Color tone MDRLP 3.823 0.238 3.686 3.961
methods, while artistic users think that PaintTF [23], PaintTF 3.607 0.290 3.439 3.774
SNP [22], and our methods (both pastel and oil- SNP 3.743 0.237 3.606 3.880
painting) present stroke textures well. Ours 4.217 0.357 4.010 4.423
User Study III. In order to evaluate the NP 3.707 0.346 3.418 3.996
aesthetics of the results, we further conducted User Ours 3.753 0.218 3.571 3.935
Study III to evaluate color tone and aesthetic beauty MDRL 3.550 0.327 3.361 3.739
Beauty
of the output paintings. We again used two user-study PaintTF 3.267 0.555 2.946 3.587
questionnaires (using a Likert scale [39]) for pastel- SNP 3.470 0.454 3.208 3.732
stroke paintings and oil-painting-stroke paintings, Ours 3.937 0.400 3.706 4.167
separately. The input images were those used in
User Study II. The participants were divided into
than 3. On the other hand, oil-paintings obtained
two groups: users with an artistic background (15)
lower scores. In particular, paintings generated by
and users without an artistic background (19). The
PaintTF [23] and SNP [22] gained much lower scores
participants came from various age groups (21–40),
than our method and MDRLP [12] for aesthetic
with 18 females and 16 males, for each questionnaire.
beauty. As the scores for content in Table 1 (given by
We compare the average score (µ), variance (σ), and
users without artistic background) are also lower than
the 95% confidence interval for paintings generated
3, we see that the paintings generated by PaintTF [23]
by each method. Tables 3 and 4 give results for the
and SNP [22] lose too much content detail, so users
two user groups.
without an artistic background give low scores for
In Table 3, scores for color tone and aesthetic
beauty. Although users without artistic background
beauty given by users without artistic background for
did not give high scores, the ranking of the compared
pastel paintings (NP [15] and our method) are higher
methods for aesthetic beauty item and color-tone
remain the same.
Table 3 User Study III. Scores of paintings generated by our methods
and SOTA methods for color tone and aesthetic beauty without artistic On the other hand, users with an artistic
background. CI = confidence interval. LB = lower bound. UB = background gave higher scores than users without an
upper bound
artistic background. In Table 4, paintings generated
95% CI by all compared methods obtain scores higher than 3.
Item Method µ σ
LB UB In particular, our paintings score 4.217 for color tone
NP 3.703 0.316 3.535 3.870 and 3.937 for aesthetic beauty. Comparing Tables 4
Ours 3.688 0.306 3.526 3.851 and 3, our method, MDRLP [12], PaintTF [23], and
Color tone MDRLP 3.211 0.274 3.071 3.350 SNP [22] achieve much higher scores than NP [15].
PaintTF 2.934 0.363 2.749 3.119 Evaluation of pastel-stroke paintings generated by
SNP 2.845 0.346 2.588 3.101 NP differ little between users with and without
Ours 3.708 0.190 3.611 3.805 artistic backgrounds. However, the difference between
NP 3.782 0.386 3.577 3.986 these two kinds of users is obvious when evaluating
Ours 3.833 0.293 3.678 3.988 the oil-painting style paintings. For example, for
Beauty MDRLP 2.963 0.327 2.796 3.130 aesthetic beauty, users without artistic background
PaintTF 2.537 0.380 2.343 2.731 give higher scores for paintings by PaintTF than by
SNP 2.484 0.356 2.220 2.748 SNP while users with artistic background give lower
Ours 3.508 0.188 3.412 3.604 scores. However, all users give consistent evaluations.
802 Q. Wang, C. Guo, H.-N. Dai, et al.

In particular, for both color tone and aesthetic the FEN for oil-painting images. We consider a new
beauty, our method is better than the others, while FEN with a combination of GoogleNet and ResNet,
PaintTF [23] and SNP [22] rank lowest. Considering namely (G+R) to generate paintings. We denote
Tables 1–4 overall, our method scores the highest the `1 -distance of features extracted by GoogleNet
among the state-of-the-art methods. NP performs well and the `1 -distance of features extracted by ResNet
for both content and color tone. MDRLP performs by LG (U0 , Cn ) and LR (U0 , Cn ), respectively: see
well for color tone. PaintTF and SNP perform well Eq. (13). The loss function of the G+R network
for stroke texture. can be written as
L(U0 , Cn ) = 0.5LG (U0 , Cn ) + 0.5LR (U0 , Cn ) (14)
4.4 Alternatives
We only perform backpropagation for the final
In this section, we investigate how our painter L(U0 , Cn ). Figure 13 compares results generated
generates different styles of paintings when using by the three different FENs and four stroke styles,
alternative FENs and strokes. Styles 1–4. Style 4 is a new stroke designed with a
4.4.1 Feature-extraction network hollow circle and a cubic Bézier curve; its stroke
Recall that we chose GoogleNet as the FEN for dataset is generated using a similar method to
watercolor and pastel-stroke images, and ResNet as that for Style 1. We see from Fig. 13 that the

Fig. 13 Paintings by various feature-extracting networks (FENs) and different stroke styles. Each row contains images created by models with
one style of strokes and an FEN (GoogleNet, GoogleNet+ResNet, or ResNet), with input images in the first column.
Stroke-GAN Painter: Learning to paint artworks using stroke-style generative adversarial networks 803

model using GoogleNet+ResNet as the FEN can process and generate different styles of paintings.
generate a style of artwork combining the characters In particular, we designed Stroke-GAN to generate
of pastel and oil-painting styles. Interestingly, the various styles of strokes. We model the painting
FEN essentially affects the artistic style of a painting, process as a stroke-state-optimization process, which
and the stroke style affects the painting style of the can be optimized by a deep convolutional neural
painting. Therefore, we confidently infer that various network. Our artistic painter can generate different
combinations of FENs and stroke styles can create a styles of paintings in a coarse-to-fine fashion, like
diversity of artistic styles of paintings. a human painter. User studies of the paintings
4.4.2 Number of strokes generated by Stroke-GAN Painter and competing
We next investigate the impact of the number of methods demonstrate that our painter gained most
strokes. In particular, the canvas is divided into h×w votes for the closeness to pastel paintings and oil
cells. We then generate images with various number paintings. Moreover, images generated by our painter
of strokes in each cell. Figure 14 depicts paintings also preserve more content detail than existing
generated using different numbers of strokes with the methods.
same FEN. The image generated using 2 strokes looks A deep learning algorithm and Stroke-GAN are
colorful and artistically creative although it also loses used to decompose the reference image into a grid
much content detail. Larger numbers of strokes (8 of cells to allow rendering of a sequence of strokes
or 10) lead to an exquisite image, much closer to to achieve the stroke-by-stroke effect. We designed
the reference image than images generated by fewer Stroke-GAN to generate style strokes by learning
strokes (2 or 5). Having an adjustable number of from stroke datasets. Our Stroke-GAN can learn
strokes provides the users with artistic choices. any stroke style, providing the painting agent with
creativity and flexibility. Although our generated
paintings do not compare with masters’ artworks,
5 Conclusions and future work
we have made an important step for learning-based
In this paper, we have presented a stroke-based image AI painting with creative and flexible artistic styles.
rendering approach to mimic the human painting Meanwhile, there is much that can be done to improve

Fig. 14 Paintings generated using 2, 5, 8, or 10 strokes in each cell. Center: reference image.
804 Q. Wang, C. Guo, H.-N. Dai, et al.

the quality of the output, and other artistic styles not International Conference on Multimedia, 244–251,
mentioned in this paper could also be emulated. In 2017.
future, we can combine the advantages of conventional [4] Zhou, W. Y.; Yang, G. W.; Hu, S. M. Jittor-GAN: A
stroke-based methods with learning-based methods to fast-training generative adversarial network model zoo
improve painting quality. On the other hand, we hope based on Jittor. Computational Visual Media Vol. 7,
to develop new style transfer methods in a stroke- No. 1, 153–157, 2021.
by-stroke manner to enrich the artistic styles of AI [5] Haeberli, P. Paint by numbers: Abstract image
painting. representations. ACM SIGGRAPH Computer Graphics
Vol. 24, No. 4, 207–214, 1990.
Availability of data and materials [6] Hertzmann, A. Painterly rendering with curved brush
strokes of multiple sizes. In: Proceedings of the
The data and materials generated during the study 25th Annual Conference on Computer Graphics and
are available from the corresponding authors on Interactive Techniques, 453–460, 1998.
reasonable request. [7] Lee, H.; Seo, S.; Ryoo, S.; Ahn, K.; Yoon, K. A
multi-level depiction method for painterly rendering
Author contributions based on visual perception cue. Multimedia Tools and
Q. Wang designed the study, performed experiments, Applications Vol. 64, No. 2, 277–292, 2013.
and wrote the manuscript. P. Li and H.-N. Dai helped [8] Ha, D.; Eck, D. A neural representation of sketch
to design the study and experiments and supervised drawings. In: Proceedings of the International
the project. C. Guo provided comments and feedback Conference on Learning Representations, 1–16, 2018.
on the study and the results. All authors reviewed [9] Zheng, N.; Jiang, Y.; Huang, D. StrokeNet: A
the manuscript. neural painting environment. In: Proceedings of the
International Conference on Learning Representations,
Acknowledgements 1–12, 2019.
[10] Ganin, Y.; Kulkarni, T.; Babuschkin, I.; Eslami, S. M.
The authors would like to thank the anonymous
A.; Vinyals, O. Synthesizing programs for images using
reviewers for their helpful suggestions and comments. reinforced adversarial learning. In: Proceedings of the
This work was supported in part by the Hong Kong 35th International Conference on Machine Learning,
Institute of Business Studies (HKIBS) Research 1666–1675, 2018.
Seed Fund under Grant HKIBS RSF-212-004, and [11] Xie, N.; Hachiya, H.; Sugiyama, M. Artist agent: A
in part by The Hong Kong Polytechnic University reinforcement learning approach to automatic stroke
under Grant P0030419, Grant P0030929, and Grant generation in oriental ink painting. IEICE Transactions
P0035358. on Information and Systems Vol. E96.D, No. 5, 1134–
1144, 2013.
Declaration of competing interest [12] Huang, Z. W.; Zhou, S. C.; Heng, W. Learning to
The authors have no competing interests to declare paint with model-based deep reinforcement learning.
that are relevant to the content of this article. In: Proceedings of the IEEE/CVF International
Conference on Computer Vision, 8708–8717, 2019.
References [13] Wang, Q.; Guo, C.; Dai, H. N.; Li, P. Self-stylized
neural painter. In: Proceedings of the SIGGRAPH
[1] Wang, L.; Wang, Z.; Yang, X. S.; Hu, S. M.; Zhang, J. Asia 2021 Posters, Article No. 9, 2021.
J. Photographic style transfer. The Visual Computer [14] Song, J. F.; Pang, K. Y.; Song, Y. Z.; Xiang,
Vol. 36, No. 2, 317–331, 2020. T.; Hospedales, T. M. Learning to sketch with
[2] Gatys, L. A.; Ecker, A. S.; Bethge, M. Image style shortcut cycle consistency. In: Proceedings of the
transfer using convolutional neural networks. In: IEEE/CVF Conference on Computer Vision and
Proceedings of the IEEE Conference on Computer Pattern Recognition, 801–810, 2018.
Vision and Pattern Recognition, 2414–2423, 2016. [15] Nakano, R. Neural painters: A learned differentiable
[3] Zhao, Y. R.; Deng, B.; Huang, J. Q.; Lu, H. T.; constraint for generating brushstroke paintings. In:
Hua, X. S. Stylized adversarial AutoEncoder for Proceedings of the 33rd Conference on Neural
image generation. In: Proceedings of the 25th ACM Information Processing Systems, 2019.
Stroke-GAN Painter: Learning to paint artworks using stroke-style generative adversarial networks 805

[16] Deussen, O.; Strothotte, T. Computer-generated pen- Transactions on Multimedia Vol. 21, No. 8, 1960–1970,
and-ink illustration of trees. In: Proceedings of the 2019.
27th Annual Conference on Computer Graphics and [28] Chu, W. T.; Wu, Y. L. Image style classification based
Interactive Techniques, 13–18, 2000. on learnt deep correlation features. IEEE Transactions
[17] Wilson, B.; Ma, K. L. Rendering complexity in on Multimedia Vol. 20, No. 9, 2491–2502, 2018.
computer-generated pen-and-ink illustrations. In: [29] Jia, B.; Fang, C.; Brandt, J.; Kim, B.; Manocha,
Proceedings of the 3rd International Symposium on D. PaintBot: A reinforcement learning approach
Non-photorealistic Animation and Rendering, 129–137, for natural media painting. arXiv preprint arXiv:
2004. 1904.02201, 2019.
[18] Deussen, O.; Hiller, S.; Van Overveld, C.; Strothotte, [30] Justin, R. Elements of art: Interpreting meaning
T. Floating points: A method for computing stipple through the language of visual cues. Ph.D. Thesis.
drawings. Computer Graphics Forum Vol. 19, No. 3, Stony Brook University, 2018.
41–50, 2000. [31] Radford, A.; Metz, L.; Chintala, S. Unsupervised
[19] Deussen, O.; Isenberg, T. Halftoning and stippling. In: representation learning with deep convolutional
Image and Video-based Artistic Stylisation. Compu- generative adversarial networks. In: Proceedings of the
tational Imaging and Vision, Vol. 42. Rosin, P.; International Conference on Learning Representations,
Collomosse, J. Eds. Springer London, 45–61, 2013. 1–16, 2016.
[20] Hertzmann, A. A survey of stroke-based rendering. [32] Donoho, D. L. De-noising by soft-thresholding. IEEE
IEEE Computer Graphics and Applications Vol. 23, Transactions on Information Theory Vol. 41, No. 3,
No. 4, 70–81, 2003. 613–627, 1995.
[33] Szegedy, C.; Liu, W.; Jia, Y. Q.; Sermanet, P.; Reed, S.;
[21] Mellor, J.; Park, E.; Ganin, Y.; Babuschkin, I.;
Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich,
Kulkarni, T.; Rosenbaum, D.; Ballard, A.; Weber,
A. Going deeper with convolutions. In: Proceedings of
T.; Vinyals, O.; Eslami, S. M. A. Unsupervised
the IEEE Conference on Computer Vision and Pattern
doodling and painting with improved SPIRAL. In:
Recognition, 1–9, 2015.
Proceedings of the Neural Information Processing
[34] He, K. M.; Zhang, X. Y.; Ren, S. Q.; Sun, J. Deep
Systems Workshops, 2019.
residual learning for image recognition. In: Proceedings
[22] Zou, Z. X.; Shi, T. Y.; Qiu, S.; Yuan, Y.; Shi,
of the IEEE Conference on Computer Vision and
Z. W. Stylized neural painting. In: Proceedings of
Pattern Recognition, 770–778, 2016.
the IEEE/CVF Conference on Computer Vision and
[35] Liu, Z. W.; Luo, P.; Wang, X. G.; Tang, X. O. Deep
Pattern Recognition, 15684–15693, 2021.
learning face attributes in the wild. In: Proceedings
[23] Liu, S. H.; Lin, T. W.; He, D. L.; Li, F.; Deng, R.
of the IEEE International Conference on Computer
F.; Li, X.; Ding, E.; Wang, H. Paint transformer:
Vision, 3730–3738, 2015.
Feed forward neural painting with stroke prediction.
[36] Russakovsky, O.; Deng, J.; Su, H.; Krause, J.;
In: Proceedings of the IEEE/CVF International
Satheesh, S.; Ma, S. A.; Huang, Z.; Karpathy, A.;
Conference on Computer Vision, 6578–6587, 2021.
Khosla, A.; Bernstein, M.; et al. ImageNet large scale
[24] Gatys, L.; Ecker, A.; Bethge, M. A neural algorithm visual recognition challenge. International Journal of
of artistic style. Journal of Vision Vol. 16, No. 12, 326, Computer Vision Vol. 115, No. 3, 211–252, 2015.
2016. [37] Huang, H. Z.; Zhang, S. H.; Martin, R. R.; Hu, S. M.
[25] Jing, Y. C.; Yang, Y. Z.; Feng, Z. L.; Ye, J. W.; Yu, Y. Learning natural colors for image recoloring. Computer
Z.; Song, M. L. Neural style transfer: A review. IEEE Graphics Forum Vol. 33, No. 7, 299–308, 2014.
Transactions on Visualization and Computer Graphics [38] Tong, Z. Y.; Chen, X. H.; Ni, B. B.; Wang, X. H. Sketch
Vol. 26, No. 11, 3365–3385, 2020. generation with drawing process guided by vector flow
[26] Dutta, T.; Singh, A.; Biswas, S. StyleGuide: Zero-shot and grayscale. Proceedings of the AAAI Conference on
sketch-based image retrieval using style-guided image Artificial Intelligence Vol. 35, No. 1, 609–616, 2021.
generation. IEEE Transactions on Multimedia Vol. 23, [39] Liddell, T. M.; Kruschke, J. K. Analyzing ordinal data
2833–2842, 2021. with metric models: What could possibly go wrong?
[27] Xu, M. L.; Su, H.; Li, Y. F.; Li, X.; Liao, J.; Niu, J. Journal of Experimental Social Psychology Vol. 79, 328–
W.; Lv, P.; Zhou, B. Stylized aesthetic QR code. IEEE 348, 2018.
806 Q. Wang, C. Guo, H.-N. Dai, et al.

Qian Wang received her B.Eng. degree from 2021 to 2022. His current research interests include
in electronic information engineering Internet of Things, big data analytics, and blockchains. He
from Yangtze University, Jingzhou, has co-authored or co-edited 3 monographs and published
China, in 2012, and her M.Eng. degree more than 150 papers in top-tier journals and conferences.
in educational technology from Zhejiang
University of Technology, Hangzhou,
China, in 2016. She is currently pursuing Ping Li received his Ph.D. degree
a Ph.D. degree in computer technology in computer science and engineering
and applications in the School of Computer Science and from The Chinese University of Hong
Engineering, Macau University of Science and Technology. Kong, in 2013. He is currently an
She is also a research assistant with The Hong Kong assistant professor with The Hong
Polytechnic University. Her current research interests include Kong Polytechnic University. He has
image and video stylization, and AI drawing. published many top-tier scholarly
research papers and has one excellent
Cai Guo received his M.Eng. degree in research project reported worldwide by ACM TechNews.
software engineering from Guangdong His current research interests include artistic rendering and
University of Technology, Guangzhou, synthesis, stylization, colorization, and creative media.
China, in 2011. He is currently pur-
Open Access This article is licensed under a Creative
suing a Ph.D. degree in computer
Commons Attribution 4.0 International License, which
technology and applications in the School
permits use, sharing, adaptation, distribution and reproduc-
of Computer Science and Engineering,
tion in any medium or format, as long as you give appropriate
Macau University of Science and
credit to the original author(s) and the source, provide a link
Technology. He is also with Hanshan Normal University,
to the Creative Commons licence, and indicate if changes
Chaozhou, China. His current research interests include
were made.
deep learning, motion deblurring, and AI drawing.
The images or other third party material in this article are
included in the article’s Creative Commons licence, unless
Hong-Ning Dai received his Ph.D.
indicated otherwise in a credit line to the material. If material
degree in computer science and
is not included in the article’s Creative Commons licence and
engineering from The Chinese University
your intended use is not permitted by statutory regulation or
of Hong Kong, in 2008. He is currently
exceeds the permitted use, you will need to obtain permission
an associate professor in the Department
directly from the copyright holder.
of Computer Science, Hong Kong Baptist
To view a copy of this licence, visit http://
University, Hong Kong, China. He was
creativecommons.org/licenses/by/4.0/.
in the Faculty of Information Technology
at Macau University of Science and Technology as an Other papers from this open access journal are available
assistant/associate professor from 2010 to 2021, and the free of charge from http://www.springer.com/journal/41095.
Department of Computing and Decision Sciences, Lingnan To submit a manuscript, please go to https://www.
University, Hong Kong, China, as an associate professor editorialmanager.com/cvmj.
© The Author(s) 2023. corrected publication 2023. This work is published under
http://creativecommons.org/licenses/by/4.0/(the “License”). Notwithstanding the
ProQuest Terms and Conditions, you may use this content in accordance with
the terms of the License.

You might also like