Stroke-GAN Painter: Learning To Paint Artworks Using Stroke-Style Generative Adversarial Networks
Stroke-GAN Painter: Learning To Paint Artworks Using Stroke-Style Generative Adversarial Networks
Stroke-GAN Painter: Learning To Paint Artworks Using Stroke-Style Generative Adversarial Networks
Research Article
787
788 Q. Wang, C. Guo, H.-N. Dai, et al.
which can adapt to diverse artistic styles. In addition, to-fine manner, as shown in Fig. 1. In particular,
they can create paintings without user intervention. the painting quality becomes better by repeating
Recently, researchers have typically used recurrent multiple learning-to-paint processes, during which it
neural networks (RNNs) [8, 9] and reinforcement proceeds from being a novice to being a veteran. In
learning (RL) models [10–12] to generate stroke-by- contrast to existing methods, such as sketching [8, 14],
stroke artworks. However, such unified frameworks doodling [10], Neural Painter (NP) [15], and MDRL
lack flexibility to choose different styles of strokes and Painter (MDRLP) [12], our painter can generate
some paintings generated in particular artistic styles diverse artistic styles of painting with different types
(e.g., pastel-like paintings) lack fine details. of strokes. Moreover, the images generated by our
To address limitations of existing painting methods, painter also preserve key content details (such as
we have built a new model leveraging advantages face details in portraits) well, as shown in Fig. 1. In
of both conventional SBR methods and learning- summary, our work makes the following three main
based methods, as an extension of Ref. [13]. We contributions.
first describe a novel stroke generative adversarial • We propose a three-player-game model, Stroke-
network (Stroke-GAN) which learns different stroke GAN, to generate strokes in an artistic style,
styles from stroke-style datasets and generates diverse which are fully adjustable in terms of stroke
stroke styles with adjustable parameters (stroke path, path, stroke size and shape, stroke color and
stroke size and shape, stroke color and transparency). transparency, thereby greatly improving the
Based on Stroke-GAN, we then describe a neural- stylization of generated paintings. Stroke-GAN
network painter which learns to create different styles has two generators and one discriminator; the
of paintings in a stroke-by-stroke paradigm. We call second generator learns to purify the stained-
the entire framework Stroke-GAN Painter. Stroke- color strokes generated by the first generator.
GAN Painter learns to generate a painting in a coarse- Consequently, the generated strokes have pure
Fig. 1 Learning-to-paint process of Stroke-GAN Painter. Rightmost column: input. Top to bottom: oil painting, watercolor painting, pastel
painting. N : number of painting times, rather than number of strokes.
Stroke-GAN Painter: Learning to paint artworks using stroke-style generative adversarial networks 789
colors and textures closer to human artists’ However, most of these methods require substantial
strokes. human intervention to choose key parameters, and are
• We design a painter based on Stroke-GAN; it does thus inconvenient for ordinary users. Moreover, SBR
not need to be trained on any image datasets. methods only generate a limited number of artistic
It learns to create different artistic styles of styles [20], consequently leading to inflexibility.
paintings based on stylized strokes, e.g., oil- 2.2 Learning-based rendering
painting-like artworks, watercolor-like artworks,
pastel-like artworks, in a unified framework. Recently, researchers have adopted learning-based
Our generated paintings preserve content details methods to improve the painting effects compared
of reference images well while offering stylistic to traditional SBR methods. SPIRAL [10, 21]
diversity of paintings. develops an adversarially-trained deep reinforcement
• Experimental results show that our painter learning (DRL) agent which learns structures of
generates diverse artistic styles of paintings images. However, it does not reconstruct details
for various types of the image-content, e.g., of human portraits well. Sketch-RNN [8] constructs
portraits, landscapes, animals, plants, and stroke-based drawings after training on human-drawn
buildings. Paintings generated by Stroke-GAN image datasets; it achieves excellent results for
Painter preserve content details well, such as eyes common objects, although it only makes simple-line
and teeth of portraits and fine details of buildings. paintings. StrokeNet [9] trains an agent to learn to
User studies with a variety of participants paint based on a differentiable render and recurrent
demonstrate that paintings generated by Stroke- neural network (RNN); it generalizes poorly on color
GAN Painter are preferred 77% of the time images. Computers are now able to generate quite
for pastel paintings, in comparison to another realistic oil paintings [12, 22, 23], and pastel-like
method, and 31% of the time for oil paintings, in paintings [15]. MDRLP [12] paints oil-painting-like
comparison to three other methods, in terms of pictures with a small number of strokes, although
fidelity and stylization. it only mimics one style in a unified framework and
loses brush-stroke textures. Methods such as those in
Refs. [22, 23] improve stroke textures by redesigning
2 Related work their stroke rendering. NP [15] has a similar design
We briefly survey closely related studies on to our method since both generate strokes using a
machine painting. We classify related studies into GAN-based module. However, our approach differs
conventional stroke-based rendering (SBR), learning- from theirs in several ways. Firstly, we use a three-
based rendering, and image-style transfer (IST). player GAN model to generate adjustable strokes
while NP only uses a normal GAN to generate fixed
2.1 Conventional stroke-based rendering
strokes. Secondly, our method can learn from any
SBR methods essentially reconstruct images as non- stroke datasets while NP must use strokes produced
photorealistic imagery using stroke-based models. by the MyPaint program as the stroke dataset.
Researchers have adopted SBR methods to different Thirdly, NP requires a massive manually-labelled
types of artworks, e.g., paintings [5–7], pen-and-ink stroke dataset to generate one-by-one action strokes.
drawings [16, 17], and stippled drawings [18, 19]. Their model requires the same strokes as the stroke
In particular, Ref. [5] introduces a semi-automatic dataset provided by MyPaint to ensure that the
painting method based on a greedy algorithm; it optimized strokes are those needed by the painting
needs substantial human intervention to control stroke model. Our model has no data-labelling owing to that it
shapes and select stroke locations. The authors of can paint well by using the three-player Stroke-GAN to
Ref. [6] propose a style design for their painting generate strokes similar to those in the stroke dataset.
method by using spline-brush strokes to render the
2.3 Image style transfer
image, but this method requires a high degree of
painting skill of its users. The work in Ref. [7] Image style transfer (IST) methods are popular in
proposes a method to segment an image into areas both research projects and industrial applications
with similar levels of salience to control the strokes. [24–28], although few methods have been applied to
790 Q. Wang, C. Guo, H.-N. Dai, et al.
stroke size and shape. The size of the brush tip since the model is designed to learn to paint from
varies with the values of S. novice to veteran ability.
• Stroke transparency: We use T = {t0 , t1 } to 3.3.2 Motivation for Stroke-GAN
control the transparency of the stroke. The Our design for Stroke-GAN follows the Deep
transparency of the stroke varies with the values Convolutional Generative Adversarial Network
of T. (DCGAN) [31], which offers more stable training than
• Color: Primary colors denoted by V = {r, g, b} conventional GANs. However, the strokes generated
determine the color of the stroke. by a normal DCGAN (i.e., a two-player game) have
3.2.2 Stroke datasets stained colors. It is unreasonable to try to reproduce
The stroke designer module provides different stroke- the painting process of human artists by rendering
style datasets for training Stroke-GAN. Each stroke strokes in stained colors. To address this problem,
dataset includes 200,000 stroke images, each with we design a second generator (the third player) to re-
64 × 64 resolution. We currently provide stroke color strokes with three color parameters to generate
datasets in three diverse styles, although others could pure-colored strokes, which are better for painting
be added: a watercolor-stroke dataset, an oil-painting- (closer to the strokes painted by human artists).
stroke dataset, and a pastel-stroke dataset. The 3.3.3 Three-player-game of Stroke-GAN
pastel-stroke dataset is from Ref. [15] while both oil- In order to obtain pure-colored strokes for reasonable
painting-stroke dataset and watercolor-stroke dataset painting strokes, we provide a coloring module as
are provided by our stroke designer modules. We a second generator G0 immediately following the
model the stroke as its path and tip, using a Bézier normal generator G (being essentially a convolutional
curve (BC) to construct the stroke path. Circles neural network), as shown in Fig. 3. Stroke-GAN
with variable radii are used to simulate the stroke tip. consists of a normal generator G, a coloring module
Each stroke is made from 100 variable circles moving G0 , and a discriminator D. Let ht denote the one-
along BC, which is given by dimensional (1-D) vector used as the random noise
d
!
X d to generated strokes. Let hst and hct denote 1-D
B(t) = (1 − t)(d−i) ti Pi , t ∈ [0, 1] (1)
i=0
i vectors being fed into G and G0 , respectively, and
where Pi denotes the control point with coordinates ht = [hst , hct ]. After being fed with hst and hct ,
(xi , yi ) ∈ P. respectively, G learns to generate stroke images but
stained-colored, and G0 learns to generate pure-color
3.3 Stroke-GAN
strokes. The discriminator first determines whether
3.3.1 Basics a stroke generated by G is valid or not, and then
In order to improve the painting quality and the determines the stroke image generated by G0 . Both
fidelity of the stroke, we have designed a three- G and G0 update during the adversary mode with
player GAN model, Stroke-GAN, to generate stylized D. Since the randomness of DCGAN leads to the
strokes for the painting process. Stroke-GAN is the unexpected stroke color, we build a second generator
core component which allows our model to use a G0 (the coloring module) to control the stroke color.
unified framework to produce diverse artistic styles Particularly, the vector ht consists of 50 elements.
of paintings from a given image. Its third player,
the coloring module, allows preservation of high
fidelity details. Paintings contain several elements:
lines, textures, colors, and so on [30]. Our Stroke-
GAN is designed to generate strokes containing these
elements; the stroke style has a major influence on
the painting style. As Stroke-GAN has an end-to-
end training model, our painter framework has the
flexibility to change the artistic style by choosing
Fig. 3 Structure of Stroke-GAN. The stroke designer module provides
differently-trained Stroke-GAN models. We do not stroke datasets. Stroke-GAN consists of a normal generator G, a
need to train the whole painter on any image datasets coloring module G0 , and a discriminator D.
792 Q. Wang, C. Guo, H.-N. Dai, et al.
The normal generator (G) feeds the variable hst with allows Stroke-GAN to easily learn different styles
47 elements and outputs the stroke image G(hst ) while of strokes. Therefore, while Stroke-GAN utilizes a
hct is used to control the stroke color {r, g, b} fed into unified framework, it can generate different styles of
G0 . Stroke-GAN improves the original DCGAN to paintings.
purify the colors of strokes (see Fig. 3). The coloring
3.3.4 Training of Stroke-GAN
module feeds hct with the stroke G(hst ) to learn to
purify the stroke. We train Stroke-GAN to acquire different stroke
Since the stroke image just contains the background models to endow our painter the ability to paint
and the stroke, we can use threshold segmentation [32] with different stroke styles. Figure 3 depicts the
to separate the stroke region from the background. structure of Stroke-GAN. Stroke-GAN is trained with
Let P = [p1 , . . . , pc ] denote the matrix of pixels in 128 images as a mini-batch for each stroke dataset
the stroke image, where c is the number of channels containing 200,000 images. We use the whole dataset
of the image. The average of P is denoted by P . as the training set since the DCGAN model has no
Since the pixel values of the background in the stroke need for validation. When training Stroke-GAN, we
image generated by G are unknown, we should train directly feed the generator with a 1-D vector (ht )
G0 to find the threshold of the background pixels. to generate a stroke image. The initial values of
We denote the threshold by γ. We use threshold the elements in ht are random. During training,
segmentation to eliminate the background region from the generator learns to produce images close to the
the stroke image, and the result is denoted by Pe , dataset by optimizing the values of these elements.
which is obtained as Eq. (2): We use the Adam optimizer to train the Stroke-GAN
Pe = γ − P (2) model, with a learning rate of 0.0002, and values
of betas of 0.5 and 0.999. Each pair comprising a
The values of the elements in Pe are 0 or close to 0
generated stroke image and a real stroke image is
in the background region. We use min(·) and max(·)
then fed into the discriminator. The discriminator
to calculate the minimum and maximum values in
next determines whether the pair of strokes is valid
Pe , respectively. We determine the stroke region and
or not. If the generated stroke image is similar to
denote the results by SP , which is obtained by
the real stroke image, the pair is valid, and invalid
SP = (Pe − min(Pe ))/(max(Pe ) − min(Pe )) (3)
otherwise.
In particular, the values of the elements in SP close The training procedure for Stroke-GAN is given
to 1 are stroke pixels, and values close to 0 represent in Algorithm 1. We essentially train the normal
background pixels. We then use hct to recolor the generator G, the coloring module G0 , and the
stroke and obtain the pure stroke image PS as Eq. (4): discriminator D by back-propagating the loss, to
PS = [SP , SP , SP ] · hct (4) update the parameters θg , θc , and θd , respectively.
The coloring module endows our painter with In particular, the discriminator D has the loss `d , the
more creativity in painting, e.g., outputting paintings generator G has the loss `g , and the generator G0
with different colors even from the same input has the loss `c . We use binary cross entropy (BCE)
image. The reason is that the coloring module denoted by `(z, y) to measure the loss of the input
in Stroke-GAN takes in hct directly to recolor the sample z on conditional variable y, with the number
stroke image by learning the color information of of the samples of T , as Eq. (5):
the input. This is the reason that Stroke-GAN can T
1 X
be trained without the data-labelling restriction of `(z, y) = − yt log(zt ) − (1 − yt ) log(1 − zt )
T t=1
generating an image the same as the labeled image.
Even though Stroke-GAN generates strokes different (5)
from the dataset, the coloring module learns color When training the discriminator on real stroke images
information by analyzing the input image directly so (denoted by x), y = 1, so using Eq. (5), the loss `dr
as to ensure that the rendered canvas is close to the for real strokes is
input image. The design of the coloring module plays `dr = ` D(x), 1 (6)
an important role in ensuring the GAN generates When training the discriminator on fake stroke images
realistic human strokes. Moreover, this design also generated by G, we then have y = 0, so the loss for
Stroke-GAN Painter: Learning to paint artworks using stroke-style generative adversarial networks 793
Fig. 4 Training Stroke-GAN. (a) Stroke samples generated by two comparative methods: Stroke-GAN with coloring module (W CM) and
without coloring module (W/O CM). (b) Loss of Stroke-GAN versus iterations for three different stroke datasets.
794 Q. Wang, C. Guo, H.-N. Dai, et al.
a grid order. Stroke-GAN runs once in one painting the backpropagation algorithm for L(U0 , Cn ). Each
process. The mapping f : S → H uses the transition Cn (the state of the canvas) is rendered by T strokes,
function Cn+1 = f (Cn , Hn+1 ). Stroke selection first and each stroke is produced according to the values
outputs an initial set H1 ; each element in H1 denotes in ht . Therefore, the values of the elements in
an effect for rendering a stroke at a certain position each ht can be updated to the ones needed by
of the canvas. The rendering module then optimizes backpropagation for L(U0 , Cn ).
stroke selection via the stroke-selection-optimization
3.5 Style reconstruction
algorithm and generates a new set of elements Hn ,
which are used to render strokes on the canvas to get 3.5.1 Basics
Cn . We continue the coarse-to-fine process from Cn As explained in Section 3.3, Stroke-GAN is the core
to Cn+1 , where Cn+1 denotes a finer-grained painting component for generating the style. The rendering
(with optimized strokes). Continuing the above process, module also contributes to the stylization of the
we finally obtain the best painting CN . whole painting. In particular, we use the FEN in the
3.4.3 Stroke selection optimization rendering module to extract features of the reference
A rendering module is used to optimize stroke image as well as the painted canvas. This process
selection; it consists of an FEN and the optimization loses some content of the original image but mimics
algorithm. During the painting process, the rendering paintings close to the original image. This step can
module first feeds in both the reference image and make the generated painting differ from but be similar
the painted canvas to compute the distance between to the reference image, providing artistic mimesis or
them, and then optimizes the stroke selection. Stroke “realism”. We reconstruct the painting style by using
selection picks well-behaved strokes generated by the Stroke-GAN and FEN in the rendering module.
the Stroke-GAN. This ensures that the state of the In particular, Stroke-GAN endows the painter with
canvas Cn+1 is better than Cn . This procedure diverse styles of strokes and the FEN in the rendering
continues until one painting process is complete, and module creates the artistic style.
the painting CN is generated after N painting process. 3.5.2 Artistic style
The learning-to-paint process works in a coarse-to-fine Since Ref. [15] indicates that the content objective
manner. preserves only high-level features while the para-
It is a key task to optimize the stroke generated meterization can fill in details, we also only take high-
by the Stroke-GAN in the stroke selection step. The level features as inputs. We adopt two of the most
rendering module first extracts the feature maps of representative deep neural networks: GoogleNet [33]
U0 and Cn , and then processes the difference of the or residual nets (ResNets) [34] for the digital
input and the canvas by computing their `1 -distance. rendering module. The design of using FEN to process
We denote the extracted feature maps of the input the original reference image, rather than directly
reference image U0 and those of the painted canvas Cn using the original image, endows our painter with
by F(U0 ) = {Ij |j = 1, . . . , M } and F(Cn ) = {cj |j = artistic creativity while retaining a high similarity
0, . . . , M }, respectively, where M is the number of to the reference image. We focus on realism in oil
features extracted by the neural rendering module. paintings, so we use ResNet to build the FEN in the
In particular, Ij and cj denote the features of U0 and rendering module. ResNets have high accuracy when
those of a certain state of the canvas Cn , respectively. extracting information [34], so can keep a high fidelity
We calculate the `1 -distance loss function L(U0 , Cn ) as in the extracted features. On the other hand, features
M
1 X extracted by GoogleNet are relatively sparse so that
L(U0 , Cn ) = |Ij − cj | (13)
M j=1 may offer more space for our painterly creativity.
Therefore, we use GoogleNet for watercolor and pastel
This essentially computes the distance between
paintings.
the features of the reference image U0 and those of
canvas Cn . The stroke selection algorithm optimizes 3.5.3 Stroke style
the generated stroke by resetting the values of Different strokes can produce different styles of
the elements in ht based on the `1 -distance loss artwork even when used by the same human artist.
function. The function f(Cn , Hn+1 ) is computed by We provide three kinds of strokes to endow our painter
796 Q. Wang, C. Guo, H.-N. Dai, et al.
with more creativity. Figure 6 shows different stroke 4.2 Comparison of stroke styles
styles for watercolors, oil-painting, and pastels. The We compare paintings generated by different styles
watercolor strokes have smooth, soft contours and of strokes on CelebA, ImageNet, and real-world
the brush paths are simple and pure. In contrast, photos. In Fig. 7 and Fig. 8, the top row shows
oil-painting strokes have sharp contours and volatile the inputs, and the images in successive rows
paths. When stacking multiple strokes on the canvas, show oil painting, watercolor painting, and pastel
the oil-painting texture can be recognised easily due painting results. Img 39 and Img 12 were randomly
to these characteristics. The pastel strokes seem to be selected from CelebA [35], Img 32 and Img 35 were
accumulated from many uneven points (mimicking randomly selected from ImageNet [36], and the others
the granular textures of pastel paintings). These are real-world photos. Figures 7 and 8 show that
different styles of strokes cause the canvas to show all generated paintings exhibit different styles in
different styles of painting. After utilizing different contrast to the reference images. In particular,
FENs to process the reference image in conjunction the oil-painting-stroke paintings in the second row
with different types of strokes, we obtain different well preserve textures, lines, and color features,
styles of paintings. consequently capturing fine details in the reference
images. Meanwhile, we observe stroke textures from
oil paintings, demonstrating oil-painting stylization.
The watercolor-stroke paintings in the third row
exhibit a style between pastel paintings and oil
paintings; this style is good at expressing details
for scenery and building images (see Imgs 26, 15, 11,
21, and 23). The pastel-stroke paintings in the last
Fig. 6 Samples from three datasets used to mimic the styles of row preserve some textures and lines while losing
watercolor, oil-painting, and pastel strokes
some color features.
Figure 9 plots `1 -distances between the generated
4 Experimental results images and the reference images. Specifically, Fig. 9(a)
We have evaluated our approach with several plots the `1 -distance versus painting times for the
experiments. We first describe the implementation. three stroke styles. We observe that all three styles
Then, we evaluate three styles of paintings generated nearly converge after 300 painting times, although
by our painter and compare the output of the the oil-painting stroke style converges faster than
proposed Stroke-GAN Painter to state-of-the-art the other two. The pastel-stroke style paintings
methods. Finally, we study alternatives to under- converge the slowest since they lose more content detail.
stand how our Stroke-GAN Painter generates different Figure 9(b) compares convergence for three types of
styles of paintings by fine-tuning the design. image datasets using the same watercolor-stroke style.
It takes 200 painting iterations to recreate the images
4.1 Implementation
in CelebA, 300 iterations for images from ImageNet,
Our experiments were conducted on a workstation and 400 for real-world photos. The portrait images
with an i7-7700k CPU and an NVIDIA Titan RTX of CelebA are relatively easier to learn than those of
GPU. We evaluated our painter on three image ImageNet and real-world photos as they have fewer
datasets: CelebA [35], ImageNet [36], and real-world features.
photos. These images cover various types of content
4.3 Comparison to prior methods
including portraits, landscapes, animals, plants, and
buildings. All images used in experiments are labelled 4.3.1 Methods
by “Img No.”. Style 1 (Stroke-GAN model 1) and We further evaluate our painter by comparing it
the rendering module using GoogleNet were used to to various state-of-the-art learning-based methods,
generate watercolor paintings, Style 2 and ResNet including Neural Painter (NP) [15], MDRL Painter
were used to generate oil paintings, and Style 3 and (MDLRP) [12], SNP [22], and PaintTF [23]; these
GoogleNet were used to generate pastel paintings. outperform other learning methods and traditional
Stroke-GAN Painter: Learning to paint artworks using stroke-style generative adversarial networks 797
Fig. 7 Three artistic styles of paintings generated by Stroke-GAN Painter using images from CelebA [35], ImageNet [36], and a real-world
photo (Img 15).
Fig. 8 Three artistic styles of paintings generated by Stroke-GAN Painter using images from real-world photos.
798 Q. Wang, C. Guo, H.-N. Dai, et al.
Fig. 11 Comparison results: oil-painting-stroke paintings generated by our painter, MDRLP [12], SNP [22], and PaintTF [23].
User Study I. User Study I used two ques- Painter as their preference, for all pairs of paintings;
tionnaires, the first one to compare pastel style our paintings gained 77% of all votes on average.
artworks, and the second one to compare oil- These high votes imply that paintings from Stroke-
painting style artworks. Since User Study I was GAN Painter present pastel-painting style better than
designed to evaluate the preferences of people for the compared ones.
artworks generated by different methods, we did not Similarly, the second questionnaire evaluated the oil-
emphasize the backgrounds of users in the comparison, painting effect, comparing our painter, MDRLP [12],
although they came from both artistic and non- SNP [22], and PaintTF [23]. The second group
artistic backgrounds. of participants was also chosen to have diverse
In the first questionnaire, we arbitrarily choose backgrounds, age, and gender (40 females, 32 males).
20 images from CelebA [35] (3 images), ImageNet [36] We also select 20 images from CelebA, ImageNet,
(6 images), and real-world photos (11 photos); the and real-world photos to cover different content types
images included various types of contents including (images numbered 21–40 in Fig. 12). We asked par-
landscapes, buildings, animals, and portraits. The ticipants to choose which image is closer to an oil
first group of participants had various backgrounds painting and which they preferred in each set of
(10% with artistic training), age groups (17–50), and images. Again, more participants (31% among four
gender (44 females, 43 males). We evaluated pastel- methods) voted paintings generated by our method
stroke paintings generated by our painter and NP [15]. presenting better oil-painting style than those from
For each reference image (images numbered 1–20 in other methods.
Fig. 12), we obtained a pair of images painted by our User Study II. We used the second user study
painter and NP. We evaluated the user preference to compare paintings generated by our Stroke-GAN
for and stylization of generated images: we asked Painter and other methods in terms of content detail
participants to choose which image better represented and stroke textures. We again used two questionnaires
a pastel-stroke painting and which they preferred in (on a Likert scale [39]) for pastel paintings and oil
each pair of images. Figure 12(a) depicts the results. paintings, separately. The participants were divided
Most users picked the results created by Stroke-GAN into two groups: users with and without an artistic
800 Q. Wang, C. Guo, H.-N. Dai, et al.
Fig. 12 User Study I. (a) Pastel-stroke-painting results generated by Stroke-GAN Painter and NP [15]. (b) Oil-painting results created by
Stroke-GAN Painter, MDRLP [12], SNP [22], and PaintTF [23]. Vertical axis: percentage of users’ preferences for an image. Horizontal axis:
numbered image pairs.
background. All participants were chosen from various Table 2 User Study II. Scores of paintings generated by our methods
and other methods for content details and stroke textures with artistic
age groups (17–50) and gender (20 females, 5 males) background. CI = confidence interval. LB = lower bound. UB =
for each questionnaire. We compared the average upper bound
score (µ), variance (σ), and the 95% confidence 95% CI
Item Method µ σ
interval for paintings generated by each method. LB UB
Tables 1 and 2 give results for the two user groups, NP 3.446 0.225 3.258 3.635
respectively. Ours 3.775 0.229 3.584 3.967
In Table 1 (users without an artistic background),
Contents MDRLP 2.717 0.446 2.459 2.974
the content details of paintings generated by
PaintTF 2.397 0.559 2.074 2.719
MDRLP [12], PaintTF [23], and SNP [22] gained low SNP 2.237 0.567 1.909 2.564
scores (lower than 3). One reason lies in the fact that Ours 3.803 0.313 3.623 3.984
their paintings lose too many details, and the stroke NP 3.594 0.286 3.355 3.833
textures generated by MDRLP [12] are also difficult Ours 3.956 0.221 3.772 4.141
to recognize for most users. In contrast, the paintings
Strokes MDRLP 2.583 0.398 2.353 2.813
generated by Stroke-GAN Painter have better scores PaintTF 3.668 0.214 3.544 3.791
for both content details and stroke textures. Similarly, SNP 3.682 0.153 3.594 3.770
in Table 2 (users with an artistic background), our Ours 3.604 0.239 3.466 3.742
users with artistic background. The scores given for Table 4 User Study III. Scores of paintings generated by our methods
and SOTA methods for color tone and aesthetic beauty with artistic
SNP [22] and PaintTF [23] are higher than those for background. CI = confidence interval. LB = lower bound. UB =
our method although our score is 3.604, close to these upper bound
two methods. 95% CI
Item Method µ σ
In summary, both pastel and oil paintings from LB UB
our method contain more detailed contents than NP 3.857 0.243 3.654 4.060
do the compared methods. For non-artistic users, Ours 3.913 0.250 3.704 4.121
the stroke textures are not well presented by most
Color tone MDRLP 3.823 0.238 3.686 3.961
methods, while artistic users think that PaintTF [23], PaintTF 3.607 0.290 3.439 3.774
SNP [22], and our methods (both pastel and oil- SNP 3.743 0.237 3.606 3.880
painting) present stroke textures well. Ours 4.217 0.357 4.010 4.423
User Study III. In order to evaluate the NP 3.707 0.346 3.418 3.996
aesthetics of the results, we further conducted User Ours 3.753 0.218 3.571 3.935
Study III to evaluate color tone and aesthetic beauty MDRL 3.550 0.327 3.361 3.739
Beauty
of the output paintings. We again used two user-study PaintTF 3.267 0.555 2.946 3.587
questionnaires (using a Likert scale [39]) for pastel- SNP 3.470 0.454 3.208 3.732
stroke paintings and oil-painting-stroke paintings, Ours 3.937 0.400 3.706 4.167
separately. The input images were those used in
User Study II. The participants were divided into
than 3. On the other hand, oil-paintings obtained
two groups: users with an artistic background (15)
lower scores. In particular, paintings generated by
and users without an artistic background (19). The
PaintTF [23] and SNP [22] gained much lower scores
participants came from various age groups (21–40),
than our method and MDRLP [12] for aesthetic
with 18 females and 16 males, for each questionnaire.
beauty. As the scores for content in Table 1 (given by
We compare the average score (µ), variance (σ), and
users without artistic background) are also lower than
the 95% confidence interval for paintings generated
3, we see that the paintings generated by PaintTF [23]
by each method. Tables 3 and 4 give results for the
and SNP [22] lose too much content detail, so users
two user groups.
without an artistic background give low scores for
In Table 3, scores for color tone and aesthetic
beauty. Although users without artistic background
beauty given by users without artistic background for
did not give high scores, the ranking of the compared
pastel paintings (NP [15] and our method) are higher
methods for aesthetic beauty item and color-tone
remain the same.
Table 3 User Study III. Scores of paintings generated by our methods
and SOTA methods for color tone and aesthetic beauty without artistic On the other hand, users with an artistic
background. CI = confidence interval. LB = lower bound. UB = background gave higher scores than users without an
upper bound
artistic background. In Table 4, paintings generated
95% CI by all compared methods obtain scores higher than 3.
Item Method µ σ
LB UB In particular, our paintings score 4.217 for color tone
NP 3.703 0.316 3.535 3.870 and 3.937 for aesthetic beauty. Comparing Tables 4
Ours 3.688 0.306 3.526 3.851 and 3, our method, MDRLP [12], PaintTF [23], and
Color tone MDRLP 3.211 0.274 3.071 3.350 SNP [22] achieve much higher scores than NP [15].
PaintTF 2.934 0.363 2.749 3.119 Evaluation of pastel-stroke paintings generated by
SNP 2.845 0.346 2.588 3.101 NP differ little between users with and without
Ours 3.708 0.190 3.611 3.805 artistic backgrounds. However, the difference between
NP 3.782 0.386 3.577 3.986 these two kinds of users is obvious when evaluating
Ours 3.833 0.293 3.678 3.988 the oil-painting style paintings. For example, for
Beauty MDRLP 2.963 0.327 2.796 3.130 aesthetic beauty, users without artistic background
PaintTF 2.537 0.380 2.343 2.731 give higher scores for paintings by PaintTF than by
SNP 2.484 0.356 2.220 2.748 SNP while users with artistic background give lower
Ours 3.508 0.188 3.412 3.604 scores. However, all users give consistent evaluations.
802 Q. Wang, C. Guo, H.-N. Dai, et al.
In particular, for both color tone and aesthetic the FEN for oil-painting images. We consider a new
beauty, our method is better than the others, while FEN with a combination of GoogleNet and ResNet,
PaintTF [23] and SNP [22] rank lowest. Considering namely (G+R) to generate paintings. We denote
Tables 1–4 overall, our method scores the highest the `1 -distance of features extracted by GoogleNet
among the state-of-the-art methods. NP performs well and the `1 -distance of features extracted by ResNet
for both content and color tone. MDRLP performs by LG (U0 , Cn ) and LR (U0 , Cn ), respectively: see
well for color tone. PaintTF and SNP perform well Eq. (13). The loss function of the G+R network
for stroke texture. can be written as
L(U0 , Cn ) = 0.5LG (U0 , Cn ) + 0.5LR (U0 , Cn ) (14)
4.4 Alternatives
We only perform backpropagation for the final
In this section, we investigate how our painter L(U0 , Cn ). Figure 13 compares results generated
generates different styles of paintings when using by the three different FENs and four stroke styles,
alternative FENs and strokes. Styles 1–4. Style 4 is a new stroke designed with a
4.4.1 Feature-extraction network hollow circle and a cubic Bézier curve; its stroke
Recall that we chose GoogleNet as the FEN for dataset is generated using a similar method to
watercolor and pastel-stroke images, and ResNet as that for Style 1. We see from Fig. 13 that the
Fig. 13 Paintings by various feature-extracting networks (FENs) and different stroke styles. Each row contains images created by models with
one style of strokes and an FEN (GoogleNet, GoogleNet+ResNet, or ResNet), with input images in the first column.
Stroke-GAN Painter: Learning to paint artworks using stroke-style generative adversarial networks 803
model using GoogleNet+ResNet as the FEN can process and generate different styles of paintings.
generate a style of artwork combining the characters In particular, we designed Stroke-GAN to generate
of pastel and oil-painting styles. Interestingly, the various styles of strokes. We model the painting
FEN essentially affects the artistic style of a painting, process as a stroke-state-optimization process, which
and the stroke style affects the painting style of the can be optimized by a deep convolutional neural
painting. Therefore, we confidently infer that various network. Our artistic painter can generate different
combinations of FENs and stroke styles can create a styles of paintings in a coarse-to-fine fashion, like
diversity of artistic styles of paintings. a human painter. User studies of the paintings
4.4.2 Number of strokes generated by Stroke-GAN Painter and competing
We next investigate the impact of the number of methods demonstrate that our painter gained most
strokes. In particular, the canvas is divided into h×w votes for the closeness to pastel paintings and oil
cells. We then generate images with various number paintings. Moreover, images generated by our painter
of strokes in each cell. Figure 14 depicts paintings also preserve more content detail than existing
generated using different numbers of strokes with the methods.
same FEN. The image generated using 2 strokes looks A deep learning algorithm and Stroke-GAN are
colorful and artistically creative although it also loses used to decompose the reference image into a grid
much content detail. Larger numbers of strokes (8 of cells to allow rendering of a sequence of strokes
or 10) lead to an exquisite image, much closer to to achieve the stroke-by-stroke effect. We designed
the reference image than images generated by fewer Stroke-GAN to generate style strokes by learning
strokes (2 or 5). Having an adjustable number of from stroke datasets. Our Stroke-GAN can learn
strokes provides the users with artistic choices. any stroke style, providing the painting agent with
creativity and flexibility. Although our generated
paintings do not compare with masters’ artworks,
5 Conclusions and future work
we have made an important step for learning-based
In this paper, we have presented a stroke-based image AI painting with creative and flexible artistic styles.
rendering approach to mimic the human painting Meanwhile, there is much that can be done to improve
Fig. 14 Paintings generated using 2, 5, 8, or 10 strokes in each cell. Center: reference image.
804 Q. Wang, C. Guo, H.-N. Dai, et al.
the quality of the output, and other artistic styles not International Conference on Multimedia, 244–251,
mentioned in this paper could also be emulated. In 2017.
future, we can combine the advantages of conventional [4] Zhou, W. Y.; Yang, G. W.; Hu, S. M. Jittor-GAN: A
stroke-based methods with learning-based methods to fast-training generative adversarial network model zoo
improve painting quality. On the other hand, we hope based on Jittor. Computational Visual Media Vol. 7,
to develop new style transfer methods in a stroke- No. 1, 153–157, 2021.
by-stroke manner to enrich the artistic styles of AI [5] Haeberli, P. Paint by numbers: Abstract image
painting. representations. ACM SIGGRAPH Computer Graphics
Vol. 24, No. 4, 207–214, 1990.
Availability of data and materials [6] Hertzmann, A. Painterly rendering with curved brush
strokes of multiple sizes. In: Proceedings of the
The data and materials generated during the study 25th Annual Conference on Computer Graphics and
are available from the corresponding authors on Interactive Techniques, 453–460, 1998.
reasonable request. [7] Lee, H.; Seo, S.; Ryoo, S.; Ahn, K.; Yoon, K. A
multi-level depiction method for painterly rendering
Author contributions based on visual perception cue. Multimedia Tools and
Q. Wang designed the study, performed experiments, Applications Vol. 64, No. 2, 277–292, 2013.
and wrote the manuscript. P. Li and H.-N. Dai helped [8] Ha, D.; Eck, D. A neural representation of sketch
to design the study and experiments and supervised drawings. In: Proceedings of the International
the project. C. Guo provided comments and feedback Conference on Learning Representations, 1–16, 2018.
on the study and the results. All authors reviewed [9] Zheng, N.; Jiang, Y.; Huang, D. StrokeNet: A
the manuscript. neural painting environment. In: Proceedings of the
International Conference on Learning Representations,
Acknowledgements 1–12, 2019.
[10] Ganin, Y.; Kulkarni, T.; Babuschkin, I.; Eslami, S. M.
The authors would like to thank the anonymous
A.; Vinyals, O. Synthesizing programs for images using
reviewers for their helpful suggestions and comments. reinforced adversarial learning. In: Proceedings of the
This work was supported in part by the Hong Kong 35th International Conference on Machine Learning,
Institute of Business Studies (HKIBS) Research 1666–1675, 2018.
Seed Fund under Grant HKIBS RSF-212-004, and [11] Xie, N.; Hachiya, H.; Sugiyama, M. Artist agent: A
in part by The Hong Kong Polytechnic University reinforcement learning approach to automatic stroke
under Grant P0030419, Grant P0030929, and Grant generation in oriental ink painting. IEICE Transactions
P0035358. on Information and Systems Vol. E96.D, No. 5, 1134–
1144, 2013.
Declaration of competing interest [12] Huang, Z. W.; Zhou, S. C.; Heng, W. Learning to
The authors have no competing interests to declare paint with model-based deep reinforcement learning.
that are relevant to the content of this article. In: Proceedings of the IEEE/CVF International
Conference on Computer Vision, 8708–8717, 2019.
References [13] Wang, Q.; Guo, C.; Dai, H. N.; Li, P. Self-stylized
neural painter. In: Proceedings of the SIGGRAPH
[1] Wang, L.; Wang, Z.; Yang, X. S.; Hu, S. M.; Zhang, J. Asia 2021 Posters, Article No. 9, 2021.
J. Photographic style transfer. The Visual Computer [14] Song, J. F.; Pang, K. Y.; Song, Y. Z.; Xiang,
Vol. 36, No. 2, 317–331, 2020. T.; Hospedales, T. M. Learning to sketch with
[2] Gatys, L. A.; Ecker, A. S.; Bethge, M. Image style shortcut cycle consistency. In: Proceedings of the
transfer using convolutional neural networks. In: IEEE/CVF Conference on Computer Vision and
Proceedings of the IEEE Conference on Computer Pattern Recognition, 801–810, 2018.
Vision and Pattern Recognition, 2414–2423, 2016. [15] Nakano, R. Neural painters: A learned differentiable
[3] Zhao, Y. R.; Deng, B.; Huang, J. Q.; Lu, H. T.; constraint for generating brushstroke paintings. In:
Hua, X. S. Stylized adversarial AutoEncoder for Proceedings of the 33rd Conference on Neural
image generation. In: Proceedings of the 25th ACM Information Processing Systems, 2019.
Stroke-GAN Painter: Learning to paint artworks using stroke-style generative adversarial networks 805
[16] Deussen, O.; Strothotte, T. Computer-generated pen- Transactions on Multimedia Vol. 21, No. 8, 1960–1970,
and-ink illustration of trees. In: Proceedings of the 2019.
27th Annual Conference on Computer Graphics and [28] Chu, W. T.; Wu, Y. L. Image style classification based
Interactive Techniques, 13–18, 2000. on learnt deep correlation features. IEEE Transactions
[17] Wilson, B.; Ma, K. L. Rendering complexity in on Multimedia Vol. 20, No. 9, 2491–2502, 2018.
computer-generated pen-and-ink illustrations. In: [29] Jia, B.; Fang, C.; Brandt, J.; Kim, B.; Manocha,
Proceedings of the 3rd International Symposium on D. PaintBot: A reinforcement learning approach
Non-photorealistic Animation and Rendering, 129–137, for natural media painting. arXiv preprint arXiv:
2004. 1904.02201, 2019.
[18] Deussen, O.; Hiller, S.; Van Overveld, C.; Strothotte, [30] Justin, R. Elements of art: Interpreting meaning
T. Floating points: A method for computing stipple through the language of visual cues. Ph.D. Thesis.
drawings. Computer Graphics Forum Vol. 19, No. 3, Stony Brook University, 2018.
41–50, 2000. [31] Radford, A.; Metz, L.; Chintala, S. Unsupervised
[19] Deussen, O.; Isenberg, T. Halftoning and stippling. In: representation learning with deep convolutional
Image and Video-based Artistic Stylisation. Compu- generative adversarial networks. In: Proceedings of the
tational Imaging and Vision, Vol. 42. Rosin, P.; International Conference on Learning Representations,
Collomosse, J. Eds. Springer London, 45–61, 2013. 1–16, 2016.
[20] Hertzmann, A. A survey of stroke-based rendering. [32] Donoho, D. L. De-noising by soft-thresholding. IEEE
IEEE Computer Graphics and Applications Vol. 23, Transactions on Information Theory Vol. 41, No. 3,
No. 4, 70–81, 2003. 613–627, 1995.
[33] Szegedy, C.; Liu, W.; Jia, Y. Q.; Sermanet, P.; Reed, S.;
[21] Mellor, J.; Park, E.; Ganin, Y.; Babuschkin, I.;
Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich,
Kulkarni, T.; Rosenbaum, D.; Ballard, A.; Weber,
A. Going deeper with convolutions. In: Proceedings of
T.; Vinyals, O.; Eslami, S. M. A. Unsupervised
the IEEE Conference on Computer Vision and Pattern
doodling and painting with improved SPIRAL. In:
Recognition, 1–9, 2015.
Proceedings of the Neural Information Processing
[34] He, K. M.; Zhang, X. Y.; Ren, S. Q.; Sun, J. Deep
Systems Workshops, 2019.
residual learning for image recognition. In: Proceedings
[22] Zou, Z. X.; Shi, T. Y.; Qiu, S.; Yuan, Y.; Shi,
of the IEEE Conference on Computer Vision and
Z. W. Stylized neural painting. In: Proceedings of
Pattern Recognition, 770–778, 2016.
the IEEE/CVF Conference on Computer Vision and
[35] Liu, Z. W.; Luo, P.; Wang, X. G.; Tang, X. O. Deep
Pattern Recognition, 15684–15693, 2021.
learning face attributes in the wild. In: Proceedings
[23] Liu, S. H.; Lin, T. W.; He, D. L.; Li, F.; Deng, R.
of the IEEE International Conference on Computer
F.; Li, X.; Ding, E.; Wang, H. Paint transformer:
Vision, 3730–3738, 2015.
Feed forward neural painting with stroke prediction.
[36] Russakovsky, O.; Deng, J.; Su, H.; Krause, J.;
In: Proceedings of the IEEE/CVF International
Satheesh, S.; Ma, S. A.; Huang, Z.; Karpathy, A.;
Conference on Computer Vision, 6578–6587, 2021.
Khosla, A.; Bernstein, M.; et al. ImageNet large scale
[24] Gatys, L.; Ecker, A.; Bethge, M. A neural algorithm visual recognition challenge. International Journal of
of artistic style. Journal of Vision Vol. 16, No. 12, 326, Computer Vision Vol. 115, No. 3, 211–252, 2015.
2016. [37] Huang, H. Z.; Zhang, S. H.; Martin, R. R.; Hu, S. M.
[25] Jing, Y. C.; Yang, Y. Z.; Feng, Z. L.; Ye, J. W.; Yu, Y. Learning natural colors for image recoloring. Computer
Z.; Song, M. L. Neural style transfer: A review. IEEE Graphics Forum Vol. 33, No. 7, 299–308, 2014.
Transactions on Visualization and Computer Graphics [38] Tong, Z. Y.; Chen, X. H.; Ni, B. B.; Wang, X. H. Sketch
Vol. 26, No. 11, 3365–3385, 2020. generation with drawing process guided by vector flow
[26] Dutta, T.; Singh, A.; Biswas, S. StyleGuide: Zero-shot and grayscale. Proceedings of the AAAI Conference on
sketch-based image retrieval using style-guided image Artificial Intelligence Vol. 35, No. 1, 609–616, 2021.
generation. IEEE Transactions on Multimedia Vol. 23, [39] Liddell, T. M.; Kruschke, J. K. Analyzing ordinal data
2833–2842, 2021. with metric models: What could possibly go wrong?
[27] Xu, M. L.; Su, H.; Li, Y. F.; Li, X.; Liao, J.; Niu, J. Journal of Experimental Social Psychology Vol. 79, 328–
W.; Lv, P.; Zhou, B. Stylized aesthetic QR code. IEEE 348, 2018.
806 Q. Wang, C. Guo, H.-N. Dai, et al.
Qian Wang received her B.Eng. degree from 2021 to 2022. His current research interests include
in electronic information engineering Internet of Things, big data analytics, and blockchains. He
from Yangtze University, Jingzhou, has co-authored or co-edited 3 monographs and published
China, in 2012, and her M.Eng. degree more than 150 papers in top-tier journals and conferences.
in educational technology from Zhejiang
University of Technology, Hangzhou,
China, in 2016. She is currently pursuing Ping Li received his Ph.D. degree
a Ph.D. degree in computer technology in computer science and engineering
and applications in the School of Computer Science and from The Chinese University of Hong
Engineering, Macau University of Science and Technology. Kong, in 2013. He is currently an
She is also a research assistant with The Hong Kong assistant professor with The Hong
Polytechnic University. Her current research interests include Kong Polytechnic University. He has
image and video stylization, and AI drawing. published many top-tier scholarly
research papers and has one excellent
Cai Guo received his M.Eng. degree in research project reported worldwide by ACM TechNews.
software engineering from Guangdong His current research interests include artistic rendering and
University of Technology, Guangzhou, synthesis, stylization, colorization, and creative media.
China, in 2011. He is currently pur-
Open Access This article is licensed under a Creative
suing a Ph.D. degree in computer
Commons Attribution 4.0 International License, which
technology and applications in the School
permits use, sharing, adaptation, distribution and reproduc-
of Computer Science and Engineering,
tion in any medium or format, as long as you give appropriate
Macau University of Science and
credit to the original author(s) and the source, provide a link
Technology. He is also with Hanshan Normal University,
to the Creative Commons licence, and indicate if changes
Chaozhou, China. His current research interests include
were made.
deep learning, motion deblurring, and AI drawing.
The images or other third party material in this article are
included in the article’s Creative Commons licence, unless
Hong-Ning Dai received his Ph.D.
indicated otherwise in a credit line to the material. If material
degree in computer science and
is not included in the article’s Creative Commons licence and
engineering from The Chinese University
your intended use is not permitted by statutory regulation or
of Hong Kong, in 2008. He is currently
exceeds the permitted use, you will need to obtain permission
an associate professor in the Department
directly from the copyright holder.
of Computer Science, Hong Kong Baptist
To view a copy of this licence, visit http://
University, Hong Kong, China. He was
creativecommons.org/licenses/by/4.0/.
in the Faculty of Information Technology
at Macau University of Science and Technology as an Other papers from this open access journal are available
assistant/associate professor from 2010 to 2021, and the free of charge from http://www.springer.com/journal/41095.
Department of Computing and Decision Sciences, Lingnan To submit a manuscript, please go to https://www.
University, Hong Kong, China, as an associate professor editorialmanager.com/cvmj.
© The Author(s) 2023. corrected publication 2023. This work is published under
http://creativecommons.org/licenses/by/4.0/(the “License”). Notwithstanding the
ProQuest Terms and Conditions, you may use this content in accordance with
the terms of the License.