1. Introduction
Creativity is a rising topic in current society. The emphasis on creativity is valued as a key factor for success in all areas, including, but not limited to, politics, economics, culture, arts and product design. Effective approaches that inspire product designers are thus strongly desirable. One of the effective approaches is visualization. Steven Jobs said: “Creativity is just connecting things. When you ask creative people how they did something, they feel a little guilty because they didn’t really do it, they just saw something.” In this study, we propose a novel image-to-image visual generation approach for product innovation by combining deep learning-based neural style transfer algorithm with Kansei engineering.
In recent years, deep learning methods, such as convolutional neural networks (CNN), have made breakthroughs in many fields, such as computer vision, and are widely applied in object recognition, detection and segmentation [
1,
2,
3]. In the area of product design, Pedro et al. [
4] proposed to train a CNN with standard usability heuristics for evaluating usability, which is an easy method for evaluating usability in thermostats, based on images. Pan et al. [
5] used a scalable deep learning approach to predict and interpret customer perceptions of design attributes for heterogeneous markets. Wang et al. [
6] present a deep learning-based approach to automatically link customer needs to product design parameters. Zhu et al. [
7,
8] extended CNN to generative adversarial networks (GAN) and built a system that can implement the following three applications: (1) manipulating an existing product photo based on an underlying generative model to achieve different looks (shape and color); (2) “generative transformation” of one product image onto another product; (3) generating a new product image from scratch based on user’s scribbles and warping user interface (UI). In addition, Chai et al. [
9] achieved automatic coloring of product sketches. Kim et al. [
10] generated a new product image of one domain given an image from another domain. For example, they took a handbag (or shoe) image as input, and generated its corresponding shoe (or handbag) image. The above studies have been successful. They can generate product images directly. However, they only focus on the exploration of image generation while ignoring user requirements for products. In this paper, we propose a Kansei engineering-based neural style transfer for product innovation (KENPI) framework, which can directly generate high-quality product images and meet user preferences.
Kansei engineering emerged in Japan in the 1970s with the purpose of connecting customers’ affective responses to the design process of products, in an attempt to translate emotions into measurable and physical design specifications [
11]. Many scholars pointed out that satisfying the requirements of users is the key to product design [
12,
13,
14]. As a user-driven method, Kansei engineering has been widely applied in various product designs [
15,
16,
17], such as USB flash drives, running shoes, in-vehicle rubber keypads, and so on. Chang et al. [
18] used Kansei engineering to construct a relationship model between user requirements and steering wheel design parameters. Through this model, a steering wheel was designed to meet the user’s preferences. Although Kansei engineering can maximize user satisfaction, it can only provide theoretical guidance for designers and cannot directly generate products, which is a fatal defect of product form design.
Deep learning-based neural style transfer technology can generate images with high quality. In 2015, Gatys et al. [
19,
20,
21] demonstrated that representations of content and style are separable. We can manipulate both of them independently. They proposed a neural style transfer algorithm to recombine the content of a given photograph and the style of well-known artworks. Although this method can show results of high perceptual quality, it relies on a slow and memory-consuming optimization process, which limits its practical application. Ulyanov et al. [
22] used a feed-forward generation convolution network to replace the optimization process, thereby improving the speed greatly and opening the door for real-time applications. Huang et al. [
23] implemented an arbitrary style transfer by introducing an adaptive instance normalization (AdaIN) layer. Inspired by the above research, we combine Kansei engineering with neural style transfer to develop a novel product innovation design approach.
An evaluation method is needed to verify that the style migration is successful or not. In the product design field, back-propagation (BP) networks are usually used to establish a relationship between product parameters and user evaluation of the product. Chen et al. [
24] developed an integrated design approach based on the numerical definition of product form to design a knife. They also used a BP network to establish the model between the product form features and the consumers’ perception of the product image. Based on the model, the consumer’s evaluation of the knife can be predicted. Alibi et al. [
25] used artificial neural networks to establish the relationship between functional properties and structural parameters of knitted fabrics.
The main contributions of this paper are summarized as follows:
- (1)
We propose to use the deep learning-based neural style transfer technique for new product innovation by reconstructing and merging the color and pattern features of the style image, and then migrating them to the target product. The generated new product design can not only preserve the shape of the target product, but also have features of the style image.
- (2)
To assess whether the style image has been migrated to the product or not, we introduce factor analysis into Kansei engineering and analyze product styles from four perspectives: occasion, fashion, age and structure. Then, we combine Kansei engineering with BP neural networks to establish a relationship model between product properties and styles. Employing the Kansei engineering approach to capture user preferences for neural style transfer-based product design is one of our major contributions here.
- (3)
We applied the proposed KENPI framework to the female coat design problem to demonstrate the value of our method.
The overall structure of this paper is as follows. In
Section 2, we firstly present the overall research framework. Then, we describe the Kansei engineering method and style transfer neural network algorithm. In
Section 3, an example is given to verify the feasibility and effectiveness of the proposed framework, and the related experimental results are shown. Finally, we give our concluding remarks in
Section 4.
2. Methods
2.1. Research Framework
In order to exploit neural style transfer for product innovation design, we combined Kansei engineering, BP networks, and style transfer neural network models to form a generative approach, the KENPI. As shown in
Figure 1, the KENPI framework contains of three parts. In part 1, factor analysis is used to compress Kansei words into product styles to span the product semantic space. Then morphological analysis is used to decompose the product shape into design elements to span the product property space. Finally, a BP network is used to construct a nonlinear mapping model between these two spaces. Because of the cognitive differences between the designer and users, building a quantitative model to obtain product semantics is more objective than subjective evaluation by the designer directly. Through this model, users can also understand their own preference style. In part 2, the selection of style images is guided by the product semantics obtained from the BP model. Then the product image and style image are imported into the style transfer model to generate a new product image. In part 3, we compare the semantics of the product before and after style transfer to assess whether the transfer is successful. Due to the complexity involved in modeling color and texture of the style image and the generated product image, we have chosen a user-oriented semantic differential (SD) method to evaluate the style images and generated products.
2.2. Kansei Engineering
Kansei engineering is one of the main areas of ergonomics (human factors). The term ‘‘Kansei” is a Japanese word that covers the meanings of sensibility, impression, and emotion. It is related to a customer’s physiological and psychological feelings and refers to the cognitive processes of human perception. Kansei engineering has been developed as a consumer-oriented technique to better understand customers’ emotional responses and further translate them into the design elements of a product. In Kansei engineering, consumers often use an adjective, which is called Kansei word [
11], to describe their perceptions of products. Typically, Kansei engineering studies follow a model with four main steps:
(1) Choosing the product domain:
The task in the first step is to define the research object and collect data, including Kansei words and product form images. In order to achieve a complete semantic and formal description as much as possible, we collect these from various sources, such as magazines, e-commerce platforms, the Internet, etc.
The number of Kansei words and sample size affects the quality of the result. Typically, we collect 50–600 Kansei words and samples, so a reasonable reduction must be carried out [
26]. Card sorting is one of the effective methods, using invited experts to screen. First, words (or sample images) are grouped by experts based on their affinity, then a representative for every group is chosen, and, finally, we can obtain representative Kansei words and samples.
(2) Semantic space spanning:
In this step, we usually use cluster analysis, term frequency-inverse document frequency (TF-IDF), factor analysis or other methods to filter the Kansei words, so that the semantic space is more rigorous. In Kansei engineering, cluster analysis and TF-IDF require word vector training. Since the adjectives we collect are separate, factor analysis is more suitable in this paper. The basic purpose of factor analysis is to use fewer factors to reflect the most information of the original variables. In order to obtain the data required by factor analysis, we constructed a questionnaire between Kansei words and sample images, and invited users to complete it.
(3) Properties space spanning:
Kansei engineering approaches usually apply morphological analysis to divide the product into independent items (product properties), and subdivide items into categories. For product design, product form features are commonly defined in graphical terms because it is simple and comprehensible for people to understand complex shapes and patterns [
27].
In this step, we need to divide the products and construct the questionnaire by combining the samples and Kansei words with the 7-point SD scale. The data generated by the questionnaire is applied to the next step, namely, relationship model building.
(4) Relationship model building:
In this step, we need to associate the properties space with the semantic space. Commonly used methods are multiple regression analysis, Hayashi’s quantification method I, artificial neural networks, and so on. Because BP neural networks have high nonlinear mapping ability and good fault tolerance, it is more suitable to build the relationship model between the properties space (product items) and semantic space (Kansei words). We used the data obtained in the previous step to train the model.
The details of the above steps are presented in
Section 3 where we show the analysis step when applying the technique to female coat design.
2.3. Neural Style Transfer Network
Style transfer is the technique of recomposing one image in the style of another. A content image and a style image are used to create an output image, whose “content” mirrors the content image and whose style resembles that of the style image. Batch normalization (BN), instance normalization (IN) and AdaIN are commonly used in neural style transfer. BN calculates the mean and variance of each channel for a batch of samples, while IN independently calculates the mean and variance for each channel and sample. The AdaIN layer is similar to IN, but it has no learnable affine parameters. Instead, it adaptively computes the affine parameters from the feature representations of an arbitrary style image.
Figure 2 shows an overview of our neural style transfer network. We adopt the “Encoder-AdaIN-Decoder” architecture.
Our style transfer network
takes a content image
and a style image
as inputs, and synthesizes an output image that recombines the content of the former and the style of the latter. The encoder
is a fixed Visual Geometry Group (VGG) 19 which is pre-trained on the ImageNet dataset for image classification. The structure of VGG-19 is shown in
Figure 3. In VGG 19, each layer takes the output of the previous layer to extract more complex features until the object is identified. Each layer can be considered as an extractor of many local features.
After encoding the content and style images in feature space, we feed both feature maps to an AdaIN layer that aligns the mean and variance of the content feature maps to those of the style feature maps, producing the target feature maps
:
where:
is the encoder;
and
are the content and style image;
and
are the mean and variance.
A randomly initialized decoder
is trained to map
back to the image space, generating the stylized image
:
The decoder mostly mirrors the encoder, with all pooling layers replaced by nearest up-sampling to reduce checkerboard effects. We use reflection padding in both and to avoid border artifacts.
We use the pre-trained VGG-19 to compute the loss function to train the decoder:
where:
is the loss function;
and
are the content loss and its weight; and
and
are the style loss and its weight.
The content loss is the Euclidean distance between the target features and the features of the output image. We use the AdaIN output
as the content target, instead of the commonly used feature responses of the content image:
Since our AdaIN layer only transfers the mean and variance of the style features, our style loss only matches these statistics:
where each
denotes a layer in VGG-19 used to compute the style loss. The objective is to minimize the content and style losses.
3. Empirical Study
In this research, a case study of designing female coats was conducted to verify practicality and effectiveness of the proposed framework. It had the following steps: (1) use factor analysis to extract Kansei words and divide the coat into eight styles; (2) adopt speciation analysis to obtain the properties of the coat; (3) use a BP network to establish the relationship model between the coat style and its properties; (4) use the neural style transfer model to transfer the style image to the target product; and, (5) evaluate the new coat and check if it has been transferred successfully.
3.1. Product Domain Selection
In this stage, we collected 100 Kansei words and 200 female coat images from magazines, e-commerce platforms and the Internet. Five fashion designers were invited to reduce the number of samples and Kansei words using the card sorting method. This left 100 coat samples and 30 Kansei words (Relaxed, Natural, Peaceful, Formal, Strict, Capable, Modern, Fashionable, Particular, Classical, Traditional, Conservative, Mature, Steady, Sweet, Young, Energetic, Simple, Plain, Delicate, Luxurious, Dynamic, Clear, Romantic, Warm, Soft, Noble, Female, Sexy and Elegant).
3.2. Semantic Space Spanning
Questionnaires were constructed by combining the 100 samples and 30 Kansei words. We invited 15 women (5 designers and 10 consumers) to evaluate the coat images on a 7-point Likert scale (7 corresponding to ‘‘strongly agree’’ that the image-word is very close to the image, and 1 corresponding to ‘‘strongly disagree’’). Their ages ranged from 18 to 45. Through the questionnaire, we obtained dataset 1 with a dimension of 15 × 30 × 100, where 15 indicates the number of women who completed questionnaires, 30 indicates the number of Kansei words, and 100 indicates the number of sample images. We also calculated the average of 100 images (based on dataset 1) to obtain dataset 2 (15 × 30).
Factor analysis is a statistical method used to convert many observable variables into few latent factors. That is to say, several related variables are classified into the same class, and each class becomes a factor. The cumulative percentage of the factors is considered to determine the number of factors. Generally, the number of factors should account for more than 60% of the total variance [
28]. We can use factor analysis to convert 30 Kansei words into few latent factors. In MATLAB (MathWorks, Natick, MA, USA), firstly, we converted dataset 2 into a matrix of 15 × 30, then the principal component method was used as an extraction technique and the rotation of varimax was adopted as an orthogonal rotation method [
29]. Finally, the results shown in
Table 1 were obtained.
As shown in
Table 1, Factor 1, Factor 2, Factor 3, and Factor 4 account respectively for 17%, 17%, 15%, and 15% of the variance. The cumulative percentage of variance is 64%, which is more than 60%. Clearly, it is appropriate to divide 30 adjectives into four main factors. The four factors are named by considering the Kansei words’ loading coefficient: Factor 1 describes the degree of professional or leisure style. It is defined based on the occasion of usage, comprising these adjectives: “Relaxed”, “Natural”, “Peaceful”, “Formal”, “Strict” and “Capable”. Factor 2 describes the degree of vogue or classic style. It is defined based on the fashion degree, comprising these words: “Modern”, “Fashionable”, “Particular”, “Classical”, “Traditional” and “Conservative”. Factor 3 describes the degree of grand or youth style. It is defined based on the age of users, comprising these words: “Mature”, “Steady”, “Sweet”, “Young” and “Energetic”. Factor 4 describes the degree of simple or delicate style. It is defined from the structure of the coat, comprising these words: “Simple”, “Plain”, “Delicate” and “Luxurious”. Through factor analysis, we obtain eight kinds of coat style.
3.3. Property Space Spanning
Since we are studying the form semantics of products at this stage, we need to remove the interference from other information such as color, pattern, texture, and so on. By simple Photoshop processing, we obtained the cutting illustration of 100 samples.
Figure 4 shows some of the samples.
The morphological analysis is used to divide the female coat into seven items: model, waist, length, collar, sleeve, pocket and opening. Then, these items are subdivided into 24 categories, as shown in
Table 2.
We constructed another questionnaire (
Figure 5) by combining the cutting illustrations of the samples’ eight styles with the 7-point SD scale. Fifteen women were invited to finish the 100 questionnaires. By sorting the data from the questionnaires, we obtained dataset 3 with a dimension of 15 × 100 × 8. We calculated the average of 15 females to obtain dataset 4 (100 × 8).
3.4. Relationship Model Building
In this section, we establish a BP network-based relationship model between product parameters and styles. To make the structure of the relationship model easy to design and have good functional performance, a three-layer neural network structure was selected. Specific steps were as follows:
(1) Model Construction
The structure of BP network is shown in
Figure 6. The input layer consists of seven coat parameters (
). Hence, the number of neurons is seven. The output layer is composed of four groups of styles (professional-leisure, vogue-classic, grand-youth, simple-delicate), so the number of neurons is four. The empirical formula for the number of hidden neurons is:
In Equation (6), , and are the number of neurons in the hidden, input, and output layers, respectively, and is the empirical value (). Through repeated trials, the number of neurons in the hidden layer is determined.
(2) Model training and results
As stated above, we had 100 samples in dataset 4. We used the k-fold cross-validation (CV) method to evaluate our model. First, the sample set was randomly divided into five subsets, each with 20 samples. In these five subsets, each subset was used as the verification set while the remaining four subsets constituted the training sets. The 100 samples and their corresponding style evaluation values (dataset 4) were used to train the BP networks. We repeated trials with different numbers of neurons (
p) in the hidden layer. The comparison results are shown in
Table 3.
As can be seen from
Table 3, when
p = 11, the minimum CV error of 0.324 is obtained. Since Kansei evaluation is a qualitative analysis and the value is a range, our result is satisfactory. The result shows that our relationship model has high reliability for female coat style prediction.
3.5. Female Coat Style Transfer
Our KENPI framework was developed using Python. Specifically, both the BP relational model and the style transfer model were built using the Python programming language along with Python modules such as NumPy, TensorFlow, SciPy, etc. All experiments were run on a Dell Precision workstation (Dell Inc., Round Rock, TX, USA) with Intel i9-7900X and Nvidia Tian XP and Ubuntu 16.04. operating system (Canonical Ltd., London, England). We used the Microsoft common objects in context (MS-COCO) dataset [
30] for content images to train our network, and datasets mostly collected from WikiArt for the style images, following the setting of [
31]. Each dataset contained roughly 80,000 training examples. We used the Adam optimizer [
32] and a batch size of eight content–style image pairs. During training, we first resized the smallest dimension of both images to 512 while preserving the aspect ratio, then randomly cropped regions of size 256 × 256. Since our network is fully convolutional, it can be applied to images of any size during testing.
We randomly selected six female coats as the content images (
Figure 7a), and inputted their parameters into the BP model to obtain their styles. Then we chose the same style of image as the style image for style transferring (
Figure 7b). We constructed questionnaires as shown in
Figure 8 to obtain images’ styles. Thirty women consumers were invited to finish the questionnaires. By sorting the data from questionnaires, we obtained the final results shown in
Table 4.
We inputted the content images (
Figure 7a) and the style images (
Figure 7b) into our style transfer model to obtain new product images.
To verify whether the style of image was transferred successfully to the product, we constructed another questionnaire (
Figure 9). Thirty women consumers were invited to finish the questionnaires. By sorting the data from questionnaires, and calculating the averages, we obtained the results shown in
Figure 10. For result 1, its leisure-style score changed from 2.5 to 2, which indicates that the leisure semantic is weakened while professional semantic is enhanced. The classic-style score of result 1 changed from 4.3 to 5, which indicates that the vogue semantic is weakened while classic semantic is enhanced. Although this change was the largest in result 1, it does not mean it is the strongest, because the closer to the two ends of the scale, the harder it is for the semantics to be enhanced [
33]; the score of 4.3, however, is close to the middle. The grand-style and simple-style of result 1 both changed from 3.2 to 3, which indicates that the grand and simple semantics are enhanced. Although they have the same value, since the standard deviation of the grand-style (1.3) is greater than the simple-style (0.5), the evaluation results of the grand-style are more discrete, indicating that the semantic of the simple-style is stronger. The same enhancements also happened in result 2 to result 6.
In order to illustrate the effectiveness of the proposed framework, we chose another 20 samples to repeat the experiment. The results are shown in
Figure 11. There were two samples (sample 15 and 18) whose scores did not increase and the corresponding images are shown in
Figure 12. Out of the 20 samples, ten samples’ scores increased by 0–0.5, five samples’ scores increased by 0.5–1, and three samples’ scores increased by 1–3. About 90% of the samples had increased scores while the remaining 10% had neither increased nor weakened scores.
Table 5 shows the evaluation results of the generated images of sample 15 and 18. For image 15, eight people thought it was a grand-style, while seven people thought it was a youth-style. Similarly, six people thought that image 18 was a fashion-style, while seven people thought it was a classic-style. Their votes were very close and the styles were difficult to decide. Finally, we defined image 15 as a grand-style and image 18 as a classic-style. The evaluation results of samples 15 and 18 are shown in
Figure 13. Both of them show small changes, and their maximum standard deviation (1.7) is greater than the samples in
Figure 10, indicating that the results of samples 15 and 18 are more discrete. In summary, the controversy of the style image affects the style evaluation of the generated product.
These conclusions are also consistent with human subjective analysis. The above results show that transferring the style image to female coats’ form can enhance its semantics. That is, the new product generated from the same style of product form and image will have a stronger style semantic. The results show that the styles of the style images have been migrated to the target products successfully.
3.6. Other Results
In order to illustrate the universality of the proposed framework, we conduct another experiment in this section. As shown in
Figure 14, by taking the product images and style images as inputs, our framework can automatically generate new product images blending with the new style while preserving the basic design. We have shown style transfer results for a child’s shoe, handbag, dress, sofa, and car. Since our framework can change the color and texture of an image, it can be used for automatic coloring of sketches. Furthermore, this framework allows the user to select arbitrary style images, which can be applied to product customization and other production pattern designs, such as packaging box design, fashion design, advertisement design, etc.
4. Conclusions
With the development of the social economy, customer demands have become more diversified and personalized, thereby motivating designers to break with conventional wisdom and seek new ways of innovative product design. In this research, we propose KENPI, a deep learning and Kansei engineering-based framework for product innovation design. Firstly, we use Kansei engineering to obtain user preferences and establish the BP mapping model between product properties and semantics. Through the BP model, we obtain the semantics of the selected content image and use it to guide the selection of the style image. Secondly, we construct a style transfer model to transfer the style image to the content image and generate the new product. Finally, we compare the semantics of the product before and after transfer to assess whether the style image has been migrated to the product or not. Taking the female coat as an example, we demonstrate the effectiveness and feasibility of KENPI. While deep learning-based neural style transfer has been used to generate product or regular images before, our work is the first to combine it with user preferences captured via Kansei engineering, which provides a solid foundation for neural style transfer-based product design.
Although our framework can automatically generate new products without human intervention, the evaluation of style images is based on questionnaires rather than objective models, which imbues our framework with a certain degree of subjectivity. Therefore, in the future, we will focus on improving the objectivity of style image evaluation to improve our framework.