Towards Sustainable Development of Online Communities in the Big Data Era: A Study of the Causes and Possible Consequence of Voting on User Reviews

Zhao, Jie; Wang, Jianfei; Fang, Suping; Jin, Peiquan

doi:10.3390/su10093156

Open AccessArticle

Towards Sustainable Development of Online Communities in the Big Data Era: A Study of the Causes and Possible Consequence of Voting on User Reviews

by

Jie Zhao

¹,

Jianfei Wang

¹,

Suping Fang

¹ and

Peiquan Jin

^2,*

¹

School of Business, Anhui University, Hefei 230601, China

²

School of Computer Science and Technology, University of Science and Technology of China, Hefei 230026, China

^*

Author to whom correspondence should be addressed.

Sustainability 2018, 10(9), 3156; https://doi.org/10.3390/su10093156

Submission received: 14 July 2018 / Revised: 30 August 2018 / Accepted: 31 August 2018 / Published: 4 September 2018

(This article belongs to the Special Issue Big Data Research for Social Sciences and Social Impact)

Download

Browse Figures

Versions Notes

Abstract

:

This paper focuses on the review voting in online communities, which allows users to express their own opinions in terms of User-generated Content (UGC). However, the sustainable development of online communities is likely to be affected by the social influence of UGC. In this paper, we study the so-called crowd intelligence paradox of review voting in online communities. The crowd intelligence paradox means that the quality of reviews is not highly connected with the increasing of review votes. This implies that a review with many votes is likely to be of low quality, and a review with few votes is likely to be of high quality. The crowd intelligence paradox existing in online communities inhibits users’ wishes of participating in social networks and may impact the sustainable development of online communities. Aiming to demonstrate the existence of the crowd intelligence paradox in online communities, we first analyzed a large set of reviews crawled from Net Ease Cloud Music, which is one of the most popular online communities in China. The maximum likelihood (ML) and the hierarchical regression approaches are used in this step. Then, we construct a new research model called the Voting Adoption Model (VAM) to study how different factors impact the crowd intelligence paradox in online communities. Particularly, we propose six hypotheses based on the VAM model and conduct experiments based on the measurement model and the structural model to evaluate the hypotheses. The results show that the quality of reviews is not influential to review votes, and the hot-site attribute is a dominant factor influencing review voting. In addition, the variables of the VAM model, including information credibility, perceived ease of use, and social influence have significant impacts on review voting. Finally, based on the empirical study, we present some research implications and suggestions for online communities to realize healthy and sustainable development in the future.

Keywords:

Social network; sustainable development; review voting; online community; paradox

1. Introduction

With the rapid development of social networks, more and more people tend to spend a lot of time in online communities, such as online video sharing platforms or online music platforms. For example, Net Ease Cloud Music (http://music.163.com) is one of the most popular online entertainment communities in China and has attracted millions of users. The large number of users in online communities can bring huge marketing values and business values to enterprises, e.g., through online advertising. So far, online communities are mostly based on the application of User-Generated Content (UGC), which allows users to create personalized information and share them to online communities [1]. Many UGC systems adopt a voluntary voting mechanism to identify valuable reviews that can help users browse content efficiently [2]. However, we note that there is a paradox of review voting in online communities, i.e., the number of votes is not consistent with the quality of the corresponding content. Especially, some content with low quality surprisingly receives more votes than others with high quality. Such a paradox has a close relationship with the crowd intelligence of online communities, and might have an impact on the sustainable development of online communities. Previous studies have pointed out that the bias of ratings existed in online reviews [3]. However, so far there are no studies focusing on the paradox of review voting in online communities.

In this paper, we regard the paradox of review voting in online communities as the crowd intelligence paradox. The crowd intelligence paradox has negative effects on resource allocation and the sustainable development of online communities, because it may lower the quality of UGC, which violates the initial objective of the review voting mechanism, i.e., to recommend high quality content. Furthermore, online communities have more users than traditional communities. Therefore, the crowd intelligence paradox in online communities will lead to an unfair distribution of community resources such as content ranking, exposure time, and bonus points. As a result, the crowd intelligence paradox of review voting might inhibit users who wish to participate in UGC and hinders the development of online communities.

This paper aims to address the crowd intelligence paradox and to study the inconsistency between review voting and review quality in online communities. Particularly, we aim at answering two research questions:

(1): What is the paradox of review voting?
(2): Why does the crowd intelligence paradox exist?

In order to answer what the paradox of review voting is, we designed two aspects of work. Firstly, we analyzed the distribution of review votes in a real online community (Net Ease Cloud Music) using the power law distribution theory, in which a large data set including 351, 578 reviews are used. This aims at exploring the distribution of review votes in online communities and matches the 80/20 rule—only a few reviews can receive votes. Secondly, we also propose a hierarchical-regression-based method to verify the existence of the crowd intelligence paradox. This aims at indicating that review voting is not closely connected with the quality of reviews. The above successive issues will quantitatively measure the existence of the crowd intelligence paradox in online communities.

Furthermore, aiming at answering factors influencing the crowd intelligence paradox, we present a new research model named Voting Adoption Model (VAM), which is an integration of the information adoption model (IAM), the self-determination theory (SDT), and the signal theory. Based on the VAM model, we propose six hypotheses and analyze the imbalance of voting motives and the signals of the crowd intelligence paradox based on tests with respect to measurement model evaluation and structural model evaluation. In summary, we make the following contributions in this paper:

(1): We first study the crowd intelligence paradox in online communities, and experimentally reveal the distribution features of the review votes in online communities.
(2): We conduct experiments over a reviews dataset and discover that the number of review votes is not consistent with the quality of corresponding UGC, which demonstrates the existence of the crowd intelligence paradox of review voting in online communities.
(3): We propose a new research model called Voting Adoption Model (VAM) and six hypotheses to study why the crowd intelligence paradox exists in online communities. Compared with information credibility, perceived ease of use, and social influence, the influence of commentary quality is not significant. Hence, high-quality reviews do not always receive high votes.
(4): We present discussions on the experimental results and give some suggestions for the sustainable development of online communities.

To the best of our knowledge, this work provides the first empirical study on the crowd intelligence paradox in online communities. The crowd intelligence paradox of review voting reflects an unreasonable allocation of community resources. High-quality resources are not necessarily recognized, while low-quality ones get a lot of attention. This is a potential crisis for the UGC ecological environment and may restrict the traffic monetizing. Understanding the mechanism of the existence of the crowd intelligence paradox in review voting can help people solve the voting problem by increasing the cost of the voting signals and enhancing the demand of autonomy and competence.

The remainder of the paper is organized as follows. In Section 2, we survey the related work. In Section 3, we combine the distribution of review votes in online communities with a hierarchical regression to discuss the measurement of the crowd intelligence paradox. In Section 4, we construct a research model to study why the crowd intelligence paradox exists. Section 5 presents discussions of the data analysis as well as some suggestions for the sustainable development of online communities. Finally, Section 6 concludes the paper.

2. Related Work

2.1. User-Generated Content (UGC)

The network economy is increasingly dependent on creativity. With the technological advances provided by Web 2.0, users can creatively share UGC in communities. The most commonly cited definition for UGC is given by the Organization for Economic Co-operation and Development (OECD): Publicly available content, creative effort, and non-professional staff [4]. UGC can be divided into rational users (e.g., knowledge sharing), perceptual users (e.g., social activities, entertainment), group collaboration (e.g., Wikipedia), and individual creation (e.g., blog, reviews) by user type [5]. Online communities can be divided into trading communities, interest communities, fantasy communities, and relationship communities [6]. Traditional UGC research was more focused on online reviews and eWOM of trading communities. This paper focuses on review voting in online communities.

User-generated content has many research areas and can be divided into four categories: Users’ participant motivation [7,8], the influence of content [9,10,11], content mining [12,13], and content management [14,15,16]. In these studies, online reviews [17] and eWOM [18] are the most abundant themes, which focus on three areas: Consumers’ acts of releasing reviews [19], consumers’ perceived value of reviews [20], and consumers’ shopping decisions under the effect of UGC [21]. Among them, the helpfulness of online reviews is one of the important works of UGC research in business areas. The helpfulness of online reviews refers to identify helpful reviews to support consumer purchase decisions. [17]. For example, Amazon asks users a question about whether this review is helpful to you.

Many scholars have discussed the helpfulness of the comment from three perspectives: Information source (e.g., platform reputation [22], publisher characteristic [23], and product type [20]), information content (e.g., quality [24], quantity [25], valence [26], attribute [27], and expression [28]), and information receiver (e.g., consumer knowledge [29] and involvement [30]). They found the above factors have a significant impact on the helpfulness of reviews.

With the rapid increasing of the volume and scale of Web data, massive UGC content increased the information overload, so the industry began to use the “votes” of this social voting mechanism to identify helpful content (e.g., Facebook’s thumb up). It has been studied through software usage data and Amazon commentary data that the information characteristics (e.g., basic characteristics, style features and semantic features) and commentator characteristics (e.g., commentator confidence) have a significant influence on the number of votes [2,31].

While many studies have looked at what makes an online review helpful (the helpfulness of content), little is known on what makes an online review receive votes (review voting). In addition, some scholars measure helpful content by a percentage of useful votes in the total number of votes [17]. We want to know whether the vote is as deviant as the rating data in trading reviews [3]. While it has been argued that more helpful comments are more likely to be voted for [2], the scope of the study is limited to the online reviews of electronic business platforms. Furthermore, it ignored the characteristics of review recipients. In addition, previous studies lacked a study of non-commercial information content in online communities. The number of votes has significantly affected the value perception of UGC content publishers [22] and studying review voting of online communities helps to improve the UGC ecosystem, which will help improve users’ stickiness and traffic monetizing.

2.2. IAM, SDT and Signal Theory

Review voting reflects recipients’ acceptance of content information. The theory of information communication points out that in the course of information communication, information content, information sources, and receiver characteristics all influence recipients’ acceptance and the persuasion of information. Among many persuasion process models, the Heuristic-Systematic Model (HSM) [32] and the Elaboration Likelihood Model (ELM) [33] are the most representative. The Information Adoption Model (IAM) [34] is an extension of ELM on network environments. The process of influencing people’s decision-making is seen as a process of information adoption. The model regards the quality of information as a central path and the information source as an edge path. While the model is generally used [35], it only focuses on characteristics of information, such as quality, credibility, and usefulness. The influence of information, however, should not be limited to characteristics of information [23].

Thus, we add information receiver characteristics to explore influences about what is the feature of review voting and why. Those receiver characteristics mainly relate to voting motives, perceived ease of use and so on. Theoretical explanation comes from a set of theories described below, including self-determination theory (SDT), technology acceptance model (TAM), and signal theory.

Scholars apply the Theory of Motivation to explain individual behavioral intentions, including UGC behavior. For example, the self-determination theory (SDT) was used to successfully explain how UGC publishers perceive and respond to others’ feedback [22]. Self-determination theory is composed of basic psychological need theory, cognitive evaluation theory, organismic integration theory, and causality orientation theory [36]. The organismic integration theory points out the internalization of external motivation, and the basic psychological need theory points out the important influence of autonomy, competence, and relatedness. Social relations are the main part of the demand for users in online communities, and social relations have a significant driving effect on UGC generation [10]. We also believe that online users will exchange their voting to get some kind of reward, including community honor, altruism satisfaction, and social entertainment.

The technology acceptance model (TAM) focuses on behavioral characteristics of users and influencing factors. Perceived usefulness and perceived ease of use affect the uses’ motivation and action by affecting the users’ attitude of information technology [37]. Review voting needs to rely on network platforms. The voting function is easy to understand and the perceived usefulness affects the users’ acceptance.

Review voting is based on the original and old votes, so the number of votes becomes a signal, which affects users’ judgment and analysis. Signal theory points out that there are two indispensably key features: High cost (e.g., higher education) and being readily observable (e.g., company financial statement) [38]. The latest research discussed the degree of consumer acceptance of expert blogs that form the perspective of the cost of signals [9]. The expert blog is to consumers what the UGC is to users. In addition, users want to know others’ views on the review and the information between users is asymmetric. So the number of votes becomes a clear signal, a representation of how many people recommend the comment. Therefore, this paper will use signal theory to explain reasons for review voting.

In conclusion, this paper focuses on the review voting in online communities. We not only uncovered the paradox of review voting from the perspective of information characteristics, but also explored the influencing factors of why the paradox exists from the perspective of receiver characteristics, on the basis of SDT and signal theory.

3. Validation of the Crowd Intelligence Paradox

In this section, we validate the following two propositions: (1) The distribution of review votes in online communities matches the 80/20 rule and only a few comments can receive votes; and (2) the inconsistency between review voting and review quality exists in the reviews receiving many votes.

Practically, we study the distribution of review votes in online communities by conducting data analysis on a real online community, Net Ease Cloud Music, which is one of the most popular online entertainment communities in China. In Section 3.1, we give the details about the vote dataset and subsequent survey data, and in Section 3.2, we present the results of distribution tests on the reviews dataset. Finally, we measure the crowd intelligence paradox in Section 3.3.

3.1. Dataset

Net Ease Cloud Music was officially released in 23 April 2013. Compared with other music software, Net Ease Cloud Music pays more attention to music socialization and creates music community with UGC, especially its music comment function attracting so many people. As of April 2017, Net Ease Cloud Music users exceeded 300 million people, singles exceeded 400 million. We choose Net Ease Cloud Music community as a research platform to study the paradox of review voting. We need to prepare two kinds of data, one is the vote data from Net Ease Cloud Music, and the other is the community survey data. The respondents to the survey are randomly selected from reviewers of above 17 songs, and all the reviews delivered by the respondents are among the total 351, 578 reviews.

3.1.1. Vote Data

The vote dataset is prepared by two steps: Data crawling and text mining. We crawled review votes from Net Ease Cloud Music and finally got a dataset containing 351, 578 reviews on 17 songs.

The format of crawled data is shown in Table 1.

In order to effectively distinguish high quality reviews from low quality ones, we use text classification to perform review classification on the crawled data. There are many classification models proposed in the area of data mining, such as SVM (Support Vector Machine), KNN (K nearest neighbors), and Naive Bayes Model (NB) [39]. The process is shown in Figure 1.

In the manual annotation step, we extract a random sample of 20,000 reviews from our set (351,578 reviews). Five postgraduates do the manual annotation about quality in five days from 25 August 2017 to 30 August 2017. The quality score is labeled by a five-point Likert-scale, ranging from strongly disagree (1) to strongly agree (5), according to personal perception. Because they belong to users of Net Ease cloud music, personal labeling is a simple but effective way. We roughly split the annotated reviews according to the average score [9]. Basically, the reviews with above-average scores are marked as high-quality reviews, and those with below-average scores are regarded as low-quality reviews. We did not use the full scoring scale because the annotated data was not large enough to conduct a full-scale analysis. Consequently, we get 8500 high quality reviews (above-average scores) and 8500 low quality reviews (below-average scores). The data is tested by the significance testing of an internal consistency reliability (Cronbach’s α = 0.809).

In the feature selection step, we select single and double words as features. In the step of dimension reduction, we use a Chi-square method based on the features. We select 60% data as the training data and the rest are used as test data. Then, we construct classifiers with different classification algorithms over the training data. To ensure the generality of the experiments, we randomly change the sequence of records in the training dataset and repeat the training process five times to get the average results.

Table 2 shows the classification results on reviews, where four classification models are compared, including Bernoulli Native Bayes (denoted as BernoulliNB in Table 2), Multinomial Native Bayes (denoted as MultinomialNB in Table 2), Logistic Regression, and SVM. We use the metrics of precision, recall, and the F-measure to indicate the performance of classification. These metrics are commonly used in classification and text mining.

3.1.2. Survey Data

The survey dataset is designed using a multi-item approach. All variables were carried out by a seven-point Likert-scale, ranging from strongly disagree (1) to strongly agree (7). Items were borrowed from previous literature and modified for the context of this study.

The questionnaire consists of three parts:

(1): Sample selection: Ask a question of whether user vote on the hot review in a song.
(2): Sample characteristic: This part mainly measures the sex, age, and community age of user.
(3): Variable questionnaire: As shown in Table 3, this study includes six latent variables, which are information quality (IQ), information credibility (IC), perceived ease of use (EOU), social influence (SI), information usefulness (IU), and vote adoption (A). Items were borrowed from previous literature and modified for the context of online reviews. Specifically, ‘EOU’ is adapted from the study of Gefen et al. (2003). ‘IQ’ is assessed by adapting three items used by Park et al. (2007). ‘IC’ is based on Prendergast et al. (2010). Finally, to examine ‘IU’ and ‘A’, six statements were adopted from Ismail Erkan et al. (2016).

The 500 questionnaires are distributed by artificial network through the short message of Net Ease Cloud Music. Respondents are randomly equably selected form reviewers of above 17 songs. A total of 273 questionnaires are recovered and finally obtains 244 after the cleaning work. The effective recovery rate was 48.8%. We then examined if the common method bias is a concern in this study. An exploratory factor analysis of all items extracted six factors which explain 75.08% of all the variance, with no single factor accounting for significant loadings (p < 0.10) for all items. We conclude that the common method variance (CMV) is probably not a concern in this data set. The sample characteristics are shown in Table 4.

3.2. Distribution of Review Votes

Table 5 shows the descriptive statistic of the vote data, which contains 351,578 reviews. We can see that the distribution of the number of votes is extremely skewed. Among the whole data set, the maximum value is 125,381 while the minimum value is zero. Additionally, 99.5% of the votes are less than 21. Figure 2 shows the distribution of votes ranging from 0 to 20.

By observing review voting in Net Ease Cloud Music, we suppose that review voting may match the power law distribution. Power law distribution implies that the probability of occurrence of an event is extremely skewed. The event of small observation occurs in large numbers. However, the event of big observation occurs in small numbers. As UGC connects massive users, users’ commentary reply behaviors over time intervals have been found to match the power law distribution [43].

As a long tail of the power law distribution is complex and has a greater volatility, there may be a large error when using the least squares method. In addition, it cannot be effectively compared with other forms of distributions. Thus, according to the method of fitting the power law distribution data [44], we apply the Maximum Likelihood (ML) method on the dataset and evaluate the results with the Kolmogorov-Smirnov testing method. Based on the algorithm developed by Python 3.6 [45], we examine the votes k and the corresponding occurrence number P(k) for the 351, 578 reviews. The results are shown in Table 6 and Figure 3. We discovered that the data distribution has a multi-segment truncation feature and does not conform to a strict power law distribution.

As shown in Table 6, in the high range from 16,068 to 125,381, the power law is satisfied (α = −2.68, P = 0.057). However, compared to the entire data set, it discards a lot of data. In the lower range from 1 to 303, the power law is not satisfied (α = −1.31, P = 0.16).

Therefore, we conclude that the distribution of review voting is not always a power law distribution. There are two possible reasons. First, the power law distribution of online behavior changes with time, meaning that different voting behavior may occur in different time. Second, the dataset is relatively small compared to the entire community data; thus it cannot cover all review characteristics. However, we can see that review voting matches the 80/20 rule in our study. An event of small votes occurs in large numbers while an event of big vote occurs in small numbers.

3.3. Measuring the Crowd Intelligence Paradox

In this, Section 3.3, we aim to measure the existence of the crowd intelligence paradox in online communities. From Section 3.2, we know the distribution of review votes in online communities and matches the 80/20 rule and only a few reviews can receive votes. As to reviews with many votes, if they failed to reflect the quality of review content, this voting mechanism cannot reflect crowd intelligence, and it means that there is a crowd intelligence paradox in online communities.

3.3.1. Measures

Based on previous literature about the characteristics of information and information publisher, we choose content length, release time, content quality, hot-site (a mark of whether the content is/was entering the hot list), and the characteristics of information publisher, including activity, fans, likes, and levels. We construct a hierarchical regression model to analyze the effect of variables by the ordinary least squares method (OLS).

The multiple regression model is set as follows:

\begin{array}{l} \ln ({Voting}_{i}) = β_{0} + β_{1} \ln (L e v e l_{i}) + β_{2} \ln (A c t i v i t y_{i}) + β_{3} \ln (F a n s_{i}) + β_{4} \ln (L i k e_{i}) + β_{5} \ln (T i m e_{i}) + \\ β_{6} \ln (W o r d s_{i}) + β_{7} \ln ({W o r d s}_{i})^{2} + β_{8} \ln (Q u a l i t y_{i}) + β_{9} H o t s i t e_{i} + u_{i} \end{array}

(1)

There is a dependent variable in our model: Voting. We measure voting by the increased number of votes [2] to a music review between two periods—20 August 2017 and 21 September 2017 in our experimental dataset. For instance, if a review had 100 votes on 20 August 2017, and had 300 votes on 21 September 2017, the increased number of the votes is 200. There are eight independent variables, including quality, time, words, activity, fans, hot-site, level, and like, as shown in Table 7. The variable quality is measured by the score of content quality, which comes from our text mining of content. High quality review valence is the value of the probability of quality classification. We measure time by a time difference between the release time of some review in a song and the earliest review in the same song. The variable words are measured by the number of words in the review. The variable hot-site is measured by a binary value indicating whether a review was included in the hot list on 20 August. “1” denotes that the review was in the hot list and “0” means that the review did not appear in the hot list. In addition, and through the collection of community data, we measured four commentator variables: Level, by the level of community account; activity, by the mark that often logs in and browses the community content; fans, by the fans of the user; and likes, by the number of users who take an initiative to pay attention to someone (e.g., Facebook’s Likes). Table 7 presents the descriptive statistics and correlations of the variables.

3.3.2. Results

We analyzed the sub-dataset with the hierarchical regression method using Stata 14.0 and the ordinary least squares method (OLS). Table 8 shows the results of the regression model between the independent variables and the dependent variable, as shown in the left-most column of Table 7.

Considering the problem of heteroscedasticity, we applied the logarithmic transform on all variables (except the Hot-site). We used heterosexuality-robust standard error to fix the model. Based on the test of Variance Inflation Factor (VIF ranges from 1.3 to 4.9) and the above correlations of variables, we argue that there is no severe multiple collinearity.

We first estimated a basic model that contains time, words and the other four commentator variables. We report the results in column (1) of Table 8. Those results are similar to the previous studies [2,17,30]. Specifically, the effect of content length is an inverted U curve (Words: β = 0.595, P < 0.001; Words2: β = −0.239, P < 0.001) [17]. The more words the review has, the more information it can present. However, if a sentence had too much information, it would bring an information burden to users and affects the acceptance of information [2]. Four commentator variables have different influences to review voting. Fans has a significant influence (β = 0.023, t = 8.76, P < 0.001), which is corresponding with the fact, because the commentator will have more personal channels to get votes from fans. Activity (β = −0.008, t = −4.52, P < 0.001) and Like (β = −0.012, t = −5.67, P < 0.001) have a significant negative influence, which may indicate that paying attention to external activities does not attract others to focus on themselves. Additionally, the action reduces to a time to think and write. Furthermore, because of the weak ties of network community, people do not pay more attention to others who concern about themselves. Additionally, the level (β = −0.004, t = −0.88, P > 0.05) has no significant influence. As sharing and listening to music is the main activity in Net Ease Cloud Music, the review part is relatively un-associated with experience level.

Second, we add quality to the estimation and the results are presented in column (2) of Table 8. We can see that quality has a positive influence on voting (β = 0.036, t = 4.99, P < 0.001). However, the coefficient value is low and the model fitting degree changes very little (

Δ {\bar{R}}^{2}

= 0.0003, P < 0.001).

Third, we add hot-site to the estimation and the results are shown in column (3) of Table 8. The model fitting degree changes a lot (

Δ {\bar{R}}^{2}

= 0.1882, P < 0.001) and hot-site has a significant positive influence on voting (β = 2.498, t = 41.28, P < 0.001). This indicates that the acquisition of hot-site will make votes increase by 249.8%. But 1% promotion of quantity only lets votes increase by 0.022%, which indicates that hot-site are more important than quality. In addition, the words (β = 0.508, t = 18.01, P < 0.001), time (β = 0.022, t = 34.49, P < 0.001) and fans (β = 0.008, t = 5.00, P < 0.001) have much smaller coefficients than hot-site. Therefore, hot-site is a dominant factor of review voting. This result is easy to understand. The hot-site value of a review is either one (i.e., the review is listed in the hot list) or zero (the review is not listed in the hot list). In our study, if a review is listed in the hot list of the online community “Net Ease Cloud Music” on 20 August 2017, its hot-site value is set to one, otherwise is set to zero. Generally, the reviews in the hot list are more likely to be viewed and commented on by users, yielding the increasing of the review votes to the hot-site reviews. This indicates that the hot-site value of reviews has a highly positive influence on review votes.

What is more, time (β = 0.022, t = 34.49, P < 0.001) indicates that early reviews are easier to get votes, because users are more likely to be attracted by new songs.

Combined with the distribution of review votes, reviews with many votes fail to reflect the quality of review content, and this voting mechanism cannot reflect crowd intelligence. Hence, the crowd intelligence paradox exists in review voting in online communities. Additionally, the crowd intelligence paradox is different from previous studies [2,30,31], which argued that the quality of reviews was the main factor influencing review voting in trading communities.

In addition, we found out that hot-site have a dominant influence on review voting. The distribution of review votes matches an extreme 80/20 rule and the content is inconsistent with quality. The voluntary voting mechanism is a way to recommend high quality content, but this mechanism sometimes results in content with low quality. Thus, the crowd intelligence paradox affects the UGC ecological environment.

4. Factors Influencing the Crowd Intelligence Paradox

In this section, we analyze why the crowd intelligence paradox exists. From Section 3, we know that the crowd intelligence paradox exists. However, it is much more important for enterprises to know the reasons behind the paradox. Thus, we construct a research model that includes a set of variables to study the influences of different factors on the crowd intelligence paradox. Compared with previous research, this fresh model explores the influencing factors of why the paradox exists from the perspective of receiver characteristics, on the basis of SDT and signal theory.

4.1. Research Model and Hypotheses

First, based on the organismic integration theory of SDT [36], the internalization of external motivation has four steps, including external regulation, introjection regulation, identification regulation, and integration regulation. Autonomy, competence and relatedness are three important factors to promote the internalization of external motives and influence actions. Usually, these three needs make up an organic entirety. However, relatedness is a decisive factor for review voting. Autonomy and competence almost do not work because of the special environment.

Specifically, social relations are the main part of the demand for user relations in online communities [18]. First, because of the community’s emphasis on the users’ experience, the vote does not require operational capabilities. Second, comment on some songs is easy to understand, so it does not need much cognitive effort. Third, people are completely autonomous in voting. As previous studies pointed out, self-efficacy is not related with the comment [19]. Therefore, relatedness is relatively more important. Meanwhile, community points externally influenced review voting. What is more, altruism and herd mentality strengthen this voting behavior. Taking into account the tendency of collectivism in Chinese culture, this strengthening is more significant. That means only relatedness can affect users’ voting behaviors to some extent, at least to some degree. “Readily praise (unconditionally vote)” becomes a custom under the imbalance of voting motives. People use voting rights to exchange social satisfaction.

On the other hand, except for the number of votes, there is no more signal in Net Ease Cloud Music. People tend to judge the quality of the content based on the number of votes. But the signal is useless because it does not have a higher time cost, operating cost, reputation cost, and cognitive cost [38]. Thus, the fake votes influence user behaviors.

Specifically, while watching a comment, users can vote without waiting. Moreover, operating costs can be ignored due to the strengthening of perceived ease of use. The mechanism of content voting is extremely convenient, for increasing community traffic and interactions. In addition, voting has anonymity. Users’ reputation will not change after voting. Furthermore, people do not want to think about the true value of the content for the herd mentality.

Therefore, the low cost signals cannot be an effective tool for reducing or eliminating information asymmetry [9]. That means that the votes cannot represent the quality of the content. But the community usually tends to design rankings according to the number of votes. The paradox comes with the useless signal.

In summation, we propose the following variables, including information quality, information credibility, perceived ease of use, and social influence to study the factor influences on the crowd intelligence paradox. Based on this, we present the voting adoption model (VAM) in Figure 4.

Next, we propose the following assumptions.

Hypothesis 1 (H1).

Information quality of review significantly increases users’ decision on review usefulness.

Hypothesis 2 (H2).

Information credibility of review significantly increases users’ decision on review usefulness.

Hypothesis 3 (H3).

Perceived ease of use significantly increases users’ decision on review usefulness.

Hypothesis 4 (H4).

Social influence significantly increases users’ decision on review usefulness.

Hypothesis 5 (H5).

Social influence significantly increases users’ decision to provide a vote on the review.

Hypothesis 6 (H6).

Information usefulness significantly increases users’ decision to provide a vote on the review.

4.2. Measurement Model Evaluation

The measurement model evaluation mainly includes three pieces of content: Reliability analysis, convergent validity, and discriminant validity.

Reliability is usually examined by using internal consistency reliability and composite reliability. As shown in Table 9, the internal consistency reliability is over 0.70. The composite reliability (CR) of all constructs is also over 0.70. Therefore, the reliability validity is achieved, which indicates that the variables within the VAM model are consistent.

Convergent validity is usually examined by using a composite reliability (CR) and an average variance is extracted (AVE). As shown in Table 9, all items load significantly on their respective constructs and none of the loadings are below the cutoff value of 0.60. The AVE of each variable is over 0.50 (except one value which is 0.485). Thus, the convergent validity is achieved.

Discriminant validity is proposed to examine whether a measurement is a reflection of any other measurement. As shown in Table 10, the square root of AVE for each variable is greater than other correlation coefficients. Thus, the discriminant validity is achieved.

4.3. Structural Model Evaluation

Next, we perform structural model evaluation using AMOS 23.0. The results of hypotheses verification are presented in Table 11, and Table 12 shows the goodness of fitting.

Five hypotheses between variables are found statistically significant while one hypothesis is not significant.

H1 states that the information quality significantly increases users’ decision on review usefulness, which is not supported (b = 0.04, p > 0.5).

H2 states that information credibility significantly increases users’ decision on review usefulness, which is supported (b = 0.24, p < 0.001).

H3 states that perceived ease of use significantly increases users’ decision on review usefulness, which is supported (b = 0.29, p < 0.001).

H4 states that social influence significantly increases users’ decision on review usefulness, which is supported (b = 0.37, p < 0.001).

H5 states that social influence significantly increases users’ decision to provide a vote on the review, which is supported (b = 0.55, p < 0.001).

H6 states that information usefulness significantly increases users’ decision to provide a vote on the review, which is supported (b = 0.22, p < 0.001).

Additionally, in order to test the mediating effect of the information usefulness in the model, we choose the bootstrapping method (using 2000 samples) [46]. At a significant level of 95%, the mediation of information usefulness exists between information quality and vote adoption (0.057, 0.218), and between perceived ease of use and vote adoption (0.160, 0.520). The mediating effect of the information usefulness is achieved, which means that voting the hot review is usually as a result of information usefulness.

5. Discussions and Suggestions

5.1. Discussions

According to the data analysis in Section 4, the survey data confirms that reliability, perceived ease of use, and social influence are the main factors influencing review voting. However, the correlation between behavior and text quality has not been validated. The imbalance of voting motives and the failure of voting signals may be the causes of the uselessness of information quality. Therefore, the analysis supports our assumption on the causes of the crowd intelligence paradox.

Specifically, strong social needs repress the work of autonomy and competence. An economic person is embedded in a social structure [47]. In online communities, everyone is deeply embedded in a community environment. They constantly collect information and make decisions in the interaction process between individuals and then consciously or unconsciously make a choice under the influence of others. When people provide a vote to a song, they are implicitly connected with other users who follow the song. As the voters and the followers of a song are supposed to have similar interests, they are much likely to interact with each other in the online music community. So the quality of a review is insignificant. The satisfaction of social need is much greater than the satisfaction of the cognitive value of a review [19]. At the same time, group praises form a huge pressure of identity. Voting for a hot review is a way to escape the pressure of judgment. Therefore, “Readily praise (unconditionally vote)” becomes a custom under an imbalance of voting motives.

Information reliability has a significant impact, which is similar to previous studies. The reliability of the information source (as an edge path) has an impact on information receivers [30]. However, there is a lack of heterogeneous signals in online communities, compared to the trading community. Users assess the reliability and often rely on the number of previous points. Because the low-cost signals cannot be an effective tool for reducing or eliminating information asymmetry, the number of votes cannot represent the quality of content. A comment that is relatively low or even of no value will receive the majority of votes under the fake signal. It may mislead users, which is similar to the bias of the ratings [3]. This principle can also be observed by a counter example. For example, Stack Overflow (https://stackoverflow.com/) is the largest, most trusted online community for developers to learn, share their programming knowledge, and build their careers. A voting mechanism is relatively good in Stack Overflow. Because the knowledge of communication is more professional and standard, users need to spend a certain time to do a mental work. Based on a professional and technical literacy, users pay more attention to their reputations. Therefore, the cost of voting is high and the signal is relatively good.

5.2. Suggestions for the Sustainable Development of Online Communities

Based on the empirical research, we provide some research implications and advice for the sustainable development of online communities.

We suggest increasing the cost of the voting signal. Based on the cost point, we can set a time interval to avoid voting for three reviews in a second. This may increase a selection cost and a time cost, while potentially not affecting normal voting. We can also set a negative vote to amend the integral of votes by forcing users to actively and subjectively evaluate. In order to avoid a negative feedback from negative votes to publishers, we can only open to the recipient and cannot display information publicly.

We suggest setting up multiple signals for review voting in online communities. Due to a lack of signals, users usually judge the value of content based on the number of votes. We can set multiple signals to weaken the strength of the voting signal. For example, give a preset score for content (through arithmetic) and then encourage people to independently revise the score. Because changing the score requires some certain ability, it may enhance the satisfaction of autonomy and competence. Therefore it may reduce the impact of social voting motives.

It is better to provide behavioral data, especially the record of voting data, in the review voting environment. For example, we can list the historical content of the votes, as well as the feedback of voting, specifying how many people vote for the same content. This may make users reflect their voting behavior and enhance the demand of autonomy and competence.

Enterprises need to avoid traffic traps. A key indicator of community development is user traffic. Community administrators are accustomed to seeing a large number of frequent voting interactions as a help of improving community activity—however, it ignores the harm of the “Readily praise (unconditionally vote)”. Content publishers often face many praises, but only a small amount of feedback content. This long-term, frequent, and ineffective interaction affects the publishers’ enthusiasm. Community managers should carefully avoid traffic traps and maintain the UGC ecological environment. Additionally, paying more attention to traffic and less attention to quality is an extensive development pattern. At early stages, massive users provide natural resources for online communities. Additionally, network externality has strengthened the power of a crowd. However, the Internet is becoming more and more mature and the demographic dividend is relatively disappearing. Refined management will become a feasible way to get competitive advantages.

6. Conclusions and Future Work

In this paper, we discovered what the crowd intelligence paradox of review voting in online communities is and why it exists. We first testify that the distribution of review votes in online communities matches the 80/20 rule and only a few comments can receive votes. Then, as to those reviews with many votes, not all reviews are related to high-quality comments. We found that they fail to reflect the quality of review content and this voting mechanism cannot reflect crowd intelligence, and it means that there is a crowd intelligence paradox in online communities. The above successive aspects quantitatively measure the existence of the crowd intelligence paradox in online communities.

Furthermore, we constructed a new research model called VAM (voting adoption model) to uncover the imbalance of voting motives and the failure of voting signals. The survey data confirms that reliability, perceived ease of use, and social influence are main factors influencing review voting. Based on SDT, strong social needs repress the work of autonomy and competence. The satisfaction of social need is much greater than the satisfaction of the cognitive value of a review. Therefore, “Readily praise (unconditionally vote)” becomes a custom under an imbalance of voting motives. On the other hand, because the low-cost signals cannot be an effective tool for reducing or eliminating information asymmetry, the number of votes cannot represent a quality of content. A review that is relatively low or even of no value will receive the majority of votes under the fake signal.

One limitation of this study is that the manual annotation might not reflect users’ real intention, because the involved volunteers for annotation are different from users in online communities. This is always a crucial issue in classification and other text mining tasks, and in the future we will explore other feasible approaches that can get results close to natural voting. Another future work is to investigate more online communities to verify the crowd intelligence paradox. In addition, we will also explore the application of the research results on a real online community to improve the efficiency and effectiveness of online businesses.

Author Contributions

J.Z.: Conceptualization, Project administration, Supervision, and Writing—original draft. J.W.: Data curation and Formal analysis. S.F.: Investigation and Software. P.J.: Funding acquisition, Supervision, and Writing—review & editing.

Funding

This paper is partially supported by the National Science Foundation of China (Nos. 71273010 & 61672479) and the Doctor Start-up Fund of Anhui University.

Acknowledgments

We would like to thank the editors and anonymous reviewers for their suggestions and comments to improve the quality of the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Choi, B.; Lee, I. Trust in open versus closed social media: The relative influence of user-and marketer-generated content in social network services on customer trust. Telemat. Inform. 2017, 34, 550–559. [Google Scholar] [CrossRef]
Kuan, K.; Hui, K.L.; Prasarnphanich, P.; Lai, H.Y. What makes a review voted? An empirical investigation of review voting in online review systems. J. Assoc. Inf. Syst. 2015, 16, 48–71. [Google Scholar] [CrossRef]
Hu, N.; Pavlou, P.; Zhang, J. On self-selection biases in online product reviews. MIS Q. 2017, 41, 449–471. [Google Scholar] [CrossRef]
Vickery, G.; Wunsch-Vincent, S. Participative Web and User-Created Content: Web 2.0 Wikis and Social Networking; Organization for Economic Cooperation and Development (OECD): Paris, France, 2007. [Google Scholar]
Krishnamurthy, S.; Dou, W. Note from special issue editors: Advertising with user-generated content: A framework and research agenda. J. Interact. Advert. 2008, 8, 1–4. [Google Scholar] [CrossRef]
Armstrong, A.; Hagel, J. The real value of online entertainment communities. Knowl. Communities 2000, 74, 85–95. [Google Scholar]
Wang, W.; Li, Y. How trust and need satisfaction motivate producing user-generated content. J. Comput. Inf. Syst. 2017, 57, 49–57. [Google Scholar] [CrossRef]
Li, G.; Yang, X. Effects of social capital and community support on online community members’ intention to create user-generated content. J. Electron. Commer. Res. 2014, 15, 190. [Google Scholar]
Luo, X.; Gu, B.; Zhang, J.; Phang, C. Expert Blogs and Consumer Perceptions of Competing Brands. MIS Q. 2017, 41, 371–395. [Google Scholar] [CrossRef]
Shriver, S.; Nair, H.; Hofstetter, R. Social ties and user-generated content: Evidence from an online social network. Manag. Sci. 2013, 59, 1425–1443. [Google Scholar] [CrossRef]
Christodoulides, G.; Jevons, C.; Bonhomme, J. Memo to marketers: Quantitative evidence for change. J. Advert. Res. 2012, 52, 53–64. [Google Scholar] [CrossRef]
Zhao, J.; Wang, X.; Jin, P. Feature selection for event discovery in social media: A comparative study. Comput. Hum. Behav. 2015, 51, 903–909. [Google Scholar] [CrossRef]
Jin, P.; Mu, L.; Zheng, L.; Zhao, J.; Yue, L. News feature extraction for events on social network platforms. In Proceedings of the Companion Proceedings of the 26th International Conference on World Wide Web Companion (WWW’17), Perth, Australia, 3–7 April 2017; pp. 69–78. [Google Scholar]
Yang, Y.; Wang, X.; Guan, T.; Shen, J.; Yu, L. A multi-dimensional image quality prediction model for user-generated images in social networks. Inf. Sci. 2014, 281, 601–610. [Google Scholar] [CrossRef]
Ahn, D.; Duan, J.; Mela, C. Managing user-generated content: A dynamic rational expectations equilibrium approach. Mark. Sci. 2015, 35, 284–303. [Google Scholar] [CrossRef]
Chen, J.; Xu, H.; Whinston, A.-B. Moderated online entertainment communities and quality of user-generated content. J. Manag. Inf. Syst. 2011, 28, 237–268. [Google Scholar] [CrossRef]
Mudambi, S.; Schuff, D. What makes a helpful online review? A study of customer reviews on amazon.com. MIS Q. 2010, 34, 185–200. [Google Scholar] [CrossRef]
Hennig-Thurau, T.; Gwinner, K.P.; Walsh, G.; Gremler, D.D. Electronic word-of-mouth via consumer-opinion platforms: What motivates consumers to articulate themselves on the internet? J. Interact. Mark. 2004, 18, 38–52. [Google Scholar] [CrossRef]
Cheung, C.; Lee, M. What drives consumers to spread electronic word of mouth in online consumer-opinion platforms. Decis. Support Syst. 2012, 53, 218–225. [Google Scholar] [CrossRef]
Luan, J.; Yao, Z.; Zhao, F.T.; Liu, H. Search product and experience product online reviews: An eye-tracking study on consumers’ review search behavior. Comput. Hum. Behav. 2016, 65, 420–430. [Google Scholar] [CrossRef]
Jabr, W.; Zheng, Z. Know yourself and know your enemy: An analysis of firm recommendations and consumer reviews in a competitive environment. MIS Q. 2014, 38, 635–654. [Google Scholar] [CrossRef]
Wang, X.; Li, Y. Trust, psychological need, and motivation to produce user-generated content: A self-determination perspective. J. Electron. Commer. Res. 2014, 15, 241–253. [Google Scholar]
Erkan, I.; Evans, C. The influence of eWOM in social media on consumers’ purchase intentions: An extended approach to information adoption. Comput. Hum. Behav. 2016, 61, 47–55. [Google Scholar] [CrossRef]
Xu, X.; Yao, Z. Understanding the role of argument quality in the adoption of online reviews: An empirical study integrating value-based decision and needs theory. Online Inf. Rev. 2015, 39, 885–902. [Google Scholar] [CrossRef]
Fan, Y.; Miao, Y.; Fang, Y.; Lin, R. Establishing the adoption of electronic word-of-mouth through consumers’ perceived credibility. Int. Bus. Res. 2013, 6, 58–65. [Google Scholar] [CrossRef]
Wang, F.; Liu, X.F.; Fang, E. User reviews variance, critic reviews variance, and product sales: An exploration of customer breadth and depth effects. J. Retail. 2015, 91, 372–389. [Google Scholar] [CrossRef]
Korfiatis, N.; GarcíA-Bariocanal, E.; Sánchez-Alonso, S. Evaluating content quality and helpfulness of online product reviews: The interplay of review helpfulness vs. Review content. Electron. Commer. Res. Appl. 2012, 11, 205–217. [Google Scholar] [CrossRef]
Moore, S. Attitude predictability and helpfulness in online reviews: The role of explained actions and reactions. J. Consum. Res. 2015, 42, 30–44. [Google Scholar] [CrossRef]
Zhu, F.; Zhang, X. Impact of online consumer reviews on sales: The moderating role of product and consumer characteristics. J. Mark. 2010, 74, 133–148. [Google Scholar] [CrossRef]
Cheung, M.; Sia, C.; Kuan, K. Is this review believable? A study of factors affecting the credibility of online consumer reviews from an ELM perspective. J. Assoc. Inf. Syst. 2012, 13, 618. [Google Scholar] [CrossRef]
Cao, Q.; Duan, W.; Gan, Q. Exploring determinants of voting for the “helpfulness” of online user reviews: A text mining approach. Decis. Support Syst. 2011, 50, 511–521. [Google Scholar] [CrossRef]
Chaiken, S. The heuristic model of persuasion. Ont. Symp. Soc. Influ. 1987, 5, 3–39. [Google Scholar]
Cacioppo, J.; Petty, R.; Kao, C.; Rodriguez, R. Central and peripheral routes to persuasion: An individual difference perspective. J. Pers. Soc. Psychol. 1986, 51, 1032–1043. [Google Scholar] [CrossRef]
Sussman, S.; Siegal, W. Informational influence in organizations: An integrated approach to knowledge adoption. Inf. Syst. Res. 2003, 14, 47–65. [Google Scholar] [CrossRef]
Knoll, J. Advertising in social media: A review of empirical evidence. Int. J. Advert. 2016, 35, 266–300. [Google Scholar] [CrossRef]
Deci, E.; Eghrari, H.; Patrick, B.; Leone, D. Facilitating internalization: The self-determination theory perspective. J. Pers. 1994, 62, 119–142. [Google Scholar] [CrossRef] [PubMed]
Zhao, J.; Fang, S.; Jin, P. Modeling and quantifying user acceptance of personalized business modes based on TAM, trust and attitude. Sustainability 2018, 10, 356. [Google Scholar] [CrossRef]
Connelly, B.; Certo, S.; Ireland, R.; Reutzel, C. Signaling theory: A review and assessment. J. Manag. 2011, 37, 39–67. [Google Scholar] [CrossRef]
Reyes, A.; Rosso, P. Making objective decisions from subjective data: Detecting irony in customer reviews. Decis. Support Syst. 2012, 53, 754–760. [Google Scholar] [CrossRef] [Green Version]
Gefen, D.; Karahanna, E.; Straub, D. Trust and TAM in online shopping: An integrated model. MIS Q. 2003, 27, 51–90. [Google Scholar] [CrossRef]
Park, D.; Lee, J.; Han, I. The effect of on-line consumer reviews on consumer purchasing intention: The moderating role of involvement. Int. J. Electron. Commer. 2007, 11, 125–148. [Google Scholar] [CrossRef]
Prendergast, G.; Ko, D.; Yin, V. Online word of mouth and consumer purchase intentions. Int. J. Advert. 2010, 29, 687–708. [Google Scholar] [CrossRef]
Wu, Y.; Zhou, C.; Chen, M. Human comment dynamics in on-line social systems. Phys. A Stat. Mech. Appl. 2010, 389, 5832–5837. [Google Scholar] [CrossRef]
Clauset, A.; Shalizi, C.; Newman, M. Power-law distributions in empirical data. SIAM Rev. 2009, 51, 661–703. [Google Scholar] [CrossRef]
Alstott, J.; Bullmore, E.; Plenz, D. Powerlaw: A Python package for analysis of heavy-tailed distributions. PLoS ONE 2014, 9, e85777. [Google Scholar] [CrossRef] [PubMed]
Preacher, K.; Rucker, D.; Hayes, A. Addressing moderated mediation hypotheses: Theory, methods, and prescriptions. Multivar. Behav. Res. 2007, 42, 185–227. [Google Scholar] [CrossRef] [PubMed]
Granovetter, M. Economic action and social structure: The problem of embeddedness. Am. J. Sociol. 1985, 91, 481–510. [Google Scholar] [CrossRef]

Figure 1. The process of reviews classification.

Figure 2. Distribution of votes (0, 20).

Figure 3. The power law distribution of the dataset.

Figure 4. The voting adoption model (VAM).

Table 1. Data crawling format.

ID	Time	Votes	Hot Site	Content
335940343 ……	2017/8/27 21:04 ……	13,051 ……	1 ……	The music is g ………
ID	Activity	Like	Fans	Level
335940343 ……	46 ……	17 ……	24,871 ……	7 ……

Table 2. Classification of reviews.

	Precision	Recall	F1-Score	Support		Precision	Recall	F1-Score	Support
BernoulliNB’s accuracy is 0.96					MultinomiaNB’s accuracy is 0.91
low	1.00	0.93	0.96	3661	low	0.89	0.92	0.91	3291
high	0.92	1.00	0.96	3139	high	0.92	0.89	0.91	3509
ratio	0.96	0.96	0.96	6800	ratio	0.91	0.91	0.91	6800
Logistic Regression’s accuracy is 0.96					SVM’s accuracy is 0.71
low	0.98	0.93	0.96	3579	low	1.00	0.63	0.77	5359
high	0.93	0.98	0.96	3221	high	0.42	0.99	0.59	1441
ratio	0.96	0.96	0.96	6800	ratio	0.87	0.71	0.73	6800

Table 3. Variables used in the VAM model.

Variable	Item
Perceived ease of use [40]	EOU1: The vote function is easy to use.
	EOU2: It is easy to become skillful at using the vote function.
	EOU3: It is easy to interact with the vote function.
Social influence	SI1: Voting on the hot review will help the announcer.
	SI2: I like to vote the content for connecting other people
	SI3: There are many people visiting what I vote.
Information quality [41]	IQ1: I think they are understandable.
	IQ2: I think they are clear.
	IQ3: In general, I think the quality of them is high.
Information credibility [42]	IC1: I think they are convincing.
	IC2: I think they are credible.
	IC3: I think they are familiar.
Information usefulness [23]	IU1: I think they are generally useful.
	IU2: I think they are generally informative.
	IU3: I think they are generally appreciative.
Vote adoption [23]	A1: It is very likely that I will vote the review.
	A2: I will definitely vote the review.
	A3: I will recommend the review to my friends.

Table 4. Sample characteristics.

Measure	Gender		Age				Years of Registration
Measure	Male	Female	18–23	24–28	29–33	34–39	1	2	3	4
Frequency	132	112	113	83	22	26	29	31	85	99
Percentage (%)	54.1	45.9	46.3	34.03	9.02	10.66	11.89	12.7	34.84	40.57

Table 5. Descriptive statistics of the vote data.

Vote	Mean	Median	Max	Min	Sum	Standard Deviation	Reviews#
[0, 20]	1.76	1	20	0	617,283	2.18	349,581
[21, 125,381]	1330.19	64	125,381	21	2,656,404	6485.78	1997
All	9.31	1	125,381	0	3,273,687	476.66	351,578

Table 6. The power law index table over the dataset.

Power Law Index
Vote range	[1, 303]	[16,068, 125,381]
power law index	−1.31	−2.68
Sig.	0.16	0.057
Logarithmic Likelihood Ratio Compared with Other Distributions *
	R, p	R, p
Exponential	1173.38, 0.0	3.74, 0.00
Log-normal	−40.78, 0.0	−0.29, 0.44
Truncated power law	−22.50, 1.96	−0.57, 0.28
Stretched exponential	−45.94, 0.0	−0.35, 0.39

* R is logarithmic likelihood ratio of two distributions. When R is positive, then the first distribution is preferred; otherwise, we should consider the second distribution. P is the significance.

Table 7. Correlations and descriptive statistics over the vote data.

Variables	Type	1	2	3	4	5	6	7	8	9
1. Voting	Dependent	1
2. Quality	Independent	0.10	1.00
3. Time	Independent	0.07	0.09	1.00
4. Words	Independent	0.13	0.58	0.03	1.00
5. Activity	Independent	−0.01	−0.02	−0.10	−0.02	1.00
6. Fans	Independent	0.02	−0.02	−0.08	−0.04	0.57	1.00
7. Hot-site	Independent	0.44	0.03	−0.02	0.04	0.02	0.06	1.00
8. Level	Independent	0.00	−0.01	−0.01	−0.03	0.38	0.48	0.01	1.00
9. Like	Independent	−0.02	−0.04	−0.14	−0.06	0.55	0.56	0.01	0.37	1.00
Mean		2.49	0.27	54.81	23.03	13.07	37.43	0	5.99	21.47
Max		11,392	1	948.81	2.5	5248	136,089	1	10	2333
Min		0	0	0	1	0	0	0	0	0
Std. Dev.		88.74	0.39	170.57	26.18	69.53	795.7	0.04	1.9	65.81

Table 8. Results of the regression model over the vote data.

Variable	(1)	(2)	(3)
Voting	−0.085 (−11.59)	−0.075 (−10.30)	−0.067 (−10.44)
Level	−0.004 (−0.88)	−0.004 (−0.97)	0.004 (1.13)
Activity	−0.008 (−4.94)	−0.008 (−4.94)	−0.006 (−4.52)
Fans	0.023 (8.76)	0.023 (8.79)	0.008 (5.00)
Like	−0.012 (−5.67)	−0.019 (−5.85)	−0.007 (−4.33)
Time	0.020 (25.93)	0.020 (25.40)	0.022 (34.49)
Words	0.595 (18.8994)	0.557 (17.61)	0.508 (18.01)
Words2	−0.239 (−17.4796)	−0.225 (−16.37)	−0.205 (−16.71)
Quality		0.036 (4.99)	0.022 (3.47)
Hot site			2.498 (41.28)
${\bar{R}}^{2}$	0.0264	0.0267	0.2149
$Δ {\bar{R}}^{2}$	0.0264	0.0003	0.1882
Prob (F > 0)	0.0000	0.0000	0.0000

Table 9. Factor loadings, CR and AVE values.

Variable	Item	Cronbach’s α	Item Loading	CR	AVE
Information quality	IQ1	0.844	0.77	0.845	0.646
	IQ2		0.83
	IQ3		0.81
Information credibility	IC1	0.900	0.86	0.901	0.754
	IC2		0.93
	IC3		0.81
Perceived ease of use	EOU1	0.807	0.76	0.811	0.590
	EOU2		0.83
	EOU3		0.71
Social influence	SI1	0.770	0.78	0.761	0.516
	SI2		0.71
	SI3		0.66
Information usefulness	IU1	0.710	0.68	0.739	0.485
	IU2		0.72
	IU3		0.69
Vote adoption	A1	0.854	0.75	0.865	0.684
	A2		0.93
	A3		0.79

Table 10. Correlation matrix of the key variables.

Variable	SI	IC	EOU	IQ	IU	A
Social influence (SI)	0.718
Information credibility (IC)	0.625	0.868
Perceived ease of use (EOU)	0.438	0.405	0.768
Information quality (IQ)	0.396	0.23	0.282	0.804
Information usefulness (IU)	0.667	0.598	0.562	0.329	0.696
Vote adoption (A)	0.693	0.473	0.363	0.289	0.583	0.827

Note: Diagonal elements are the square root of AVE for each variable.

Table 11. Results of hypotheses verification.

Hypotheses	Relationship	Standard Regression Coefficient	CR	p
H1	Information quality→ Information usefulness	0.04	0.613	0.54
H2	Information credibility→ Information usefulness	0.24	2.598	<0.001
H3	Perceived ease of use→ Information usefulness	0.29	3.545	<0.001
H4	Social influence→ Information usefulness	0.37	3.377	<0.001
H5	Social influence→ Vote adoption	0.55	2.159	<0.001
H6	Information usefulness→ Vote adoption	0.22	5.058	<0.01

Table 12. Goodness of fitting.

Item	Value
$χ^{2} / d . f .$	1.434
Goodness-of-fit index (GFI)	0.928
Adjusted GFI (AGFI)	0.900
Comparative fit index (CFI)	0.975
RMSEA	0.042

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, J.; Wang, J.; Fang, S.; Jin, P. Towards Sustainable Development of Online Communities in the Big Data Era: A Study of the Causes and Possible Consequence of Voting on User Reviews. Sustainability 2018, 10, 3156. https://doi.org/10.3390/su10093156

AMA Style

Zhao J, Wang J, Fang S, Jin P. Towards Sustainable Development of Online Communities in the Big Data Era: A Study of the Causes and Possible Consequence of Voting on User Reviews. Sustainability. 2018; 10(9):3156. https://doi.org/10.3390/su10093156

Chicago/Turabian Style

Zhao, Jie, Jianfei Wang, Suping Fang, and Peiquan Jin. 2018. "Towards Sustainable Development of Online Communities in the Big Data Era: A Study of the Causes and Possible Consequence of Voting on User Reviews" Sustainability 10, no. 9: 3156. https://doi.org/10.3390/su10093156

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Towards Sustainable Development of Online Communities in the Big Data Era: A Study of the Causes and Possible Consequence of Voting on User Reviews

Abstract

1. Introduction

2. Related Work

2.1. User-Generated Content (UGC)

2.2. IAM, SDT and Signal Theory

3. Validation of the Crowd Intelligence Paradox

3.1. Dataset

3.1.1. Vote Data

3.1.2. Survey Data

3.2. Distribution of Review Votes

3.3. Measuring the Crowd Intelligence Paradox

3.3.1. Measures

3.3.2. Results

4. Factors Influencing the Crowd Intelligence Paradox

4.1. Research Model and Hypotheses

4.2. Measurement Model Evaluation

4.3. Structural Model Evaluation

5. Discussions and Suggestions

5.1. Discussions

5.2. Suggestions for the Sustainable Development of Online Communities

6. Conclusions and Future Work

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI