In this section, we first provide a description of the dataset and variables. Then, we analyze the topology properties of the co-purchase network from both global and local perspectives. Lastly, we provide ERGM diagnostics and estimation for discussion about the factors that affect the formation of the observed co-purchase network.
4.3. Concentration Effect of the Co-Purchase Network
A product with a high in-degree centrality indicates its popularity and can drive more traffic to its webpage by being sold with many other products. On the other hand, a product with a high out-degree centrality means that it has a larger number of other products that are directly accessible from its webpage, which may indicate a relationship with original products and their accessories.
Figure 4 shows the correlation between these degree centralities. We observe a smaller variation of the in-degree centrality across the products compared with the out-degree centrality, but both distributions are similar, as seen in
Table 4 and
Figure 4. However,
Table 5 shows that the data of the in-degree centrality do not follow a power-law distribution from a statistical inference perspective. To examine the intrinsic concentration effect of product node connectivity, or the mutuality effect, we conducted a Pearson correlation test and found that the in-degree centrality and out-degree centrality were positively correlated (correlation value = 0.28,
p-value = 2.2 × 10
−16 < 0.001). These results imply the existence of an intrinsic concentration effect in the network, where products have mutual co-purchase relations.
Besides the intrinsic concentration effect, we aim to study the relationship between a product’s aggregate sales volume and its co-purchasability. The concept of homophily suggests that consumers tend to purchase products that are similar to those that they have previously purchased [
1,
2]. This phenomenon is known as co-purchasing and has been extensively studied in network research [
13]. To test the potential directionality in a co-purchase network, the degree centralities represent the extent to which consumers purchased a product
i with other products (i.e., product
i’s co-purchasability), and the strengths indicate the aggregate sales volume of product
i within its shared co-purchase relations (i.e., the strength of product
i’s connected co-purchase links). Prior studies have also shown that the degree centrality of a node (in this case, a product) is a good indicator of its importance within the network [
15,
31,
42]. The degree centrality represents the number of connections a node has with other nodes. Previous research has found that the sales volume of a product is correlated with degree centrality [
8,
9,
10]. In other words, products that have a higher degree centrality are likely to be more important within the co-purchase network. Therefore, it is reasonable to hypothesize that products with a higher degree centrality (i.e., more co-purchases) will also have a higher total sales volume. Specifically, we posit that a product’s co-purchasability is positively correlated with the aggregate sales volume between itself and its connected products. To examine this hypothesis, we performed a Pearson correlation test:
H0: The in-degree centrality (out-degree centrality) of a product is independent of its in-strength (out-strength).
H1: The in-degree centrality (out-degree centrality) of a product is correlated with its in-strength (out-strength).
As shown in
Figure 5, we can see a close linear relationship between the degree centralities and strengths. Through a Pearson correlation analysis, we found that our hypotheses H
1 are supported in terms of the incoming links (correlation value = 0.852,
p-value < 0.001 in
Figure 5a) and outgoing links (correlation value = 0.855,
p-value < 0.001 in
Figure 5b), respectively. This means that products with higher degree centralities have larger aggregate sales volumes, which demonstrates the directed rich-get-richer concentration effect in co-purchase behavior. The network effects in online markets have shown that products with a higher centrality tend to have a higher sales volume [
8,
9,
10]. These studies suggest that products with a higher centrality benefit from a rich-get-richer effect, where they attract more customers due to their prominent position in the network. Furthermore, we analyzed the directed relationship between the degree centrality and sales volume in the context of a co-purchase network, which focuses on co-consideration and co-purchasing instead of co-engagement [
15,
31,
42]. The findings enhance our understanding of the directed rich-get-richer effect by examining the dyadic relationships between the products in terms of their co-purchase behavior.
We also note that the product category affects both the intrinsic and rich-get-richer concentration effects. For example, the products belonging to the second category, shown as triangle-shaped scatters in
Figure 4 and
Figure 5, have more intense concentration effects, as these scatters are distributed above the averages (as indicated by the slope of the approximate linear regressions) in
Figure 5a,b. In the study, we control for the effect of the product category in the ERGM diagnostics and estimation.
4.4. ERGM Diagnostics and Estimation
In total, three ERGM models are proposed with different specifications under a paradigm of hierarchical regression-like analysis [
34,
42,
51]. Model A is a basic model that only includes two variables, edges (explaining density) and product category (explaining cross-category purchase patterns), Model B is an extended model that adds the network topological attributes to investigate the bilateral relationship, and Model C is a refined model that includes all the variables and emphasizes the impact of eWOM. The fitting process is based on the MCMCMLE method, and several model selection criteria, including log-likelihood and AIC (Akaike information criterion), are used to select the most parsimonious model [
33,
34].
The fitting processes of Model A to Model C are convergent. According to the model selection criteria [
33,
34] and the similar research process [
37,
42], the reduced AIC value and bigger log-likelihood value from Model A to Model C in
Table 6, indicate that the improvement of the model fitness is due to the addition of the network topological structure and eWOM. Model C is selected as the best-fitted model, with the maximum log-likelihood (−41,538.06) and the minimum AIC (83,112). Furthermore, empirical research considers the coefficient estimation to be robust if the signs of the parameters are consistent in the hierarchical regression analyses [
42,
51]. The proposed ERGM is a data-augmentation explanatory model, due to its interpretability to conduct regression-like analyses [
34,
42]. In
Table 6, we observe that both the signs and the significance level of the estimates are consistent from Models A to C, suggesting that the impacts of the topological structure, product category, and online reviews on co-purchase are robust. Overall, Model C is considered to be the best fitted model to study the factors that affect the formation of the co-purchase network.
Edge term. The Edge term in our directed co-purchase network refers to the connections between the nodes. The network has 1825 nodes, meaning that it has the potential to reach a total of 1825 × (1825 − 1) × 2 = 6,657,600 directed links. In reality, it only has 5731 links, resulting in a density of 0.086% (5731/6,657,600). This suggests that the network is sparse. As demonstrated in
Table 6, Model C’s coefficient estimate reveals a negative and significant value of −3.671 (
p-value < 0.01) for the Edge term. This significance suggests that co-purchase relationships are not likely to occur randomly between the products. Instead, it reflects the genuine preferences and associations among the consumers and products.
Cross-category co-purchase. Additionally, we observe differential homophily effects in consumer cross-category purchasing behaviors. Our results indicate that, when consumers purchase products in categories #1 (0.319, p-value < 0.01) or #5 (0.280, p-value < 0.01), they tend to subsequently purchase products within the same category. On the other hand, if a product belongs to categories #4 (−0.253, p-value < 0.01) or #6 (−0.335, p-value < 0.01), then consumers are more likely to co-purchase it with a product from a different category.
To further investigate this co-purchasing behavior across the categories, we developed a new model based on Model C. This new model replaced the 7 product category variables with 49 cross-category variables (7 × 7), while keeping all the other settings the same. The significant results are presented in
Table 7. While category #1 exhibits a positive homophily effect,
Table 7a shows that the consumers who purchase products in category #1 are also likely to purchase products in categories #3 and #6. However, it is not advisable to recommend category #2. Looking at the negative homophily effect, it appears that product category #6 is unlikely to be co-purchased within the same category (as seen in
Table 6). However, it is more likely to be purchased together with categories #1, #2, #3, and #5 (as seen in
Table 7a). In addition, cross-category purchase behavior is observed from categories that do not exhibit homophily effects, as presented in
Table 7b. Thus, it can be concluded that categories #2, #3, and #7 can also significantly impact the co-purchasing behavior across the categories.
eWOM. As
Table 6 demonstrates, positive ratings and review volume were found to be key factors that impact co-purchase formation, owing to their high value of estimates. We found that the positive ratings for outgoing products had a negative impact on the formation of co-purchase links (variable (a), its estimate = −2.449,
p-value < 0.01), while the impact of the positive ratings for incoming products was insignificant (variable (b)). The results indicate that a product with a higher positive rating is less likely to subsequently have co-purchased links with other products. eWOM valences, such as positive ratings or negative ratings, reflect consumers’ opinions from a global audience [
18,
52]. An increment of negative publicity about a product often reduces the satisfaction or product attitude of potential consumers [
53,
54]. Conversely, a highly positive rating of a product increases its sales, which helps consumers to build a perception of a product’s quality [
55]. Therefore, consumers’ acceptable perception of high product quality can be deduced. When a user processes online review information and purchases a product with a highly positive rating, their demand for high-quality products might be met. This implies a lower probability of follow-up purchases for a relatively low-quality product. However, the estimate of variable (b) shows that purchasing a product does not significantly affect the subsequent product (with a highly positive rating) purchases.
Furthermore, if all the other variables are constant, a product with twice the positive rating of another product will have 3.098 lower log-odds of co-purchase, compared with that of two products with the nearly same positive ratings (variable (c), its estimate = −3.098,
p-value < 0.01). In other words, a higher inconsistency of positive ratings among products decreases the likelihood of co-purchase. This is comparable to the self-reference effect in information decisions [
56,
57], where individuals have internalized information about the acceptable positive ratings related to themselves, and they react quicker to products with similar ratings. The inconsistency effect of positive ratings also has the strongest explanatory power, as reflected by its maximum estimated value. The estimation result highlights that the inconsistency effect of positive eWOM ratings can be leveraged to design directed recommendation hyperlinks or sponsored product hyperlinks in e-commerce.
When considering the impact of the eWOM volume, an increased number of online reviews corresponds with a greater product popularity and bolsters consumer buying inclination [
19,
20,
52]. Compared with products with a low popularity (i.e., with 0–50 reviews), the analysis of the variables (d) to (e) reveals that products with a high popularity (i.e., with more than 50 reviews) are more likely to form a co-purchase relationship with another product (2.793,
p-value < 0.01). However, there is no clear indication that positive ratings lead to a homophily effect where products with over 50 reviews are more likely to be purchased together. For the interaction effect (variable (g)), we observe that in highly popular products (with over 50 reviews), the higher the positive rating, the lower the likelihood of forming a co-purchase. One possible explanation for this is that consumers find it more difficult to perceive the review’s usefulness when the product has many positive reviews [
58,
59]. Thus, when a product has a higher positive rating and more reviews, consumers may be less likely to seek out additional products to co-purchase with it.
Network topological attribute. The refined Model C incorporates network topological attributes (g) to (j), which show that the number of outgoing connections (out-degree centrality) of a product has a positive effect, whereas the number of incoming connections (in-degree centrality) has a negative effect on the formation of co-purchase links. Despite the negative impact of in-degree centrality, the volume of co-purchases (in-strength) still has a positive, but less significant effect, on the formation of the co-purchase network, whereas the volume of outgoing connections (out-strength) has a trivial effect. These results again suggest that the observed co-purchase network is shaped by both consumer behavior preferences and product association effects.