1 Introduction
People in need of mental health support have reported benefits from interacting with peers through online mental health platforms (OMHPs) [
49]. These platforms have been growing in popularity [
68], are accessible and cost-effective [
56], reduce stigma about mental health by building anonymous connections among individuals [
10,
51], empower the sharing of individual journeys [
45,
78], and enable individuals in times of need to find advice and support for their problems [
64,
73,
79]. Prior research has provided valuable insights on effective support strategies for those in need by studying a variety of success metrics such as satisfaction with support [
3,
70], mental health status [
20], community participation [
73,
74], and linguistic behavior such as amount of self-disclosure [
72,
78]. The use of diverse metrics offers online platforms the ability to track the impact of peer counseling holistically.
Despite the multiplicity of perspectives in the study of OMHPs, prior research has tended to use singular outcomes without examination of a larger body of potentially meaningful metrics. Examining a single outcome may lead to non-robust and non-generalizable findings, and peer counseling strategies could correlate with different success outcomes in inconsistent or conflicting patterns, limiting potential applications in the design of OMHPs. In the related area of computational psychology, interest in measurement validity has led to calls for metrics triangulation. Chancellor and De Choudhury [
18] noted that lack of transparency in the operationalization of predictor variables raises concerns regarding validity, algorithm choice, and replicability of research that aims to predict mental health status using social data. Ernala et al. [
27] suggested triangulation of diagnostic signals for predictive models to remedy issues in the validity and contextualization of predictor variables.
Outcome triangulation can be used to study conversation success for OMHPs, which are simultaneously community platforms, on-demand counseling services, and clinical interventions. Some previous studies have researched how a user’s continued engagement with an OMHP constitutes a successful outcome in terms of a community’s ability to support those seeking help [
73,
78]. Others have found that online mental health support is sought out during major life transitions, arguing that a user may leave a community based on many factors including when a user has received sufficient support [
47,
80]. This raises an important question for healthcare technology design: How do we balance the notion that peer support increases user engagement with a community yet decreases the probability of users staying on a platform? Triangulation can begin addressing this question by allowing us to compare the social and clinical outcomes of OMHP design.
To the best of our knowledge, no studies have paid attention to systematically triangulating outcomes for peer counseling on OMHPs. We expect that using multiple metrics will reveal tensions among outcomes and predictors of successful peer counseling. To examine this expectation, we propose two research questions (RQs):
•
RQ1: Does triangulating across multiple outcomes provide novel insights for finding indicators of counseling success?
•
RQ2: Do widely used predictors of counseling success have consistent relationships with separate outcome metrics?
We first review prior research and note inconsistencies and ambiguities in the way that different authors define and operationalize OMHP outcomes. Next, we document our process for identifying four measures of peer counseling success for therapy-focused platforms: retention in the community, following up on a previous session with a counselor, users’ evaluation of a counselor, and changes in users’ mood. Then, using statistical analysis of a large dataset of one-to-one chats between support seekers and support providers, we compare the relationships of several widely used predictor variables with these four measures. Results show that predictors of successful conversations correlate negatively with community retention and correlate positively with the likelihood of users following up and giving higher counselor evaluations.
Contribution. In this paper, we make an original contribution toward unifying diverse approaches to online peer counseling research by triangulating different success outcomes and studying how they correlate with widely used predictors for peer counseling. Concretely, we leverage a dataset of 1.7M chat sessions between support seekers and support providers to conduct a large-scale regression analysis of peer counseling on 7 Cups of Tea, an online therapy-focused platform. By investigating the relationship between widely-used linguistic predictors of effective counseling and multiple outcome metrics, we validate different prior reported outcomes using large-scale data from a popular OMHP. We discuss the implications of our results and provide novel insights for therapy platform design with peers.
3 Triangulating Outcome Measures
Our aim is to empower designers and researchers of OMHPs by unifying social and clinical perspectives in the literature and to develop a proposal for platform-level decision-making with regard to identifying success outcomes. In this section, we first conduct a review of outcome measures and organize them by the construct they measure. Next, we outline multiple metrics available on the 7 Cups of Tea platform and discuss our choice of outcome metrics.
3.1 Literature Review
Since our approach blends both social and clinical perspectives, we conduct a review of outcome measures for peer counseling using a keyword search of papers [
44] defining outcomes for participation in platforms or communities for online peer counseling, mental health support, and therapy. Studies in which peer support was part of a larger body of mental health-related features were excluded. Qualitative analysis of outcomes was excluded to focus on metrics for platform analytics. Identified outcomes were grouped by community, conversational, and individual levels based on social computing, psychotherapy, and clinical perspectives respectively. We also noticed differences in method of measure that lead to multiple operationalization of the same constructs in the literature. Lastly, we identify prior work that has used multiple outcomes and note differences in their approaches to ours.
3.1.1 Community.
Community-level constructs track behaviors or attitudes spanning interactions with a broader group of many users. Examples include continued seeker engagement across multiple conversations [
45,
73,
74], commitment [
77], support seeking behavior [
78,
80], support provision behavior [
14,
80], or general attitudes toward the community such as the desire for support [
39,
80]. Community metrics may also track broader perceptions about the platform on which the community is built, reflecting the UX nature of measuring attitudes [
42]. For example, Alvarez-Jimenez et al. [
4] evaluated an online social therapy platform for first-episode psychosis recovery using attitudinal measures of perceived degree of social interaction and platform usability.
3.1.2 Conversation.
Conversation-level constructs track outcomes of support provided by specific individuals, capturing the impact of dyadic relationships within public forums or private chats. Prior work in this space has leveraged both attitudinal and behavioral measures. Attitudinal measures such as the Session Rating System [
10], counselor helpfulness [
83], and counselor rating [
62] align with psychotherapy research aimed at evaluating client-therapist alliance [
12,
35]. Some papers have examined behavioral outcomes such as amount of self-disclosure in response to what others have said [
7,
8,
72,
78], linguistic alignment [
71], and amount of support provision [
64,
71,
79,
80]. Vlahovic et al. [
70] evaluated seeker satisfaction with support using third-party annotation methods.
3.1.3 Individual.
Individual-level outcomes capture the impact of receiving support on a single user’s cognitive state. Outcomes include moments of cognitive change in how an individual thinks about a problem [
57,
61], clinical questionnaires of mental health symptoms [
4,
20,
41,
84], mood [
3,
41], and various proxies of mental health status such as affective language use and linguistic measures of cognitive behavior [
61].
3.1.4 Method of Measure.
Several outcomes not only spanned multiple construct levels but also had differences in how they were measured. For example, support provision was operationalized as both an annotated, community-level outcome and a behavioral, conversation-level one. The former occurred in the analysis of social roles that an individual adopts (e.g., old-timers who contribute significantly to their communities) among the larger community [
14,
80]. The latter was measured within a conversation in which a member of an online community provided informational or emotional support to a seeker [
79]. Satisfaction was tracked as an attitudinal metric at the community level with regard to general peer counseling experience on a platform [
82], as an attitudinal metric at the conversation level in response to experience with a specific counselor [
10], and as an annotated metric at the conversation level based on seeker responses to receiving support [
70]. To accommodate a range of outcomes, we note a second dimension for the method of measure in our literature review to account for overlaps in measured construct but differences in data collection methods.
3.1.5 Multiple Outcomes.
Some studies leveraged more than one outcome to measure multiple hypotheses of peer counseling impact [
4,
41,
78,
80]. Others report hybrid constructs that span multiple construct levels or methods of measure depending on platform or experiment design. For example, threaded conversations may be conversations when there is only a single user responding to a top-level post but can generalize to include multiple discussions using a nested structure. This has led to engagement outcomes that measure both community and conversational results depending on the number of providers within a thread [
73,
74]. We found one example of a composite outcome that used multiple outcomes across individual and community levels to create an overall proxy score for mental health [
61]. In general, outcomes were chosen or created based on the theories being tested, which may lead to challenges in measurement validity [
9,
18]. This finding supported our hypothesis that systematic outcome triangulation is an opportunity to discover new insights about outcome selection in OMHP research.
3.2 Selecting Outcome Variables for Triangulation on 7 Cups of Tea
We contribute to research on effective peer counseling by studying conversations and multiple outcomes on 7 Cups of Tea (7 Cups), a coping and therapy platform where users can discuss a variety of issues with peer volunteers who are ready to listen. The platform allows support seekers to register as members and support providers to register as listeners. A user can also register as both roles. 7 Cups requires all listeners to complete a 30-60 minute long initial training that teaches various talk therapy techniques such as active listening, showing empathy, summarizing and reflecting back to members their concerns, and asking guiding questions. Chats on 7 Cups start with a member requesting support. Available listeners then choose to chat with members based on incoming requests, a process that allows listeners to match themselves with requests related to topics they specialize in or have personal experience with.
7 Cups was chosen as a research site for validating prior work on peer counseling because the majority of interactions on 7 Cups are through private, one-to-one conversations where member-listener pairs converse anonymously. Private messages comprise roughly 90% of over 400 million messages sent between users of the platform from January 2020 to August 2022. Other mental health apps for therapy such as BetterHelp and TalkSpace focus on professional therapist services while online communities with public spaces such as Reddit, CSN, or TalkLife center around many-to-many interactions. Although 7 Cups offers these features as well, the emphasis on connecting seekers to providers in chats offers a close environment for replicating prior work on peer counseling strategies conducted on one-to-one channels such as crisis hotlines.
Next, we describe available member outcomes for 7 Cups and our rationale for selecting specific ones for triangulation.
3.2.1 Community.
One challenge with platform metrics is identifying meaningful measures of community outcomes as 7 Cups offers a number of individual and social features. Although 7 Cups administers several attitudinal measures such as a product market fit survey [
25] and the net promoter score [
59], these metrics may not capture community outcomes since several platform features on do not support peer-to-peer interaction. Similarly, product reviews may contain insights about user experience, but they are likely to reflect high-level perceptions of 7 Cups such as interface usability or app design [
2]. As a result, we considered possible behavioral outcomes at the community level based on user logs data. We define community-level outcomes as variables representing a member’s relationship with more than one listener.
•
Engagement is the presence or absence of a member’s continued participation in spaces with other 7 Cups users [
41,
61,
63,
73,
74].
•
Frequency of participation is a behavioral metric quantifying the amount by which something occurs such as the number of posts or responses over a member’s lifetime on the platform [
61,
64,
77,
80].
Retention was chosen as a community-level outcome and operationalized from engagement as whether or not a member chatted with other listeners after a conversation on 7 Cups. This parallels definitions of community engagement as posting or commenting in multiple threads in online health communities research. We select this measure to represent the significant body of prior work on continued participation as a meaningful outcome of receiving support in communities. Frequency of participation, operationalized as the number of past conversations a member had prior to conversing with a listener, was used as a control variable (Section
4.6).
3.2.2 Conversation.
Conversation level outcomes represent member outcomes for a single conversation involving a member and listener. In addition to user logs, other metrics were available based on instruments currently deployed on 7 Cups.
•
Engagement is the presence or absence of a member’s continued participation in conversation [
73,
74].
•
Rating is a single-question attitudinal measure of a member’s perception towards a listener [
62].
•
Hearts are a feature where members and listeners can react with a heart for any message, similar to emoji functionality for SMS-based text chats and the like button on social media. Hearts are a novel measure that we speculate to represent shallow engagement based on social media research [
54].
We found no prior work exploring the use of a similar metric to hearts in peer counseling for OMHPs, although turn or message-level metrics exist in other domains [
6,
50]. With our focus on examining previously reported outcomes, we did not pursue adding hearts as a novel outcome due to a lack of interpretability. However, to account for the possibility that usage of hearts may impact conversation outcomes, we use the historical usage of hearts as a control variable (Section
4.6).
Two conversational outcomes, follow-up and rating, were chosen to address our primary interest in peer counseling. Follow-up was operationalized from engagement because we hypothesized that support provision should lead to a higher desire to continue a conversation [
61,
63,
73,
74,
79]. Rating data was chosen as a metric for tracking satisfaction towards support provision [
10,
70,
83].
3.2.3 Individual.
Individual outcomes reflect measures of the member’s mental state. 7 Cups administers multiple self-report measures at different frequencies. In addition, user logs were also available for creating psycholinguistic proxy variables of mental health status.
•
General mood is a single-question instrument asking a member how they feel at that moment [
3] that occurs at most once every hour.
•
PHQ-9 is a nine-item battery of questions for depressed mood [
20] administered at most once every two weeks.
•
GAD-7 is a seven-item battery of questions for anxiety [
20] administered at most once every two weeks.
•
Psycholinguistic proxies of mental health status could be used to measure a member’s cognition based on language use found in user logs [
61].
Mood was chosen over clinical questionnaires to replicate studies by Althoff et al. [
3] and Kushner and Sharma [
41], both of which used mood as a proxy for mental health status. While clinical questionnaires are the gold standard in clinical research, the two-week delay in the administration of 7 Cups led to doubts about sensitivity to peer counseling effects. We did not pursue the replication of a psycholinguistic outcome following Saha and Sharma [
61], as they conducted a group-level aggregate analysis by counselors with this metric. It was unclear if their method generalized to our session-based analysis while all other outcomes in our study had been applied in similar contexts examining individual counseling sessions. Furthermore, the moment of cognitive change was excluded from modeling as it may be confounded with our goal of identifying successful peer counseling sessions: a high rating or lack of follow-up may be the result of a moment of cognitive change, which may appear before a session is concluded.
3.3 Situating Outcomes in Literature
The review of literature and available measures on 7 Cups illustrated a need to differentiate between construct levels and methods of measure as part of outcome variable operationalization. As the method of measure can have an impact on validity of outcomes, we add a dimension capturing attitudinal, behavioral, and annotation methods of measure. Attitudinal measures offer a more direct connection to a user’s perceptions but suffer from issues with response rate or human bias [
42]. Behavioral measures such as engagement benefit from being observed [
60], avoiding pitfalls with issues in drop-off or bias in reporting at the cost of being more difficult to interpret. More recently, human annotation using experts or crowd workers has been used as a method of labeling behavioral outcomes for machine learning models [
18].
Table
1 situates our four outcome measures among literature reviewed in Section
3.1 organized by construct level and method of measure. In total, we use four metrics across three different construct levels and two methods of measurement. We position our work as a form of external validation, leveraging regression models to ascribe relative merits to existing constructs and operationalization methods [
9,
42]. Considering that human annotation is also a form of external validation and the private nature of 7 Cups chats, we choose not to pursue a secondary validation method through annotation for this particular study and instead focus on investigating attitudinal and behavioral outcomes. An attitudinal community outcome and behavioral individual outcome were not included for triangulation following the selection process in Section
3.2.
5 Results
5.1 Models
Figure
3 shows the correlations among the four outcome variables in this study using Kendall’s
τ. Correlations are low, with the mean absolute
τ coefficient being .04. Since these alternate measures of counseling success are relatively independent of each other, the low correlations offer the possibility that triangulating across the community, conversation, and individual levels may provide new insights for assessing counselor success (RQ1). The largest positive correlation is between rating and follow-up (
τ = .11), supporting our belief that conversation-level constructs should be more highly correlated with each other. The largest negative correlation is between rating and retention (
τ = −.05), which validated our choice in the triangulation stage to identify conversational-level outcomes separately from community-level ones. In general, mood and retention show a smaller correlation with other outcomes. Engagement at the community and conversational levels (
τ = −.03) can be seen as relatively independent measures with respect to predictors of effective counseling for an online therapy platform.
Next, we examine the consistency of predictor variables across outcomes (RQ2). We focus our analysis on consistency in coefficients for several reasons. The aim of our research is not to find the best-fitting model, but to understand the relationship between previously reported measures of effective counseling and measures of counseling success. For ratings and mood, report timing does not directly coincide with the end of a session and has the potential to include confounds within the temporal window. A large amount of observational data with unbalanced outcomes (e.g., 90% of the members in our dataset continue using 7 Cups) increases the likelihood of highly significant variables being found (type 1 error) and reduces the goodness of fit for each individual model. Differences in the logistic, Heckman, and ordinal regression models and the amount of data for each outcome make it difficult to directly compare effect sizes across models.
Table
4 reports regression coefficients for all models. Tests of collinearity for all independent variables are provided in
A. Coefficients for topical controls and stage 1 of the Heckman model are provided in
B. For logistic regression models, the coefficients represent the log odds of the outcome occurring. For the Heckman model, the coefficients follow a standard linear regression in the stage 2 phase. The coefficients for ordinal models represent the log odds of moving one interval up the ordinal scale. Our findings suggest that prior reported predictors of counseling success are not consistent across construct levels but are consistent within the conversational outcome level.
5.1.1 Retention.
Retention on the platform is negatively associated with the total number of words exchanged in a session (p < 2e-16), the member words ratio (p < 5e-6), listener self-disclosure (p < 2e-16), member self-disclosure (p < 2e-16), and linguistic style matching (p < 2e-16). Since 90% of the members in our dataset stay on 7 Cups after their first session, this result suggests that predictors of conversation success may have an inverse relationship with the need to participate in more chats on 7 Cups. Our results align with those of [
41], who report that users turn to mental health platforms during times of need. We also replicate similar findings to those of Yang et al. [
78], who noted that self-disclosure decreases commitment to a community.
5.1.2 Follow-up.
Follow-up with the same listener is positively impacted by the total number of words (p < 2e-16), listener self-disclosure (p < 2e-16), and median response time (p < 2e-16). Follow-up shows a negative relationship with member words ratio (p < 2e-16), member self-disclosure (p < 4e-8), linguistic style matching (p < 1e-14), and topic matching (p < 0.0002). We expect higher amounts of member words ratio and member self-disclosure to represent information exchange during support provision [
79], so lower amounts of it reducing the chance of follow-up suggests a problematic conversation. The negative impact of linguistic style matching and topic matching aligns with previous findings that more successful peer counselors change the topic or flow of conversation to progress conversations to important topics [
3,
83]. 10% of members in our dataset follow up with their listener after the initial session, which suggests that this outcome measure may track long-term relationships built between member and listener in cases of an unfinished initial conversation.
5.1.3 Rating.
A higher rating is positively associated with the total number of words (p < 2e-16), listener self-disclosure (p < 2e-16), member self-disclosure (p < 2e-16), the median response time (p <2 e-16), and linguistic style matching (p < 2e-16). Rating is negatively associated with member words ratio (p < 2e-16) and topic matching (p < 2e-16). Rating is the most consistent with predictors of effective counseling with prior literature. Unlike follow-up, member self-disclosure and linguistic style matching are associated with high ratings in this model. This suggests that rating is an appropriate attitudinal measure of a member’s satisfaction with support received as a conversation reaches conclusion.
5.1.4 Mood.
Member mood is positively associated with the total number of words (p < 0.038) and member words ratio (p < 6e-6), but is the least associated with prior predictors of listener performance among our outcomes. Our findings replicate [
41] that individual mood does not change much with effective peer counseling, but contrasts with those of [
3]. We discuss the differences in methodology between these two studies in
7.1.
5.1.5 Follow-up vs. Rating.
Comparing within construct levels, predictors with the same directionality for both outcomes are total words, member words ratio, median response time, and topic matching. Our results for total words and member words ratio are consistent with findings by Althoff et al. [
3], who found that successful supporters have longer message lengths and control the flow of conversation better than unsuccessful ones. Our results for topic matching are consistent with those of Zhang et al. [
83], who found that lower amounts of topic matching suggested that a supporter is better at controlling the flow of conversation. In contrast, we found that longer response times lead to a higher likelihood of follow-up and a higher rating, unlike Saha and Sharma’s [
61] claim that faster response times lead to more engagement within TalkLife threads. These differences may be due to community or channel differences between platforms as TalkLife’s interactions are forum-based.
Interestingly, member self-disclosure and linguistic style matching change signs within conversation outcomes, negatively correlating with follow-up but positively correlating with rating. The conflicting directionality in the two predictors suggests that follow-up and rating are related but distinct outcomes. Since prior work has suggested that more self-disclosure leads to more support provision [
64,
79] and that more linguistic style matching in both dyadic [
29] and group [
64] conversations represent alignment between participants in a conversation, our findings suggest that follow-up sessions occur on 7 Cups when a conversation has not reached closure. A member who has not yet had time to self-disclose information and align with their listener in their first session is more likely to follow up after an idle period.
5.2 Robustness Checks
To rule out bias introduced by our choice of models for self-report data, we use model triangulation to check for robustness for the Heckman selection model and our mood model.
The Heckman model handles missing ratings by estimating a Gaussian variable in the selection step (stage 1) and using it for linear regression in the prediction step (stage 2). One drawback to this model is that it cannot handle missing data in control variables, so we were unable to use a member’s prior ratings to control for individual differences in rating reports. We validate the results of our Heckman model using an ordinal regression model with a member’s average rating given in all sessions prior to the current one added as an additional control variable. Observations for members that did not have at least one prior rating in addition to a session rating were dropped. This resulted in a 126k session subset for ordinal regression. Results showed agreement between the Heckman model and the ordinal regression model. All explanatory variables in the ordinal regression were significant and had the same directionality as those of the Heckman model.
Post-session mood was modeled using pre-session mood as a control variable, which reduced the dataset size from 14,434 to 4,839 data points. To check that the change in sample size does not impact model coefficients, we ran two comparison models without pre-session mood as a control variable. The first model used the 14,434 data points from members who reported post-session mood but not pre-session mood. The second model used the same 4,839 observations from members who reported both pre-session and post-session mood scores, but with the pre-session mood control variable removed from modeling. Results showed that member words ratio, member self-disclosure, and linguistic style matching are significant variables that impact mood for both of these models. This suggests that the lack of significant variables impacting our mood outcome is not due to changes in the sample size.
6 Discussion
The most important finding in our work is that alternative ways of measuring counseling success used in prior research are not strongly correlated with each other and show different patterns of association with conversational features that others have hypothesized to influence counseling success in online mental health platforms. For RQ1, results reinforced our hypothesis that outcome triangulation provided novel insight into interactions on OMHPs by revealing tensions in desirable outcomes previously noted in the literature. For RQ2, we find that the directionality of previously reported predictors of counseling success are mostly consistent within construct levels but not across them.
Retention has a negative relationship with almost all predictors of effective counseling previously reported in the peer counseling literature. It also shows a weak negative correlation with conversational outcomes, echoing [
78]’s findings that support provision inside a conversation may lead to seekers leaving a community. Although the weak relationship implies that individual conversations may not strongly influence a member’s decision to continue chatting with other listeners on 7 Cups, our findings reinforce reports that users leave platforms when their needs are fulfilled [
47,
80].
Both dyad-level outcomes were mostly consistent with predictors but showed nuanced differences. Rating is the most consistent outcome in relation to prior literature on predictors of effective counseling. Contrary to rating, less member self-disclosure and linguistic style matching correspond with an increased likelihood of follow-up. A novel insight from this difference is that some conversations have an idle period, yet are likely to continue if members are not given the opportunity to speak about their problems and receive feedback from a listener. Combined, high ratings and low follow-ups may signal effective single-session counseling on 7 Cups.
General mood shows little relationship with prior predictors. Outcomes that track the impact of a single conversation on an individual member need to be cautiously adopted when analyzing the impact of conversations on individuals. Our results replicate those of Kushner and Sharma [
41], who found little change in mood following conversations on TalkLife. Based on the positive relationship between member words ratio and mood, it is possible that counseling expertise does not correlate with positive moods, but simply chatting with someone does. However, our findings do not necessarily disagree with those of [
3] as our methodologies were different. We did not leverage group-level aggregation and our reporting 24-hour reporting window for mood has limitations in terms of temporal causality. Lastly, the lack of interaction between predictors and mood scores may also be due to issues with sampling or questionnaire administration on 7 Cups.
In summary, our results demonstrate the value of systematic outcome triangulation across construct levels and methods of measure. Similar to previously reported findings on other platforms [
47,
78,
80], we hypothesize that some members who have their peer counseling needs met leave 7 Cups based on the small negative correlation between retention and both conversational outcomes. There is also a small segment of 7 Cups users who continue a conversation with the same counselor after an idle period if their needs are not met. Rating, an attitudinal metric, becomes a key signal of attitude toward support received when interpreting the behavioral metrics of retention and follow-up. One nuance in interpreting the relationship between retention and follow-up is that seekers on therapy platforms always have the option of finding new supporters to discuss their problems with. This design may lead to trade-offs between community and conversation engagement.
Based on the above, we argue that 7 Cups provides an on-demand service similar to single-session therapy (SST), a therapy delivery method in which the aim is to maximize the efficacy of the first, and sometimes only, session with a walk-in therapist [
24,
37]. Yip et al. [
82] reported similar findings from an online text-based counseling service for youth called Open Up, noting that 23.6% of 81,654 sessions on the platform came from users that only accessed the service a single time. We provide an even larger dataset suggesting that SST may naturally arise in OMHPs. The lack of significant individual mood change is no longer surprising in light of the single-session perspective. Extratherapeutic circumstances are a large factor in the effectiveness of SST as a service [
17,
65].
8 Limitations and Future Work
Our methodology and analysis focused on correlations across metrics rather than the causal impact of particular predictors on specific outcomes. Future work can examine true effect sizes and predictive models for each individual outcome in this study to help platforms identify the best predictors of key success outcomes. In addition, our choice of using a therapy platform, 7 Cups of Tea, as a research site may limit our analysis to platforms of this type. We urge practitioners to use our findings with caution when applying them to mental health sub-communities on social media or topical platforms such as Breastcancer.org. Design choices in the development of these platforms such as public and private channels, the use of conversation threads versus chats, and illness-specific user needs may change our understanding of what outcomes are important.
Future work can help validate the usefulness of our triangulation methodology by using a similar distinction between construct levels and methods of measurement to identify meaningful outcomes on OMHPs. In this study, follow-up and rating showed opposite relationships with member self-disclosure and linguistic style matching, which suggests additional nuances in measured construct despite sharing the same relationships with all other predictor variables at the conversation level. There is also room to expand the construct levels to include turn-based metrics. In this study, hearts were used as a control variable for participation in a conversation on 7 Cups but were not used as an outcome measure. It is unclear what a turn-level metric means within the context of peer counseling, unlike in social media where hearts or likes represent positive sentiment and shallow engagement with a post [
54]. Although [
61] has investigated turn-level predictors of online counseling satisfaction, turn-level outcomes do not appear to be understood in the context of online peer counseling.
9 Conclusion
In this study, we examined two research questions around whether triangulating across multiple outcomes provides novel insights for finding counseling success indicators and whether previously reported predictor variables of counseling success track multiple outcome metrics consistently. We answer these questions by modeling the relationship between previously reported linguistic predictors of effective counseling with four outcomes: retention in the community, following up on a previous session with a counselor, users’ evaluation of a counselor, and changes in users’ mood. Our findings suggest that community retention and conversational outcomes are relatively independent, follow-up and rating capture two complementary measures of conversation progress, and mood outcomes show little relationship with proposed predictors of counseling success. To the best of our knowledge, this paper is the first to systematically triangulate four outcome measures at different construct levels to examine effective peer counseling on therapy platforms. Our work shows that research on peer counseling benefits from a systematic approach to outcome measurement that prior work in the literature has not always been fully clear in defining. Based on our findings, we raise questions and discuss future directions for interdisciplinary research on OMHPs.