6.1 General Discussion
This study pursued to examine what qualities of error messages explain the rate the participants succeeded to correct the query they were shown and how would they improve the error messages. Utilizing the participants’ suggestions, we modified the error messages used in this study and compiled a set of guidelines for error message design (Table
3).
As can be observed, the queries with common syntax errors are simple in the test suite. Additionally, as the test suite is based on previously identified common syntax errors, the empirical observations underneath also show that novices commit simple syntax errors [
57]. Despite this, the error messages do not reflect the simple nature of the errors. Based on the error messages listed in Appendix
B, it seems justified to argue that many error messages fail to identify the nature of the error correctly, identify the error position incorrectly, or both. This arguably highlights the rather unfortunate state of error messages in many modern DBMSs, instead of begging the question of why the test suite only considers simple syntax errors. Despite what a reader thinks about the error message guidelines presented in this study or of those presented previously in scientific literature, we argue that all the DBMSs subject to this study have error messages that contain at least some elements that seem unintuitive in facilitating query fixing. Table
4 lists characteristics typical to each DBMS. The table arguably shows that many (if not all) of the design guidelines presented in this study have been implemented in at least one of the DBMSs studied.
RG1, presented in Section
3.1, was concerned with previously identified system message qualities and how they affect SQL query fixing success. This was analyzed with binomial logistic regression. The results of the regression analyses presented in Section
4, with three exceptions, failed to reject the null hypothesis. This may indicate, at least with the data available, that general error message qualities do not explain SQL error fixing, i.e., the general guidelines fail to particularize. In addition, it should be noted that all the percentages of how much the three statistically significant models explained the success rate were very low. The results from the qualitative analysis, however, suggest that the participants value the error message qualities proposed by Shneiderman [
47], with the exception that error messages should be positive. In a sense, the results from the quantitative analyses are not in line with the results from the qualitative analyses. That is, the regression analyses suggest that general system error message qualities do not affect query fixing, but nevertheless the results from the content analyses rather uniformly suggest that if the error message qualities tested in the regression model were not present in the error messages, the participants suggested adding these qualities.
RG2 was to formulate an SQL error message design framework derived from the data. Table
5 compares the guidelines presented in this study to those presented by Shneiderman [
47], Traver [
60], and Becker et al. [
7]. The table shows that most of our guidelines map to most of the guidelines presented in previous studies, indicating that our study participants suggest improvements for error messages presented in previous studies unknown to them. The only clear omission in our guidelines is that the error message should be positive. According to our coding of the eight DBMS error messages, all DBMSs had at least one error message that was not positive, meaning that all 311 participants were exposed to at least one non-positive error message. Our data contained only three mentions (all from a single participant, approximately 0.3% of all participants) that the error message could be rephrased without the use of dramatic words. This observation may be biased due to the fact that the participants were recruited from a single university. Becker et al. [
7] summarize that in the context of programming language error messages, the effects of a positive tone have been empirically tested by merely two studies, while 14 studies on the subject are of historical or anecdotal nature. A closer inspection of the two studies reveals that in the first study [
31], 29% of the 77 participants observed that the word
illegal may intimidate the user. In the second study [
32], 6 of the 13 participants claimed that the error messages under observation were
friendly, yet it was not further discussed whether
friendly was considered helpful. Both our quantitative and qualitative results contest the recommendation that error messages should be positive. In fact, in test T16, a positive tone even reduced the odds of successful query fixing. Our interpretation of this result is that the positive error messages in test T16 simply had some other quality which hindered query fixing, rather than positive tone being detrimental to query fixing.
RG3 was to investigate how the error messages should be improved according to the participants, and RG4 was to propose examples of modified error messages based on the suggested improvements. The modified error messages are presented in Appendix
A. We listed
specify the error position as one of the guidelines. It seems both crucial and needless to expand that the error message must provide the error position
correctly, something that both programming language error messages [
60] as well as SQL error messages (e.g., VoltDB in Figure
24 in Appendix
B) sometimes fail to do. In cases when reliably pinpointing the error is not possible, the error message should provide a
near position like some DBMSs already do, although even these positions are not always accurate, or replicate a part of the query which does not contain the error (e.g., SingleStore in Figure
24 in Appendix
B). As demonstrated with a quotation in Section
5.1, at least one participant raised a concern that simply providing the line number of the erroneous part would be more informative than the error message provided. Although the consensus view seems to be that feedback with examples is more useful than binary feedback (correct/incorrect) [
14], even this might not hold true when the error message provides unnecessary or incorrect feedback. For example, the error message of VoltDB shown in Figure
22 in Appendix
B replicates the erroneous query in its entirety, yet does not provide the error position, demonstrating an error message that is neither brief nor specific. Also from the perspective of specificity and comprehensiveness, the error messages of VoltDB and SQL Server in Figure
22 in Appendix
B are different. Considering that the error in the query in Figure
22(a) is that the subquery returns too many columns for the
IN operator which is in this case only expecting values from one column, the cause of the error in VoltDB in Figure
22 in Appendix
B is ambiguous, stating “row column count mismatch”, which seems to imply that the error is somehow related to rows. The error can be fixed by making sure that there is the same number of columns in the upper-level query’s expression concerning
IN, as in the subquery’s
SELECT clause. The error message of SQL Server in Figure
22 in Appendix
B, however, uses a complete sentence and conveys the cause of the error more accurately.
Regarding suggestions on how to fix errors, Marceau et al. [
26] suggest that novices can follow suggestions on how to fix an error without understanding what causes the error and whether the suggestion is even the right fix. This may be problematic from an educational perspective, as the goal is arguably not to fix an error per se but to learn how to write queries. Although making mistakes is part of any learning process, simply committing errors for the sake of receiving hints and suggestions on how to correctly write a query seems counterproductive to learning. We believe that by explaining what causes the error and why the error occurs, the error message can provide a more deep-rooted understanding to a novice, as opposed to merely providing a suggestion on how to fix the error.
Both Shneiderman [
47] and Traver [
60] have suggested that error messages should be brief, and the need for brevity is usually argued with the need to reduce cognitive load [
46,
51]. Four of our guidelines,
provide line number,
specify the error position,
place the most important information first, and
remove unnecessary elements can be viewed as means to reduce cognitive load, yet it is worth noting that our data suggest very few observations on error message brevity per se. Based on the data, it seems reasonable to argue that brevity in itself is not a desirable goal for an SQL error message, as it can be seen to contradict guidelines such as using plain English with complete sentences, or with the guidelines of providing hints and suggestions. Additionally, some errors arguably cannot be described with both clarity and brevity, as the situations in which the errors occur, or why the error occurs may be complex. Therefore, we present that cognitive load should be reduced with other means such as the removal of unnecessary elements and ordering of information, rather than with brief error messages.
Previous results from scientific efforts toward more effective programming language error messages have been inconclusive, or not implemented by the industry, which somewhat diminishes the framework presented in this study. For example, some studies have tested enhanced error messages with novices, yet concluded that there are no positive effects [
15,
40]. In contrast, at least one study has shown that the utilization of enhanced error messages results in fewer errors, and fewer repetitions of a similar error [
6]. Nevertheless, it has been criticized that over several decades, error message guidelines revolve around similar themes, and despite the rise of new programming languages, the same problems persist [
7]. In our opinion, there have also been exceptions, such as the error messages presented in the programming language Rust. From a critical perspective, this study is also yet another one proposing guidelines for error message design, albeit in the novel context of SQL. Despite the criticism presented about programming language error messages, Table
5 shows that there are fundamental differences between the error messages in different DBMSs and that the effects of these differences have also been shown to affect the perceived usefulness for error finding and fixing [
54]. Therefore, it seems reasonable to argue that enhancing SQL error messages is a desirable goal for both industry and education, even though our quantitative analysis does not support the view that some error message qualities affect query fixing success rates. In other words, the effects of enhanced error messages may be explained with other variables besides success, not captured in our data or regression model.
Additionally, online learning environments have been proposed for SQL learning for decades [
12], and several learning environments that provide enhanced error messages have been studied in programming education [
8,
40]. Implementing enhanced error messages into learning environments seems like a natural and relatively fast way of helping novices, as well as acquiring empirical findings on the effectiveness of said error messages. The potential problems with such learning environments are the maintenance overhead and the fact that many of such environments are closed or proprietary, and to our knowledge not widely utilized in industry. Furthermore, in terms of SQL, which can be implemented by one of several different DBMSs with different internals, either the maintenance of the learning environment is even more laborious than that of programming language learning environments, or the SQL learning environment only supports a small subset of DBMSs. One workaround to maintenance could be to check the query syntax on the learning environment’s side, which, in turn, risks the situation where the learning environment evaluates the query syntax error-free, yet the underlying DBMS detects a syntax error (or vice versa). Given these considerations, we deem it beneficial for education that the DBMS vendors undertake the task of enhancing SQL error messages to consider design guidelines. That being said, the task is not as straightforward as changing the character strings provided by the compiler [
4], and arguably requires refactoring of the query parsing process with, e.g., reclassification [
64] or perhaps by utilizing the work done on automatic error correction in SQL [
37], or with large language models. However, it is unclear how DBMSs identify error messages, how different these implementations are between different DBMSs [
52], and how technically difficult it is to implement modified error messages. Finally, industry may be (rightfully) concerned about investing in enhancing error messages which have not been scientifically shown to affect the general user experience.
6.2 Limitations and Threats to Validity
The limitations concerning the scope of this study are that the error message design framework is solely based on data retrieval, and not on other types of SQL statements such as data insertion or updating and that we only considered the 16 most common syntax errors. The main reason behind limiting the scope of this study is the extent of previous studies. In terms of different SQL statements, data retrieval is the most well-studied in human-centered contexts [
56], and provided us with a limited yet scientifically justified starting point with reports on which errors are the most common [
57], and how these common errors can be tested [
54]. Another limitation is that many of the syntax errors can be interpreted in multiple ways (e.g., T08 misspellings), yet we only tested each syntax error with one query. Furthermore, all the syntax errors were tested with relatively simple queries. Arguably, more complex queries emphasize the error message qualities even more, e.g., specifying the error position in the error message is more valuable in a query spanning 50 lines when compared to a query spanning 5 lines. Finally, we only tested the syntax errors using novice participants. This could be seen as a limitation affecting the generalizability of the results, yet given that many previously reported guidelines have been based on expert opinion [
47,
60], and that this expert opinion has been critiqued [
7], we believe the use of novices is a justified approach toward filling an identified research gap, rather than a limitation. The use of appropriate study participants has been argued for in detail in several studies [
18,
19].
Regarding the regression analyses, it is possible that there may be a hidden factor or factors (i.e., predictors) not present in the data that affect query fixing success. Furthermore, the dependent variable (i.e., query fixing success) may not be a fitting metric for error message effectiveness. In the context of programming languages, it has been speculated that time taken to fix an error might be such a metric [
3,
41], but time was not measured in this study due to the shortcomings of our data collection instrument. In hindsight, measuring time would have been informative and should be taken into account in further studies if the effects of the modified error messages are studied. Finally, it is possible that our coding of the error messages (Appendix
B) captures Shneiderman’s [
47] error message qualities incorrectly. We have explained the general nature of said guidelines in Section
3.3, and based our coding on a rubric reported in the same section to mitigate this threat and make the coding more transparent.
Another threat to validity is the unnatural environment in which the participants fixed their queries. As explained in Section
2.1, the user typically engages in a feedback loop with the compiler. In this study, however, the participants fixed queries written by someone other than themselves (i.e., us), and received no feedback on whether their fixes were at least syntactically correct. As we wanted to base the error message design framework on previously identified common syntax errors, designing the research setting in another way would have introduced other threats to validity. Nevertheless, the results should be interpreted while taking the environment into account.
6.3 Future Directions
Although our framework was constructed based on empirical findings, this study provides no empirical evidence if these modified error messages actually facilitate, e.g., error fixing success rate, the time required to fix errors or user experience in general. For this reason, we have refrained from calling the new error message examples enhanced or improved. An intuitive topic for future research is to test the effectiveness of these messages using several metrics and iterate the messages based on empirical findings.
A potential—and to our knowledge little studied—topic is the suggestions given by compilers. In our data, several participants criticized the error messages for giving misleading suggestions, or identifying the erroneous position incorrectly. Although we did not systematically examine such situations in our data, it seems justifiable to speculate whether certain error messages are even detrimental to error fixing. Future research could categorize the queries the participants had fixed as, e.g., more incorrect, still incorrect, incorrect in a different way, more correct, and correct, and examine whether the error message plays a part in the evolution of the originally erroneous query.
Finally, both Becker et al. [
7] and Traver [
60] briefly discuss the interaction between the human user and the compiler through a more interactive user interface than plain text. For example, such a simple modification as hyperlinks in the error message pointing to more extensive documentation is something our participants also suggested. External online documentation would also make fixes and updates to error messages more effortless, without requiring updating the DBMS. Additionally, the error messages may be provided in a form other than textual, if the environment allows [
33]. While more rich feedback may arguably present problems in development contexts (as opposed to learning contexts) when, e.g., the DBMS error message is replicated in a plain text error stack, from an educational and human–computer interaction perspective, such richer error messages are an interesting future topic.