Performance analysis of hospitals in Australia and its peers: a systematic and critical review

Wang, Zhichao; Nguyen, Bao Hoang; Zelenyuk, Valentin

doi:10.1007/s11123-024-00729-z

Performance analysis of hospitals in Australia and its peers: a systematic and critical review

Open access
Published: 29 June 2024

Volume 62, pages 139–173, (2024)
Cite this article

Download PDF

You have full access to this open access article

Journal of Productivity Analysis Aims and scope Submit manuscript

Performance analysis of hospitals in Australia and its peers: a systematic and critical review

Download PDF

1154 Accesses
1 Altmetric
Explore all metrics

Abstract

Along with the development of productivity and efficiency analysis techniques, extensive research on the performance of hospitals has been conducted in the last few decades. In this article, we conduct a systematic review supported by a series of bibliometric analyses to obtain a panoramic perspective of the research about the productivity and efficiency of hospitals—a cornerstone of the healthcare system—with a focus on Australia and its peers, i.e., the UK, Canada, New Zealand, and Hong Kong. We focus on the bibliometric data in Scopus from 1970 to 2023 and provide a qualitative and critical analysis of major methods and findings in selected published journal articles.

Hospital performance evaluation indicators: a scoping review

Article Open access 01 May 2024

A method for measuring individual research productivity in hospitals: development and feasibility

Article Open access 14 October 2015

Productivity growth and quality changes of hospitals in Taiwan: does ownership matter?

Article 03 January 2019

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Healthcare costs in most developed countries have been growing dramatically in the last few decades. According to records from the Organization for Economic Co-Operation and Development (OECD), the average proportion of Gross Domestic Product (GDP) devoted to the healthcare systems of all member countries has increased from 4.6% in 1970, when the OECD was founded, to 6.5% in 1990, and reached 8.7% in 2010 (Eckermann and Coelli, 2013; OECD, 2023). Moreover, the proportion of healthcare expenditure to GDP is generally even higher in developed country members, the average of which has surpassed 10% in 2021. As for Australia, the proportion demonstrates a persistent and upward tendency, which is estimated at about 10.6% in 2021 and is located at around the 40th percentile of all OECD members (OECD, 2023). Compared with other member countries sharing a similar healthcare system, such as the United Kingdom (UK), Canada, and New Zealand, the trajectory of the proportions of GDP spent on healthcare in Australia shows a high degree of similarity throughout the past half-century.

From a domestic perspective, when looking at the budget of the Australian government in the financial year (FY) 2022/2023, one can see that the expense in the “Health” sector ($108 billion, 16.70% of total expenses, 4.24% of GDP) is second only to the “Social security and welfare” sector ($226 billion, 35.11% of the total expenses, 8.90% of GDP), and is followed by “Education” and “Defense”. In recent years, the budget in the health category has been even more than that of education together with defense (Commonwealth of Australia, 2023). Moreover, considering the huge expenditure from the consumer side, expenditure on healthcare greatly exceeds budget expenses. For example, according to the report from the Australian Institute of Health and Welfare (AIHW), the nationwide healthcare spending on health goods and services in FY 2021/2022 was $241.3 billion, equating to about $9365 per capita, accounting for about 10.5% of the overall economic activity (Australian Institute of Health and Welfare, 2022).

Encouraged by both public pressure and the executive interest in cost containment, a substantial amount of research has been published to inform policymakers on how to address the problems of the tremendous and still rapidly increasing costs (O’Neill et al. 2008). It is widely believed that the inefficiency of healthcare institutions has, to some extent, contributed to the ongoing global increase in healthcare costs (Worthington, 2004). In addition, the resources of healthcare services are naturally limited, and therefore a more efficient utilization of them is important for any type of healthcare system (Kumbhakar, 2010). Efficiency measurement may be the first step in the evaluation of a coordinated healthcare system (O’Neill et al. 2008), whether for better healthcare service outcomes or a more controllable budget. Hospitals, as the core institutions of the healthcare system, also account for the lion’s share of the expenditure (about 39.8% of Australian healthcare spending in FY 2021/2022), and have become the most popular research target (Australian Institute of Health and Welfare, 2022).^{Footnote 1}

Along with the continuously expanding research, what insights have we gained about hospital efficiency in Australia, and what inspiring and outstanding achievements could be pursued in the future? To obtain a broader scope, we extended the topic from Australia to several reference countries and regions, which share a similar healthcare system structure with high-level quality healthcare services, similar traditions of the British management system, and a similar level of economic development, i.e., the UK, Canada, New Zealand, and Hong Kong.^{Footnote 2}

Several comprehensive reviews of comparable topics can be found in the literature. For example, Hollingsworth (2003, 2008); Hollingsworth et al. (1999) broadly reviewed the studies of healthcare delivery efficiency, focusing on the application and development of efficiency measures, the main findings, the indicators of output and quality, etc. The measurements of efficiency, including the indicators of input and output of a healthcare facility and the approach to evaluate the utilization, were also among the main concerns of the reviews by Hussey et al. (2009); O’Neill et al. (2008); Worthington (2004).

The novelty of our review primarily lies in two aspects. Firstly, to reduce the selection bias of the journal articles, we conducted a systematic review instead of the commonly employed conventional approach. Indeed, often the articles chosen by the authors, though mostly by field experts, might not be representative of the existing knowledge (Linnenluecke et al. 2020). Meanwhile, a systematic review tends to collect a more comprehensive set of available research and select journal articles by precisely predetermined criteria for further analysis (Linnenluecke et al. 2020; Tranfield et al. 2003). Secondly, we deployed bibliometric analytic techniques for visualization and network analysis, which have been widely used in surveys of other topics (e.g., Choi and Oh, 2019; Nepomuceno et al. 2022), specifically for the review of efficiency studies of hospitals. In turn, this helped in revealing the dynamic patterns of the most researched topics and the most productive authors for these topics.

The remainder of this article is organized as follows. After the introduction, selected previous reviews on the related topic are summarized in Section 2. Methodologies of article collection, processing procedure, and bibliometric analysis are discussed in Section 3. Section 4 presents a detailed analysis of the articles and methodologies. Qualitative and critical analysis of findings in selected articles are further discussed in Section 5. Section 6 provides reflections on the interpretations of efficiency results, and the concluding remarks are summarized in Section 7.

2 Key related works

Several key review studies have been conducted in the last three decades. Hollingsworth et al. (1999) conducted a seminal review of the global applications of nonparametric methods on healthcare efficiency published before 1997, and found that more than two-thirds of the applications used data envelopment analysis (DEA) or DEA-related techniques, and more than two-thirds of the research was about hospitals and nursing homes in the US. A few years later, Hollingsworth (2003) reinvestigated the study of healthcare efficiency with a broader spectrum of methods. Although using different measures, the studies regarding the hospitals in the EU typically reported a higher level of mean efficiency than the mean efficiency estimated for the hospitals in the US. Parametric methods, especially within the stochastic frontier analysis (SFA) paradigm, had been employed more widely, compared to the last survey, although the dominant methods were still DEA and DEA-related techniques. Under an overall perspective of frontier techniques, about three-quarters of the 188 reviewed articles are based on DEA, SFA, and their variants.

In another review of the hospital efficiency studies with DEA-based methods, O’Neill et al. (2008) reviewed 79 studies from 1984 to 2004, which also mainly focused on samples from the US and Europe. Besides the selection of inputs and outputs, this cross-national comparison study also revealed the difference in preference of the research topic and model selection. For example, they concluded that European researchers pay more attention to allocative efficiency than to technical efficiency, compared to those in the US.

Specifically for the SFA models, Rosko and Mutter (2008) reviewed 20 hospital efficiency studies in the US utilizing SFA. Varying assumptions were employed regarding the distributions of the composed error terms across different model specifications. Meanwhile, the consideration of the input and output variables is relatively consistent. SFA also exhibits a high level of flexibility, including the adaptation to panel data settings and the utilization of a two-stage framework to examine the influence of environmental variables on (in)efficiency. In the subsequent review of SFA on the US hospitals, Rosko and Mutter (2011) focused more on the discussion of the empirical findings of the influential factors to hospital efficiency and potential policy implications.^{Footnote 3}

In a nutshell, three primary types of measures of efficiency have been developed to satisfy the requirements of researchers and policymakers. These are technical efficiency, allocative efficiency, and a combination of both (often referred to as overall efficiency). In turn, these are focused on maximizing output from a given input, or minimizing input with expected output, or optimizing over inputs and outputs jointly (Färe et al. 2019; Worthington, 2004). Other efficiency measures, such as cost, revenue, profit, and scale efficiency, may be also utilized in the context of hospitals (e.g., see Sickles and Zelenyuk (2019, Chap. 3, 8) for more details).

Worthington (2004) reviewed the efficiency measurements applied in healthcare topics, focusing on the frontier techniques. The author considered the main approaches, such as DEA, SFA, the Malmquist index (MI), and their combinations, the implications, including the input and output indicators, and explanatory approaches of the differences in efficiency. In conclusion, although the efficiency measurement has attracted more and more attention in the early 2000s, the applications of advanced frontier techniques are still in a rudimentary stage (Worthington, 2004).

Subsequently, Hollingsworth (2008) reviewed a broader collection of 317 published studies. The widely preferred techniques have remained constant from prior years. Besides, sharing the same viewpoint with other discussions and reviews on this topic, the author found that output indicators are mostly for physical performance, such as inpatient days, without considering the quality of the treatment. Only 9% of studies included outcome measures, such as the mortality rate and changes in health status. Another weakness is that only a few studies tested methods with statistical inference or sensitivity analysis. Furthermore, technical efficiency is a primary focus of most studies, while only a few of the studies focused on allocative efficiency.

Hussey et al. (2009) stated similar opinions about the deficiency of current research, especially in the aspects of output and statistical tests. By reviewing articles in Medline and EconLit from 1990 to 2008, the authors identified 265 measures in peer-reviewed articles and 8 in so-called ‘gray literature’. As defined in Paez (2017), gray literature is the research produced by academics, government, industry, etc., which is not controlled by traditional publishers. Their systematic review focused on the efficiency measures and tried to create a mutual understanding of the adequacy of these approaches. Following McGlynn et al. (2008), the measures were classified into three branches: “perspective”, “inputs”, and “outputs”. Among the 265 measures abstracted from the 172 reviewed articles, the production of hospital services, such as length of stay and cost per discharge, was the most commonly used indicator. Half of the measurements used physical resources to reflect the input, while one-third used costs and one-quarter used both as input indicators. For outputs, most studies counted the healthcare services, for example, discharges, procedures, and physician visits. In rare cases was quality integrated into the output, which was the most concerning issue in the review and also an enduring focus of discussion in the field. Meanwhile, another empirical issue is that only about 2.3% of the articles included tests of reliability or validity, while sensitivity analysis was considered in about one-quarter of the articles, even though it is commonly used in multivariate statistical models.

More recently, Hadji et al. (2014) conducted a systematic review of studies regarding both productivity and financial outputs of hospitals from 1990 to 2013. Among the 38 articles reviewed, they summarized a taxonomy of the most commonly considered input/output categories, which are mainly applied to the prevalent DEA and SFA methods. Nepomuceno et al. (2022) systematically reviewed 65 hospital efficiency studies from 1996 to 2022, highlighting the effectiveness of bibliometric techniques in paper selection, visualization, and notably bolstering subsequent critical and qualitative analysis.^{Footnote 4}

Consequently, according to the conclusive reviews by the field experts, the efficiency studies on healthcare show a prosperous atmosphere, especially in the US and the EU. However, there is a lack of review of similar topics for particular countries other than the US and we try to bridge this gap by providing such an analysis for Australia and some of its peers. One may wonder why would Australia be interesting for a general audience of this journal. Basically, it is because the Australian healthcare system is one of the best in the world (albeit still with room for improvement), and knowledge about it can help other countries. Indeed, in their recent report, Schneider et al. (2021, p.14) concluded that

"International comparisons allow the public, policymakers, and health care leaders to see alternative approaches to delivering health care, ones that might be borrowed to build better health systems that yield better health outcomes. Lessons from the three top performers we highlight in this report—Norway, the Netherlands, and Australia—can inform the United States and other countries seeking to improve.”

For the same reasons, we hope that the insights about the studies on the performance of hospitals in Australia and its peers, with similar healthcare system structures and comparable levels of economic development, presented in this paper will also help researchers in other countries.

3 Methodologies and data

3.1 Collection of articles

In line with the article collection process in Hussey et al. (2009), we employed a systematic collection of the published articles and gray literature regarding the hospital efficiency studies conducted in Australia and its peer countries and regions. Following the procedure of bibliometric data collection in the literature (e.g., Choi and Oh, 2019 and Linnenluecke et al. 2020), our collection process aims at capturing a confluence of possibly all related published articles and gray literature.

Firstly, we chose Scopus as the main database for its comprehensive coverage of healthcare research and the adaptive format for mainstream bibliometric analysis techniques. Meanwhile, the Web of Science or more specialized platforms such as Medline or PubMed provide complements.

Subsequently, in the Boolean search, our review topic could be decomposed into three categories, i.e., “location”, “topic”, and “object”. The “location” restricts the interested country or region. Besides the country names, the sub-level district names were also included. For example in Australia, apart from “Australia” and “Australian” as the keywords, “Queensland”, “Victoria”, and names of all the other States and Territories were included as well. Moreover, the commonly used abbreviations, such as “UK” to “United Kingdom”, and synonyms, such as “British” and “Britain”, were also listed simultaneously. Since there are five countries and regions of interest, we listed five keyword groups for Australia, the UK, Canada, New Zealand, and Hong Kong, respectively.

The “topic” limits the research regarding “efficiency”, “inefficiency”, “productivity”, and “performance analysis”, which are the same for all targeted countries and regions. Finally, the research “object” is constrained among “hospital”, “healthcare” (or “health care”), and “health services”. Whilst our focus is “hospital”, we opted to encompass similar phrases, which are slightly more generalized, in order not to overlook any crucial material for further analysis.

The result of the Boolean search is the intersection among the three categories, obtained by the logical operator “AND”. Meanwhile, in each category, the selected terms are searched in every specified field (e.g., title, keyword (including author keys and index terms), or affiliation^{Footnote 5}) and combined with the logical operator “OR”. Another configuration worth noting is that the wildcard “*” is added at the beginning and end of the one-word keywords, such as “efficiency”, to allow similarly spelled terms, such as “inefficiency”. An illustration of the Boolean search employed is presented in Fig. 1.

As for gray literature, we searched similar keywords in Google Scholar and RePEc. For example, regarding Australia, we found 6 papers, which were not included in the previous search but were very close to our review topic. We reviewed these studies together with those collected from the published platform, but due to one limitation of our study that we will discuss in later sections, the gray literature could not be included in the subsequent bibliometric analysis because of the lack of bibliometric information.

The collection at this stage is usually too wide for effective review or analysis. Similar to the procedure used in the literature, we dropped unrelated articles by manually reviewing them with a set of predetermined criteria. The selection criteria were determined by field experience and were adjusted by the condition of the existing paper pool, aiming at extracting every article that is directly relevant to the theme of the review.

Accordingly, we acquired five sets of published articles regarding each interested country and region via the Scopus database, containing 2153 studies for Australia, 5398 for the UK, 6872 for Canada, 337 for New Zealand, and 192 for Hong Kong from 1970 to 2023.^{Footnote 6} Following a process shown in Fig. 2, after excluding the articles containing disease or department terms in the title, we reviewed the contents of the remaining articles and identified those that were closely relevant to our review topic. Consequently, we elected 12 articles for Australia, 17 for the UK, 10 for Canada, as well as 9 and 5 for New Zealand and Hong Kong, respectively.

3.2 Bibliometric analysis techniques

Academic knowledge is expanding so dramatically that it is becoming increasingly challenging for researchers to comprehensively review, analyze, and understand a field relying only on manual reading (Linnenluecke et al. 2020). In addition to the efficiency, the massive resources also force researchers to choose “high quality” materials but not necessarily consider a broader range of evidence (Tranfield et al. 2003). Visualization and mapping techniques, due to their remarkable capability to facilitate the comprehension of a substantial volume of knowledge in a variety of dimensions, have attracted widespread interest from researchers in various fields in recent years. In fact, it is more than a beneficial tool, but to some degree a necessary route to conduct a comprehensive systematic review.

Bibliometric analysis, such as co-citation analysis (Small, 1973) and co-word analysis (Callon et al. 1983), has been long applied in mapping the literature (Nieminen et al. 2013). With the development of big data methods, scientometric researchers have programmed functional tools to solve the analysis demand on comprehensive data sources.

A number of well-developed tools are based on the Java platform, such as VOSviewer (Van Eck and Waltman, 2010), Sci2 (Science to Science Tool) (Sci2 Team, 2009), and SciMAT (Cobo et al. 2012), which are powerful in mapping visualization and network analysis. These Java-based tools are also usually equipped with text-mining functions to handle data modification in advance of mapping. There are some other tools possessing similar capabilities, such as CiteSpace II (Chen, 2006), whose main functions are similar to Sci2, and Network Workbench Tool (NWB Team, 2006), which is also functional and has open access to data modification. Another series of tools is developed in the statistical programming language R (R Core Team, 2023). One advantage of these open-source packages is the flexible and extensible working environment, where researchers and practitioners could continuously provide updates for the functions (Linnenluecke et al. 2020). A representative package developed in R is Bibliometrix (Aria and Cuccurullo, 2017), which supports both descriptive analysis and network analysis. Additionally, with the deployment of another package, Shiny (Chang et al. 2023), the functions of Bibliometrix could be applied in a user-friendly interaction interface through web applications. There are some other tools developed for mapping and network analysis, such as Pajek (Batagelj and Mrvar, 1998) and Gephi (Bastian et al. 2009), whose functions could be amalgamated with other applications, such as Bibliometrix and VOSviewer. Another pioneer software Histcite (Garfield, 2009) was developed for network analysis of key authors and articles, which however is no longer in development now.

Most of these popular tools are designed to allow importing bibliometric data from Scopus, Web of Science, etc., which are usually stored as either Bibtex (bib), Plain Text (txt), or RIS (ris). However, the storage formats of different platforms are incompatible. Taking into account the functional requirements and the characteristics of our bibliometric data set, we mainly use Bibliometrix and VOSviewer in the subsequent analysis.

4 Analysis

We start our analysis by employing a series of bibliometric techniques to obtain a panoramic perspective on the research regarding the productivity and efficiency of hospitals within the selected countries and regions. This overview serves as a prerequisite for those seeking a deeper understanding of the research landscape. It provides fundamental yet pivotal insights into the dynamics of prevalent research topics and methodologies, the identification of principal contributors to the field, and the interconnection between local research endeavors and the broader global literature. We then delve deeper into each of the selected papers to identify the common inputs, outputs, and quality indicators, as well as major techniques utilized in the analysis of the productivity and efficiency of hospitals within these countries and regions.

4.1 Overview of the research landscape

4.1.1 The dynamics of research topics and methodologies

To depict the prominent research topics and methodologies, a word cloud of keywords was generated for each country and region, where general terms (such as country names) were omitted to emphasize the methodologies and research topics. In Fig. 3, the size of each term corresponds to its frequency relative to other terms within the same country or region. As can be seen from Fig. 3, in addition to output assessment terms, such as “mortality” and “length of stay”, researchers in Australia preferred using “risk assessment” and “cost-benefit analysis” in measuring “organizational efficiency”. Meanwhile, research in Canada focused more on methodology terms, such as “DEA”, “bootstrapping”, “Monte Carlo method”, and “regression analysis”. In New Zealand, researchers exhibited a greater focus on “public health” and “primary health care”, while frequently employing “cost-benefit analysis”, “DEA”, and “Monte Carlo method”. Researchers in the UK devoted more attention to the “National Health Service” (NHS), “state medicine”, and “quality”, as well as the novel technique, “machine learning”. In Hong Kong, the researchers are primarily concentrated on “health care delivery”, which is also a popular topic in Canada and New Zealand. Hong Kong’s research also has placed particular emphasis on “population density”, distinguishing it from its peers.

In a dynamic perspective of the key topics and methods, as illustrated in Fig. 4, the records of keywords from our interested countries and regions were combined and redivided into several periods: prior to 2005, 2006 to 2010, 2011 to 2015, and 2016 to 2023. Using the same generating algorithm, the word clouds of later years appear to be denser and more informative, primarily because the majority of research was conducted after 2010. Research in earlier years focused more on the qualitative discussion about the effects of cost, policy, and reform on efficiency, while the focus later shifted to empirical methods, with “cost benefit analysis”, “DEA”, and “machine learning” gradually becoming more prevalent. Notably, “Quality” was extensively discussed in the years from 2006 to 2015 but less so in more recent years.

Another dynamic trend worth exploring is the evolution of the keywords. The results of co-occurrence analysis of the modified keywords using VOSviewer (Van Eck and Waltman, 2010) are as depicted in Figs. 5 and 6. The connections between the word frames denote co-occurrences between pairs of terms. The size of each frame reflects the number of occurrences of the respective keyword, while the dimensions represented by different colors differ between the two figures.

The frame in Fig. 5 is color-coded based on the average publication year. Initially, a period is derived from the publication time of all the published journal articles in the sample. Accordingly, each year in the period is scored chronologically. For a given keyword, the score of publication year S_i is the average of the scores ${S}_{i}^{j}$ of all the studies which have used it, i.e.,

$${S}_{i}={m}^{-1}\sum\limits_{j=1}^{m}{S}_{i}^{j},$$

(1)

where i represents the ith keyword in the sample, m is the number of studies that mentioned this keyword, and j denotes the jth article among these m articles. Finally, the color gradient from warm to cold hues is assigned to each keyword based on the scores $S={\{{S}_{i}\}}_{i = 1}^{w}$, where w represents the number of keywords included in the analysis. As a result, the more frequently a keyword appeared in recently published journal articles, the warmer the color of the frame would be. Analogously, a frame with a colder hue indicates that the term was more frequently employed in earlier publications.

Meanwhile, the color of the frame in Fig. 6 serves as an indicator of the average number of total citations of the journal articles containing the particular term as a keyword. The color gradient is determined by the citation scores $C={\{{C}_{l}\}}_{l = 1}^{q}$, where q is the number of keywords and l indicates the lth keyword in the sample. The citation score for a certain keyword is

$${C}_{l}={p}^{-1}\sum\limits_{k=1}^{p}{C}_{l}^{k},$$

(2)

where p is the number of relevant studies that used this keyword, k represents the kth study among these p studies, and ${C}_{l}^{k}$ is the number of citations of the kth study containing the l^th keyword. Therefore in Fig. 6, the published studies utilizing the warm-colored terms as keywords receive a higher citation count on average than those using the cold-colored keywords.

Consequently, the evolution of keywords reveals three main periods of different research interests among our interested regions. In the early years around 2005, “cost benefit analysis”, “health care organization” and “teaching hospital” were the predominant research subjects. Whereas during the subsequent period around 2010 to 2015, more advanced methods and indexes emerged within the field, i.e., from “organization management” and “healthcare delivery” to “length of stay”, “hospital quality”, “technical efficiency”, and “DEA”. Finally, new terms were introduced in recent years, such as “efficiency frontier estimation”, “bootstrap”, “factor analysis”, and “cost efficiency”. Moreover, this keyword shift over time also corresponds to that unveiled in the previous word clouds.^{Footnote 7}

The network in Fig. 6 implies that research focusing on “DEA”, “diagnosis-related group”, and “Malmquist productivity index” (MPI) has received the highest number of citations, which indicates the popularity of these methods in this topic. When cross-checked with Fig. 5, most of the terms that emerged in recent years have obtained relatively fewer citations, which may be due to the limited time of circulation in the field. Nevertheless, the methodology terms, such as “bootstrap”, “DEA”, and “MPI”, were not prevalent in the early stage, but they have received a considerable amount of attention in more recent years.

4.1.2 Top contributors

To gain an overview of the top contributors to the research on the performance of hospitals across all the countries and regions of interest, we plotted the number of publications by the most productive authors (in descending order) in a time series chart. As shown in Fig. 7, the articles authored by the productive researchers span from 1995 to 2023. The connecting lines represent each piece of work of a specific researcher in our paper pool, with the size of the blue dots representing the number of articles published each year, ranging from one to three. Additionally, the depth of color at each point is determined by the average number of citations per article per year, ranging from approximately 0.2 to around 5.4.

Andrews has been the most productive author, publishing five articles on New Zealand hospitals in the last decades. In Australia, Yong stands out as the most prolific author over the past decades and is also the most active local collaborator. The focus of research in Canada is on the period from 2008 to 2016, during which Chowdhury and Laporte, along with their co-authors, conducted the majority of the studies. Research in Hong Kong is relatively scarce compared to other peers and thus not presented in the plot. In general, most of the articles were published after 2011, aligning with the conclusion drawn from the time series word clouds plot.

In addition to studying productive authors individually, we aimed to uncover connections among them by providing a comprehensive overview of the relationship between the productive authors, prevalent keywords, and frequently published sources with a three-field plot (also referred to as the Sankey diagram) for each country. Sankey diagrams are designed to visualize the flow of networks and processes with the arrow and the width (Froehlich, 2005). The three fields utilized are keywords (both author keys and index terms), authors, and published sources, arranged sequentially from left to right. The central field is the “authors”, where the most productive authors were identified for each country. The width of the flow between author and keyword or between author and source represents the degree of relevance. Not all of the keywords or sources of each author are included in the analysis, but only the most commonly utilized ones among various authors are depicted.

As shown in Fig. 8 for Australia, the productive authors in the middle field exhibit groups of highly productive collaborators. For example, Chua, Palangkaraya, and Yong, whose works contain similar keywords (author keys and index terms) on the left side, have articles published in Economic Record and Health Economics as displayed on the right side. Moreover, if we focus on research methods and research subjects in the keywords on the left side and disregard general terms such as “Australia”, “article”, and “humans”, some tendencies in local research can be discerned, specifically, “risk assessment” and “mortality” are the most frequently mentioned subjects.

Analogous diagrams were generated for other peer countries for comparisons.^{Footnote 8} As shown in Figs. 8–11, it is similar to the case in Australia that productive researchers of this topic in the UK also display a pattern of collaboration, as the co-authorship among Bojke, Castelli, Street, Laudicella, and Ward. Furthermore, researchers in Canada and New Zealand demonstrate a greater emphasis on the DEA method, e.g., the Rouse, Harrison, and Turner group in New Zealand. It is also shown by keywords on the left side of the diagram that researchers in New Zealand exhibit a greater interest in assessing private healthcare.

4.1.3 Connection to the global literature

To understand the connection between the local and global literature on hospital performance, we first examined the sources and authors most cited by local researchers in each region. For Australia, as summarized in Table 1, Health Economics is not only a journal where local research is frequently published, but also the most frequently cited journal by local researchers. Turning to the most cited authors, as listed in Table 2, researchers with global reputations in efficiency analysis and the healthcare sector, including Simar, Wilson, Zelenyuk, Braithwaite, Grosskopf, Färe, are among others commonly cited by Australian researchers on this topic.

Table 1 Most cited sources by region

Full size table

Table 2 Most cited authors by region

Full size table

Another noteworthy phenomenon is the resemblance of the most cited authors by researchers in different countries and regions. Most of the top cited researchers in studies of Canadian hospitals are based in the US, such as Grosskopf, Färe, and Valdmanis, who are also among the most cited in the studies on Australia, which indicates that Australian research closely tracks the works of North America. On the contrary, the top cited researchers in the studies on the UK are local experts, such as Street, Castelli, and Gravelle. The higher level of independence of research in the UK may be due to the special topics of interest, such as the performance of the NHS. The condition in New Zealand is similar to that in Australia that local researchers tend to follow the global prominent researchers in the field, as well as the regional leading authors.

Finally, we collected the global citations of local studies and the top ones within each country and region are listed in Table 3. When comparing the total citations of these top-cited journal articles, studies in the UK and Canada were more influential in the field than those in Australia, New Zealand, and Hong Kong.

Table 3 Influential journal articles in each region

Full size table

4.2 Input, output, and quality indicators

Healthcare services can be considered as a production process, where inputs are transformed into outputs by the production decision-making units (DMUs), in our focus, the hospitals. Among the reviewed studies, as well as the literature (and reviews) of a wider range of countries on this topic,^{Footnote 9} the most commonly applied inputs include different types of labor (e.g., hours of medical and surgical staff, nurses, administrative staff, etc.), supply expenditure (e.g., medical and surgical supplies and cost of other consumables), equipment expenditure (e.g., physical investment and cost of equipment), and capital investment (usually represented by the number of staffed beds). Two categories of outputs are generally evaluated for the hospital performance: the outpatient service (e.g., number of outpatient visits, number of ambulance visits, etc.) and the inpatient care (e.g., number of admitted/discharged episodes, number of bed days, number of case-mix weighted procedures, etc.).

The production process of a hospital, and the healthcare sector in general, is different from that of other sectors, as the final ‘product’ or the outcome of the service should be about improving the health of the patients, rather than admitting and discharging patients efficiently. Hence when feasible, it would be more desirable to evaluate the performance of hospitals by the outcome. However in practice, the aforementioned output categories are the most frequently considered dimensions in the evaluation of hospital performance. One reason for this is usually the lack of data availability on the outcomes, as well as the difficulty in quantifying the health improvement of a procedure.

Much effort has been devoted to remedying this inadequacy (though not many studies address this aspect in our review scope). Some indicators designed for specific procedures or the entire hospital can be recorded to represent the quality of the services (e.g., the mortality rate during admission, the post-surgery infection rate, and the readmission rate), which in turn may indicate the outcome. In practice, the indicators can be adapted into the output as a weight or an independent category, or considered as an exogenous variable to explain the variations in efficiency (e.g., Chua et al. 2011; Deng et al. 2019; Ferrier and Valdmanis, 1996). As proxies of the outcome, the currently developed quality indicators, as reviewed by Breyer et al. (2019); De Vos et al. (2009), are primarily focused on undesirable outcomes, aimed at controlling the lower limit of the service. For instance, in our reviewed studies, Andrews et al. (2022) evaluated the trade-off between technical efficiency and quality of hospitals in New Zealand, where the quality indicator is represented by the undesirable output–the readmission of inpatients. Nevertheless, incorporating the quality indicators in the performance analysis of hospitals is a step closer to the evaluation based on the fundamental objective of healthcare.

As indicated in reviews by Hollingsworth (2008); Hussey et al. (2009); O’Neill et al. (2008), the application examining the quality of care is sparse in the literature, which is also the case in our review. Akin to the issue of evaluating the outcome directly, the difficulty in data collection is usually one of the main reasons for this.

4.3 Major techniques

Various performance indicators are used for measuring hospital efficiency. For example, the length of stay (LOS) was analyzed by Bogomolov et al. (2017); Hanning (2007). The ratio-type indicators, e.g., total factor productivity (TFP) between periods or hospital groups, is another vein of computationally simple and easy to interpret measures (e.g., Aragón et al. 2019; Bojke et al. 2013; Cheng et al. 2020; Andrews and Emvalomatis, 2023).^{Footnote 10} The prevalence of the two most widely adopted techniques in the efficiency analysis field is also observed among our reviewed articles as in other reviews on hospital efficiency (e.g., Hollingsworth and Wildman, 2003; Hollingsworth et al. 1999; Worthington, 2004). Both approaches belong to the vein of “frontier analysis”.^{Footnote 11} One is SFA that was first proposed by Aigner et al. (1977) and Meeusen and van den Broeck (1977) and has been further advanced in many studies, while the other method is DEA, which was originated by Farrell (1957) and advanced by Charnes et al. (1978) and many others.

DEA is a nonparametric method derived from production theory axioms and it is relatively easier to apply in the case when both inputs and outputs are multi-dimensional. In the healthcare sector, it also has advantages in flexibility and versatility (O’Neill et al. 2008), and hence it is widely applied in our reviewed articles, e.g., Chua et al. (2011); Nghiem et al. (2011); Nguyen and Zelenyuk (2021a); Nguyen and O’Donnell (2023) for Australia, Giuffrida (1999); Hollingsworth et al. (1999); Parkin and Hollingsworth (1997) for the UK, Chowdhury and Zelenyuk (2016); Fixler et al. (2014); Wang et al. (2018) for Canada, Andrews (2021); Andrews et al. (2022) for New Zealand, and Li et al. (2019) for Hong Kong. At the same time, DEA can be more sensitive to outliers (which must be dealt with, if any, prior to the deployment of DEA) and it requires a relatively large sample for statistical inference with a high dimension of inputs/outputs. Meanwhile, as the estimated production (cost) frontier of the envelopment-type estimator is downward (upward) biased from the boundary of the true technology set, the estimated efficiency is also biased. Hence, to obtain more accurate and reliable efficiency estimates, a bias correction procedure as well as the estimation of confidence intervals are necessary following the initial estimation.^{Footnote 12}

The SFA method, on the other hand, decomposes the regression-type overall error term into inefficiency and statistical noise, which makes it less sensitive to extreme values. It also provides more interpretable results (e.g., estimated coefficients of the assumed production relationship, etc.). SFA appears to be more frequently applied in Australia, e.g., Gabbitas and Jeffs (2009); Nghiem et al. (2011); O’Donnell and Nguyen (2013); Productivity Commission (2009, 2010); Wang and Zelenyuk (2024). It is also deployed in Street (2003) for the UK and Jiang and Andrews (2020) for New Zealand. However, SFA is usually implemented in a parametric framework, and therefore requires additional assumptions on the frontier and error terms, which may lead to potentially high estimation errors and, could possibly, influence the conclusions.

The two mainstream approaches have been widely compared, while also frequently jointly applied in the literature (e.g., Jacobs, 2001; Linna, 1998; Varabyova and Schreyögg, 2013 in the context of healthcare). The choice may be more analysis-oriented but does not rely on the pros and cons of each method. Moreover, a hybrid approach may attenuate the limitations of both approaches and apply leverage on their respective virtues at the same time. For example, the semi-parametric and nonparametric SFA (e.g., see the review by Parmeter and Zelenyuk, 2019) and stochastic DEA (Simar and Zelenyuk, 2011) and StoNED (Kuosmanen and Johnson, 2017).

In either vein of techniques, explaining the efficiency is a critical task after the estimation. After all, it is not the level of efficiency or ranking but the potential policy implementations to improve the performance that matter. After obtaining the estimates in DEA, researchers have applied a variety of regression-type methods to explain the efficiency with environmental variables, e.g., from ordinary least squares (OLS) (Street (2003) for the UK) to Tobit regression (Guo et al. 2017 for Hong Kong), and truncated regression in the most widely applied two-stage DEA approach based on Simar and Wilson (2007) (e.g., Chua et al. 2011 for Australia, Chowdhury and Zelenyuk, 2016; Wang et al. 2018 for Canada, and Andrews, 2020a for New Zealand). In the stream of SFA, a number of models were also developed to explain the inefficiency term by incorporating the environmental determinants into the assumed distribution. The idea can be traced back to Kumbhakar et al. (1991) and was more popularized by Battese and Coelli (1995), which is also widely applied in the case of hospitals, e.g., in the US and the EU by Herr (2008); Rosko (2001, 2004); Vitikainen et al. (2010), as well as Sickles et al. (2024); Wang and Zelenyuk (2024) for Australian hospitals.

Concluding this section, a summary of these key findings of the bibliometric analysis is presented in Table 4. It is good to emphasize that all approaches have their advantages, as well as caveats and limitations, which may be more or less relevant depending on the context, data, or aims of the analyses—we will discuss this in more detail in the next section.

Table 4 Summary of key findings from the bibliometric analysis

Full size table

5 Summary of findings of reviewed research

Results and conclusions drawn from the reviewed studies across the selected countries and regions exhibit some degree of consistency in a number of aspects, while also indicating contrasting views in certain cases, and sometimes are also not comparable. In Table 5, we provide a succinct summary of selected articles with representative findings in our review and present a more detailed discussion in the following sub-sections.

Table 5 Key findings of selected research

Full size table

5.1 Levels of inefficiency

The level of efficiency and/or productivity is one of the fundamental aspects of analysis for this topic. Such estimates are usually reported in an aggregated measurement of the studied area. For example, Gabbitas and Jeffs (2009) suggested that improvements of up to 10% in productivity in aggregate for Australian public hospitals are possible as indicated in the SFA models. In the report of the Productivity Commission (2009), the average technical efficiency estimated with SFA was found to be similar between the 386 public and 122 private hospital sectors in Australia (2006/2007), which was about 20% below the hypothesized best practice. With regard to other countries of interest, the national technical efficiency of New Zealand public hospitals was evaluated at 86% (i.e., 14% inefficiency) on average from 2011 to 2017 by SFA and at 93% (i.e., 7% inefficiency) by DEA with variable returns to scale (VRS) assumption (Jiang and Andrews, 2020). In a dynamic view, research in the UK and Canada utilizing MPI and its decomposition with DEA (constant returns to scale (CRS) and VRS) generally indicated an improvement in productivity and/or efficiency during the studied period (e.g., Chowdhury et al. 2011; Giuffrida, 1999; McCallion et al. 2000; Valdmanis et al. 2017).

In the decade from 1999 to 2009,^{Footnote 13} SFA was among the most popular methods in the efficiency analysis of Australian hospitals.^{Footnote 14} It was widely applied in the studies on public hospitals (e.g., Victoria, 1994/1995 Yong and Harris, 1999 and New South Wales, 1995/1996 Paul, 2002), public acute hospitals (e.g., nationwide in Australia, 1996 to 2006, Gabbitas and Jeffs, 2009 and New South Wales, 1997/1998 Wang et al. 2006) and both public and private hospitals (e.g., nationwide in Australia, 2006/2007 Productivity Commission, 2009). The results indicated an approximate 75% overall efficiency level (i.e., 25% inefficiency) of public hospitals in New South Wales in 1995/1996, a 10% potential improvement in productivity in nationwide acute hospitals from 1996 to 2006, and 80% mean technical efficiency of Australian hospitals in 2006/2007. From a cost perspective, the SFA model revealed 3% mean cost inefficiency in Victoria in 1994/1995 and about 9% of inefficiency in total cost in New South Wales in 1997/1998. During a similar period, research on hospitals in the UK reached conclusions with more diverse approaches. In the analysis of acute hospitals in Scotland (1991 to 1994), DEA models with both CRS and VRS assumptions were conducted, which concluded an average production efficiency score of around 85% to 90% in different financial years, where DEA-VRS generally reported a higher level of efficiency (Parkin and Hollingsworth, 1997). In a comparison study of DEA (under the VRS assumption) and SFA, the benchmark NHS hospitals (1995/1996) were analyzed with both methods using several specifications. The mean level of technical efficiency estimated with the DEA models ranged from around 65% to 94%, while the SFA models estimated a mean efficiency level of around 85% (Jacobs, 2001).

In the last decade, more studies on Australian hospitals applied DEA and DEA-related methods, such as the analysis of the admitted episodes between 2003 and 2005 in Victorian hospitals (Chua et al. 2011). Following the two-stage semi-parametric approach (bootstrapping bias-corrected DEA and truncated regression) proposed by Simar and Wilson (2007), the researchers obtained estimates of an input-oriented technical efficiency score at around 72% to 82% on average. In the research of acute hospitals in Canada (Ontario, 2003 and 2006), the two-stage double-bootstrap DEA-CRS was applied to evaluate the main factors associated with output-oriented technical efficiency, where an average efficiency score between 70% and 75% was estimated for most types of hospitals, and a score of 90% was obtained for the teaching hospitals (Chowdhury and Zelenyuk, 2016). Meanwhile, a similar two-stage algorithm with DEA under VRS assumption was deployed by Wang et al. (2018) for Canadian acute hospitals in 2012/2013, which reported an average technical efficiency level of about 75%.

In more recent studies of Australian hospitals (Queensland, FY 2016/2017), Nguyen and Zelenyuk (2021b) employed DEA and Free Disposal Hull (FDH) estimators, as well as partial quantile frontier methods in estimating the individual and aggregated efficiency. In particular, the estimated efficiency levels were aggregated by local Hospital and Health Services (HHSs), through averaging individual efficiency scores weighted by individual output. The aggregated efficiency of the FDH estimators was within a range between 82% and 100%, while the estimates of DEA under VRS and CRS assumptions ranged from 57% to 99% and from 48% to 84%, respectively. In comparison, the results of the partial-quantile frontier model were between 82% and 136%, where the scores above 100% indicate a phenomenon of the so-called ‘super-efficiency’ observed for some hospitals, some of which can be considered as outliers relative to the rest of the sample (Nguyen and Zelenyuk, 2021b). In another study, the authors also utilized the bootstrapping method as proposed by Simar and Zelenyuk (2007) and the central limit theorem (CLT) method by Simar and Zelenyuk (2018) in aggregating the DEA (with VRS or CRS) estimates of the efficiency of Queensland hospitals (Nguyen and Zelenyuk, 2021a). When the VRS assumption was applied, the aggregated efficiency, bias-corrected with the generalized jackknife estimator proposed by Kneip et al. (2015); Simar and Zelenyuk (2018), was about 83% and 80% using the bootstrapping method and CLT method, respectively. Meanwhile, when the CRS assumption was applied, the bias-corrected aggregate efficiency following the two aggregation approaches was around 50% and 46%, respectively. Further discussion regarding the influence of returns to scale assumptions on the efficiency estimates will be presented in the next section.

5.2 Determinants of inefficiency

An important issue for further analysis is to identify the factors within a hospital that determine, impact, or are associated with the efficiency scores. The size of a hospital is one of the most frequently investigated factors in the studies for Australian hospitals, where the larger hospitals were typically found to be more efficient than the smaller ones (e.g., Bogomolov et al. 2017; Cheng et al. 2020; Chua et al. 2010; Nguyen and Zelenyuk, 2021b; Paul, 2002; Wang and Zelenyuk, 2024). While an alternative perspective through a stochastic frontier cost model indicates that smaller hospitals tend to be more labor-intensive and perform better in scale economies (Wang et al. 2006). In the research about Northern Ireland hospitals, the decomposed MPI using DEA with VRS and CRS suggested that smaller hospitals, started from a lower level of basis, tended to achieve more improvement in productivity during 1986 to 1992 (McCallion et al. 2000).

Chua et al. (2010) deployed a two-stage regression on Victorian hospitals and found that teaching hospitals demonstrated higher efficiency than the non-teaching hospitals, which was also related to their larger size. For hospitals in Queensland, Nguyen and Zelenyuk (2021a) aggregated the DEA estimated efficiency with the bootstrap and the CLT approaches, where the superior efficiency of teaching hospitals is observed when utilizing the VRS assumption. On the contrary, when under the CRS assumption, the aggregated efficiency of non-teaching hospitals is significantly higher. Meanwhile, a higher level of education function in a hospital usually leads to higher costs for the hospitals (Yong and Harris, 1999) and according to a SFA-based study, the higher level of education of the healthcare professionals (especially doctors and nurses, as well as managers at all levels) was found to contribute to the technical inefficiency of the hospital (Paul, 2002). Nevertheless, it is also important to acknowledge that a higher education of employees (especially medical staffs and nurses) may potentially result in better quality outcomes (i.e., improved healthcare, with lower mortality and readmission rates, etc.), even if the technical efficiency is comparatively lower due to the provision of fewer services.

The location of a hospital is another significant factor associated with the efficiency (Chowdhury and Zelenyuk, 2016; Lavers and Whynes, 1978). The relatively low efficiency of certain hospitals in Queensland, Australia, as demonstrated by DEA (CRS and VRS), FDH, and partial-quantile frontier estimators, was found to be partially explained by their remoteness (Nguyen and Zelenyuk, 2021b). Meanwhile according to the SFA model by Battese and Coelli (1995), no discernible relationship was observed between the location and efficiency level of hospitals in New South Wales (Australia) (Paul, 2002). In the context of Hong Kong, the public hospitals in more affluent districts showed a lower level of efficiency in a DEA study^{Footnote 15} and corresponding regressions, which also reflects the preference of people in better economic conditions to receive services from private hospitals (Guo et al. 2017). Hospitals in remote areas are important in ensuring equitable access to healthcare services to the local population, who are more likely to be limited in being insured, in poor health, and incur higher expenses. The rural hospitals also foster community cohesion and contribute to the local economy. Other than technical efficiency, these are also important factors for policymakers to consider. See Valdmanis et al. (2024) for more detailed discussions.

Some other factors or indicators also have been found to be significantly associated with hospital efficiency. For example, a longer LOS can negatively impact the efficiency of a hospital as suggested in a regression tree model by Ali et al. (2019), the two-stage DEA (with the VRS assumption) in Andrews (2020b), and the SFA model with determinants (i.e., the Kumbhakar et al. (1991) model) in (Wang and Zelenyuk, 2024). The lower level of occupancy rate indicated a positive relationship with the higher level of efficiency in Paul (2002) and (Wang and Zelenyuk, 2024), while the occupancy rate was inversely related to the efficiency estimated by SFA models with different specifications in Yong and Harris (1999). For the effect of the case-mix adjustment, the MPIs (with DEA) and decomposition of the specifications with or without case-mix weighted output were significantly different (Chowdhury et al. 2014).^{Footnote 16}

The type of hospital was also considered influential in some cases. In a comparative study of the operation and performance of public and private hospitals in Australia by the Productivity Commission (2009) and its supplement (Productivity Commission, 2010), the authors, inter alia, concluded that private hospitals showed higher partial productivity^{Footnote 17} in admitted-patient care than public hospitals through several specifications of the classic SFA model (Aigner et al. 1977; Meeusen and van den Broeck, 1977). Importantly, they also concluded that public hospitals provided a higher proportion of non-admitted patient care (including outpatient and emergency department visits) in the sample while private hospitals preferred to treat the least morbid patients (Productivity Commission, 2009). As for technical efficiency, the public contract hospitals^{Footnote 18} performed the most efficiently in both the output-oriented and input-oriented conventional SFA models (Productivity Commission, 2010).

It is worth noting the potential caveats inherent in the analysis of the determinants of inefficiency. For example, the widely applied two-stage DEA proposed by Simar and Wilson (2007) relies on certain assumptions. Among others, the ‘separability’ assumption requires that the environmental variables only influence the inefficiency scores, but not the frontier, which may not always be supported by the data (Simar and Wilson, 2011). The ‘separability’ assumption is satisfied for the unconditional (or marginal) frontier, by construction, yet such a frontier may or may not be the relevant reference for some observations in the sample. If the sample size is large enough for an accurate inference, one may conduct the statistical tests in advance to ascertain the validity of the ‘separability’ assumption.^{Footnote 19}

Moreover, it is important to acknowledge the observed lack of application of causal inference methods in the context of explaining the inefficiency estimates whether within DEA, SFA, or other approaches (with or without the “separability” assumption). For example, even though it is typically impossible to conduct randomized control trials, methods including the regression discontinuity design (RDD) (Angrist and Lavy, 1999; Black, 1999), the difference in differences (DID) (Card, 1990; Card and Krueger, 1994), and synthetic controls (Abadie and Gardeazabal, 2003), among others, could be adapted in a two-stage DEA framework to further explain the change of efficiency level.^{Footnote 20}

The adaption of advanced machine learning methods also exhibits considerable potential for further research within our reviewed topic. For example, the least absolute shrinkage and selection operator (LASSO), introduced in Tibshirani (1996), can be incorporated with DEA (Chen et al. 2021; Lee and Cai, 2020) to enhance the performance in high-dimensional scenarios. The regression or classification utilizing random forests (RF) (Breiman, 1996a, b; Breiman et al. 1984), artificial neural network (ANN) (Rosenblatt, 1958; Rumelhart et al. 1986), and other machine learning algorithms, can also be adapted into the explanation of the efficiency estimates of DMUs.^{Footnote 21}

5.3 Policy implications

The objective of this subsection is to briefly summarize the key policy implications that we found from the studies mentioned above regarding Australia and its peers. For Australia, while many questions were tackled and various policy implications were made explicitly or implicitly in different studies, the primary policy implication appears to be about the size of hospitals. For example, some studies advised the consideration of merging small hospitals that are situated close to one another in Victoria, Australia, in order to leverage the substantial advantage among large hospitals in the level and growth rate of TFP (Cheng et al. 2020). Similar recommendations were reached for hospitals in Northern Ireland, the UK (McCallion et al. 2000), and in Ontario, Canada (Chowdhury et al. 2011) where concentrating services in large hospitals was suggested to contribute to the improvement of technical efficiency.

In addition, small hospitals are usually located in remote areas, whereas teaching hospitals are typically larger in size and situated in urban areas. However, when considering merging small hospitals in remote areas in Australia, it is also necessary to emphasize the crucial contribution of rural hospitals in fulfilling special clinical functions and guaranteeing equitable access to healthcare services for local residents (Bogomolov et al. 2017; Cheng et al. 2020; Nguyen and Zelenyuk, 2021b). Another concern was raised regarding the consolidation of small hospitals in Canada. Although the technical efficiency has been improved, the declined scale efficiency indicated that larger facilities face greater difficulties in managing and coordinating resources to absorb the technical advancement due to the decreasing returns to scale (Chowdhury et al. 2011). Alternatively, for Australia, a possible policy implication could be to promote the remote healthcare delivery models, such as Telehealth (Nguyen and Zelenyuk, 2021b), or to provide bonded medical training programs (Nguyen and O’Donnell, 2023). Similarly, in the context of New Zealand, video conference techniques are also believed to play a vital role in addressing the staff shortages (Andrews, 2020a).

For the Canadian healthcare system, it was concluded that the process of discharging patients to post-acute care is a key driver of efficiency for large and medium size non-teaching hospitals. Accordingly, provinces were suggested to increase the capacity of post-acute care and improving the smoothness of transition procedures (Wang et al. 2018). Meanwhile, a reassessment was recommended for the funding policy in New Zealand to prevent the incentives for secondary care facilities from keeping the patients longer than required, which in turn increases the length of stay and the level of inefficiency (Andrews, 2020a).

In Scotland, UK, it was also found that technical change in a given period exhibited a negative impact on the change in the subsequent period, which may be due to the time required for hospitals to assimilate the novel treatment technologies. Consequently, another policy implication was to moderate the escalation of the same type of input in a contiguous period to prevent the over-concentration of investments and constrain the increase of costs (Valdmanis et al. 2017).

The findings concluded for countries with similar healthcare systems suggest feasible pathways for the further investigation of Australian hospital efficiency and potential reform strategies. These include exploring the incentive mechanism and influence of funding policies to the efficiency of healthcare facilities, the role of post-acute care in improving the hospital efficiency, and the potential decreasing returns to scale resulting from the consolidation of hospital functions.

To conclude this section, it is worth reminding again that the aforementioned policy implications were derived (explicitly or implicitly) from particular approaches/models that had certain assumptions (e.g., orientation, choice of variables and of reference, etc.), some of which might be critical for the arrived conclusions. Performing extensive robustness analyses of such conclusions with respect to various assumptions is paramount in such circumstances, before deriving the policy implications.^{Footnote 22}

Table 6 Selected policy implications from reviewed studies

Full size table

Overall, the collection of the reviewed articles utilized a spectrum of approaches, predominantly including DEA, SFA, their variants, and various productivity indices, to gauge the levels of efficiency of local hospitals and their changes over time. Further analysis, e.g., regression (OLS, bootstrap truncated regression, etc.) and decomposition of productivity indices, among others, was deployed in many studies to investigate the possible determinant variables of the estimated efficiency. A myriad of features of hospitals and exogenous factors were taken into account, while some determinants share similar effects across different countries. For example, hospitals that are small in size, located in remote areas, or without teaching functions, often obtained a lower level of technical efficiency. Ultimately, various policy recommendations were proposed based on the efficiency level, the identified determinants, as well as the regional sociocultural situations. Interestingly, divergent implications stemming from different considerations were sometimes suggested even with the same or similar findings. Last, yet not least, besides technical efficiency and cost efficiency, the access and quality of healthcare are also very important to take into account when making policy decisions, which sometimes appear to be overlooked in the literature.^{Footnote 23}

6 Reflections on the interpretations of efficiency results

Before concluding the paper, a closer look at how to compare and interpret the different efficiency scores, and related caveats, is in order. First of all, it is worth reminding that the DEA estimate of efficiency under the VRS assumption can be (and often is) relatively higher than that estimated under the CRS assumption. As a hypothetical example, consider five hospitals {A, B, C, D, E} in a one-input-one-output scenario, and suppose the frontiers estimated by DEA from a large sample with CRS and VRS are depicted in Fig. 12 with the solid and dashed lines, respectively. Under the VRS assumption, hospital C lies on the estimated frontier, and hence its estimated efficiency level would be higher than that of hospital B. On the other hand, while hospital C is technically efficient under VRS, it is much more inefficient compared to hospital B if the CRS assumption is employed.

Furthermore, consider hospital D: it is efficient under VRS for the output-oriented approach, yet very inefficient under VRS for the input-oriented approach. Similarly, hospital E is efficient under VRS with the input-oriented approach, yet very inefficient under VRS for the output-oriented approach. It is worth noting that such ambiguity does not happen under CRS: theoretically, input and output orientations are equivalent with respect to any CRS technology.

An important question, therefore, is which reference (CRS, VRS, etc.) to choose, and which orientation is more appropriate to take. By and large, it depends on the research questions.^{Footnote 24} For example, if the goal is to benchmark efficiency relative to the individual frontier of a DMU, then it would be important to conduct appropriate statistical inference (tests) to identify the most likely true individual technology (whether it is CRS, VRS, or perhaps non-convex). A caveat here, however, is that such tests may require too strong underlying assumptions for the tests to be valid. One such assumption is usually that technology is identical for all the DMUs in the relevant sample, which can be far from the truth and, in fact, can be one reason for some DMUs to look less efficient than others. This reasoning sometimes motivates an alternative approach that allows the frontier to be conditional on some variables. While appealing, a caveat of this approach is that, potentially, one may find yet another conditional variable (and come up with a nice economic story for it), due to which a DMU of interest may get a very different (e.g., perfect) efficiency score than otherwise. Such a hunt for “a significant” conditional variable (which may depend on the researcher’s imagination and persistence) may lead to many or even all DMUs looking perfectly efficient, being near or on their individual frontiers. In other words, allowing for individual frontiers to depend on conditional variables (whether in DEA, FDH, SFA, or other methods), while it sounds appealing, might be likened to opening a ‘Pandora’s box’ that may defeat the meaningfulness of an efficiency analysis. Moreover, the estimated efficiency scores in such an approach may not be comparable if they are measured with respect to different conditional frontiers.^{Footnote 25}

Another related caveat in this framework is that, often, the currently used approaches that allow for conditional variables in frontier analysis are not based on causal inference. For example, they may not be accounting for possible endogeneity, or reverse causality, or might be identifying spurious relationships between the conditional variables and the dependent variable, which is also important to keep in mind when deriving policy implications.^{Footnote 26}

An alternative look at the problem is to admit that each firm may indeed have its own technology that may be unique relative to others and potentially be very different from CRS or VRS or FDH-type technologies. And, the goal may not be to estimate those individual technologies (which are hard to identify without too strong assumptions), but rather to estimate the unconditional best practice aggregate technology for the observed DMUs, and then measure the individual efficiency relative to it. How will such an aggregate technology look? Interestingly, even if the individual technologies are non-CRS and non-convex and very different across the individuals, the aggregate technology (defined as the summation of the individual technology sets) may still be approximately convex (due to the Shapley-Folkman lemma) and, under fairly mild conditions, also approximately CRS.^{Footnote 27} Importantly, the use of such a reference is coherent with benchmarking relative to a socially optimal level, i.e., the level of optimal scale or the highest average productivity (e.g., hospital A in Fig. 12).

To illustrate this point vividly, suppose a rigorous statistical test rejects DEA-CRS in favor of DEA-VRS. A researcher then proceeds with DEA-VRS as the preferred approach and reports that hospital C is 100% efficient. A policymaker may rely on such results and think that other hospitals should be like the “100%-efficient” hospital C and approve policies that award and encourage hospitals like C and discourage (and perhaps penalize) others, including hospital B. However, a closer look at the matter reveals that the amount of inputs that hospital C is actually using should provide about twice as much output if two hospitals like B (even though inefficient relative to both CRS and VRS frontiers) are deployed instead of a “100%-efficient” hospital like C! How important is such a difference in the interpretations of efficiency results? Well, ultimately, providing twice as much output for a hospital may lead to a substantial reduction in pain and suffering of many more patients, or it may simply mean saving many more lives of patients in urgent need–they would receive health services due to greater output from the same inputs.

Now, suppose a researcher is able to obtain two more data points, hospitals F and G in Fig. 12, which were not available before. If the researcher re-estimates DEA for the updated data, the DEA-VRS results would change substantially (e.g., the output-oriented technical efficiency of hospital C and D decreases from 100% efficient to 92% and 80%, respectively), while the DEA-CRS results would not. Furthermore, suppose another data point–hospital H in Fig. 12–becomes available. In this case, the results change also for DEA-CRS, yet much less than the changes for DEA-VRS. This illustrates the relatively higher robustness (lower sensitivity) of DEA-CRS relative to DEA-VRS, which is often the case, although the opposite could happen too, emphasizing the importance of robustness checks and sensitivity analysis in practice.

Finally, it is also worth remembering that while the DEA and FDH (under certain assumptions) are consistent estimators, they are biased. The bias converges to zero asymptotically, yet the speed of convergence decreases with the dimension of the DEA-model (and the assumptions on returns to scale and convexity), which is the price one pays for going nonparametric. It is therefore desirable (yet so far appears to be still rare in the analyzed literature) to correct the bias, for both individual and aggregate efficiency scores, as well as present standard errors or estimated confidence intervals.^{Footnote 28} The parametric SFA approaches avoid some of these problems, yet at the expense of potentially high estimation errors due to misspecification of the imposed parametric assumptions.

All in all, this hypothetical example illustrates that the choice of a reference relative to which one shall benchmark efficiency should not be just about running a statistical test of whether VRS or CRS or FDH or any other frontier fits the data tighter. Rather, it should be a combination of positive and normative judgments that is incorporated into various modeling assumptions (orientation, variables, types of measures, etc.), some of which are chosen to be tested or checked for sensitivity/robustness of results, while others are chosen to stay as maintained assumptions. What is paramount is to make the judgment choices coherent with policy goals (e.g., striving for social optimum or individual profits?) and transparent to the readers.^{Footnote 29} Of course, we acknowledge and respect that there could be different views on the mentioned issues and we welcome an open discussion on them.

7 Concluding remarks

In this article, we systematically reviewed the published research about hospital-level efficiency with a focus on Australia and its peers, the UK, Canada, New Zealand, and Hong Kong from 1970 to 2023. In particular, we conducted a series of Boolean searches in Scopus to select a pool of published journal articles and then manually reviewed the selected articles to construct a pool for each country and region. Prior to a detailed review of the selected research, bibliometric analytic techniques were deployed to explore the field.

For each country, we identified distinct groups of productive authors as depicted in the Sankey plots. As illustrated by the word clouds, noticeable divergence was revealed in the most concerned keywords for each country and region, for instance, “cost benefit analysis” and “length of stay” in Australian research, “DEA” in Canadian studies, and “public health” in New Zealand studies. According to the analysis of local and global citations, the European Journal of Operational Research and Health Economics are the most cited sources by researchers in all our interested regions. Regarding the most cited authors, the researchers in Australia, New Zealand, and Canada show comparable preferences to the global top researchers (mostly in the US) and the local prominent authors of efficiency analysis in the healthcare sector. In contrast, UK researchers tend to focus more on domestic studies rather than global research. We also employed both co-occurrence network analysis and dynamic word cloud analysis and identified a shift in popular methods and research topics. We found that the most frequently applied methods in the reviewed regions changed from “cost benefit analysis” in the early years to “bootstrap” and “MPI” in the middle period, and ultimately to “efficiency frontier estimation”, “factor analysis”, and “total factor productivity” in more recent years. Moreover, we found that DEA has been applied over an extensive period, as reflected by its inclusion as a keyword in some highly cited journal articles. We also pointed out some caveats in interpretations of results and policy implications from efficiency studies, which may depend on the assumptions and methods chosen by the researchers. Hence, we emphasized the importance of carefully choosing the assumptions of the models, which must be coherent with the goals of measurement being transparently explained to the readers, and then carefully interpreting the achieved results and the subsequent corresponding policy implications. We also pointed out a relative lack of adaptation and development of the causal inference methods and modern machine learning techniques for the context of efficiency and productivity analysis, and that in our view, such endeavors promise to pave a fruitful avenue for future research. The promotion of Bayesian methodologies within the realm of hospital efficiency analysis is also advocated, with a particular emphasis on addressing inherent challenges such as endogeneity and stringent assumptions commonly faced by the prevalent DEA and SFA estimators. For example, a recent Bayesian-type solution advanced by Tsionas et al. (2023) consolidates the benefits of DEA and SFA, providing considerable flexibility in both functional forms and distributional assumptions. Based on Bayesian artificial neural networks, the approach can also effectively tackle issues such as endogeneity and determinants of inefficiency. Table A1 provides a summary and conclusions of the reviewed studies.

One of the limitations of our review is that the quantity of the target research in the paper pool is still relatively small for more advanced machine learning analysis and visualization, although it is relatively large for manual analysis. In contrast to the large and increasing number of publications regarding efficiency analysis of medical techniques and therapies, the efficiency analysis of hospitals is relatively sparse. Another constraint is that the gray literature, though potentially presenting some fruitful conclusions (albeit potentially with a lower quality than articles published in refereed journals), is not included in the deeper analysis due to a lack of bibliometric information. Moreover, a possible improvement in future studies could be to evaluate the citations with a deeper perspective when conducting the co-occurrence analysis on keywords or investigating the reasons for high citation rates. All in all, we hope this study will provide a valuable stepping stone for studies and practices aiming to improve the performance of hospitals in Australia and its peers as well as other countries.

Notes

If we search the keywords “efficiency” and “hospital” in Google Scholar, the results were around 705,000 rows dating from 1979 to 1999, but increased to 2,300,000 between 2000 and 2023.
E.g., for various cross-country comparisons and rankings of healthcare systems, see Greene (2004); Kumbhakar (2010); WHO (2000) and more recently Schneider et al. (2021).
See also Kiadaliri et al. (2013) for a systematic review of hospitals in Iran with a further discussion of the influence of methodological deficiencies on the efficiency estimates.
See also systematic reviews of efficiency analysis in related healthcare sectors, e.g., Pelone et al. (2015) for primary care and Tran et al. (2019) for nursing home.
In cases, a toponym may not be indicated in the title or keywords, not even in the abstract. Hence, the “location” terms are also searched in the affiliation of authors to refine the selection of relevant studies.
Scopus search was performed on January 20, 2024.
In fact, DEA is observed to recur across several periods, which implies that if a term retains popularity throughout the time, the average score of the publication year may indicate a concentration of usage during the middle period.
Due to the limited sample size, the Sankey diagram regarding the literature of Hong Kong is not informative and is hence omitted in this section.
See e.g., Burgess and Wilson (1996); Färe et al. (1992); Zuckerman et al. (1994) and the summarization in O’Neill et al. (2008).
TFP = Q_y/Q_x, where Q_y and Q_x are the aggregated output and input, respectively.
See Sickles and Zelenyuk (2019) for a textbook-style discussion of DEA (Chap. 8–10) and SFA (Chap. 11–13) and many references therein.
E.g., see more discussion of the statistical properties of DEA in Kneip et al. (2015); Simar and Zelenyuk (2018).
As illustrated in Fig. 7, most related studies were published in the last two decades, where a delay of years exists in contrast to the literature for the US, for which the seminal works were conducted in the 1980s (e.g., Grosskopf and Valdmanis, 1987; Nunamaker, 1983; Sherman, 1984, and see See et al. 2024 for further discussion).
It might be worth noting that this is in contrast to the general literature covering all countries, where DEA appears to be more popular (e.g., See et al. 2024).
Guo et al. (2017) proposed a generalized and slacks-based non-oriented DEA (increasing/decreasing returns to scale) to the study of Hong Kong hospitals, and utilized MPI and Tobit regression for further analysis of the productivity change.
The estimated densities of MPI (with DEA) and its decomposition of the model using case-mix as a separate output were not significantly different from the results of the model without the case-mix variable.
Due to the lack of data to measure TFP of hospitals, the Productivity Commission examined the partial productivity measures by quantifying the output per unit of a single input, i.e., case-mix adjusted episodes per staff and patient days per bed (Productivity Commission, 2009).
In this study, some public hospitals were re-classified as public contract hospitals, which are managed or owned by a non-government entity, but are established under legislation or are contracted by the government to provide public hospital services (Forbes et al. 2010; Productivity Commission, 2010).
For an earlier attempt, see Simar and Wilson (2001), while currently the most rigorous is the test proposed by Daraio et al. (2018).
See a recent overview of the causal inference with a more general focus on the healthcare sector by Cunningham and Seward (2024).
See more detailed discussion in Athey and Imbens (2019); Mullainathan and Spiess (2017).
In consideration of conciseness, we present representative policy implications in Table 6.
See Valdmanis et al. (2024) for a comprehensive discussion of the DEA studies and the corresponding hospital policies in the US addressing these aspects.
Statistical tests may provide an alternative approach to assist in the decision of model structures, e.g., the tests proposed by Simar and Wilson (2002) and Kneip et al. (2016) with central limit theorems.
Indeed, this type of argument (that some conditional variables were omitted in the frontier estimation hence some inefficiency is observed) has often been used by some researchers to place doubts on the relevance of efficiency analysis as such.
Clearly, an adaptation and development of such methods of causal inference for this context would be very useful and, to our knowledge, some work is apparently progressing in this direction (e.g., Nguyen et al. 2022).
See Zelenyuk (2023) for more detailed explanations regarding approximate convexity and approximate CRS.
E.g., for the bias and variance estimation and related confidence intervals, see Kneip et al. (2008), the review by Simar and Wilson (2015), and the more recent Kneip et al. (2015); Simar and Zelenyuk (2018).
See Nguyen and Zelenyuk (2021a) for a related discussion on this topic.

References

Abadie A, Gardeazabal J (2003) The economic costs of conflict: a case study of the Basque Country. Am Econ Rev 93:113–132
Article Google Scholar
Abeney A, Yu K (2015) Measuring the efficiency of the Canadian health care system. Can Public Policy 41:320–331
Article Google Scholar
Aigner D, Lovell C, Schmidt P (1977) Formulation and estimation of stochastic frontier production function models. J Econ 6:21–37
Article Google Scholar
Ali M, Salehnejad R, Mansur M (2019) Hospital productivity: the role of efficiency drivers. Int J Health Plan Manag 34:806–823
Article Google Scholar
Andrews A (2020a) Investigating technical efficiency and its determinants: case of New Zealand District Health Boards. Health Policy Technol 9:323–334
Article Google Scholar
Andrews A (2020b) The efficiency of New Zealand District Health Boards in administrating public funds: an application of bootstrap DEA and beta regression. Int J Public Admin pp 1–12
Andrews A (2021) An application of PCA-DEA with the double-bootstrap approach to estimate the technical efficiency of New Zealand District Health Boards. Health Econ, Policy Law 17:175–199
Article Google Scholar
Andrews A, Emvalomatis G (2023) What drives the productivity growth of New Zealand district health boards: Technology, efficiency, or scale? NZ Econ Pap 57:214–228
Google Scholar
Andrews A, Sahoo BK, Temoso O, Kimpton S (2022) Quality-efficiency trade-off when the state is the sole provider of hospital services: Evidence from New Zealand. Aust Econ Pap 62:335–348
Article Google Scholar
Angrist JD, Lavy V (1999) Using Maimonides’ rule to estimate the effect of class size on scholastic achievement. Q J Econ 114:533–575
Article Google Scholar
Aragón AMJ, Castelli A, Chalkley M, Gaughan J (2019) Can productivity growth measures identify best performing hospitals? Evidence from the English National Health Service. Health Econ 28:364–372
Article Google Scholar
Aria M, Cuccurullo C (2017) bibliometrix: an R-tool for comprehensive science mapping analysis. J Informetr 11:959–975
Article Google Scholar
Athey S, Imbens GW (2019) Machine learning methods that economists should know about. Annu Rev Econ 11:685–725
Article Google Scholar
Australian Institute of Health and Welfare Health expenditure Australia 2021-22 (2022) Health and welfare expenditure series Number 93. https://www.aihw.gov.au/reports/health-welfare-expenditure/health-expenditure-australia-2021-22/contents/about (Accessed April 2024)
Bastian M, Heymann S, Jacomy M (2009) Gephi: an open source software for exploring and manipulating networks. In Proceedings of the third international AAAI conference on weblogs and social media. 3, 361–362
Batagelj V, Mrvar A (1998) Pajek-program for large network analysis. Connections 21:47–57
Google Scholar
Battese GE, Coelli TJ (1995) A model for technical inefficiency effects in a stochastic frontier production function for panel data. Empir Econ 20:325–332
Article Google Scholar
Black SE (1999) Do better schools matter? Parental valuation of elementary education. Q J Econ 114:577–599
Article Google Scholar
Bogomolov T, Filar J, Luscombe R, Nazarathy Y, Qin S, Swierkowski P et al (2017) Size does matter: a simulation study of hospital size and operational efficiency. In Proceedings 22nd international congress on modelling and simulation, MODSIM 2017, pages 1274–1280. Modelling and Simulation Society of Australia and New Zealand Inc. (MSSANZ)
Bojke C, Castelli A, Street A, Ward P, Laudicella M (2013) Regional variation in the productivity of the English National Health Service. Health Econ 22:194–211
Article Google Scholar
Braithwaite J, Westbrook MT, Hindle D, Iedema RA, Black DA (2006) Does restructuring hospitals result in greater efficiency?–An empirical test using diachronic data. Health Serv Manag Res 19:1–12
Article Google Scholar
Breiman L (1996a) Bagging predictors. Mach Learn 24:123–140
Article Google Scholar
Breiman L (1996b) Heuristics of instability and stabilization in model selection. Ann Stat 24:2350–2383
Article Google Scholar
Breiman L, Friedman J, Stone CJ, Olshen RA (1984)Classification and regression trees. CRC Press
Breyer JZ, Giacomazzi J, Kuhmmer R, Lima KM, Hammes LS, Ribeiro RA, Kops NL, Falavigna M, Wendland EM (2019) Hospital quality indicators: a systematic review. Int J Health Care Qual Assur 32:474–487
Article Google Scholar
Burgess JF, Wilson PW (1996) Hospital ownership and technical inefficiency. Manag Sci 42:110–123
Article Google Scholar
Callon M, Courtial J-P, Turner WA, Bauin S (1983) From translations to problematic networks: an introduction to co-word analysis. Information (Int Soc Sci Counc) 22:191–235
Article Google Scholar
Card D (1990) The impact of the Mariel boatlift on the Miami labor market. Ind Labor Relat Rev 43:245–257
Article Google Scholar
Card D, Krueger AB (1994) Minimum wages and employment: a case study of the fast food industry in New Jersey and Pennsylvania. Am Econ Rev 84:772–793
Google Scholar
Castelli A, Laudicella M, Street A, Ward P (2011) Getting out what we put in: productivity of the English National Health Service. Health Econ Policy Law 6:313–335
Article Google Scholar
Chang W, Cheng J, Allaire J, Sievert C, Schloerke B, Xie Y et al (2023) shiny: Web application framework for R. R package version 1.7.4.9002, https://shiny.rstudio.com/ (Accessed April 2024)
Charnes A, Cooper W, Rhodes E (1978) Measuring the efficiency of decision making units. Eur J Oper Res 2:429–444
Article Google Scholar
Chen C (2006) CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. J Am Soc Inf Sci Technol 57:359–377
Article Google Scholar
Chen Y, Tsionas MG, Zelenyuk V (2021) LASSO+DEA for small and big wide data. Omega 102:102419
Article Google Scholar
Cheng CC, Scott A, Sundararajan V, Yan W, Yong J (2020) An examination of public hospital productivity and its persistence: an index number approach. Aust Econ Rev 53:343–359
Article Google Scholar
Choi H-D, Oh D-H (2019) The importance of research teams with diverse backgrounds: research collaboration in the Journal of Productivity Analysis. J Product Anal 53:5–19
Article Google Scholar
Chowdhury H, Zelenyuk V (2016) Performance of hospital services in Ontario: DEA with truncated regression approach. Omega 63:111–122
Article Google Scholar
Chowdhury H, Wodchis W, Laporte A (2011) Efficiency and technological change in health care services in Ontario. Int J Product Perform Manag 60:721–745
Article Google Scholar
Chowdhury H, Zelenyuk V, Laporte A, Wodchis WP (2014) Analysis of productivity, efficiency and technological changes in hospital services in Ontario: how does case-mix matter? Int J Prod Econ 150:74–82
Article Google Scholar
Chua CL, Palangkaraya A, Yong J (2010) A two-stage estimation of hospital quality using mortality outcome measures: an application using hospital administrative data. Health Econ 19:1404–1424
Article Google Scholar
Chua CL, Palangkaraya A, Yong J (2011) Hospital competition, technical efficiency and quality. Econ Rec 87:252–268
Article Google Scholar
Cobo MJ, López-Herrera AG, Herrera-Viedma E, Herrera F (2012) SciMAT: a new science mapping analysis software tool. J Am Soc Inf Sci Technol 63:1609–1630
Article Google Scholar
Commonwealth of Australia Budget Strategy and Outlook (2023) Budget paper No. 1, 2023-24, Commonwealth of Australia. https://budget.gov.au/content/bp1/download/bp1_2023-24_230727.pdf (Accessed Jan 2024)
Cunningham S, Seward J (2024) A brief introduction to causal inference in health care. In Grosskopf S, Valdmanis V, Zelenyuk V (eds), The Cambridge handbook of productivity, efficiency and effectiveness in health care.
Daraio C, Simar L, Wilson PW (2018) Central limit theorems for conditional efficiency measures and tests of the ‘separability’ condition in nonparametric, two-stage models of production. Econ J 21:170–191
Google Scholar
Davis P, Milne B, Parker K, Hider P, Lay-Yee R, Cumming J, Graham P (2013) Efficiency, effectiveness, equity (E3). Evaluating hospital performance in three dimensions. Health Policy 112:19–27
Article Google Scholar
De Vos M, Graafmans W, Kooistra M, Meijboom B, Van Der Voort P, Westert G (2009) Using quality indicators to improve hospital care: a review of the literature. Int J Qual Health Care 21:119–129
Article Google Scholar
Deng Z, Jiang N, Pang R (2019) Factor-analysis-based directional distance function: the case of New Zealand hospitals. Omega 98:318–334
Google Scholar
Eckermann S, Coelli T (2013) Including quality attributes in efficiency measures consistent with net benefit: creating incentives for evidence-based medicine in practice. Soc Sci Med 76:159–168
Article Google Scholar
Färe R, Grosskopf S, Lindgren B, Roos P (1992) Productivity changes in Swedish pharmacies 1980–1989: a non-parametric Malmquist approach. J Prod Anal 3:85–101
Article Google Scholar
Färe R, He X, Li S, Zelenyuk V (2019) A unifying framework for Farrell profit efficiency measurement. Oper Res 67:183–197
Article Google Scholar
Farrell MJ (1957) The measurement of productive efficiency. J R Stat Soc Ser A (Gen) 120:253–281
Article Google Scholar
Ferrier GD, Valdmanis V (1996) Rural hospital performance and its correlates. J Prod Anal 7:63–80
Article Google Scholar
Fixler T, Paradi JC, Yang X (2014) A data envelopment analysis approach for measuring the efficiency of Canadian acute care hospitals. Health Serv Manag Res 27:57–69
Article Google Scholar
Forbes M, Harslett P, Mastoris I, Risse L (2010) Quality of care in Australian public and private hospitals. In Proc Australian Conference of Health Economists, Sydney, 14.
Froehlich P (2015) Interactive sankey diagrams. In IEEE symposium on information visualization, p 233.
Gabbitas O, Jeffs C (2009) Assessing productivity in the delivery of public hospital services in Australia: some experimental estimates. Productivity Commission conference paper, Productivity Commission
Garfield E (2009) From the science of science to Scientometrics visualizing the history of science with HistCite software. J Informetr 3:173–179
Article Google Scholar
Giuffrida A (1999) Productivity and efficiency changes in primary care: a Malmquist index approach. Health Care Manag Sci 2:11–26
Article Google Scholar
Greene WH (2004) Distinguishing between heterogeneity and inefficiency: stochastic frontier analysis of the World Health Organization’s panel data on national health care systems. Health Econ 13:959–980
Article Google Scholar
Grosskopf S, Valdmanis V (1987) Measuring hospital performance: a non-parametric approach. J Health Econ 6:89–107
Article Google Scholar
Guo H, Zhao Y, Niu T, Tsui K-L (2017) Hong Kong Hospital Authority resource efficiency evaluation: via a novel DEA-Malmquist model and Tobit regression model. PloS One 12:e0184211
Article Google Scholar
Hadji B, Meyer R, Melikeche S, Escalon S, Degoulet P (2014) Assessing the relationships between hospital resources and activities: a systematic review. J Med Syst 38:1–21
Article Google Scholar
Hanning BW (2007) Length of stay benchmarking in the Australian private hospital sector. Aust Health Rev 31:150–158
Article Google Scholar
Herr A (2008) Cost and technical efficiency of German hospitals: does ownership matter? Health Econ 17:1057–1071
Article Google Scholar
Hollingsworth B (2003) Non-parametric and parametric applications measuring efficiency in health care. Health Care Manag Sci 6:203–218
Article Google Scholar
Hollingsworth B (2008) The measurement of efficiency and productivity of health care delivery. Health Econ 17:1107–1128
Article Google Scholar
Hollingsworth B, Parkin D (1995) The efficiency of Scottish acute hospitals: an application of data envelopment analysis. Math Med Biol: A J IMA 12:161–173
Article Google Scholar
Hollingsworth B, Wildman J (2003) The efficiency of health production: re-estimating the WHO panel data using parametric and non-parametric approaches to provide additional information. Health Econ 12:493–504
Article Google Scholar
Hollingsworth B, Dawson P, Maniadakis N (1999) Efficiency measurement of health care: a review of non-parametric methods and applications. Health Care Manag Sci 2:161–172
Article Google Scholar
Hussey PS, De Vries H, Romley J, Wang MC, Chen SS, Shekelle PG, McGlynn EA (2009) A systematic review of health care efficiency measures. Health Serv Res 44:784–805
Article Google Scholar
Jacobs R (2001) Alternative methods to examine hospital efficiency: data envelopment analysis and stochastic frontier analysis. Health Care Manag Sci 4:103–115
Article Google Scholar
Jiang N, Andrews A (2020) Efficiency of New Zealand’s District Health Boards at providing hospital services: a stochastic frontier analysis. J Prod Anal 53:53–68
Article Google Scholar
Kiadaliri AA, Jafari M, Gerdtham U-G (2013) Frontier-based techniques in measuring hospital efficiency in Iran: a systematic review and meta-regression analysis. BMC Health Serv Res 13:1–11
Article Google Scholar
Kneip A, Simar L, Wilson PW (2008) Asymptotics and consistent bootstraps for DEA estimators in nonparametric frontier models. Econ Theory 24:1663–1697
Article Google Scholar
Kneip A, Simar L, Wilson PW (2015) When bias kills the variance: central limit theorems for DEA and FDH efficiency scores. Econ Theory 31:394
Article Google Scholar
Kneip A, Simar L, Wilson PW (2016) Testing hypotheses in nonparametric models of production. J Bus Econ Stat 34:435–456
Article Google Scholar
Kumbhakar SC (2010) Efficiency and productivity of world health systems: where does your country stand? Appl Econ 42:1641–1659
Article Google Scholar
Kumbhakar SC, Ghosh S, McGuckin JT (1991) A generalized production frontier approach for estimating determinants of inefficiency in US dairy farms. J Bus Econ Stat 9:279–286
Article Google Scholar
Kuosmanen T, Johnson A (2017) Modeling joint production of multiple outputs in StoNED: directional distance function approach. Eur J Oper Res 262:792–801
Article Google Scholar
Lavers RJ, Whynes DK (1978) A production function analysis of English maternity hospitals. Socio-Econ Plan Sci 12:85–93
Article Google Scholar
Lee C-Y, Cai J-Y (2020) Lasso variable selection in data envelopment analysis with small datasets. Omega 91:102019
Article Google Scholar
Li Y, Lei X, Morton A (2019) Performance evaluation of nonhomogeneous hospitals: the case of Hong Kong hospitals. Health Care Manag Sci 22:215–228
Article Google Scholar
Lin S, Rouse P, Wang Y-M, Lin L, Zheng Z-Q (2023) Performance measurement of nonhomogeneous Hong Kong hospitals using directional distance functions. Health Care Manag Sci 26:330–343
Article Google Scholar
Linna M (1998) Measuring hospital cost efficiency with panel data models. Health Econ 7:415–427
Article Google Scholar
Linnenluecke MK, Marrone M, Singh AK (2020) Conducting systematic literature reviews and bibliometric analyses. Aust J Manag 45:175–194
Article Google Scholar
Liu C, Laporte A, Ferguson BS (2008) The quantile regression approach to efficiency measurement: Insights from Monte Carlo simulations. Health Econ 17:1073–1087
Article Google Scholar
Longo F, Siciliani L, Moscelli G, Gravelle H (2019) Does hospital competition improve efficiency? The effect of the patient choice reform in England. Health Econ 28:618–640
Article Google Scholar
McCallion G, Glass JC, Jackson R, Kerr CA, McKillop DG (2000) Investigating productivity change and hospital size: a nonparametric frontier approach. Appl Econ 32:161–174
Article Google Scholar
McGhee S, Leung G, Hedley A (2001) Efficiency is dependent on the control of supply. Hong Kong Med J 7:169–173
Google Scholar
McGlynn EA, Shekelle PG, Chen S, Goldman DP, Romley JA, Hussey PS et al (2008) Identifying, categorizing, and evaluating health care efficiency measures. Report for agency for healthcare research and quality, RAND Corporation, Santa Monica, CA. https://library.ahima.org/PdfView?oid=81708 (Accessed March 2023)
Meeusen W, van den Broeck J (1977) Efficiency estimation from Cobb-Douglas production functions with composed error. Int Econ Rev 18:435–444
Article Google Scholar
Milliken O, Devlin RA, Barham V, Hogg W, Dahrouge S, Russell G (2011) Comparative efficiency assessment of primary care service delivery models using data envelopment analysis. Can Public Policy 37:85–109
Article Google Scholar
Mullainathan S, Spiess J (2017) Machine learning: an applied econometric approach. J Econ Perspect 31:87–106
Article Google Scholar
Nepomuceno TCC, Piubello Orsini L, de Carvalho VDH, Poleto T, Leardini C (2022) The core of healthcare efficiency: a comprehensive bibliometric review on frontier analysis of hospitals. Healthcare 10:1316
Article Google Scholar
Nghiem S, Coelli T, Barber S (2011) Sources of productivity growth in health services: a case study of Queensland public hospitals. Econ Anal Policy 41:37–48
Article Google Scholar
Nguyen BH, Zelenyuk V (2021a) Aggregate efficiency of industry and its groups: the case of Queensland public hospitals. Empir Econ 60:2795–2836
Article Google Scholar
Nguyen BH, Zelenyuk V (2021b) Robust efficiency analysis of public hospitals in Queensland, Australia. In A Daouia and A Ruiz-Gazen, editors, Advances in Contemporary Statistics and Econometrics, pages 221–242. Springer
Nguyen BH, Grosskopf S, Yong J, Zelenyuk V (2022) Activity-based funding reform and the performance of public hospitals: The case of Queensland, Australia. CEPA Working Paper Series No. WP08/2022, School of Economics, University of Queensland
Nguyen HN, O’Donnell C (2023) Estimating the cost efficiency of public service providers in the presence of demand uncertainty. Eur J Oper Res 309:1334–1348
Article Google Scholar
Nieminen P, Pölönen I, Sipola T (2013) Research literature clustering using diffusion maps. J Informetr 7:874–886
Article Google Scholar
Nunamaker TR (1983) Measuring routine nursing service efficiency: a comparison of cost per patient day and Data Envelopment Analysis models. Health Serv Res 18:183
Google Scholar
NWB Team (2006) Network workbench tool. Indiana University, Northeastern University, and University of Michigan
O’Donnell C, Nguyen K (2013) An econometric approach to estimating support prices and measures of productivity change in public hospitals. J Product Anal 40:323–335
Article Google Scholar
OECD (2023) OECD statistics. https://stats.oecd.org/Index.aspx?ThemeTreeId=9#. (Accessed April 2024)
O’Neill L, Rauner M, Heidenberger K, Kraus M (2008) A cross-national comparison and taxonomy of DEA-based hospital efficiency studies. Socio-Econ Plan Sci 42:158–189
Article Google Scholar
Ouellette P, Vierstraete V (2004) Technological change and efficiency in the presence of quasi-fixed inputs: a DEA application to the hospital sector. Eur J Oper Res 154:755–763
Article Google Scholar
Paez A (2017) Gray literature: an important resource in systematic reviews. J Evid-Based Med 10:233–240
Article Google Scholar
Parkin D, Hollingsworth B (1997) Measuring production efficiency of acute hospitals in Scotland, 1991–94: validity issues in data envelopment analysis. Appl Econ 29:1425–1433
Article Google Scholar
Parmeter CF, Zelenyuk V (2019) Combining the virtues of stochastic frontier and data envelopment analysis. Oper Res 67:1628–1658
Article Google Scholar
Paul CJM (2002) Productive structure and efficiency of public hospitals. In Fox KJ (ed) Efficiency in the public sector, pp 219–248. Springer
Pelone F, Kringos DS, Romaniello A, Archibugi M, Salsiri C, Ricciardi W (2015) Primary care efficiency measurement using data envelopment analysis: a systematic review. J Med Syst 39:1–14
Article Google Scholar
Productivity Commission Public and Private Hospitals (2009) Research report, Canberra, Australia
Productivity Commission Public and Private Hospitals: multivariate analysis (2010) Supplement to Research Report, Canberra, Australia
R Core Team (2023) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria
Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65:386–408
Article Google Scholar
Rosko MD (2001) Cost efficiency of US hospitals: a stochastic frontier approach. Health Econ 10:539–551
Article Google Scholar
Rosko MD (2004) Performance of US teaching hospitals: a panel analysis of cost inefficiency. Health Care Manag Sci 7:7–16
Article Google Scholar
Rosko MD, Mutter RL (2008) Stochastic frontier analysis of hospital inefficiency: a review of empirical issues and an assessment of robustness. Med Care Res Rev 65:131–166
Article Google Scholar
Rosko MD, Mutter RL (2011) What have we learned from the application of stochastic frontier analysis to US hospitals? Med Care Res Rev 68:75S–100S
Article Google Scholar
Rouse P, Harrison J, Turner N (2011) Cost and performance: complements for improvement. J Med Syst 35:1063–1074
Article Google Scholar
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536
Article Google Scholar
Sandiford P, Consuelo DV, Rouse P, Bramley D (2018) The trade-off between equity and efficiency in population health gain: making it real. Soc Sci Med 212:136–144
Article Google Scholar
Schneider EC, Shah A, Doty MM, Tikkanen R, Fields K Williams II, RD (2021) Mirror, mirror 2021: reflecting poorly: health care in the US compared to other high-income countries. Fund reports, The Commonwealth Fund. 2021. https://www.commonwealthfund.org/publications/fund-reports/2021/aug/mirror-mirror-2021-reflecting-poorly (Accessed March 2023)
Sci2 Team (2009) Science of science (Sci2) tool. https://sci2.cns.iu.edu/user/index.php (Accessed Mar 2023)
See, K., Grosskopf, S., Valdmanis, V., Zelenyuk, V., What do we know from the vast literature on efficiency and productivity in healthcare? A review and bibliometric analysis. In Grosskopf, S., Valdmanis, V. Zelenyuk, V. editors, The Cambridge handbook of productivity, efficiency and effectiveness in health care. Cambridge, UK: Cambridge University Press (2024).
Sherman HD (1984) Hospital efficiency measurement and evaluation: Empirical test of a new technique. Med Care 22:922–938
Article Google Scholar
Sickles RC, Zelenyuk V (2019) Measurement of productivity and efficiency: theory and practice. New York, NY: Cambridge University Press
Sickles RC, Wang Z, Zelenyuk V (2024) Stochastic frontier analysis for healthcare, with illustrations in R. In Grosskopf S, Valdmanis V and Zelenyuk V (eds) The Cambridge handbook of productivity, efficiency and effectiveness in health care. Cambridge, UK: Cambridge University Press
Simar L, Wilson PW (2001) Testing restrictions in nonparametric efficiency models. Commun Stat-Simul Comput 30:159–184
Article Google Scholar
Simar L, Wilson PW (2002) Non-parametric tests of returns to scale. Eur J Oper Res 139:115–132
Article Google Scholar
Simar L, Wilson PW (2007) Estimation and inference in two-stage, semi-parametric models of production processes. J Econ 136:31–64
Article Google Scholar
Simar L, Wilson PW (2011) Two-stage DEA: caveat emptor. J Product Anal 36:205–218
Article Google Scholar
Simar L, Wilson PW (2015) Statistical approaches for nonparametric frontier models: a guided tour. Int Stat Rev 83:77–110
Article Google Scholar
Simar L, Zelenyuk V (2007) Statistical inference for aggregates of Farrell-type efficiencies. J Appl Econ 22:1367–1394
Article Google Scholar
Simar L, Zelenyuk V (2011) Stochastic FDH/DEA estimators for frontier analysis. J Product Anal 36:1–20
Article Google Scholar
Simar L, Zelenyuk V (2018) Central limit theorems for aggregate efficiency. Oper Res 66:137–149
Article Google Scholar
Small H (1973) Co-citation in the scientific literature: a new measure of the relationship between two documents. J Am Soc Inf Sci 24:265–269
Article Google Scholar
Street A (2003) How much confidence should we place in efficiency estimates? Health Econ 12:895–907
Article Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B (Methodol) 58:267–288
Article Google Scholar
Tran A, Nguyen K-H, Gray L, Comans T (2019) A systematic literature review of efficiency measurement in nursing homes. Int J Environ Res Public Health 16:2186
Article Google Scholar
Tranfield D, Denyer D, Smart P (2003) Towards a methodology for developing evidence-informed management knowledge by means of systematic review. Br J Manag 14:207–222
Google Scholar
Tsionas M, Parmeter CF, Zelenyuk V (2023) Bayesian artificial neural networks for frontier efficiency analysis. J Econ 236:105491
Article Google Scholar
Valdmanis V, Rosko M, Mancuso P, Tavakoli M, Farrar S (2017) Measuring performance change in Scottish hospitals: a Malmquist and times-series approach. Health Serv Outcomes Res Methodol 17:113–126
Article Google Scholar
Valdmanis V, Grosskopf S, Zelenyuk V, Ferrier G (2024) Data envelopment analysis applications to US hospital policy. In Grosskopf S, Valdmanis V, Zelenyuk, editors, The Cambridge handbook of productivity, efficiency and effectiveness in health care
Van Eck NJ, Waltman L (2010) Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics 84:523–538
Article Google Scholar
Varabyova Y, Schreyögg J (2013) International comparisons of the technical efficiency of the hospital sector: Panel data analysis of OECD countries using parametric and non-parametric approaches. Health Policy 112:70–79
Article Google Scholar
Vitikainen K, Linna M, Street A (2010) Substituting inpatient for outpatient care: What is the impact on hospital costs and efficiency? Eur J Health Econ 11:395–404
Article Google Scholar
Wang J, Zhao Z, Mahmood A (2006) Relative efficiency, scale effect, and scope effect of public hospitals: Evidence from Australia. IZA Discussion Papers
Wang L, Grignon M, Perry S, Chen X-K, Ytsma A, Allin S, Gapanenko K (2018) The determinants of the technical efficiency of acute inpatient care in Canada. Health Serv Res 53:4829–4847
Article Google Scholar
Wang Z, Zelenyuk V (2024) Random versus explained inefficiency in stochastic frontier analysis: the case of Queensland hospitals. In Parmeter CF, Tsionas M, and Wang H-J (eds), Advances in econometrics: essays in honor of Subal Kumbhakar, volume 46, pp 371–413. Emerald Publishing Limited
WHO (2000) The World Health Report 2000: Health systems: improving performance. World Health Organization
Worthington AC (2004) Frontier efficiency measurement in health care: A review of empirical techniques and selected applications. Med Care Res Rev 61:135–170
Article Google Scholar
Yong K, Harris AH (1999) Efficiency of hospitals in Victoria under casemix funding: a stochastic frontier approach. Number 92. Centre for Health Program Evaluation.
Zelenyuk V (2023) Aggregation in efficiency and productivity analysis: A brief review with new insights and justifications for constant returns to scale. J Prod Anal, pp 1–14
Zuckerman S, Hadley J, Iezzoni L (1994) Measuring hospital efficiency with frontier cost functions. J Health Econ 13:255–280
Article Google Scholar

Download references

Acknowledgements

The authors acknowledge the financial support from the Australian Research Council (from the ARC Future Fellowship grant FT170100401) and the support from their institution. We thank Jongsay Yong, Kok Fong See, Xinju He, David Du, Hong Ngoc Nguyen, and Evelyn Smart for their fruitful comments. We are also thankful for the feedback from other colleagues and participants of workshops and seminars where this paper was presented. These individuals and organizations are not responsible for the views expressed in this paper.

Funding

Open Access funding enabled and organized by CAUL and its Member Institutions.

Author information

Authors and Affiliations

Centre for Health Services Research, Faculty of Medicine, The University of Queensland, Brisbane, QLD, 4006, Australia
Zhichao Wang
Centre for the Economics and Business of Health, The University of Queensland, Brisbane, QLD, 4072, Australia
Bao Hoang Nguyen
School of Economics and Centre for Efficiency and Productivity Analysis, The University of Queensland, Brisbane, QLD, 4072, Australia
Valentin Zelenyuk

Authors

Zhichao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Bao Hoang Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Valentin Zelenyuk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhichao Wang.

Ethics declarations

Conflict of interests

VZ acknowledges the financial support from the Australian Research Council (the ARC Future Fellowship grant FT170100401). The authors have no financial or non-financial conflicts of interest related to this work.

Appendix: a summary and conclusions of the reviewed studies

Table A1 Review studies with conclusions

Full size table

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, Z., Nguyen, B.H. & Zelenyuk, V. Performance analysis of hospitals in Australia and its peers: a systematic and critical review. J Prod Anal 62, 139–173 (2024). https://doi.org/10.1007/s11123-024-00729-z

Download citation

Accepted: 17 June 2024
Published: 29 June 2024
Issue Date: October 2024
DOI: https://doi.org/10.1007/s11123-024-00729-z

JEL classification

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Performance analysis of hospitals in Australia and its peers: a systematic and critical review

Abstract

Similar content being viewed by others

Hospital performance evaluation indicators: a scoping review

A method for measuring individual research productivity in hospitals: development and feasibility

Productivity growth and quality changes of hospitals in Taiwan: does ownership matter?

1 Introduction

2 Key related works