The SPIRIT 2013 (The Standard Protocol Items: Recommendations for Interventional Trials) statemen... more The SPIRIT 2013 (The Standard Protocol Items: Recommendations for Interventional Trials) statement aims to improve the completeness of clinical trial protocol reporting, by providing evidence-based recommendations for the minimum set of items to be addressed. This guidance has been instrumental in promoting transparent evaluation of new interventions. More recently, there is a growing recognition that interventions involving artificial intelligence need to undergo rigorous, prospective evaluation to demonstrate their impact on health outcomes. The SPIRIT-AI extension is a new reporting guideline for clinical trials protocols evaluating interventions with an AI component. It was developed in parallel with its companion statement for trial reports: CONSORT-AI. Both guidelines were developed using a staged consensus process, involving a literature review and expert consultation to generate 26 candidate items, which were consulted on by an international multi-stakeholder group in a 2-st...
The CONSORT 2010 (Consolidated Standards of Reporting Trials) statement provides minimum guidelin... more The CONSORT 2010 (Consolidated Standards of Reporting Trials) statement provides minimum guidelines for reporting randomised trials. Its widespread use has been instrumental in ensuring transparency when evaluating new interventions. More recently, there has been a growing recognition that interventions involving artificial intelligence (AI) need to undergo rigorous, prospective evaluation to demonstrate impact on health outcomes. The CONSORT-AI extension is a new reporting guideline for clinical trials evaluating interventions with an AI component. It was developed in parallel with its companion statement for clinical trial protocols: SPIRIT-AI. Both guidelines were developed through a staged consensus process, involving a literature review and expert consultation to generate 29 candidate items, which were assessed by an international multi-stakeholder group in a two-stage Delphi survey (103 stakeholders), agreed on in a two-day consensus meeting (31 stakeholders) and refined throu...
Clearly presented and transparently reported statistical code is a sine qua non for reproducible ... more Clearly presented and transparently reported statistical code is a sine qua non for reproducible research. More than a decade ago, Annals began asking authors to report the availability of code that supported their statistical methods (1). Before that policy adoption, our statistical editors routinely requested and reviewed code underlying analyses of papers that we eventually published. Although we never formally graded submitted code, our experiences are similar to those reported by Assel and Vickers (2). We have found that authors increasingly apply complex statistical methods to account for such factors as correlated and repeated measures, missing data, incomplete adherence, and confounders on the causal pathway from exposure to outcomes. Reporting only that SAS version 9.2 (SAS Institute) was used for all analyses offers neither a hint of actual methods nor the model specification or structure. Naming a procedure or library within a statistical package usually explains little more. Authors have responded to our requests with computer code that frustrates interpretation except by the authorprogrammer, as if clear statistical code were not needed to understand or reproduce the science. We beg to differ. The goals of reproducible research in the biostatistics literature (3) apply equally to supporting statistical code in medical journals (4). Although journals and the International Committee of Medical Journal Editors (5) provide guidance for reporting statistical methods, journals rarely address statistical code. Except for simple and obvious statistical methods, we believe that authors should describe the scriptthat is, the combination of comments and code that outlines the analysis; any programs and functions that the script calls; and any postestimation commands that produce and output the estimates of interest. Statistical code translates raw data into interpretable information (6). Code must match the scientific question, reflect the estimators that answer the question (7), and disclose fundamental assumptions and choices. Goals of reproducible research apply equally to the scientific methods represented by statistical code (4, 6). Reproducible research requires a level of transparency that would permit a newcomer to the analysis to read the description, code, and output and be able to use the same or similar data to arrive at the same or similar results. Statistical code is not merely a grocery list of commands and algorithms. The code, as annotated, must reveal the exposures of interest, covariates, effect modifiers, and functional form of the equation behind the model. For example, if hierarchical models include random effects, readers need annotation that defines random intercepts and slopes and their covariances or correlations as well as the covariance pattern of error terms. If authors use splines, they should reveal the precise implementation. Communicating these features will require many lines of computer code statements and often complex function calls, together with adequate comments on each step's purpose. Recommendations on interpretable statistical code are not new, and the Table offers some simple principles. Forty years ago, in their treatise The Elements of Programming Style, Kernighan and Plauger noted, Write clearlydon't sacrifice clarity for efficiency' (8). In the 1980s, computer pioneer Knuth proposed a paradigm called literate programming to focus not on providing instructions to the computer but on explaining to readers what the authors are asking the computer to do (9). Common fourth-generation languages, including SAS, Stata (Stata Corp), SPSS (IBM), and R (R Project for Statistical Computing), offer full-screen, color-coded text editors with automatic syntax and error checking, with a user interface suitable for producing statistical programs appropriately annotated with transparent prose. Clarity wins readers; tweet-like brevity loses them. Subroutines and macros may be elegant but can degrade ease of interpretation. Pseudocode, a description of the formal program syntax to follow that is similar to English, can document both purpose and implementation, and focused statistical references in comments can point to methodological sources. Table. Guides for Preparing and Reporting Statistical Code Programs must leave audit trails. Some newer statistical software packages rely increasingly on point-and-click menus rather than code. Those who rely on menus should save a log of the code actually executed by those menus. Otherwise, no record of methods will exist and neither the authors nor the readers will be able to reconstruct the analysis. If transparent coding were entirely effortless, we would not be writing this editorial. Authors too often write code for the short termthe analysis and paper before them in the present moment. Programming tasks usually fall on junior team members unused to writing for outside comprehension, whereas senior investigators, vouching for the…
In traditional meta-analyses, researchers combine data from individual studies into a summary mea... more In traditional meta-analyses, researchers combine data from individual studies into a summary measure to describe the benefits or harms of an intervention. The pooled estimate is calculated by first estimating the treatment effect and 95% CI for each individual study. Each study's estimated treatment effect is then weighted, usually by its precision, to reflect the amount of information the study contains relative to the others. The pooled overall treatment effect is the weighted average of the individual treatment effect estimates. This installment of the Understanding Clinical Research series addresses issues (Table) that readers should consider when evaluating the meaning of a summary estimate and understanding to whom and under what circumstances it applies. Table. Key Questions for Evaluating the Meaning of a Summary Estimate in a Meta-analysis The Example A meta-analysis compared the benefits and harms of transcatheter aortic valve implantation (TAVI) with those of surgical aortic valve replacement (SAVR) among patients with severe aortic stenosis (1). The authors searched several databases and identified 5 randomized controlled trials and 31 observational studies that matched TAVI patients with SAVR patients. They examined all-cause mortality as the main outcome of interest and considered mortality data in several ways: by timing of the mortality assessment (early, midterm, and long-term), patient surgical risk (low, intermediate, and high), and study design (randomized trials and observational studies). Overall, pooled estimates suggested no major differences in early or midterm mortality but a possible increase in long-term mortality with TAVI versus SAVR. Pooled estimates derived from studies involving low- to intermediate-risk patients suggested a possible decrease in early mortality with TAVI versus SAVR. Pooled estimates derived from the randomized trials were inconclusive, with wide confidence bounds, whereas analyses of observational matched studies (for the definition of this and other terms used in the article, see Glossary) were similar to the overall results. Glossary Did the Groups of Studies Pooled Have Similar Patient Populations, Intervention and Comparison Strategies, Follow-up Durations, and Outcome Assessments? Understanding to whom and under what circumstances the pooled effect estimate applies is the first step in unraveling the meaning of a pooled treatment effect. It requires attention to several clinical and methodological issues. Critical issues are whether the authors' analyses are driven by specific clinical questions relevant to the use of an intervention and whether the grouped and pooled data are appropriate sources of evidence to address those questions. A first step in examining these clinical issues is to consider the following aspects of groups of studies: patient populations, intervention and comparison strategies, outcome assessments, and follow-up procedures. In this example, follow-up among individual studies varied by timing of assessments, with authors specifying early (30-day) and midterm (1-year) all-cause mortality as the 2 main outcomes of interest. The study populations varied by surgical risk. Most studies included patients at high surgical risk, for whom TAVI was already the recommended alternative to SAVR. Results from these studies drove the overall pooled effect size estimates for early (odds ratio [OR], 1.01 [95% CI, 0.81 to 1.26]) and midterm (OR, 0.96 [CI, 0.81 to 1.14]) mortality. Of note, the pooled estimate from studies enrolling only patients at low to intermediate surgical risk was different for the early mortality outcome (OR, 0.67 [CI, 0.42 to 1.07]) (Figure). Figure. Forest plots for all-cause mortality in the low- to intermediate-risk population. KnappHartung random-effects OR and 95% CI for 30-day (top), midterm (middle), and long-term (bottom) all-cause mortality in patients at low to intermediate risk. NOTION= Nordic Aortic Valve Intervention; OR= odds ratio; PARTNER= Placement of Aortic Transcatheter Valves; SAVR= surgical aortic valve replacement; TAVI= transcatheter aortic valve implantation. * Percentages do not sum to 100% for early all-cause mortality because of rounding. (Reprinted from Gargiulo and colleagues [1] with permission.). The intervention and comparison strategies in all studies were TAVI and SAVR, respectively. Given 2 TAVI approaches (transfemoral vs. transapical), the researchers also examined a potential interaction between the TAVI approach and early all-cause mortality, demonstrating that transfemoral (but not transapical) TAVI was more beneficial than SAVR. Variability in outcome assessment was not an issue in this review because the primary end point was all-cause mortality. Did the Design and Validity of Individual Studies Vary? Readers should note whether and how researchers considered the design and validity of individual studies. In this meta-analysis, researchers reviewed only randomized or observational…
The 2009 U.S. Preventive Services Task Force (USPSTF) recommendations on breast cancer screening ... more The 2009 U.S. Preventive Services Task Force (USPSTF) recommendations on breast cancer screening ignited a firestorm (1). Seven years later, the draft updated recommendations, which were available for public comment from 20 April to 18 May 2015, rekindled the fire (2). Sparks included full-page advertisements, likely costing up to a half-million dollars and appearing in such venues as The New York Times, USA Today, and The Washington Post, that asked, "Which of our mothers, wives, daughters, and sisters would it be OK to lose?" The named sponsors of the adBright Pink, the Black Women's Health Imperative, the National Medical Association, the National Hispanic Medical Association, Men Against Breast Cancer, and the Prevent Cancer Foundationurged readers to sign a petition to "stop the guidelines" (bit.ly/StopTheGuidelines). Flames, fueled by controversy about the grade C screening recommendation for women aged 40 to 49 years, spread to the halls of Congress. A convoluted law passed in the wake of the 2009 recommendations required private insurers to cover procedures for which the USPSTF issued grade A or B recommendations, except in the case of the 2009 recommendations for mammography. The exception, which was meant to ensure coverage for screening mammography for women aged 40 to 49 years (a grade C recommendation in 2009), was set to expire when the USPSTF issued new recommendations. The planned expiration was averted, however, when Congress passed an omnibus bill in December 2015 that included a rider that effectively extended the exception for screening mammography indefinitely (3). In this issue, Annals publishes the updated recommendations of the USPSTF (4). The USPSTF did a difficult job well, considering updated evidence reviews, a fuller panoply of potential harms, and tradeoffs of different screening strategies (59). The science led the USPSTF to conclude that the following recommendation (originally issued in 2009) still stands: Each average-risk woman between the ages of 40 and 49 years should make her own decision about whether to have a mammogram, based on her personal balancing of the benefits and harms of screening (a grade C recommendation). Although for many years the dogma was that women should have mammograms "once a year for a lifetime" starting at age 40 years, current evidence shows that the balance of risks and benefits of screening, particularly among women in their 40s, warrants more nuanced decision making. Potential harms of overdiagnosis and overtreatment of lesions with little progressive potential and harms of false-positive screening results with unnecessary biopsies and multiple repeated examinations must be considered (10, 11). The potential benefits of preventing breast cancer deaths are real, but the likelihood of those benefits is small and no definitive evidence shows that screening reduces total mortality. The USPSTF grades its recommendations to reflect the evidence about the benefits and harms of the health care intervention. When the net benefits of a health care intervention for a specific patient group are clear, the USPSTF strongly recommends it for patients in that group (grade A or B recommendation). When the balance of risks and benefits is less clear or the net benefit is small, the USPSTF issues a grade C recommendation, as it did for average-risk women in their 40s. As women who have had personal experiences with breast cancer and false-positive screening results and who devote much professional energy to evaluating medical evidence, we are concerned about efforts that conflate scientific evidence with policy decisions related to payment for health care. These efforts also create unwarranted suspicion of the USPSTF's work and divert attention and resources from gathering evidence to fill important gaps in knowledge about effective breast cancer prevention and screening. Most important, we may lose the attention and trust of the public with regard to the content of evidence-based recommendations unless scientists pay more attention to addressing existing gaps. The evidence gaps are wide and concern issues surrounding breast cancer screening about which we and many women worry. For example, we need to identify better screening methods for all women, particularly those with dense breasts. We need to act on concerns about the prevalence of a more deadly form of breast cancer in African American women and promote research that evaluates the effect of screening in ethnic minority women. And we need to identify optimal strategies for the management of possible precursor lesions, such as ductal carcinoma in situ, while also better defining and quantifying the harms associated with overdiagnosisan issue with direct application to screening mammography. The firestorm that reignited in spring 2015 was not without reason. The public has had legitimate worry about whether copayment for mammograms would be required if a new recommendation were to supersede the special…
The SPIRIT 2013 (The Standard Protocol Items: Recommendations for Interventional Trials) statemen... more The SPIRIT 2013 (The Standard Protocol Items: Recommendations for Interventional Trials) statement aims to improve the completeness of clinical trial protocol reporting, by providing evidence-based recommendations for the minimum set of items to be addressed. This guidance has been instrumental in promoting transparent evaluation of new interventions. More recently, there is a growing recognition that interventions involving artificial intelligence need to undergo rigorous, prospective evaluation to demonstrate their impact on health outcomes. The SPIRIT-AI extension is a new reporting guideline for clinical trials protocols evaluating interventions with an AI component. It was developed in parallel with its companion statement for trial reports: CONSORT-AI. Both guidelines were developed using a staged consensus process, involving a literature review and expert consultation to generate 26 candidate items, which were consulted on by an international multi-stakeholder group in a 2-st...
The CONSORT 2010 (Consolidated Standards of Reporting Trials) statement provides minimum guidelin... more The CONSORT 2010 (Consolidated Standards of Reporting Trials) statement provides minimum guidelines for reporting randomised trials. Its widespread use has been instrumental in ensuring transparency when evaluating new interventions. More recently, there has been a growing recognition that interventions involving artificial intelligence (AI) need to undergo rigorous, prospective evaluation to demonstrate impact on health outcomes. The CONSORT-AI extension is a new reporting guideline for clinical trials evaluating interventions with an AI component. It was developed in parallel with its companion statement for clinical trial protocols: SPIRIT-AI. Both guidelines were developed through a staged consensus process, involving a literature review and expert consultation to generate 29 candidate items, which were assessed by an international multi-stakeholder group in a two-stage Delphi survey (103 stakeholders), agreed on in a two-day consensus meeting (31 stakeholders) and refined throu...
Clearly presented and transparently reported statistical code is a sine qua non for reproducible ... more Clearly presented and transparently reported statistical code is a sine qua non for reproducible research. More than a decade ago, Annals began asking authors to report the availability of code that supported their statistical methods (1). Before that policy adoption, our statistical editors routinely requested and reviewed code underlying analyses of papers that we eventually published. Although we never formally graded submitted code, our experiences are similar to those reported by Assel and Vickers (2). We have found that authors increasingly apply complex statistical methods to account for such factors as correlated and repeated measures, missing data, incomplete adherence, and confounders on the causal pathway from exposure to outcomes. Reporting only that SAS version 9.2 (SAS Institute) was used for all analyses offers neither a hint of actual methods nor the model specification or structure. Naming a procedure or library within a statistical package usually explains little more. Authors have responded to our requests with computer code that frustrates interpretation except by the authorprogrammer, as if clear statistical code were not needed to understand or reproduce the science. We beg to differ. The goals of reproducible research in the biostatistics literature (3) apply equally to supporting statistical code in medical journals (4). Although journals and the International Committee of Medical Journal Editors (5) provide guidance for reporting statistical methods, journals rarely address statistical code. Except for simple and obvious statistical methods, we believe that authors should describe the scriptthat is, the combination of comments and code that outlines the analysis; any programs and functions that the script calls; and any postestimation commands that produce and output the estimates of interest. Statistical code translates raw data into interpretable information (6). Code must match the scientific question, reflect the estimators that answer the question (7), and disclose fundamental assumptions and choices. Goals of reproducible research apply equally to the scientific methods represented by statistical code (4, 6). Reproducible research requires a level of transparency that would permit a newcomer to the analysis to read the description, code, and output and be able to use the same or similar data to arrive at the same or similar results. Statistical code is not merely a grocery list of commands and algorithms. The code, as annotated, must reveal the exposures of interest, covariates, effect modifiers, and functional form of the equation behind the model. For example, if hierarchical models include random effects, readers need annotation that defines random intercepts and slopes and their covariances or correlations as well as the covariance pattern of error terms. If authors use splines, they should reveal the precise implementation. Communicating these features will require many lines of computer code statements and often complex function calls, together with adequate comments on each step's purpose. Recommendations on interpretable statistical code are not new, and the Table offers some simple principles. Forty years ago, in their treatise The Elements of Programming Style, Kernighan and Plauger noted, Write clearlydon't sacrifice clarity for efficiency' (8). In the 1980s, computer pioneer Knuth proposed a paradigm called literate programming to focus not on providing instructions to the computer but on explaining to readers what the authors are asking the computer to do (9). Common fourth-generation languages, including SAS, Stata (Stata Corp), SPSS (IBM), and R (R Project for Statistical Computing), offer full-screen, color-coded text editors with automatic syntax and error checking, with a user interface suitable for producing statistical programs appropriately annotated with transparent prose. Clarity wins readers; tweet-like brevity loses them. Subroutines and macros may be elegant but can degrade ease of interpretation. Pseudocode, a description of the formal program syntax to follow that is similar to English, can document both purpose and implementation, and focused statistical references in comments can point to methodological sources. Table. Guides for Preparing and Reporting Statistical Code Programs must leave audit trails. Some newer statistical software packages rely increasingly on point-and-click menus rather than code. Those who rely on menus should save a log of the code actually executed by those menus. Otherwise, no record of methods will exist and neither the authors nor the readers will be able to reconstruct the analysis. If transparent coding were entirely effortless, we would not be writing this editorial. Authors too often write code for the short termthe analysis and paper before them in the present moment. Programming tasks usually fall on junior team members unused to writing for outside comprehension, whereas senior investigators, vouching for the…
In traditional meta-analyses, researchers combine data from individual studies into a summary mea... more In traditional meta-analyses, researchers combine data from individual studies into a summary measure to describe the benefits or harms of an intervention. The pooled estimate is calculated by first estimating the treatment effect and 95% CI for each individual study. Each study's estimated treatment effect is then weighted, usually by its precision, to reflect the amount of information the study contains relative to the others. The pooled overall treatment effect is the weighted average of the individual treatment effect estimates. This installment of the Understanding Clinical Research series addresses issues (Table) that readers should consider when evaluating the meaning of a summary estimate and understanding to whom and under what circumstances it applies. Table. Key Questions for Evaluating the Meaning of a Summary Estimate in a Meta-analysis The Example A meta-analysis compared the benefits and harms of transcatheter aortic valve implantation (TAVI) with those of surgical aortic valve replacement (SAVR) among patients with severe aortic stenosis (1). The authors searched several databases and identified 5 randomized controlled trials and 31 observational studies that matched TAVI patients with SAVR patients. They examined all-cause mortality as the main outcome of interest and considered mortality data in several ways: by timing of the mortality assessment (early, midterm, and long-term), patient surgical risk (low, intermediate, and high), and study design (randomized trials and observational studies). Overall, pooled estimates suggested no major differences in early or midterm mortality but a possible increase in long-term mortality with TAVI versus SAVR. Pooled estimates derived from studies involving low- to intermediate-risk patients suggested a possible decrease in early mortality with TAVI versus SAVR. Pooled estimates derived from the randomized trials were inconclusive, with wide confidence bounds, whereas analyses of observational matched studies (for the definition of this and other terms used in the article, see Glossary) were similar to the overall results. Glossary Did the Groups of Studies Pooled Have Similar Patient Populations, Intervention and Comparison Strategies, Follow-up Durations, and Outcome Assessments? Understanding to whom and under what circumstances the pooled effect estimate applies is the first step in unraveling the meaning of a pooled treatment effect. It requires attention to several clinical and methodological issues. Critical issues are whether the authors' analyses are driven by specific clinical questions relevant to the use of an intervention and whether the grouped and pooled data are appropriate sources of evidence to address those questions. A first step in examining these clinical issues is to consider the following aspects of groups of studies: patient populations, intervention and comparison strategies, outcome assessments, and follow-up procedures. In this example, follow-up among individual studies varied by timing of assessments, with authors specifying early (30-day) and midterm (1-year) all-cause mortality as the 2 main outcomes of interest. The study populations varied by surgical risk. Most studies included patients at high surgical risk, for whom TAVI was already the recommended alternative to SAVR. Results from these studies drove the overall pooled effect size estimates for early (odds ratio [OR], 1.01 [95% CI, 0.81 to 1.26]) and midterm (OR, 0.96 [CI, 0.81 to 1.14]) mortality. Of note, the pooled estimate from studies enrolling only patients at low to intermediate surgical risk was different for the early mortality outcome (OR, 0.67 [CI, 0.42 to 1.07]) (Figure). Figure. Forest plots for all-cause mortality in the low- to intermediate-risk population. KnappHartung random-effects OR and 95% CI for 30-day (top), midterm (middle), and long-term (bottom) all-cause mortality in patients at low to intermediate risk. NOTION= Nordic Aortic Valve Intervention; OR= odds ratio; PARTNER= Placement of Aortic Transcatheter Valves; SAVR= surgical aortic valve replacement; TAVI= transcatheter aortic valve implantation. * Percentages do not sum to 100% for early all-cause mortality because of rounding. (Reprinted from Gargiulo and colleagues [1] with permission.). The intervention and comparison strategies in all studies were TAVI and SAVR, respectively. Given 2 TAVI approaches (transfemoral vs. transapical), the researchers also examined a potential interaction between the TAVI approach and early all-cause mortality, demonstrating that transfemoral (but not transapical) TAVI was more beneficial than SAVR. Variability in outcome assessment was not an issue in this review because the primary end point was all-cause mortality. Did the Design and Validity of Individual Studies Vary? Readers should note whether and how researchers considered the design and validity of individual studies. In this meta-analysis, researchers reviewed only randomized or observational…
The 2009 U.S. Preventive Services Task Force (USPSTF) recommendations on breast cancer screening ... more The 2009 U.S. Preventive Services Task Force (USPSTF) recommendations on breast cancer screening ignited a firestorm (1). Seven years later, the draft updated recommendations, which were available for public comment from 20 April to 18 May 2015, rekindled the fire (2). Sparks included full-page advertisements, likely costing up to a half-million dollars and appearing in such venues as The New York Times, USA Today, and The Washington Post, that asked, "Which of our mothers, wives, daughters, and sisters would it be OK to lose?" The named sponsors of the adBright Pink, the Black Women's Health Imperative, the National Medical Association, the National Hispanic Medical Association, Men Against Breast Cancer, and the Prevent Cancer Foundationurged readers to sign a petition to "stop the guidelines" (bit.ly/StopTheGuidelines). Flames, fueled by controversy about the grade C screening recommendation for women aged 40 to 49 years, spread to the halls of Congress. A convoluted law passed in the wake of the 2009 recommendations required private insurers to cover procedures for which the USPSTF issued grade A or B recommendations, except in the case of the 2009 recommendations for mammography. The exception, which was meant to ensure coverage for screening mammography for women aged 40 to 49 years (a grade C recommendation in 2009), was set to expire when the USPSTF issued new recommendations. The planned expiration was averted, however, when Congress passed an omnibus bill in December 2015 that included a rider that effectively extended the exception for screening mammography indefinitely (3). In this issue, Annals publishes the updated recommendations of the USPSTF (4). The USPSTF did a difficult job well, considering updated evidence reviews, a fuller panoply of potential harms, and tradeoffs of different screening strategies (59). The science led the USPSTF to conclude that the following recommendation (originally issued in 2009) still stands: Each average-risk woman between the ages of 40 and 49 years should make her own decision about whether to have a mammogram, based on her personal balancing of the benefits and harms of screening (a grade C recommendation). Although for many years the dogma was that women should have mammograms "once a year for a lifetime" starting at age 40 years, current evidence shows that the balance of risks and benefits of screening, particularly among women in their 40s, warrants more nuanced decision making. Potential harms of overdiagnosis and overtreatment of lesions with little progressive potential and harms of false-positive screening results with unnecessary biopsies and multiple repeated examinations must be considered (10, 11). The potential benefits of preventing breast cancer deaths are real, but the likelihood of those benefits is small and no definitive evidence shows that screening reduces total mortality. The USPSTF grades its recommendations to reflect the evidence about the benefits and harms of the health care intervention. When the net benefits of a health care intervention for a specific patient group are clear, the USPSTF strongly recommends it for patients in that group (grade A or B recommendation). When the balance of risks and benefits is less clear or the net benefit is small, the USPSTF issues a grade C recommendation, as it did for average-risk women in their 40s. As women who have had personal experiences with breast cancer and false-positive screening results and who devote much professional energy to evaluating medical evidence, we are concerned about efforts that conflate scientific evidence with policy decisions related to payment for health care. These efforts also create unwarranted suspicion of the USPSTF's work and divert attention and resources from gathering evidence to fill important gaps in knowledge about effective breast cancer prevention and screening. Most important, we may lose the attention and trust of the public with regard to the content of evidence-based recommendations unless scientists pay more attention to addressing existing gaps. The evidence gaps are wide and concern issues surrounding breast cancer screening about which we and many women worry. For example, we need to identify better screening methods for all women, particularly those with dense breasts. We need to act on concerns about the prevalence of a more deadly form of breast cancer in African American women and promote research that evaluates the effect of screening in ethnic minority women. And we need to identify optimal strategies for the management of possible precursor lesions, such as ductal carcinoma in situ, while also better defining and quantifying the harms associated with overdiagnosisan issue with direct application to screening mammography. The firestorm that reignited in spring 2015 was not without reason. The public has had legitimate worry about whether copayment for mammograms would be required if a new recommendation were to supersede the special…
Uploads
Papers by Cynthia Mulrow