| | Grading Evidence-Based Guidelines—What Are the Issues?Commentary on Sawaya GF, Guirguis-Blake J, LeFevre M, et al: Update on the Methods of the U.S. Preventive Services Task Force: Estimating Certainty and Magnitude of Net Benefit. Ann Intern Med 147:871-875, 2007. Clinical practice guidelines aim to inform decisionmaking in clinical practice. Guidelines build on summaries of the best available evidence and integrate judgments and values of an expert workgroup in the interpretation. When evidence is conclusive, for example when high-quality research studies show a practice to unequivocally improve patients' health and longevity, the values and preferences of the expert workgroup will play a lesser role in the interpretation of the evidence. In the setting of inconclusive evidence, however, the composition of the workgroup, the interests and biases of its members, and their attitudes and philosophies towards uncertainty can all affect the formulation of the recommendations. Of particular relevance to nephrologists are guidelines developed by KDIGO (Kidney Disease Improving Global Outcomes), an international entity devoted to the coordination of guideline development in kidney diseases.1 It has representation from the guideline-development initiatives of the major English-language nephrology organizations.2 Its goal is to develop guidelines for a global setting that will undergo local adaptation by incorporating the regional context. A major challenge for developing rigorous and useful guidelines for important areas of nephrology is that the evidence base for treatments of kidney diseases is often of low quality. There is wide consensus that, in order to enhance transparency and accountability in guideline development and to facilitate adaptation of guidelines, both the quality of supporting evidence and the strength of a recommendation should be graded. The grade of a recommendation communicates the confidence of the guideline development group that following the recommendation will do more good than harm, thereby expressing the degree to which users should follow the recommendation.3 The approach to grading used in KDIGO guidelines4 is derived from that developed by Grades of Recommendation Assessment Development and Evaluation (GRADE),3 though it includes important differences. The GRADE working group consists of international epidemiologists, guideline developers, and methods experts, who over the course of several years have developed and continue to refine a widely adopted approach to grading guidelines. In a recent article in the Annals of Internal Medicine, the US Preventive Services Task Force (USPSTF) updated its approach for grading.5 The USPSTF aims to provide evidence-based recommendations on a wide range of preventive services for the primary care general population in the United States. In this editorial, we compare the approaches taken by USPSTF, GRADE,3 and KDIGO4; discuss issues in grading guideline recommendations; and provide guidance on what guideline developers, users, and researchers should do. We hope to enhance the understanding of grading for the user of nephrology guidelines. What is the USPSTF Proposing?  Table 1 summarizes the sequence of grading steps followed by USPSTF. For each overarching question related to a preventive service, relevant studies related to screening, treatment benefits, and harms have to be reviewed. The quality for each body of evidence is assessed according to criteria detailed in Table 2. The magnitude of the net overall benefit is then estimated across all relevant bodies of evidence and the certainty surrounding this estimate is determined by again applying the criteria in Table 2. Based on the estimated magnitude of the net overall benefit and the certainty surrounding this estimate, the USPSTF assigns a letter grade from “A” to “D” to the preventive service. The letter signifies the strength of the recommendation about whether the service should be provided. An example is the recent recommendation by the Task Force on carotid artery stenosis screening.6 The Task Force rated as “moderate” the certainty about the net benefits from screening followed by treatment. It judged the magnitude of net benefit as “zero or negative,” and issued a “D” grade recommendation that the screening should not be provided in asymptomatic people.6 | | |  | Steps for Grading or Assessment | USPSTF | GRADE | KDIGO |  |
|---|
 | Quality of individual study | | Not specified | |  |  | |  |  | Quality of evidence for each key question in analytic framework (USPSTF) or for each outcome (GRADE and KDIGO)⁎ | Convincing Adequate Inadequate | | |  |  | |  |  | Overall certainty of net benefit (USPSTF) or overall quality of evidence across all important outcomes (GRADE and KDIGO)⁎ | | | |  |  | |  |  | Magnitude of net benefit | Substantial Moderate Small Zero/Negative | Net benefit Important trade-offs Uncertain trade-offs No net benefit | Net benefit Important trade-offs Uncertain trade-offs No net benefit |  |  | |  |  | Strength and meaning of recommendation | A = Should be provided for eligible patients B = Should be provided for eligible patients C = Should not be offered routinely D = Should not be provided I = Insufficient evidence | | B = “Should be considered”† |  | | | |
| ⁎ Criteria used for assessment are listed in Table 2. †These strengths can also be used for negative recommendations, but the wording for these has not yet been standardized in KDIGO. |
| | |  | | USPSTF | GRADE and KDIGO |  |
|---|
 | Matching question of interest with research design | Appropriateness of the study design of the reviewed studies. | Appropriateness of the study design of the reviewed studies. |  |  | |  |  | Internal validity | Methodological quality. | Methodological quality. |  |  | |  |  | External validity | Generalizability to the general primary care population and situation in the United States. | Directness of the evidence, ie, similarity of populations, interventions, outcome measures, to those of interest. |  |  | |  |  | Precision of evidence | Quantity and size of studies. | Precision and lack of sparseness of results. |  |  | |  |  | Consistency | Consistency of results across studies. | Consistency of results across studies. |  |  | |  |  | Additional factors | Dose-response effect, fit within a biological model. | Dose-response gradient; plausible residual confounders would have reduced observed effect; strength of association (for observational studies); lack of reporting bias. |  | | | |
How Does the USPSTF Approach Compare to GRADE and KDIGO?  As shown in Table 1, Table 2, USPSTF, GRADE, and KDIGO use broadly similar but distinctive approaches toward grading the quality of evidence and the strength of recommendations. The sequential steps used by all 3 groups highlight that guideline development follows a complex process. As detailed in the USPSTF approach, the strength of a recommendation relates to the size of the overall net effect as well as the certainty surrounding the estimate, which is based on the quality of the overall supporting evidence. However, no system has a formulaic approach to sum up all assessments. Furthermore, values and preferences are applied by the guideline development committee in each step of appraisal and generate a degree of arbitrariness. Implicit or explicit consideration of costs adds an additional layer of complexity.3 The focus for the USPSTF recommendations is the United States, while the context for KDIGO recommendations is global, and therefore values and judgments will inherently vary more widely. These issues become even more complex when interpreting inconclusive evidence. Consider the question of whether dialysis patients with hepatitis C infection should be treated with interferon.7 There is moderate-quality evidence that interferon treatment results in a sustained viral response (SVR) in about 40% of the patients.8 There is, however, uncertainty whether achieving SVR reduces mortality or improves quality of life in patients on dialysis or after kidney transplantation. Furthermore, the treatment has frequent and sometimes severe adverse events. The treatment duration prolongs the waiting time and may close the window of opportunity to transplantation. Finally, the treatment is costly. In the above scenario, if a guideline workgroup values SVR highly because it believes that SVR is a good surrogate for better survival and quality of life and deemphasizes adverse events because they are usually self limited or manageable, it may issue a recommendation for the treatment. If, however, the workgroup judges the benefits of SVR to be uncertain and is more concerned about proven harms associated with interferon therapy and also its direct and indirect costs, it may issue a weak recommendation against the use of the treatment. Alternatively, a recommendation could advise using the treatment only in selected patients where the benefit-harm ratio is most favorable (for example, transplant candidates with longer life expectancy), or the workgroup could refrain from making any recommendation. Therefore, despite examining identical evidence, different workgroups could reasonably arrive at discordant recommendations or at recommendations of different strengths. This example illustrates how when interpreting inconclusive evidence, workgroup dynamics9 and members' philosophies and attitudes towards uncertainty10 can determine whether and how a recommendation is issued. At one end of the spectrum are workgroup members who feel that even if the evidence is not conclusive, it is still useful to suggest what makes sense or is currently acceptable practice. After all, not many practitioners will have the time and expertise to systematically review and appraise the primary literature to inform their own opinions. At the other end of the spectrum are members who hesitate to issue a recommendation that is predominantly based on expert opinion, no matter how its strength is rated, given the possibility of being proven wrong by future research. The direction provided by the guideline-issuing body to its workgroup will further impact on the final guideline product. For example, the KDIGO Board allows its workgroups to issue recommendations based predominantly on expert judgment in the setting of insufficient evidence. Beyond the philosophical debate in guideline development, there is a tension between complexity and simplicity in grading. Guideline developers must follow a complex process, incorporate judgments, and transparently record decisions. But users want succinct guidelines with simple and intuitive grades for recommendations. With the proliferation of guideline initiatives, different grading systems have been developed.11, 12, 13 The coexistence of different systems presents an apparent paradox in evidence-based practice, given the overall goal of improving patient care by reducing unjustified variability. While common sense suggests that there is benefit to having one convention for grading, a single system that fits all purposes and transparently summarizes a multidimensional process in a simple, unequivocal, and intuitively understood set of grades is likely to remain an elusive goal. What Should Developers, Users, and Researchers of Guidelines Do?  Given the imperfection of any grading system, one might ask whether guideline recommendations should be graded at all. The answer depends on one's assessment of whether guideline users can and prefer to independently synthesize and appraise the evidence to inform their decisions. Inasmuch as there is an audience for guidelines, it can be assumed that their usefulness is enhanced by being explicit about their strength. While there is no direct evidence proving the benefit of grading versus not grading recommendations in terms of clinical outcomes, an unpublished, crossover, randomized, controlled trial by UpToDate demonstrated a strong preference by its users for graded recommendations (P. Bonis, personal communication, November, 2007). Entities commissioning guidelines should be explicit in outlining the charge to the guideline workgroups, in particular their expectations regarding whether recommendations should be issued when evidence is not conclusive. Guideline developers should acknowledge that grading is a process that includes some degree of subjectivity and that following a structured approach is useful to guide the deliberations. The guideline workgroup needs to be explicit about its values and preferences, especially in areas of lower-quality evidence. Furthermore, the workgroup needs to define what each grade means and which recommendations are or are not reasonable candidates for translation into performance measures. If the committee issues recommendations in areas of inconclusive evidence that are predominantly based on expert opinion, these should be clearly differentiated from guidelines supported by higher-quality evidence. Guideline users need to develop an understanding for the complexity of the process and become literate about the issues that impact the strength of a recommendation.14 It is vital that the implications for recommendations of different strengths are understood, along with the degree to which they are based on a workgroup's judgments. The more sensitive a recommendation is to values and preferences, the greater is the need for the practitioner to incorporate the values and preferences of the patient when deciding whether to follow a recommendation. Researchers of guideline methods should empirically evaluate the utility of a grading system. Surveying guideline users will generate a better understanding of their preferences and needs. It can also provide information on how grading or formulating the strength of a recommendation affects guideline uptake and implementation in clinical practice. Acknowledgements  The manuscript reflects the views of the authors, not necessarily those of the KDIGO or KDOQI (Kidney Disease Outcomes Quality Initiative) Advisory Boards. Support: None. Financial Disclosure: All authors receive salary support from the National Kidney Foundation (NKF) as staff at the NKF Center for Clinical Practice Guideline Development and Implementation, Boston, Massachusetts. References  1. 1Kidney Disease Improving Global Outcomes (KDIGO) website. http://www.kdigo.org. 2. 2Uhlig K, Balk EM, Lau J, et al. Clinical practice guidelines in nephrology–for worse or for better. Nephrol Dial Transplant. 2006;21:1145–1153. MEDLINE |
CrossRef
3. 3Atkins D, Best D, Briss PA, et al. Grading quality of evidence and strength of recommendations. BMJ. 2004;328:1490. 4. 4Uhlig K, Macleod A, Craig J, et al. Grading evidence and recommendations for clinical practice guidelines in nephrology (A position statement from Kidney Disease: Improving Global Outcomes (KDIGO)). Kidney Int. 2006;70:2058–2065. MEDLINE 5. 5Sawaya GF, Guirguis-Blake J, LeFevre M, et al. Update on the methods of the U.S. Preventive Services Task Force: estimating certainty and magnitude of net benefit. Ann Intern Med. 2007;147:871–875. 6. 6Wolff T, Guirguis-Blake J, Miller T, et al. Screening for carotid artery stenosis: an update of the evidence for the U.S. Preventive Services Task Force. Ann Intern Med. 2007;147:860–870. 7. 7KDIGO. KDIGO Clinical Practice Guidelines for the Prevention, Diagnosis, Evaluation and Treatment of Hepatitis C in Chronic Kidney Disease. Kidney Int Suppl. 2008;109:S1–S99. 8. 8Gordon C, Uhlig K, Lau J, et al. Interferon treatment in hemodialysis patients with chronic hepatitis C virus infection—a systematic review of the literature and meta-analysis of treatment efficacy and harms. Am J Kidney Dis. 2008;51:263–277. Abstract | Full Text |
Full-Text PDF (176 KB)
|
CrossRef
9. 9Pagliari C, Grimshaw J, Eccles M. The potential influence of small group processes on guideline development. J Eval Clin Pract. 2001;7:165–173. MEDLINE |
CrossRef
10. 10Djulbegovic B, Frohlich A, Bennett CL. Acting on imperfect evidence: How much regret are we ready to accept?. J Clin Oncol. 2005;23:6822–6825.
CrossRef
11. 11West S, King V, Carey T, et al. Systems to Rate the Strength of Scientific Evidence (Evidence Report/Technology Assessment No. 47. AHRQ Publication No 02-E016). Rockville, MD: Agency for Healthcare Research and Quality; 2002;. 12. 12Schunemann HJ, Best D, Vist G, et al. Letters, numbers, symbols and words: how to communicate grades of evidence and recommendations. CMAJ. 2003;169:677–680. MEDLINE 13. 13King SB, Smith SC, Hirshfeld JW, et al. 2007 Focused Update of the ACC/AHA/SCAI 2005 Guideline Update for Percutaneous Coronary Intervention: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines: 2007 Writing Group to Review New Evidence and Update the ACC/AHA/SCAI 2005 Guideline Update for Percutaneous Coronary Intervention, Writing on Behalf of the 2005 Writing Committee. Circulation. 2008;117:261–295.
CrossRef
14. 14Glasziou P, Vandenbroucke JP, Chalmers I. Assessing the quality of research. BMJ. 2004;328:39–41. Tufts Medical Center, Tufts University School of Medicine, Boston, Massachusetts Address correspondence to Katrin Uhlig, MD, MS, Division of Nephrology, Department of Medicine, Tufts Medical Center, Box 391, 800 Washington St, Boston, MA 02111.
PII: S0272-6386(08)00994-3 doi:10.1053/j.ajkd.2008.06.002 © 2008 National Kidney Foundation, Inc. Published by Elsevier Inc All rights reserved. | |
|