The use of modified Rankin Scale (mRS)1 as one of the most widely accepted outcome measures in stroke is the likely motivator for the ongoing sustained attention to how ordinal stroke outcome data are and should be analyzed.2 mRS presents a perfect example of a “textbook definition” ordinal scale with 7 individual categories that are ordered but unequally spaced. The original debate about analytical approaches to mRS data was almost exclusively driven by the considerations of statistical efficiency and power. This debate came to the successful resolution that appropriate scale dichotomization and the analysis across all ordinal scale should not be considered as 2 competing approaches where one is uniformly preferred to the other but rather as complementary analytical strategies that answer 2 different clinical research questions.3 Dichotomization of the scale at an appropriately chosen clinically meaningful threshold level (such as mRS score 0–1 versus mRS score 2–6 or mRS score 0–2 versus mRS score 3–6), through the use of binary logistic regression, is best suited for the question about which treatment regimen has higher probability (or odds) of achieving absolute attainment of such a predefined level of functional performance. At the same time, the analytical approaches across the full mRS scale are aimed at understanding the probability (or odds) of relative improvement gained by patients that received 1 treatment compared with another.
See related article, p 3025
Ordinal logistic regression (that relies on the proportional odds assumption) and all-to-all pairwise comparison methods (often referred to as assumption-free methods to emphasize the difference to the ordinal logistic regression) are the 2 most widely encountered analytical approaches used to conduct the ordinal analysis across the full mRS. Recent examples of trials in which the former and the latter approaches were used include the WAKE-UP (Efficacy and Safety of MRI-Based Thrombolysis in Wake-Up Stroke) and EXTEND IA TNK Part 2 (Extending the Time for Thrombolysis in Emergency Neurological Deficits - Intra-Arterial Using Intravenous Tenecteplase Part 2), respectively.4,5 In this Editorial, we introduce the notion of a “Tournament Method” to refer to an all-to-all pairwise comparison method that explicitly considers all possible pairs of patients who received treatments A and B, respectively. The idea behind the Tournament Method approach has historical roots in Wilcoxon-Mann-Whitney test and Kendall’s tau, and has received renewed attention as discussed below. The aim of the Tournament Method is to calculate 3 numbers: the number of all pairs where a patient who received Treatment A achieved a better outcome than a patient with Treatment B (number of wins), the number of pairs where the outcome for patient receiving Treatment A was worse than that for a patient with Treatment B (number of losses), and the number of pairs where such 2 patients achieved the same outcome (number of ties). These 3 numbers are then presented either as a proportion of the total number of all analyzed pairs (thus, representing the probability that a random patient receiving Treatment A will win, lose or tie in a head-to-head comparison against a random patient receiving Treatment B respectively), or as odds of the same.
Zou et al6 in the current issue of Stroke present original research on how a particular version of a Tournament Method may be applied to improve the design and analysis of stroke clinical trials that use mRS as an outcome measure. By doing that, Zou and colleagues contribute to a rich body of previously published literature dedicated to this topic. In particular, Koziol and Feng7 and Rahlfs et al8 both provided conceptual overviews of this class of methods and identified the links between the all-to-all pairwise comparison approach and other measures of treatment effect well-known in stroke research. Howard et al9 proposed a computationally intensive permutation test based on pairwise comparisons. Our group10 drew upon prior statistical work by Agresti11 and O’Brien and Costelloe12 to introduce Generalised Odds Ratio for mRS analysis, which reduced the computational burden of permutation testing and provided P values and confidence intervals based on a standard error that assumed the null hypothesis of no treatment effect was true. In their paper, Zou et al6 also drew from Agresti11 to calculate the win proportion, and provided confidence intervals based on a standard error that did not assume the null hypothesis of no treatment effect. They demonstrate the appropriate statistical properties of the proposed approach with a simulation study based on the mRS outcome data from multiple randomized stroke trials. They also provide a convenient closed-form analytic expression for the sample size estimation. As there now exist several alternative sample size formulas to facilitate the planning of stroke trials with mRS outcome using Tournament Methods,6,12,13 it would be important for the stroke research community to explicitly compare these formulas to determine the most accurate one, and to better understand their robustness to misspecification of assumed treatment effect.
The article by Zou et al6 in this issue also provides an extra reason to focus the attention by stroke researchers on the following 2 tantalizing conceptual points: handling tied observations (that may emerge as the result of using Tournament Methods) and careful and appropriate clinical interpretation of the resulting effect measures. There are differing views expressed in stroke literature about how to address tied comparisons. Howard and colleagues9 proposed that ties be dropped from the effect size measure, which results in a clinically interpretable measure of effect as a probability (or odds) that a random patient receiving Treatment A will have a better outcome than the patient receiving Treatment B, assuming that there was a difference in their outcomes. Our group,10 as well as Zou et al6 in this issue, instead followed traditional Mann-Whitney convention and split ties evenly between being in favor of Treatment A and in favor of Treatment B before calculating the effect size. This method has close conceptual links to the estimation of the area under a receiver operating curve, as discussed by Rahlfs et al.8 The split-tie approach generally results in a receiver operating curve-like interpretation, where more pronounced treatment effects may be visualized as increased deviations from the diagonal line (that corresponds to the equal probability of 0.5 for both groups) on a percentile-percentile plot. It also results in a more conservative treatment effect with tighter confidence intervals than the drop-ties approach, which led to our group10 recommending its use over the drop-ties approach. It is important to note that such even splitting of ties is based on the dual assumptions that the null hypothesis of no treatment effect is true, and that “ties occur only because of insufficient precision of measurement.”14 If both assumptions are correct (the second one is quite likely to hold for the mRS scale where it is commonly recognized that the simplicity and practicality of use is achieved at the expense of scale granularity), then a split-ties effect size measure may also be interpreted clinically as the probability (or odds) that a patient receiving Treatment A will be better off than a patient receiving Treatment B. However, if either assumption is violated, then a split-ties approach does not have a simple clinical interpretation and can only be interpreted as the weighted sum of win and tie probabilities.
One assumption-free alternative to the split-ties approach would be to resolve all ties optimistically in favor of the experimental treatment group, resulting in an effect size measure that may be interpreted as the probability (or odds) that a patient will be no worse off under experimental treatment than under control condition. Conversely, all ties may be resolved pessimistically in favor of the control group, therefore, resulting in an effect size measure interpreted as the probability (or odds) that a patient receiving experimental treatment will be better off than a patient under control condition. The decision on whether to split ties or to adopt either the optimistic or the pessimistic approach for tie resolution should be made a priori and be driven by the clinical research objective and other potential considerations such as cost or feasibility of a particular treatment regimen. For example, such approach could allow for ties to be resolved in favor of the cheaper, or easier to implement, intervention. Regardless of how ties are handled, Howard et al9 proposed that studies that use pairwise preference methods explicitly report the number of pairs where treatment wins, treatment loses, and when treatment and controls are tied. We strongly encourage this practice for Tournament Methods regardless of how ties are handled.
Additional care may also be needed in relation to how the magnitudes of effect size measures are interpreted. While “small‚” “medium,” and “large” effect magnitudes have been proposed for the win proportion by Zou et al,6 these thresholds are based on those from standardised mean differences with normally distributed outcome measures.8 It is important to keep in mind that Tournament Methods do not measure the magnitude of difference in the actual mRS outcomes, only the tendency for one treatment group to have better mRS outcomes than the other group. This can be observed in the illustrative meta-analysis performed by Zou et al,6 which showed extremely similar win proportions for the EXTEND-IA (Extending the Time for Thrombolysis in Emergency Neurological Deficits - Intra-Arterial)15 and MR CLEAN (Multicenter Randomized Clinical Trial of Endovascular Treatment for Acute Ischemic Stroke in the Netherlands)16 clinical trials, despite the fact that these trials reported NNT for achieving mRS score 0 to 2 of 3 and 7 respectively. Additionally, the proposed benchmarks for effect magnitudes for the win proportion challenge the intuitive notion of effectiveness previously acquired in the stroke field: very effective reperfusion treatments (NNT below 10 or even 5 for achieving mRS score 0–2) can only be characterized as having small-to-medium effects based on the benchmarks proposed by Zou et al.6 Care must therefore be taken to not misinterpret and misrepresent a “large” treatment effect on the proposed scale as being indicative of a large numeric shift in mRS outcomes.
Contemplating the future, in addition to extending the use of Tournament Methods to the case of cluster randomized studies suggested by Zou et al,6 there has been growing interest outside of stroke research in extending the use of Tournament Methods beyond statements of preference based on a single outcome measurement (such as mRS). This development has been spearheaded by cardiology research17 and extends the definition of “better off” to consider differences across multiple types of outcomes (eg, time until death and time until rehospitalisation; functional outcome and quality of life). Such methods are equally applicable to stroke and may provide a valuable contribution to ongoing discussion about the appropriateness of the mRS and its utility-weighted counterparts. Identification of where and how using Tournament Methods with preference on multiple outcomes could be most beneficial to stroke research.
Disclosures None.
The opinions expressed in this article are not necessarily those of the editors or of the American Heart Association.
For Disclosures, see page 3034.