-
Causal Inference with Complex Surveys: A Unified Perspective on Sample Selection and Exposure Selection Am. Stat. (IF 1.8) Pub Date : 2024-11-05 Giovanni Nattino, Robert Ashmead, Bo Lu
Probability surveys are a major source of population representative data for policy research and program evaluation. However, the data come with the added complications of being observational and s...
-
Cross-validatory Z-Residual for Diagnosing Shared Frailty Models Am. Stat. (IF 1.8) Pub Date : 2024-10-29 Tingxuan Wu, Cindy Feng, Longhai Li
Accurate model performance assessment in survival analysis is imperative for robust predictions and informed decision-making. Traditional residual diagnostic tools like martingale and deviance resi...
-
Performance Analysis of NSUM Estimators in Social-Network Topologies Am. Stat. (IF 1.8) Pub Date : 2024-10-29 Sergio Díaz-Aranda, Jose Aguilar, Juan Marcos Ramírez, David Rabanedo, Antonio Fernández Anta, Rosa E. Lillo
The Network Scale-up Methods (NSUM) are methods to estimate unknown populations based on indirect surveys in which the participants provide information about aggregated data of their acquaintances....
-
A Pareto tail plot without moment restrictions Am. Stat. (IF 1.8) Pub Date : 2024-10-15 Bernhard Klar
We propose a mean functional that exists for arbitrary probability distributions and characterizes the Pareto distribution within the set of distributions with finite left endpoint. This is in shar...
-
Sparse-group boosting: Unbiased group and variable selection Am. Stat. (IF 1.8) Pub Date : 2024-10-07 Fabian Obster, Christian Heumann
For grouped covariates, we propose a framework for boosting that allows for sparsity within and between groups. By using component-wise and group-wise gradient ridge boosting simultaneously with ad...
-
Additive Hazards Regression Analysis of Massive Interval-Censored Data via Data Splitting Am. Stat. (IF 1.8) Pub Date : 2024-09-25 Peiyao Huang, Shuwei Li, Xinyuan Song
With the rapid development of data acquisition and storage space, massive data sets exhibited with large sample size emerge increasingly and make more advanced statistical tools urgently need. To a...
-
An effective and small sample-size valid confidence interval for isotonic dose-response curves by inverting a partial likelihood ratio test Am. Stat. (IF 1.8) Pub Date : 2024-09-19 J. G. Liao
A dose-response curve is essential for determining the safe dosage of a drug and is widely used in bioassay and in phase 1 clinical trials. It is generally accepted that the probability of death or...
-
Estimation of Contact Time Among Animals from Telemetry Data Am. Stat. (IF 1.8) Pub Date : 2024-09-09 Andrew B. Whetten, Trevor J. Hefley, David A. Haukos
Continuous processes in most applications are measured discretely with error. This complicates the task of detecting intersections and the number of intersections between two continuous processes (...
-
Selecting the best compositions of a wheelchair basketball team: a data-driven approach Am. Stat. (IF 1.8) Pub Date : 2024-09-11 Gabriel Calvo, Carmen Armero, Bernd Grimm, Christophe Ley
Wheelchair basketball, regulated by the International Wheelchair Basketball Federation, is a sport designed for individuals with physical disabilities. This paper presents a data-driven tool that e...
-
When Heavy Tails Disrupt Statistical Inference Am. Stat. (IF 1.8) Pub Date : 2024-09-10 Richard M. Vogel, Simon Michael Papalexiou, Jonathan R. Lamontagne, Flannery Dolan
Heavy tails (HT) arise in many applications and their presence can disrupt statistical inference, yet the HT statistical literature requires a theoretical background most practicing statisticians l...
-
Tightening Blocks in Complementary Analyses of Observational Studies: Optimization Algorithm and Examples Am. Stat. (IF 1.8) Pub Date : 2024-08-20 Paul R. Rosenbaum
An observational block design has I blocks matched for covariates and J individuals per block, but treatments were not randomly assigned to individuals within blocks, as would have been done in an ...
-
Using Exact Tests from Algebraic Statistics in Sparse Multi-way Analyses: An Application to Analyzing Differential Item Functioning Am. Stat. (IF 1.8) Pub Date : 2024-08-12 Shishir Agrawal, Luis David Garcia Puente, Minho Kim, Flavia Sancier-Barbosa
Asymptotic goodness-of-fit methods in contingency table analysis can struggle with sparse data, especially in multi-way tables where it can be infeasible to meet sample size requirements for a robu...
-
Distance Covariance, Independence, and Pairwise Differences Am. Stat. (IF 1.8) Pub Date : 2024-07-03 Jakob Raymaekers, Peter J. Rousseeuw
Distance covariance (Székely et al. 2007) is a fascinating recent notion, which is popular as a test for dependence of any type between random variables X and Y. This approach deserves to be touche...
-
A Multi-Method Data Science Pipeline for Analyzing Police Service Am. Stat. (IF 1.8) Pub Date : 2024-07-01 Anna Haensch, Daanika Gordon, Karin Knudson, Justina Cheng
Despite the fact that most police departments in the U.S. serve jurisdictions with fewer than 10,000 residents, policing practices in small towns are understudied. This is due in part to data limit...
-
High-dimensional propensity score and its machine learning extensions in residual confounding control Am. Stat. (IF 1.8) Pub Date : 2024-06-17 Mohammad Ehsanul Karim
“The use of health care claims datasets often encounters criticism due to the pervasive issues of omitted variables and inaccuracies or mis-measurements in available confounders. Ultimately, the tr...
-
Integrative data analysis where partial covariates have complex non-linear effects by using summary information from an external data Am. Stat. (IF 1.8) Pub Date : 2024-06-17 Jia Liang, Shuo Chen, Peter Kochunov, L. Elliot Hong, Chixiang Chen
A full parametric and linear specification may be insufficient to capture complicated patterns in studies exploring complex features, such as those investigating age-related changes in brain functi...
-
On Misuses of the Kolmogorov–Smirnov Test for One-Sample Goodness-of-Fit Am. Stat. (IF 1.8) Pub Date : 2024-06-12 Anthony Zeimbekakis, Elizabeth D. Schifano, Jun Yan
The Kolmogorov–Smirnov (KS) test is widely employed to assess the goodness-of-fit of a hypothesized continuous distribution to a sample. Despite its popularity, the test is frequently misused in th...
-
Assessment and Continuous Improvement of an Undergraduate Data Science Program Am. Stat. (IF 1.8) Pub Date : 2024-06-07 Nicholas Clark, Christopher Morrell, Mike Powell
In recent years, there has been an explosion in the growth of undergraduate statistics and data science programs across the US. Simultaneously, there has been clear guidance written on curriculum d...
-
Analyzing Matched 2 × 2 Tables from all Corners Am. Stat. (IF 1.8) Pub Date : 2024-05-31 Marc Aerts, Geert Molenberghs
Squared 2 × 2 tables with binary data from matched pairs are typically analyzed using Cochran-Mantel-Haenszel methodology, conditional logistic regression, or random intercepts logistic regression....
-
Sequential monitoring using the Second Generation P-Value with Type I error controlled by monitoring frequency Am. Stat. (IF 1.8) Pub Date : 2024-05-28 Jonathan J. Chipman, Robert A. Greevy Jr., Lindsay Mayberry, Jeffrey D. Blume
The Second Generation P-Value (SGPV) measures the overlap between an estimated interval and a composite hypothesis of parameter values. We develop a sequential monitoring scheme of the SGPV (SeqSGP...
-
Binomial Confidence Intervals for Rare Events: Importance of Defining Margin of Error Relative to Magnitude of Proportion Am. Stat. (IF 1.8) Pub Date : 2024-05-24 Owen McGrath, Kevin Burke
Confidence interval performance is typically assessed in terms of two criteria: coverage probability and interval width (or margin of error). In this article, we assess the performance of four comm...
-
The R2D2 prior for generalized linear mixed models Am. Stat. (IF 1.8) Pub Date : 2024-05-09 Eric Yanchenko, Howard D. Bondell, Brian J. Reich
In Bayesian analysis, the selection of a prior distribution is typically done by considering each parameter in the model. While this can be convenient, in many scenarios it may be desirable to plac...
-
Boldness-Recalibration for Binary Event Predictions Am. Stat. (IF 1.8) Pub Date : 2024-05-13 Adeline P. Guthrie, Christopher T. Franck
Probability predictions are essential to inform decision making across many fields. Ideally, probability predictions are (i) well calibrated, (ii) accurate, and (iii) bold, that is, spread out enou...
-
The Best Time to Play the Lottery Am. Stat. (IF 1.8) Pub Date : 2024-05-07 Christopher M. Rump
The best time to play the lottery is when the jackpot has rolled over several times and grown large, but not so large that you must share the prize if you win. We examine maximizing the expected va...
-
A Simple and Fast Algorithm for Generating Correlation Matrices with a Known Average Correlation Coefficient Am. Stat. (IF 1.8) Pub Date : 2024-05-02 Niels G. Waller
This article describes a simple and fast algorithm for generating correlation matrices ( R) with a known average correlation. The algorithm should be useful for researchers desiring plausible R m...
-
Tractable Bayesian Inference For An Unidentified Simple Linear Regression Model Am. Stat. (IF 1.8) Pub Date : 2024-04-24 Robert Calvert Jump
In this article, I propose a tractable approach to Bayesian inference in a simple linear regression model for which the standard exogeneity assumption does not hold. By specifying a beta prior for ...
-
Telling Stories with Data: With Applications in R Am. Stat. (IF 1.8) Pub Date : 2024-04-23 Piotr Fryzlewicz
Published in The American Statistician (Vol. 78, No. 4, 2024)
-
Deep Learning and Scientific Computing with R torch Am. Stat. (IF 1.8) Pub Date : 2024-04-17 Yang Ni
Published in The American Statistician (Vol. 78, No. 2, 2024)
-
An Introduction to R and Python for Data Analysis: A Side-by-Side Approach. Am. Stat. (IF 1.8) Pub Date : 2024-04-17 Gabriel Wallin
Published in The American Statistician (Vol. 78, No. 2, 2024)
-
On Point Estimators for Gamma and Beta Distributions Am. Stat. (IF 1.8) Pub Date : 2024-04-17 Nickos D. Papadatos
Let X1,…,Xn be a random sample from the gamma distribution with density f(x)=λαxα−1e−λx/Γ(α), x > 0, where both α>0 (the shape parameter) and λ>0 (the reciprocal scale parameter) are unknown. The m...
-
Introduction to Statistical Modelling and Inference Am. Stat. (IF 1.8) Pub Date : 2024-04-17 Nianpin Cheng, Beth Chance
Published in The American Statistician (Vol. 78, No. 3, 2024)
-
Statistical Theory: A Concise Introduction, 2nd ed. Am. Stat. (IF 1.8) Pub Date : 2024-04-17 Juan Sosa
Published in The American Statistician (Vol. 78, No. 3, 2024)
-
Moments of the Nonnegative Adjusted Estimator of Squared Multiple Correlation Am. Stat. (IF 1.8) Pub Date : 2024-04-17 Joseph F. Lucke
I present the moments of the nonnegative adjusted estimator of the squared multiple correlation ρ2, the coefficient of determination for random-predictor regression. This estimator, first proposed ...
-
Thick Data Analytics (TDA): An Iterative and Inductive Framework for Algorithmic Improvement Am. Stat. (IF 1.8) Pub Date : 2024-04-15 Minh Nguyen, Tiffany Eulalio, Ben J. Marafino, Christian Rose, Jonathan H. Chen, Michael Baiocchi
A gap remains between developing risk prediction models and deploying models to support real-world decision making, especially in high-stakes situations. Human-experts’ reasoning abilities remain c...
-
Covariance Matrix Estimation for High-Throughput Biomedical Data with Interconnected Communities Am. Stat. (IF 1.8) Pub Date : 2024-04-15 Yifan Yang, Chixiang Chen, Shuo Chen
Estimating a covariance matrix is central to high-dimensional data analysis. Empirical analyses of high-dimensional biomedical data, including genomics, proteomics, microbiome, and neuroimaging, am...
-
On the Term “Randomization Test” Am. Stat. (IF 1.8) Pub Date : 2024-03-18 Jesse Hemerik
There is no consensus on the meaning of the term “randomization test.” Contradictory uses of the term are leading to confusion, misunderstandings and indeed invalid data analyses. A main source of ...
-
Fitting Log-Gaussian Cox Processes Using Generalized Additive Model Software Am. Stat. (IF 1.8) Pub Date : 2024-03-14 Elliot Dovers, Jakub Stoklosa, David I. Warton
While log-Gaussian Cox process regression models are useful tools for modeling point patterns, they can be technically difficult to fit and require users to learn/adopt bespoke software. We show th...
-
Sequential Selection for Minimizing the Variance with Application to Crystallization Experiments Am. Stat. (IF 1.8) Pub Date : 2024-03-07 Caroline M. Kerfonta, Sunuk Kim, Ye Chen, Qiong Zhang, Mo Jiang
For many crystal-based products (e.g., pharmaceuticals, energy storage), the size uniformity is not only a key quality attribute, but sometimes also an indicator of other attributes such as solid p...
-
Proximal MCMC for Bayesian Inference of Constrained and Regularized Estimation Am. Stat. (IF 1.8) Pub Date : 2024-02-26 Xinkai Zhou, Qiang Heng, Eric C. Chi, Hua Zhou
This article advocates proximal Markov chain Monte Carlo (ProxMCMC) as a flexible and general Bayesian inference framework for constrained or regularized estimation. Originally introduced in the Ba...
-
Parole Board Decision-Making using Adversarial Risk Analysis Am. Stat. (IF 1.8) Pub Date : 2024-02-13 Chaitanya Joshi, Charné Nel, Javier Cano, Devon L.L. Polaschek
Adversarial Risk Analysis (ARA) allows for much more realistic modeling of game theoretical decision problems than Bayesian game theory. While ARA solutions for various applications have been discu...
-
Prioritizing Variables for Observational Study Design using the Joint Variable Importance Plot Am. Stat. (IF 1.8) Pub Date : 2024-02-08 Lauren D. Liao, Yeyi Zhu, Amanda L. Ngo, Rana F. Chehab, Samuel D. Pimentel
Observational studies of treatment effects require adjustment for confounding variables. However, causal inference methods typically cannot deliver perfect adjustment on all measured baseline varia...
-
Hidden Markov Models for Low-Frequency Earthquake Recurrence Am. Stat. (IF 1.8) Pub Date : 2024-02-05 Jessica Allen, Ting Wang
Low-frequency earthquakes (LFEs) are small magnitude earthquakes with frequencies of 1–10 Hertz which often occur in overlapping sequence forming persistent seismic tremors. They provide insights i...
-
Applied Linear Regression for Longitudinal Data: With an Emphasis on Missing Observations Am. Stat. (IF 1.8) Pub Date : 2024-02-05 Maria Francesca Marino
Published in The American Statistician (Vol. 78, No. 1, 2024)
-
Hitting a Prime by Rolling a Die with Infinitely Many Faces Am. Stat. (IF 1.8) Pub Date : 2024-01-05 Shane Chern
Alon and Malinovsky recently proved that it takes on average 2.42849… rolls of fair six-sided dice until the first time the total sum of all rolls arrives at a prime. Naturally, one may extend the ...
-
Understanding the Implications of a Complete Case Analysis for Regression Models with a Right-Censored Covariate Am. Stat. (IF 1.8) Pub Date : 2023-12-21 Marissa C. Ashner, Tanya P. Garcia
Despite its drawbacks, the complete case analysis is commonly used in regression models with incomplete covariates. Understanding when the complete case analysis will lead to consistent parameter e...
-
Using Conformal Win Probability to Predict the Winners of the Canceled 2020 NCAA Basketball Tournaments Am. Stat. (IF 1.8) Pub Date : 2023-12-21 Chancellor Johnstone, Dan Nettleton
The COVID-19 pandemic was responsible for the cancellation of both the men’s and women’s 2020 National Collegiate Athletic Association (NCAA) Division I basketball tournaments. Starting from the po...
-
Lessons from a Discussion-Based Course on the History of Statistics Am. Stat. (IF 1.8) Pub Date : 2023-12-21 David B. Hitchcock
A special-topics undergraduate course about the history of statistics which was taught in Spring 2023 at the University of South Carolina is described. We review other similar courses (past and cur...
-
One-Step Weighting to Generalize and Transport Treatment Effect Estimates to a Target Population Am. Stat. (IF 1.8) Pub Date : 2023-12-11 Ambarish Chattopadhyay, Eric R. Cohn, José R. Zubizarreta
The problems of generalization and transportation of treatment effect estimates from a study sample to a target population are central to empirical research and statistical methodology. In both ran...
-
The Phistogram Am. Stat. (IF 1.8) Pub Date : 2023-11-17 Adriana Verónica Blanc
This article introduces a new kind of histogram-based representation for univariate random variables, named the phistogram because of its perceptual qualities. The technique relies on shifted group...
-
Missing Data Imputation with High-Dimensional Data Am. Stat. (IF 1.8) Pub Date : 2023-11-17 Alberto Brini, Edwin R. van den Heuvel
Imputation of missing data in high-dimensional datasets with more variables P than samples N, P≫N, is hampered by the data dimensionality. For multivariate imputation, the covariance matrix is ill ...
-
Technical Validation of Plot Designs by Use of Deep Learning Am. Stat. (IF 1.8) Pub Date : 2023-11-16 Anne Helby Petersen, Claus Ekstrøm
When does inspecting a certain graphical plot allow for an investigator to reach the right statistical conclusion? Visualizations are commonly used for various tasks in statistics—including model d...
-
A Note on Monte Carlo Integration in High Dimensions Am. Stat. (IF 1.8) Pub Date : 2023-11-16 Yanbo Tang
Monte Carlo integration is a commonly used technique to compute intractable integrals and is typically thought to perform poorly for very high-dimensional integrals. To show that this is not always...
-
Causal Quartets: Different Ways to Attain the Same Average Treatment Effect Am. Stat. (IF 1.8) Pub Date : 2023-11-15 Andrew Gelman, Jessica Hullman, Lauren Kennedy
The average causal effect can often be best understood in the context of its variation. We demonstrate with two sets of four graphs, all of which represent the same average effect but with much dif...
-
Bayesian Modeling and Computation in Python Am. Stat. (IF 1.8) Pub Date : 2023-10-31 P. Richard Hahn
Published in The American Statistician (Vol. 77, No. 4, 2023)
-
A First Course in Linear Model Theory, 2nd ed. Am. Stat. (IF 1.8) Pub Date : 2023-10-31 Carlos Cinelli
Published in The American Statistician (Vol. 77, No. 4, 2023)
-
ANOVA and Mixed Models: A Short Introduction Using R Am. Stat. (IF 1.8) Pub Date : 2023-10-31 Brady T. West
Published in The American Statistician (Vol. 77, No. 4, 2023)
-
Comment on “Forbidden knowledge and specialized training: A versatile solution for the two main sources of overfitting in linear regression,” by Rohlfs (2023) Am. Stat. (IF 1.8) Pub Date : 2023-10-30 Ronald Christensen
Published in The American Statistician (Just accepted, 2023)
-
The Application of the Likelihood Ratio Test and the Cochran-Mantel-Haenszel Test to Discrimination Cases Am. Stat. (IF 1.8) Pub Date : 2023-10-20 Weiwen Miao, Joseph L. Gastwirth
In practice, the ultimate outcome of many important discrimination cases, for example, the Wal-Mart, Nike and Goldman-Sachs equal pay cases, is determined at the stage when the plaintiffs request t...
-
Statistical Challenges in Online Controlled Experiments: A Review of A/B Testing Methodology Am. Stat. (IF 1.8) Pub Date : 2023-10-18 Nicholas Larsen, Jonathan Stallrich, Srijan Sengupta, Alex Deng, Ron Kohavi, Nathaniel T. Stevens
The rise of internet-based services and products in the late 1990s brought about an unprecedented opportunity for online businesses to engage in large scale data-driven decision making. Over the pa...
-
Melded Confidence Intervals Do Not Provide Guaranteed Coverage Am. Stat. (IF 1.8) Pub Date : 2023-10-16 Jesse Frey, Yimin Zhang
Melded confidence intervals were proposed as a way to combine two independent one-sample confidence intervals to obtain a two-sample confidence interval for a quantity like a difference or a ratio....