Grip strength as a simple method to predict postoperative delirium after lower extremity surgery in the elderly: a prospective diagnostic evaluation
Article information
Abstract
Background
Grip strength (GS), an indicator of sarcopenia, declines with age and is associated with frailty, which increases the risk of postoperative delirium (POD). Assessing frailty in patients undergoing lower extremity surgery is challenging owing to limited mobility. We investigated whether GS, a simple screening tool, can independently predict POD and aimed to determine the most appropriate criteria for low GS in the Korean population.
Methods
This prospective study included patients aged ≥ 65 years undergoing lower extremity surgery. Preoperative GS was measured, and sarcopenia was diagnosed according to the Asian Working Group for Sarcopenia (AWGS) cut-offs and three other established criteria for GS weakness. POD was assessed with the Confusion Assessment Method for the Intensive Care Unit. Receiver operating characteristic (ROC) curves were used to evaluate the sensitivity and specificity of low GS in predicting POD. Logistic regression analysis was performed to identify variables independently associated with POD.
Results
GS was measured in 150 patients with a median age of 73 years. POD was diagnosed in 17 patients (11.3%), 13 of whom had low GS. Among the four sarcopenia criteria, the AWGS showed the highest area under the ROC curve (0.796). Of the variables analyzed—including age, American Society of Anesthesiologists class, body weight, intraoperative opioid use, and postoperative pain—only low GS was identified as an independent predictor of POD (odds ratio: 15.543, P < 0.001).
Conclusions
GS is a simple, reliable measure that may serve as an independent predictor of POD in elderly undergoing lower extremity surgery.
INTRODUCTION
Postoperative delirium (POD) is a common and serious complication in elderly surgical patients, with reported incidence ranging from 13–50% depending on the diagnostic criteria used [1]. POD is associated with cognitive dysfunction, increased morbidity and mortality, prolonged hospital stays, and increased medical expenses [2,3]. Early identification and management of high-risk patients are therefore critical. Among the various risk factors, frailty and sarcopenia have emerged as important predictors of POD [1,4]. Frailty, defined as a decline in physiological reserve and functional capacity, is particularly relevant in older adults and is closely associated with sarcopenia, the age-related loss of muscle mass and strength. As a predisposing factor, frailty is considered a major contributor to POD [4].
Fried et al. defined frailty using five phenotypic criteria: unintentional weight loss (shrinking), weakness (low grip strength), poor endurance and energy, slowness, and reduced physical activity levels. Frailty is identified when at least three of these five criteria are present [5]. However, assessing slowness and physical activity often requires measuring walking speed or documenting recent activity history, which may not be feasible in patients with limited mobility (such as those awaiting lower extremity surgery). Furthermore, poor endurance and energy are based on subjective self-reports (e.g., the Center for Epidemiologic Studies Depression Scale), introducing additional complexity to frailty assessment.
Grip strength (GS) is a widely recognized indicator of muscle strength and function and offers a convenient, noninvasive method to assess sarcopenia and the degree of associated frailty [4]. Reduced GS is associated with increased postoperative complications, longer hospital stays, functional limitations, and disability [6]. Sarcopenia, in turn, is closely related to frailty, a condition of declining physiological reserves that negatively impacts the prognosis of elderly patients and increases the risk of POD [7].
We therefore hypothesized that GS alone may offer a practical and objective predictor of POD in elderly patients with impaired mobility undergoing lower extremity surgery. While previous studies have evaluated frailty or sarcopenia as predictors of POD, few have explored the relationship between POD and GS alone in this patient group. To our knowledge, no prior study has specifically focused on patients undergoing lower extremity surgery—a population with inherently limited mobility, in whom comprehensive frailty assessments are often challenging.
Given that GS is a simple and reliable marker of muscle strength, this study aims to investigate whether GS can independently predict POD in this population. The primary objective is to determine whether a simple measurement of GS can predict POD in older patients with impaired mobility due to lower extremity dysfunction. Additionally, given the varying definitions of low GS across different criteria, we assessed the most appropriate diagnostic criterion for reduced GS for the Korean population.
MATERIALS AND METHODS
This prospective observational study was approved by the Institutional Review Board (IRB) of Korea University (IRB No.: 2021GR0133) and conducted at Korea University Guro Hospital between March 2021 and December 2022. The trial was retrospectively registered in the University Hospital Medical Information Network (UMIN) Clinical Trials Registry (UMIN000057442).
Study population
Patients aged ≥ 65 years who were scheduled for lower extremity surgery under general anesthesia and had no preoperative cognitive dysfunction were eligible for inclusion. Exclusion criteria were hemodynamic instability, intellectual disability, or refusal to participate. Written informed consent was obtained from all participants before surgery. Preoperative cognitive function was screened using the animal verbal fluency test; patients who named fewer than seven animals were considered to have a Clinical Dementia Rating score of 1 and were excluded [8,9].
Grip strength measurement
GS was measured by two trained physicians in the preoperative waiting room following a standardized procedure based on the Southampton protocol [10]. A mechanical Jamar Hydraulic Hand Dynamometer (Performance Health Supply) was used. Each patient was instructed to compress the dynamometer handle three times using the right hand at 1-minute intervals, followed by three measurements with the left hand. Although the Southampton protocol recommends a seated position, this was not feasible owing to the patients’ lower extremity conditions; therefore, measurements were taken in a semi-recumbent position. The highest value from either hand was used for analysis.
Diagnostic criteria for GS weakness vary across definitions of sarcopenia (Table 1). In this study, we considered four commonly used criteria: the Asian Working Group for Sarcopenia (AWGS) [11], the European Working Group on Sarcopenia in Older People 2 (EWGSOP2) [6,12], the Korea National Health and Nutrition Examination Survey (KNHANES) [13], and the Simpler Modified Fried Frailty Scale (SMFFS), which adjusts GS thresholds according to body mass index (BMI) [14,15]. We primarily applied AWGS criteria but compared all four sets of diagnostic cut-offs.
Anesthesia management
Upon entering the operating room, standard monitoring was applied, including electrocardiography, pulse oximetry, an esophageal temperature probe, noninvasive blood pressure monitoring, neuromuscular transmission monitoring, and processed electroencephalography.
Anesthesia was induced with propofol (1–2 mg/kg) and rocuronium (0.6 mg/kg). Maintenance was achieved using desflurane, targeting a Bispectral Index of 40–60 or a Patient State Index of 25–50. Intraoperative analgesia was provided with remifentanil, titrated according to hemodynamic changes at the anesthesiologist’s discretion. At the end of surgery, a bolus of fentanyl was administered as preemptive analgesia upon discontinuation of remifentanil. The use (yes/no) and total doses of intraoperative remifentanil and fentanyl were recorded.
At the end of surgery, the inhalation agent was discontinued, and extubation was performed when the patient exhibited spontaneous breathing and eye-opening. Patients were then transferred to the post-anesthesia care unit (PACU), where discharge was determined according to the Aldrete score. Intravenous patient-controlled analgesia was initiated after recovery of consciousness, typically when patients achieved a full score in the consciousness domain of the Aldrete system. If patients reported significant pain (visual analogue scale ≥ 5), additional analgesics such as fentanyl or ketorolac were administered at the discretion of the PACU physician.
Regional analgesia, which could reduce opioid consumption (a known risk factor for POD), was intentionally excluded from anesthetic management in this study. Postoperative pain management followed a standardized protocol, consisting primarily of non-opioid intravenous medications such as nonsteroidal anti-inflammatory drugs or acetaminophen (if not contraindicated), administered at 4–6 hour intervals in addition to patient-controlled analgesia.
Postoperative delirium and outcomes assessment
Before PACU discharge, the Confusion Assessment Method for the Intensive Care Unit (CAM-ICU) was used to evaluate the presence of POD [16]. CAM-ICU assessments were conducted once daily for up to five consecutive days during the hospital stay by a trained research nurse blinded to GS results. To enhance consistency and minimize variability owing to the fluctuating nature of delirium, assessments were performed at a fixed time each day, typically before noon.
The primary endpoint was the incidence of POD. Secondary endpoints included length of hospital stay, 30-day discharge rate, and in-hospital mortality.
Statistical analyses
Sample size estimation was based on a retrospective pilot study at Korea University Guro Hospital, which reported a 20% incidence of POD in elderly patients undergoing lower extremity surgery. Previous literature, including a review by Inouye et al. [1], reported a POD incidence of 13–50% following orthopedic surgery, while a prospective cohort study by Gleason et al. [17] reported a 24% incidence following elective surgery in elderly patients. We conservatively estimated a 20% POD incidence and assumed that reduced GS would increase the risk with an odds ratio (OR) of 2, as reported by Arita et al. [18]. Based on these assumptions, a sample size of 150 was calculated as sufficient using a two-sided chi-square test with an alpha of 0.05 and a power of 0.8.
Patients were allocated into two groups based on the presence or absence of GS weakness, as defined by the AWGS criteria. The incidence of POD was compared between the groups. Statistical analyses were performed using IBM SPSS Statistics software (ver. 20.0, IBM Co.). Continuous variables were analyzed using Student’s t-test or the Mann–Whitney U test, depending on data normality. Categorical variables, including the presence or absence of complications in the PACU or ward, were analyzed using the chi-square test.
To assess the diagnostic value of GS, receiver operating characteristic (ROC) curve analysis was performed. The performance of the four criteria for low GS was further evaluated using diagnostic metrics, including the area under the curve (AUC), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). Pairwise comparisons of AUCs between the AWGS criteria and each of the other models were performed using DeLong’s test to determine whether observed differences in discriminatory ability were statistically significant.
Logistic regression analyses were conducted to evaluate associations between POD and various variables, including age, sex, height, weight, American Society of Anesthesiologists physical status (ASA PS) classification, BMI, type of surgery, duration of anesthesia, verbal fluency test results, intraoperative fentanyl and remifentanil use, PACU pain scores, use of PACU analgesics, and GS. For multivariate modeling, two approaches were used: in Model 1, GS was entered as a binary variable based on the AWGS cut-off (presence or absence of sarcopenia), whereas in Model 2, GS was entered as a continuous variable (kg) to allow direct interpretation and exploration of potential risk factors beyond dichotomized sarcopenia status. These two models were compared to assess whether binary classification using cut-offs and continuous GS measurement yielded different covariates and predictive performance for POD.
In both models, variables with a P value < 0.1 in univariate analysis were entered into multivariate logistic regression using a stepwise forward selection method. Predictive performance for each model was assessed using the optimal cut-off determined based on the Youden Index.
Data are presented as mean ± standard deviation, median (1Q, 3Q), or number of patients (%). A two-tailed P value < 0.05 was considered statistically significant.
RESULTS
A total of 150 patients with a median age of 73 years (IQR: 69, 78) were enrolled in the study (Fig. 1). Based on the AWGS criteria, patients were categorized into two groups: normal GS (n = 113) and low GS (n = 37) (Table 2). The low GS group was older (median 74.5 vs. 73 years; P = 0.026), had a higher proportion of ASA PS ≥ III (P = 0.008), and scored lower on the animal verbal fluency test (P = 0.007). Other baseline characteristics, including sex, BMI, duration of surgery, intraoperative opioid use, and PACU pain scores, were comparable between the groups.
The incidence of POD was significantly higher in the low GS group (36.1%) than in the normal GS group (3.5%) (P < 0.001). Overall, 17 patients (11.3%) developed POD (Table 3). No significant differences were observed between groups in length of hospital stay, 30-day discharge rate, or inpatient mortality.
To evaluate the diagnostic utility of different GS weakness definitions, ROC curve analysis was performed for all four criteria (Fig. 2). The AUC and 95% confidence intervals (CIs) were calculated for each model. The AWGS model demonstrated the highest diagnostic performance, with an AUC of 0.796 (95% CI: 0.673–0.919; P < 0.001). The SMFFS model yielded an AUC of 0.739, while the EWGSOP2 and KNHANES models each had AUCs of 0.719.
Prediction of delirium across four grip strength models for sarcopenia. AUC and 95% CIs for EWGSOP2, AWGS, KNHANES, and SMFFS were 0.719 (95% CI, 0.574–0.864, P = 0.003), 0.796 (0.673–0.919, P < 0.001), 0.719 (0.574–0.864, P = 0.003), and 0.739 (0.614–0.865, P = 0.001), respectively. Pairwise comparison of AUCs between AWGS and the other models using DeLong’s test showed no significant differences (P > 0.638 for all). AUC: area under the curve, CI: confidence interval, EWGSOP2: European Working Group on Sarcopenia in Older People 2, AWGS: Asian Working Group for Sarcopenia, KNHANES: Korea National Health and Nutrition Examination Survey, SMFFS: Simpler Modified Fried Frailty Scale.
In addition to AUC, diagnostic performance was further evaluated using sensitivity, specificity, PPV, NPV, and likelihood ratios (Supplementary Table 1). The AWGS model showed the most balanced performance, with the highest sensitivity (0.765), NPV (0.965), and positive likelihood ratio (4.422), supporting its potential value in identifying patients at risk for POD. However, none of the pairwise AUC comparisons (using DeLong’s test) demonstrated a statistically significant difference between AWGS and the other models (all P > 0.638; Supplementary Table 2), suggesting that although AWGS may have marginally better diagnostic performance, the overall discriminative ability was comparable across models.
In multivariate logistic regression, Model 1 (GS as a binary variable per AWGS) retained only low GS as a significant predictor of POD (OR 15.543; 95% CI: 4.647–51.989; P < 0.001) (Table 4). Model 2 (GS as a continuous variable) retained ASA PS ≥ III, intraoperative use of remifentanil, and GS value; for each 1-kg increase in GS, the odds of POD decreased by 9.4% (OR 0.906; 95% CI: 0.839–0.979) (Table 5). The AUC, sensitivity, and NPV of the two models were compared using ROC curves, and the optimal cut-off for each model was determined based on the Youden Index. Model 2 showed marginally higher AUC (0.821 vs. 0.796), sensitivity (0.824 vs. 0.765), and NPV (0.974 vs. 0.965) compared with Model 1 (Supplementary Table 3). However, these findings based on the Youden Index were descriptive, and no formal statistical test confirmed a significant difference.
DISCUSSION
In this study, 13 of the 17 patients with POD had low GS. The incidence of POD was 36.1% in the low GS group, significantly higher than the 3.5% observed in the normal GS group, indicating a strong association between low GS and the development of POD. These findings support the use of GS as an independent marker to stratify delirium risk preoperatively, enabling targeted perioperative strategies and resource allocation in high-risk individuals.
The proportion of elderly patients undergoing surgery is steadily increasing, and the risk of complications increases with age, leading to disability, loss of independence, reduced quality of life, and increased healthcare costs [19,20]. Identifying risk factors for adverse outcomes in older adults is therefore essential. Among these, POD a common and clinically significant complication, with an incidence ranging from 13–50% in non-cardiac surgeries, depending on the type of surgery and diagnostic criteria used [1,17,18].
In our study of elderly patients undergoing orthopedic surgery, POD occurred in 17 patients (11.3%), a marginally lower rate than previously reported [1,17,18]. Notably, the median ages in the two key reference studies [17,18], which informed our study design, were 77 and 80 years, respectively—higher than in our cohort. This age difference, along with heterogeneity in surgical and perioperative contexts, likely contributed to the relatively lower POD rate observed. Interestingly, a recent study [21] in an Asian population with a mean age of approximately 70.6 years reported a POD incidence of only 5.2% following total hip or knee arthroplasty, suggesting that population characteristics, surgical context, and baseline frailty may all influence observed POD rates.
Although the observed incidence of POD (11.3%) was lower than the 20% assumed for sample size estimation, the strong association observed (OR 15.543) suggests that the study retained sufficient statistical power to detect meaningful differences. This large effect size may have compensated for the lower event rate, supporting the robustness of our findings. However, the CI for the OR was relatively wide (15.543; 95% CI: 4.647–51.989), likely due to the limited number of POD cases. A larger study population would be needed in future investigations to provide more precise risk estimates and narrower CIs.
Several prior studies have reported a direct relationship between GS and POD [18,22,23], particularly in Asian populations (Japan and China). Arita et al. [18] retrospectively studied patients with colorectal cancer and found that older age, lower Mini-Mental State Examination scores, higher Geriatric Depression Scale scores, and low GS were significant independent predictors of POD. Their ROC analysis suggested optimal GS cut-off values of 21.8 kg for men and 15.4 kg for women, which are lower than AWGS thresholds (28 kg for men, 18 kg for women) applied in our study. In contrast, Kotani et al. [22] used the revised Japanese Cardiovascular Health Study criteria and observed a 37% incidence of POD in patients undergoing cardiovascular surgery with cardiopulmonary bypass. They reported a strong association between low GS and POD, with an OR of 4.58 (95% CI: 1.57–13.2). A prospective study by Qian et al. [23], closely aligned with our design, examined POD following arthroplasty and found an incidence of 14.36%. Their ROC analysis identified GS thresholds of 22.05 kg for men and 18.05 kg for women, with the threshold for men being marginally lower than that of the AWGS. While previous studies primarily categorized patients by the presence or absence of delirium, our study stratified patients based on GS status, focusing on risk prediction. Despite differences in study design and populations, these findings consistently demonstrate an association between reduced GS and increased POD risk, supporting GS as a simple and effective screening tool to identify high-risk patients for early preventive strategies.
Low GS is considered a marker of general physical decline and frailty, both of which are associated with vulnerability to delirium. The mechanisms underlying this association may involve shared pathways linking sarcopenia to cognitive impairment, including systemic inflammation, reduced physical activity, and neurodegenerative changes [18,24]. In our study, patients with low GS were older, had higher ASA PS classifications, and scored lower on cognitive screening, further supporting the link between frailty and POD risk.
GS is widely used as a diagnostic indicator of sarcopenia, although thresholds vary according to guideline. In this study, we evaluated four different criteria, and the AWGS definition (< 28 kg for men, < 18 kg for women) demonstrated the highest predictive value, with an AUC of 0.796. This finding aligns with that of previous studies in Asian populations, which often apply similar GS thresholds unless data-driven cut-offs are calculated [18,22,23]. Although AWGS had the highest AUC, SMFFS (0.739), EWGSOP2 (0.719), and KNHANES (0.719) also showed acceptable predictive performance, and DeLong’s test revealed no significant difference between AWGS and the other models. Thus, in clinical environments where rapid screening is needed and precision may be less critical, these criteria may provide useful guidance.
We also developed two logistic regression models to assess whether different approaches to GS—binary classification using the AWGS cut-off (Model 1) and continuous GS measurement (Model 2)—would influence covariate selection and predictive performance. Both models identified GS as an important predictor, but Model 2 additionally retained ASA PS classification and intraoperative remifentanil use, suggesting that modeling choices can affect variable selection. Ultimately, the goal is to proactively identify and reduce the risk of POD, regardless of the criterion applied.
This study has several limitations. First, because of patients’ lower extremity conditions, we could not strictly follow the Southampton protocol, which recommends GS measurement in a seated position; measurements were instead taken in a semi-recumbent position. Nevertheless, our findings suggest that semi-recumbent measurements can provide clinically meaningful results, offering preliminary evidence for this adapted method in mobility-limited populations. Second, although our analysis focused on preoperative GS, POD is multifactorial and influenced by numerous perioperative and postoperative variables. Not accounting for additional factors such as sleep disturbance, nutritional status, or postoperative medications may have limited the comprehensiveness of our model. Third, while the OR was large and statistically significant, the relatively small number of POD cases contributed to the wide CI, potentially affecting reliability. Fourth, although our study focused on lower extremity surgery, the cohort included a variety of procedures. Postoperative pain management in the ward was determined by each orthopedic surgeon’s protocol, introducing some heterogeneity in analgesic strategies.
In conclusion, preoperative GS is a simple, objective, and independent predictor of POD. In elderly patients undergoing lower extremity orthopedic surgery in the Korean population, the AWGS criteria appear marginally better for identifying sarcopenia-related delirium risk, although its discriminative ability was comparable to other criteria. Routine pre-anesthetic GS assessment may facilitate early risk stratification and targeted interventions to reduce POD incidence.
SUPPLEMENTARY MATERIALS
Supplementary data is available at https://doi.org/10.17085/apm.25254.
Diagnostic Performance of GS Criteria for Predicting POD Based on CAM-ICU Assessment
Pairwise Comparison of AUCs Using DeLong’s Test.
Each Comparison is Between Two Grip Strength Criteria (AWGS, EWGSOP2, KNHANES, Fried), Showing the AUC for Each Criterion, the Z Statistic for their Difference, and the P value. The First Three Comparisons (AWGS vs. EWGSOP2, AWGS vs. KNHANES, AWGS vs. Fried) are as Previously Reported, and the Remaining Three (EWGSOP2 vs. KNHANES, EWGSOP2 vs. Fried, KNHANES vs. Fried) are Newly Calculated from the Provided Dataset
Predictive Performance Summary (with Youden Index Cutoff for Model 1 and 2)
Notes
FUNDING
This work was supported by grant No. KSGAP-2020-002 from the Korean Society of Geriatric Anesthesia and Pain.
CONFLICTS OF INTEREST
No potential conflict of interest relevant to this article was reported.
DATA AVAILABILITY STATEMENT
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
AUTHOR CONTRIBUTIONS
Conceptualization: Seok Kyeong Oh. Funding acquisition: Seok Kyeong Oh. Supervision: Seok Kyeong Oh, Young Sung Kim. Data curation: Hyo Sung Kim, Hyun Ah Lee. Formal analysis: Seok Kyeong Oh, Young Sung Kim. Writing - original draft: Hyo Sung Kim, Seok Kyeong Oh. Writing - review & editing: Hyo Sung Kim, Young Sung Kim, Hyun Ah Lee, Seok Kyeong Oh.
