Critical Appraisal of 

“Frailty screening in the emergency department: a consensus study”

 (Hubbard et al., 2023) Using the CREDES Framework

Study Summary and Key Findings

Purpose: Hubbard et al. conducted an international Delphi consensus study to identify the core requirements of an ideal frailty screening tool in the emergency department (ED). This was motivated by uncertainty and variability in how to best screen for frailty in ED settings, especially given the diverse professionals involved in geriatric emergency care. The Delphi method was chosen to gather expert agreement on what features a frailty screening instrument should have for effective use in busy EDs.

Main Findings: After two rounds of surveys with a multidisciplinary expert panel, the study reached consensus on 19 key statements defining an optimal ED frailty screening approach . These consensus recommendations emphasize practical and feasible screening in the ED environment. In summary, the panel agreed that an ideal ED frailty screening instrument should have the following characteristics:

  • Very brief and quick to administer: Ideally taking less than 5 minutes to complete . This addresses ED time pressures and resource limitations.
  • Multi-dimensional assessment: It should cover multiple domains of frailty (physical, cognitive, psychological, social), and crucially reflect the patient’s baseline function in the 2–4 weeks prior to the acute illness . This ensures the screen identifies pre-existing frailty rather than transient illness effects.
  • Early application in the patient’s ED stay: Frailty screening should be initiated promptly at the point of first contact (such as triage or initial assessment) and be completed within ~4 hours of arrival. Early identification can streamline comprehensive geriatric assessment (CGA) and tailored interventions during the ED visit.
  • Focus on feasibility and cost-effectiveness: The panel prioritized a screening tool that is practical for routine use over a theoretically “perfect” but impractical tool . In other words, ease of use and minimal burden on staff/resources were deemed more important than maximal accuracy in an ideal scenario.
  • Targeted outcomes: The screening should effectively identify older patients at high risk of adverse outcomes (e.g. functional decline, readmission) so that appropriate geriatric interventions can be arranged . However, the aim is to raise early awareness and prompt further assessment, rather than to definitively predict long-term outcomes in the ED itself .

Practical Recommendations: For ED clinicians and administrators in Australia, New Zealand, and similar healthcare settings, these findings translate into actionable guidance. Hospitals should consider implementing a frailty screening process for older patients in the ED that meets the above criteria – for example, using a short frailty checklist or score that can be completed by ED staff soon after arrival. The consensus indicates that any chosen tool must be fast, simple, and usable within existing ED workflows, even if it sacrifices some complexity, to ensure uptake . Additionally, the study highlights ongoing uncertainties about the feasibility, efficacy, and cost-effectiveness of routine frailty screening . This suggests that while screening is recommended, it should be accompanied by evaluation and training to address staff concerns and to integrate the process smoothly into ED practice.

Title and Aim Clarity

The title of the article – “Core requirements of frailty screening in the emergency department: an international Delphi consensus study” – is clear and informative. It immediately conveys what was studied (frailty screening in the ED), how it was studied (Delphi consensus method), and the broad scope (core requirements, international panel). This adheres to good reporting practice by ensuring readers understand the study’s nature from the title alone. The aim of the study is explicitly stated in the introduction: the objective was “to identify the core requirements of an ideal frailty screening instrument [for ED], exploring important principles, practices and logistics to facilitate accurate and timely screening of frailty in ED”. This aim is clearly defined and appropriately scoped – it matches what the Delphi process addresses and what is reported in the results (a set of consensus statements about frailty screening requirements). According to Delphi study reporting guidelines, the study’s purpose should be unambiguous and justify using a Delphi method. Hubbard et al. meet this criterion by plainly articulating their goal and linking it to the need for expert consensus on a complex clinical question. In summary, the title and aim are highly clear and well-aligned with the content, giving readers a precise expectation of the article’s focus.

Rationale for Using the Delphi Method

The authors provide a solid rationale for employing a Delphi consensus technique. In the background, they note that many practical questions about frailty screening in the ED remain unanswered and require input from a broad range of experts. They explicitly state that “the Delphi consensus technique is a well-established approach to answering complex research questions by attaining a consensus view across subject and context experts”. This explanation aligns with the recommended practice that researchers justify the choice of Delphi for the problem at hand. By citing previous successful uses of Delphi methods in defining frailty-related concepts and assessments, the authors demonstrate that this approach is appropriate for their study’s aim. In essence, the Delphi method was chosen to synthesise expert opinion in a systematic way because no single evidence-based answer existed for what the “ideal” ED frailty screen should entail. The rationale is clear: only a consensus process could bridge the knowledge gaps and diverse perspectives (geriatricians, emergency physicians, nurses, etc.) on frailty screening. This justification is well-founded and explicitly conveyed, fulfilling the CREDES expectation that the choice of the Delphi technique is appropriate for the research aim.

Expert Panel Composition and Selection

Panel make-up: The Delphi panel consisted of experts from multiple disciplines and countries, reflecting the interdisciplinary nature of geriatric emergency care. 39 experts from 10 countries were invited; ultimately, 37 participated in Round 1. The panel included emergency physicians (clinical and academic), geriatricians, nursing leaders, allied health and social care professionals, frailty researchers, and public health experts. This diverse composition was intentional “to ensure that the Delphi group reflected the diversity of frailty domains and the interprofessional collaborative nature of Geriatric Emergency Medicine (GEM)”. Such breadth of expertise is a strength, as it incorporates perspectives from those who would use or be affected by frailty screening in the ED (e.g. doctors, nurses, allied health). Notably, the panel had a slight majority of participants from Europe (56%) but also significant representation from North America and a few from Asia and Australia. Over 60% of the experts were female, and a range of professional roles was included, which added to the panel’s representativeness.

Selection criteria: The authors were transparent about how panellists were chosen, satisfying the CREDES recommendation for clear expert recruitment criteria. Participants were selected based on their “professional expertise relating to frailty in acute care settings,” including having recent (last 5 years) peer-reviewed publications in this area or membership in relevant international frailty or emergency medicine associations. The panel comprised recognised leaders or active researchers in frailty and acute care. A core steering group of three researchers identified and invited these experts. This method of targeted recruitment is typical for Delphi studies, and it was justified by the need for knowledgeable contributors. All panel members gave informed consent to participate.

Appraisal: The composition appears appropriate and well-justified: the study drew on a credible pool of experts with diverse and relevant backgrounds. One consideration, as acknowledged by the authors, is that the panel was predominantly from Western countries (Europe, North America, and Australia) and included relatively fewer professionals from nursing and allied health compared to physicians. This could limit the perspective (e.g. underrepresented frontline nursing views or non-Western healthcare contexts). Nonetheless, within the Australian and New Zealand emergency care context, the panel’s makeup (including Australian experts and geriatric emergency specialists) lends confidence that the findings apply to our setting. Overall, the study satisfies the expected standard for panel selection by clearly reporting who the experts were and why they were chosen while also ensuring a mix of expertise relevant to the topic.

Recruitment and Attrition Reporting

The recruitment process and participant flow through the Delphi rounds are well documented, which enhances the study’s transparency. Initially, 39 experts were invited to take part in the e-Delphi. The response was excellent: 37 out of 39 (95%) agreed and completed the first round. This high uptake indicates strong engagement from the invited experts and likely reflects the topic’s relevance to them. There was some attrition between Round 1 and Round 2, with 32 experts completing Round 2, corresponding to an 86% retention from the previous round. The authors explicitly report these numbers (37 in Round 1; 32 in Round 2) and even provide a response rate for Round 2. Such reporting is consistent with good Delphi conduct, as it is important to track if significant dropout could bias the results. In this case, the attrition was modest (five dropouts), and the Round 2 response rate of 86% is still strong, suggesting minimal risk of non-response bias affecting the consensus.

Importantly, the article provides a table of participant characteristics (Table 1) and notes that those characteristics pertain to “all 37 invited participants” . This implies that even the few who did not complete the process were similar in profile to those who did, although it’s not explicitly stated if the dropouts differed in any way. The reasons for attrition are not detailed (Delphi studies seldom can report this unless participants gave feedback on why they withdrew), but the high retention makes this a minor concern.

By documenting the recruitment (initial invitations and inclusion criteria) and round-by-round participation, Hubbard et al. meet the expectation of transparent reporting. Decision-makers reading this can see that the consensus isn’t based on a shrinking, self-selected subgroup but rather the large majority of the originally invited experts, lending credibility to the stability of the consensus.

Consensus Definition and Achievement

A key quality criterion for Delphi studies is having a pre-specified consensus definition. Hubbard et al. clearly defined their consensus threshold at the outset: they required ≥80% of the panel to rate a statement as “agree” or “strongly agree” for it to be accepted as a consensus agreement. Statements not meeting this 80% agreement level were “automatically excluded” from further consideration. This threshold is relatively stringent (many Delphi studies use 70% or 75%). Still, the authors cite literature suggesting that a high threshold ensures stronger consensus and that consensus tends to increase after feedback once a ~75% agreement is seen. Indeed, they deliberately chose a higher cutoff to ensure only clear agreements were retained. Defining this criterion a priori and justifying it aligns with CREDES guidance for transparency in determining consensus.

Achievement of consensus: By the end of the two Delphi rounds, the panel had achieved consensus on 19 statements (out of 35 that were rated in Round 2). In Round 1, 13 of 56 initial statements (23%) met the 80% agreement threshold (7 accepted outright and 6 after revisions or merging). The remaining statements were either rejected or revised for the next round. After Round 2, an additional set of statements reached consensus, bringing the total to 19 final agreed statements (54% of those considered in Round 2). These 19 statements in the article’s Table 2 constitute the study’s consensus-based “core requirements” for ED frailty screening. The authors also transparently report the statements that were not accepted (with their disagreement percentages) in Table 3. This level of detail demonstrates how consensus evolved and which ideas did not gain agreement, which is valuable information.

It is evident how consensus was attained: any statement falling below the 80% agreement level did not carry forward (except those specifically revised for a second consideration), and only those meeting the threshold in the final round are presented as recommendations. Notably, no statement was forced into consensus; a number of items remained without consensus (indicating areas of ongoing uncertainty). This honest reporting – including that no universal 100% agreement on any statement was achieved and that some topics had to be dropped – gives confidence that the authors did not arbitrarily cherry-pick or inflate the consensus. In summary, the consensus definition was appropriate and predefined, and the outcome of the Delphi process (19 consensus statements) is clearly delineated, fulfilling the criteria for rigour in consensus development.

Delphi Process and Iteration

The study followed a modified two-round Delphi process, and the methodology is described in depth, allowing readers to understand the iterative progression. As recommended by CREDES, the authors provide a comprehensible description of each stage of the Delphi, including preparatory steps, survey rounds, and data processing. A Delphi process summary flow chart is included in Figure 1 of the paper, which visually outlines the steps from initial statement generation to the conclusion of rounds.

Number of rounds: Two rounds of surveys were conducted. Round 1 took place over two weeks in August–September 2021, and Round 2 over two weeks in October 2021. The decision to limit to two rounds classifies it as a “modified” Delphi; the authors do not report any additional rounds or a final face-to-face meeting. This appears to have been sufficient to reach a stable set of consensus items, as the second round solidified which statements met the 80% agreement threshold. (The absence of further rounds or a consensus meeting is discussed as a limitation, addressed later.)

Feedback and iteration: After Round 1, participants were provided with structured feedback – specifically, tabulated results showing group responses and a summary of qualitative comments. The researchers took the feedback from Round 1 and used it to refine the statements for Round 2 in several ways: some statements that had high agreement were edited for clarity per participant suggestions, new statements were added (22 new items) based on ideas raised in open-text comments, some statements that failed to reach consensus were revised and retained for a second vote, and a few overlapping statements that individually met criteria were merged into single statements. This process was systematic. For example, they merged redundancies and clarified wording so that Round 2 would present a refined list of 35 statements (down from 56) for reconsideration. Each participant thus had the opportunity in Round 2 to re-rate statements with knowledge of the group’s Round 1 results and the incorporated feedback – a hallmark of Delphi methodology that allows opinions to converge.

The study also notes using an online survey platform (SurveyMonkey™) to distribute the rounds and that reminders were sent to improve response rates. The time given for each round (2 weeks) and the steps taken to process data between rounds are explicitly stated. Such details make the Delphi process highly transparent and replicable. Moreover, the authors present the results of each round separately (including how many statements were accepted or rejected per round), which aligns with best practice recommendations to report the evolution of consensus across rounds. Overall, the Delphi process in this study was conducted and structured with careful iteration based on participant input, demonstrating methodological rigour and fidelity to Delphi principles.

Ethical Considerations

Ethical oversight and considerations were appropriately addressed for this Delphi study. Although a Delphi survey of experts might be considered low-risk, the researchers still obtained institutional ethical approval from the University College Cork Clinical Research Ethics Committee prior to commencing the study. They also secured informed consent from all participants, which is important given that expert panellists are human research participants, and their time/opinions constitute the data. By doing so, the study adheres to ethical standards for human research, even when participants are professionals rather than patients.

The Delphi design was also anonymised (panellists’ individual responses were not identified in the feedback). The anonymity of responses is mentioned as a way to “reduce bias” and minimise dominant voices. This is an ethical and procedural consideration: it encourages experts to express their honest opinions without pressure. It helps prevent any single expert from coercing others – in line with the Delphi ethos of equal weighting of each participant’s input.

There is no direct mention in the paper of conflicts of interest among the expert panel or researchers, but given the context, it’s likely the authors would have disclosed any if present. (To our knowledge, if any panellists had commercial interests in a frailty tool, that would need acknowledgement – none is reported.) The study also registered the preliminary systematic review on PROSPERO, which, while not an ethical requirement, indicates a commitment to transparency and rigour from the outset.

In summary, Hubbard et al. accounted for ethical considerations by obtaining ethics approval and consent and by designing the Delphi process to be fair and unbiased (anonymous feedback). This provides assurance that the consensus was developed responsibly and with respect for the contributors, aligning with general expectations for ethical research conduct.

Reporting Transparency and Replicability

The article excels in transparent reporting, making it relatively easy for readers to appraise the study and replicate a similar process. Key details of the study design, execution, and analysis are fully disclosed:

  • Preparatory Work: The authors describe a pre-Delphi systematic literature review that informed the initial pool of statements. They even supply the PROSPERO registration number for that review. From the review, they used reflexive thematic analysis to derive 56 candidate statements answering the question, “What are the core requirements of frailty screening in the ED?”. This level of detail about how the Round 1 statements were generated (including mention of using Braun and Clarke’s thematic analysis framework and involving an external expert to review the themes ) is commendable. It means the foundation of the Delphi – the statements – was based on systematically gathered evidence rather than arbitrary guesses, and others could scrutinise or reproduce the process.
  • Methodology Description: All aspects of the Delphi execution are reported: number of rounds, timing, response scales (5-point Likert from “Strongly agree” to “Strongly disagree”), the platform used, and the inclusion of free-text feedback opportunities. The decision rules for progressing or dropping statements (the 80% consensus rule) are plainly stated. Using a flow chart (Figure 1) to illustrate these stages meets the recommendation to include a visual “procedure” summary. Together, these allow the reader to follow exactly how the study was conducted and to identify any potential deviations from standard Delphi methods (none of the significance was apparent).
  • Results Availability: The paper provides comprehensive results. Each Delphi round’s outcome is summarised, and the final consensus statements are all listed (Table 2), as are the rejected statements (Table 3). By giving the full list of what was agreed and what was not, the authors ensure that the guidance arising from this Delphi is completely visible and traceable. Anyone reading the paper can see the consensus statements forming the basis of recommendations for ED frailty screening. This satisfies the idea that the “resulting guidance should be clearly identifiable from the publication”. Moreover, it aids replicability: future researchers or guideline developers could use these 19 statements as a starting point or compare them to their own setting.
  • Replicability: If another team wanted to replicate or build on this study (say, conducting a similar Delphi in another region or a few years later), the information provided would allow it. The participant selection criteria, the threshold for agreement, the thematic categories (principles, logistics, domains), and even the survey instrument (56 initial statements given in the Supplementary Appendix concept map) are essentially documented. The transparency in reporting aligns with CREDES and other consensus reporting guidelines, which emphasise complete and clear methodological descriptions for reproducibility.

In short, the study’s reporting is thorough and transparent. There is little ambiguity about what was done or how conclusions were drawn. This transparency not only builds trust in the findings but also means the work can inform practice and further research in a reliable way.

Limitations and Potential Biases

The authors acknowledge several limitations of their study, and it’s important to consider these when interpreting the results. They also reflect on possible biases, in line with CREDES guidance that calls for discussing study limitations and their impact. Key limitations and biases include:

  • Panel Representativeness: Despite international participation, the panel skewed toward experts from Western countries (Europe, North America, Australia) and had limited input from Asia, Africa, or other regions. There were also fewer nurses and allied health professionals within the panel than physicians. This means the consensus might under-represent viewpoints from those regions or professional groups. For instance, ED nurses in Australia/NZ – often implementing frailty screening – had a smaller voice in the panel, which could influence certain practical considerations. The authors note that some perspectives “may not have been included” due to this composition and suggest that further validation with a more diverse panel would be beneficial.
  • Anchoring Bias of Pre-defined Statements: The Delphi statements in Round 1 were derived from prior research rather than generated de novo by the experts. The study team gave the panel a set of “pre-written” statements to rate. This approach can introduce an anchoring effect – experts’ thinking might be influenced or constrained by the wording and content of those initial statements. The authors explicitly recognise this as a limitation: because they did not start with an open-ended first round, participants’ responses were anchored to the presented items. They did try to mitigate this by allowing free-text comments and adding new statements suggested by the panel in Round 2. Still, the risk remains that some novel ideas were not captured if they weren’t in the initial literature-derived list.
  • No Final Consensus Meeting: The study did not include a face-to-face or virtual consensus meeting after the survey rounds. Some Delphi methodologies incorporate a meeting to discuss and resolve any remaining disagreements or to finalise wording. The authors cite literature that a majority view can unduly influence minorities in face-to-face settings and that anonymity helps focus on content. Thus, they opted to rely solely on questionnaire rounds. They admit that holding a final meeting “may have changed the nature of the consensus” – possibly it could have resolved some borderline cases or, conversely, led to the dominance of certain opinions. Without a meeting, the consensus statements stand purely on the survey results. This isn’t a fatal flaw, but it means any nuanced differences were not debated in real-time; some Delphi experts consider at least an opportunity for synchronous discussion as useful, especially for complex or contentious items.
  • Other biases: The Delphi process itself carries the inherent limitation that consensus does not equal truth. As noted in methodological guidance, consensus reflects agreement, not necessarily empirical correctness. For example, the experts agreed that screening is worthwhile and should be done early, but there is still “uncertainty concerning [its] efficacy… and cost-effectiveness”. The study provides expert opinions on what should be done, but those opinions should ideally be validated by outcomes research in EDs. Another potential bias is selection bias: experts who were invited and chose to participate might be especially passionate or optimistic about frailty screening (given many are authors of frailty research). This could bias the results toward favouring frailty screening. The flip side is that if any invited experts were sceptical and did not participate, their dissenting views wouldn’t be captured. The high response rate mitigates this somewhat, but it’s worth noting that consensus guidelines might appear more uniformly positive due to such self-selection.

The authors’  “Strengths and limitations” section highlights the high response rate, extensive feedback as strengths, and the above points as limitations. They suggest that the consensus statements will require further validation and evaluation in practice and possibly with different panels. This is a prudent caveat: before EDs universally adopt these 19 requirements, piloting them in real-world settings (like some Australian or New Zealand EDs) would be wise to confirm they indeed improve frailty identification and outcomes.

While the study is methodologically robust, decision-makers should remain aware of these limitations. The consensus is strong among the experts involved. Still, it may not account for every perspective, and some agreed-upon “requirements” might face challenges in implementation (e.g. resource constraints or the need for staff training, which the panel did identify as barriers). Recognising biases and limitations helps temper overconfidence and indicates where further work or local adaptation might be needed.

Conclusion and Implications for Practice

This critical appraisal finds that the study by Hubbard et al. (2023) was conducted and reported with high rigour, adhering closely to the CREDES framework for Delphi studies. Each domain was addressed satisfactorily, from a clearly stated aim, justified use of Delphi, careful expert recruitment, predefined consensus criteria, thorough iteration and feedback, ethical conduct, and transparent reporting. The authors openly acknowledged the minor methodological limitations (panel composition biases, use of pre-drafted statements, and lack of a final meeting), which are common challenges in consensus studies but not severe flaws. Therefore, we can have considerable confidence in the quality of this research and the credibility of its outputs.

For decision-makers in emergency care (including those in Australia and New Zealand), the Delphi-derived consensus statements provide valuable, expert-vetted guidance on implementing frailty screening in EDs. The results suggest that if an ED is to introduce frailty screening for older patients, it should choose or design a tool that is short, simple, and fits within the first few minutes of patient contact. It should capture a patient’s recent baseline health status, and positive screens should trigger comprehensive geriatric assessment or other interventions without delaying acute care. These sensible and actionable recommendations align with the practical realities of ED operations (e.g., the 4-hour target for ED length of stay in Australia/NZ).

The overall quality of the research means that health administrators and clinical leaders can place a high degree of trust in these consensus recommendations. However, they should also be mindful that consensus guidelines are a form of expert opinion. As such, implementing them should go hand-in-hand with local audits and outcomes monitoring. In practice, an ED in Australia or New Zealand that adopts frailty screening per this study’s guidance should evaluate its impact on patient care and flow, contributing back to the evidence base. In conclusion, Hubbard et al.’s work offers a strong, evidence-informed foundation on which to build frailty screening practices in the ED, and decision-makers can be confident in using these core requirements as a blueprint – while remaining attentive to the need for continued validation and adaptation to their specific context.

Sources:

  • Hubbard RE et al. (2023/2024). Frailty screening in the emergency department: a consensus study (Age & Ageing). [Full text retrieved] .
  • CREDES: Checklist for Conducting and REporting DElphi Studies .

Leave a comment

Trending