IJPTRS

INTRODUCTION:

Hip pain is a typical presentation in primary care and patients of various ages may present with hip discomfort. According to study, 14.3% of adults 60 years and older reported significant hip pain.¹ Hip pain is a frequent problem that typically affects the groin, upper thigh, or buttock. The most frequent diagnosis in children and adolescents are apophyseal avulsion, avascular femoral head necrosis, and slipped capital femoral epiphysis. Femoral head fractures, common muscle and tendon sprains, and bursitis are all possible outcomes of trauma. At any age, septic or inflammatory arthritis might develop. Real emergency instances include septic arthritis, fractures, and acute epiphyseal sliding. Hip joint congenital abnormalities can cause labral tears and develop into early osteoarthritis.²^,³

According to research by the National Arthritis Data Workgroup, based on information from the first National Health and Nutrition Examination Survey (NHANES I), adult women are more likely than adult men to have hip osteoarthritis (OA), with 0.5% reporting moderate or severe symptoms. The prevalence of symptomatic hip OA increased with age in both sexes up to the age of 80, after which it began to somewhat fall. Women were more likely than men to have this condition. The incidence of symptomatic hip OA in senior women was 239 per 100,000 person-years between the ages of 60 and 69, 583 per 100,000 person-years between the ages of 70 and 79, and 441 per 100,000 person-years for women older than 80 years in a study of patients in a health maintenance organization.² Degenerative OA and fractures in older persons should be taken into account first.⁴ A clinical guideline on the assessment of hip pain was developed by the American Academy of Orthopaedic Surgeons. This recommendation, while helpful, only addresses three diagnoses—inflammatory arthritis, osteoarthritis, and avascular necrosis—and does not address the numerous additional reasons of hip pain that may be brought to the attention of a primary care physician.⁴ It is important to inquire about any referred pain as well as the duration, location, severity, and characteristics of the pain.⁵^,⁶ A physical examination remains one of the most powerful diagnostic tools available to physicians, even though a wide range of disorders can affect the hip and its surrounding tissues.⁷

When one or more clinicians administering the test can arrive at the same conclusion following each administration, the test is considered trustworthy. The formal formula for reliability is subject variability divided by measurement error and subject variability.⁸ Reliability of test methods is also essential. Results from physical examinations must be interpreted by highly reliable specialists in order for them to be therapeutically relevant. The approval of their application for creating management plans should wait until they have this proof. If inter-tester reliability is low, management choices made following the physical examination are based on incorrect-assessments. The validity of physical examination tests used to examine the hip was examined in this systematic study.⁹

No study has attempted to investigate the reliability of different types of physical examination procedures commonly used for Hip problems. Thus, the aim of the review was to evaluate the reliability of different types of examination procedures used in the assessment of Hip problems.

METHODOLOGY

Searches of MEDLINE (January 1933 to August 2023), PEDro, AMED, the Wiley Library (2020) and CINAHL (1982 to August 2023) were conducted using the terms in Table 1, grouped into three subject areas: hip problems, reliability and physical examination.

Box 1: Search terms

Hip problems	Arthritis of the hip, Femoroacetabular Impingement (FAI), Hip bursitis, Hip dislocation, Hip dysplasia, Hip fracture, Osteoarthritis, Rheumatoid arthritis, bursitis, tendons, instability and hip.
Reliability	Reliability, reproducibility, inter examiner, inter-examiner, inter tester, inter-tester, inter observer, inter-observer, intra tester, intra-tester, kappa, intra class correlation, intra-class correlation, ICC.
Physical examination	Assessment, physical examination, physical tests, clinical examination, manual examination.

Terms from distinct subject areas (hip problems, reliability, and physical examination) were then searched using Boolean logic (OR) after first being individually searched. Finally, Boolean logic (AND) was used to join these three search areas. The reference lists of the papers that the electronic search turned up were also manually searched. The following requirements had to be met by included articles:

The study involved physical examination procedures used in hip examination.
The study involved human subjects with any sort of hip pain.
The study had to be an intra- and/or inter-examiner reliability design.
The study had to be available in English.
The study did not involve a mechanical device, but simple tape measures were accepted as commonly available.
The study did not involve subjects with non-musculoskeletal conditions.
The study did not involve asymptomatic volunteers alone, although studies including a mix of symptomatic and asymptomatic participants were included.

All the relevant abstracts were screened, some were discarded that were clearly not relevant; who take decision on relevant studies. Studies went through to the next stage if they were clearly reliability studies involving the hip examination or if there was not enough detail in the abstract to determine if this was the case. Full studies were then obtained. At least two times reviewer judged each paper, and decisions was reached. Discussion and clarification about the criteria checklist and items for data extraction was made by two reviewers together. The pairs of reviewers recorded kappa values of 0.79 and 0.86, which was deemed acceptable. ³⁵

Criteria checklist

For evaluating the standard of reliability studies, there are no set or generally accepted criteria. A previously developed criterion checklist with three categories—study population, test technique, and test results—was created.¹⁰Recent reviews have modified this set of criteria.^11-13,35 The weighted criteria from a prior systematic review were applied to evaluate study quality. The standards are listed in footnote of Table 2. A study was deemed to be of higher quality if it received over 60%, as reported in a prior systematic review, and these trials are indicated in bold. The highest score was 100 points.¹¹^,³⁵

Screening of abstract by two researchers. (N = 82)

Screening of full article. (N = 33)

13 relevant and full articles reviewed.

Potentially relevant citation after hand and electronic search. (N = 196) Screening of abstract

DATA ANALYSIS

With nominal data, kappa is employed as the dependability coefficient; for ordinal data, weighted kappa; and for continuous data, the Bland-Altman test or intra-class correlation coefficient (ICC).¹⁴^,³⁵ The numerical values for kappa and ICC range from 0.00 to 1.00. Kappa was understood to mean: Poor or little agreement is from 0.00 to 0.20; fair agreement ranges from 0.21 to 0.40; moderate agreement ranges from 0.41 to 0.60; significant agreement ranges from 0.61 to 0.80; and excellent agreement ranges from 0.81 to 1.00.^14-16^,³⁵

Similar to ICC, the reliability increases as the value gets closer to 1.00. According to this, reliability ranges from 0.40 to 0.75, which is considered fair to good, to >0.90, which is considered exceptional.¹⁷ Higher values have also been suggested, particularly when examining individuals as opposed to groups; coefficients of 0.85 or 0.90 are reasonable.¹⁶^-¹⁹ Due to the heterogeneity of the tests, patients, analyses and since direct comparison of reliability studies was deemed unsuitable,¹⁴ Data were combined using the qualitative levels of evidence approach, which was developed from van Tulder et al., as shown in Table 1.²⁰^,³⁵

Level of evidence
Strong	Consistent findings from three or more high-quality studies
Moderate	Consistent findings from at least one high-quality study and a number of low-quality studies
Limited	Consistent findings in one or more low-quality studies
Conflicting	Inconsistent findings irrespective of study quality
Inconsistent	findings irrespective of study quality
No evidence	No studies found

Table 1 Levels of evidence

RESULT

After reviewing complete papers, many of the studies that were initially found using the search approach were found to be unreliable. Finally, the search approach turned up 13 studies that satisfied the inclusion criteria and were included. Numerous physical examination techniques were looked upon. 5 studies were rated as being of excellent quality (scoring >60%) with a mean quality score of 55.15% (Table 2).

1, adequate description of study population (0/4); 2, representative of clinical practice (0/4); 3, subjects selected randomly or consecutively (0/7); 4, number of subjects (25 = 3, >50 = 6; >75 or sample size calculation); 5, procedure clearly described and reproducible (0/5); 6, procedure executed in uniform manner (0/5); 7, adequate measures to reduce bias (0/10); 8, adequate description of examiners (0/10); 9, consensus procedure prior to testing or pilot study (0/5); 10, more than one pair of examiners tested (0/10); 11, multiple testing between examiners; 12, standardised measure of test outcome (0/5); 13, frequencies of outcome and agreement reported (0/10); 14, appropriate inferential statistics and measure of variance (0/10). Bold results indicate >60%/high quality.

STUDY	1	2	3	4	5	6	7	8	9	10	11	12	13	14	Total
Wilson 2014⁴	1	4	0	0	5	5	0	5	0	0	0	5	5	5	35
Jolanda 2008²¹	3	4	7	0	5	5	10	5	0	0	5	5	10	10	69
Charles,2013²²	3	4	0	0	5	5	0	5	0	5	0	5	10	5	47
Robroy l., 2015²³	2	4	0	6	5	5	10	5	0	0	5	5	10	5	62
Hananouchi T,2012²⁴	4	4	0	6	5	5	0	5	0	0	0	5	10	5	49
Reese NB, 2003²⁵	4	4	0	6	5	5	5	10	0	10	2	5	5	5	66
Melchione,1993²⁶	2	4	0	0	5	5	5	5	0	10	5	5	5	5	56
Haskel, 2020²⁷	2	4	0	10	5	5	0	10	0	10	0	0	5	0	51
J. Peeler,2007²⁸	4	4	0	6	5	5	0	10	5	0	5	5	10	5	64
Jason D., 2008²⁹	4	4	0	6	5	5	5	10	5	0	5	5	10	10	74
Troelsen A,2009³⁰	4	4	0	0	5	5	0	5	0	0	0	5	10	5	43
Nogier A, 2010³¹	2	0	0	10	0	0	0	10	0	0	0	5	5	0	32
Wakefield CB,2015³²	4	2	0	0	5	5	5	5	0	0	5	5	10	10	56
St-pierre, 2020³³	4	4	0	0	5	5	0	5	0	0	5	5	10	5	48

PROBLEM

REFRENCES

STATISTICS

VARIANCE

Hip arthritis

Gait

External rotation gait

Muscle strength

Trendelenburg test

Leg length discrepancy

Hip pain

Patrick Test

Log roll test

Internal rotation movement

Range of motion

Thomas test (hip flexion)

Obers test (IT band tightness)

Jolanda,2008²¹

Jolanda,2008

0.50(PABAK)

0.50

0.80 (Reliability coefficient)

0.78-0.80(PABAK)

0.60-0.52

0.88

0.80

Iliopsoas tenderness

Groin pain

Snapping hip

Pain with resisted SLR

Haskel, 2020³⁴

Haskel, 2020

sensitivity 100%

specificity 7%

specificity 82%,

sensitivity 24%

sensitivity 62%,

specificity 25%

Femoral acetabulum impingement

Log roll test

FABER test

Hip IR pain

Posterior impingement test

FADIR

Anterior impingement test

Straight leg raises

Charles,2013²²

Robroy l., 2015²³

Wilson JJ., 2014⁴

Charles,2013²²

Robroy l., 2015²³

Wilson JJ., 2014⁴

Charles,2013²²

Charles,2013

Wilson JJ., 2014

Charles,2013

Robroy l., 2015

Wilson JJ., 2014

0.99

0.61 (0.41-0.8)

Sensitivity - 56%

0.84

0.63 (0.43-0.8)

Sensitivity 96% to 99%.

0.84

0.81

0.78

sensitivity - 88%

0.76

0.58 (0.29-0.8)

Sensitivity – 30%

Trochanteric tenderness

Robroy l., 2015

0.66 (0.48-0.8)

Labral Lesion

Anterior impingement test

Impingement test

FABER test

FADIR test

Hananouchi T, 2012²⁴

Hananouchi T, 2012

Troelsen A,2009³⁰

Wilson JJ., 2014⁴

Sensitivity 53.1

Specificity 81.9

Positive predictive value 92.9

Negative prediction value 27.1

Sensitivity 59%

Specificity 100%

Positive predictive 100%

Negative predictive 13%

Sensitivity 41%

Specificity 100%

Positive predictive value 100%

Negative prediction value 9%

Sensitivity – 88%

Sensitivity is 96% to 75%

IT Band

Ober test

The range of motion of the hip (with the Ober test)

The range of motion of the hip (with the modified Ober test)

Reese NB, 2003²⁵

William E.,1993²⁶

Reese NB, 2003²⁵

Reese NB, 2003

Intrarater reliability 0.90

Intrarater reliability 0.91

Intrarater reliability 0.94

Interrater reliability 0.73

18.9° ± 7.6°

23.4° ± 7.0°

Rectus femoris muscle

Thomas test

Modified Thomas test

Goniometry technique

Trigonometry technique

Peeler,2008²⁹

Peeler J,2007²⁸

Wakefield CB,2015³²

Wakefield CB,2015

Reliability

intrarater = 0.40,

interrater = 0.33,

intrarater = 0.67,

interrater =0.50,

Reliability

intra-rater ℜ=0.52,

inter-rater ℜ=0.60

Reliability

Intrarater - 0.51 and 0.54

Interrater - 0.65 and 0.30

Reliability

Intrarater were 0.90 and 0.95

Interrater were 0.91 and 0.94

Bland-Altman plots

Difference between test and retest scoring is 21.2 6 18 (average of 3 examiners) or between 219.26 and 16.86.

ANOVA

= 19

Coefficient of Variance, %

37,34

25,25

Femoroacetabular Impingement

Pain predominantly in flexion/internal rotation (%)

Pain exclusively in flexion/internal rotation (%)

Pain-free flexion amplitude influenced by internal rotation (%)

Nogier A,2010³¹

Nogier A,2010

Sensitivity 70 Specificity 44

PPV 63

NPV 53

Sensitivity 20

Specificity 86

PPV 67

NPV 44

Sensitivity 51

Specificity 67

PPV 67

NPV 51

Hip ROM

Hip IRROM

flexion-adduction-internal rotation (FADIR)

flexion-abduction-external rotation-extension (FABER) test

hip internal rotation in 90° of hip flexion (HIP IR)

FADIR

St-Pierre, 2020³³

St-Pierre, 2020

Margo,2003³⁶

St-Pierre, 2020³³

Reliability - 0.83 [0.53-0.94]

Reliability - 0.75 [0.60-0.89]

Reliability - 0.71 [0.42-0.87])

FABER ROM Reliability

0.62 [0.27- 0.83])

Reliability FABER variables - ROM: 0.58 [0.32-0.79]

Sensitive - 88%

Reliability - 0.72 [0.51- 0.87]

Reliability- 0.57 [0.32-0.78]

DISCUSSION

This article reviewed 14 studies and evaluated the Hip pain physical assessment test reliability for mobility testing and pain provocation tests. Despite our best efforts to collect all available papers, it is still possible that the authors missed unpublished studies whose findings could differ from those in this study (publication bias). Indian author article was shortlisted because of unavailability of full text article, articles were not mentioned.

Five studies were concluded to be excellent quality research by quality measurement, scores 60% or above, only three studies concluded to be moderate quality research and six studies concluded to be limited quality research study.

Through this systematic review we discuss condition and related test with their reliability and specificity and sensitivity for a particular condition. For example – for rectus femoris muscle goniometer method shows moderate reliability and trigonometry method shows maximum reliability.³² There are so many already stabilised special test like - Ober’s test for IT band tightness and it shows reliability of 90%.²⁶

VALIDITY AND RELIABILITY

Although several systematic studies of the reliability of these tests have been published²¹^-³⁶, the authors are aware of no prior systematic reviews of hip physical examination protocols. These showed how many tests generally have low levels of sensitivity and/or specificity. In reality, this evaluation discovered that the reliability data were inconclusive for all tests, even those for which these reviews found some data.

RELIABILITY OF PHYSICAL EXAMINATION PROCEDURES IN GENERAL

Numerous systematic reviews of physical examination techniques for the hip joint have been published.²¹^-³³^,³⁵^-³⁶ The findings of some of these investigations, which identified generally low levels of reliability , were quite comparable to the findings of the present review. However, some of the reviews also found that treatments that relied on symptom response rather than movement or palpation had higher levels of dependability. Technique based reliability also stated in many studies which shows higher reliability. The current review did not discover that methods based on symptom response were more trustworthy.

LIMITATION

The methodical nature, the use of multiple reviewers, the studies that included patients with symptoms, and the application of a high threshold for reliability are the merits of the current review. The outcomes, like with all systematic reviews, rely on the papers that were included. Some of the quality-scoring criteria for the studies were a bit ambiguous, which led to some discussion among the reviewers; for six publications, the final decision was dependent on an among two reviewers. These have to do with the prevalence and bias indices, where a high prevalence or homogeneous population would deflate coefficient values and a strong observer expectation bias would inflate values, and vice versa.

CONCLUSION

In conclusion, there have been 13 reliability studies looking into physical examination techniques used to evaluate patients with hip pain. Their trustworthiness was disputed by the evidence, and the majority of them failed to meet the standards for acceptable reliability. Making diagnosis based on these approaches is an erroneous and inconsistent process.

ACKNOWLEDGEMENT

I would like to express my sincere gratitude to Dr. Jyoti Kataria for their invaluable contribution to project. Their guidance and support played a crucial role in shaping the project's direction and improving the quality of the publication.

I am also deeply thankful to my family and batchmates for their contribution. Their contribution significantly enhanced the publication.

I am truly fortunate to have had the support of such dedicated individuals throughout this [project/publication]. Their expertise and encouragement have been instrumental in its success.

Thank you all for your unwavering support and guidance.

REFERENCES

Manuscript Fulltext

Get In Touch