Psychometric properties of self-report measures of eating disorder cognitions: a systematic review

Hatoum, Amaani H.; Burton, Amy L.; Berry, Sophie L.; Abbott, Maree J.

doi:10.1186/s40337-023-00947-0

Journal of Eating Disorders

Table 1 Criteria of quality of psychometric properties [87]

From: Psychometric properties of self-report measures of eating disorder cognitions: a systematic review

Property	Definition	Criteria of adequacy
Content validity	The degree to which the content of an instrument is an adequate reflection of the construct to be measured	(+) A clear description is provided of the measurement aim, the target population, the concepts that are being measured, and the item selection AND target population and experts were involved in item selection (?) A clear description of above-mentioned aspects is lacking OR only target population involved OR doubtful design or method (−) No target population involvement (0) No information found on target population and experts’ involvement
Internal consistency	The degree which items are intercorrelated, thus measuring the same construct	(+) Factor analyses performed on adequate sample size (7 times the number of items) AND Cronbach’s alpha(s) or McDonald’s omega(s) between 0.70 and 0.95 for each scale (?) Cronbach’s alphas or McDonald’s omega(s) presented without factor analysis considered OR doubtful design or method (−) Cronbach’s alpha(s) or McDonald’s omega(s) < 0.70 or > 0.95 (0) No information found on internal consistency
Criterion validity	The degree to which the scores of an instrument are an adequate reflection of a ‘gold standard’	(+) Convincing arguments that gold standard is ‘‘gold’’ AND correlation with gold standard ≥ 0.70 (?) ≥ 0.70 correlation presented without convincing arguments that gold standard is ‘‘gold’’ OR doubtful design or method (−) Correlation with gold standard < 0.70 (0) No information found on criterion validity
Construct validity	The degree to which scores on a particular questionnaire relate (or are unrelated) to other measures in a manner that is consistent with theoretically derived hypotheses concerning the concepts that are being measured	(+) Explicitly tested for AND at least 75% of the results are in expected direction and size (e.g., reporting the correlation between two measures in the expected direction, or the expected lack of correlation) (?) Doubtful design or method (e.g., not explicitly tested) (−) Less than 75% of results as expected (0) No information found on construct validity
Reproducibility Agreement (test–retest)	The extent to which the scores on repeated measures are close to each other (absolute measurement error)	(+) Test–retest agreement r > .70 AND means and standard deviations are presented at both time points (?) > 0.70 correlation presented without means and standard deviations at both time points OR doubtful design or method (−) Test–retest agreement r < .70 (0) No information found on test–retest reliability
Reliability	The extent to which patients can be distinguished from each other, despite measurement errors (relative measurement error)	(+) T tests ICC or weighted Kappa > 0.70 (?) Doubtful design or method (e.g., time interval not mentioned or less valid measure then a Kappa used) (−) ICC or weighted Kappa < 0.70; (0) No information found on reliability
Responsiveness	The ability of an instrument to detect clinically important changes over time in the construct to be measured	(+) Treatment program outlined, and longitudinal expected changes presented AND > 75% of results are as expected OR RR > 1.96 OR AUC > 0.70 (?) Doubtful design or method (−) RR < 1.96 OR AUC < 0.70 (0) No information found on responsiveness
Floor and ceiling effects	The number of respondents who achieved the lowest or highest possible score	(+) < 15% of the respondents achieved the highest or lowest possible scores (?) Doubtful design or method (−) > 15% of the respondents achieved the highest or lowest possible scores (0) No information found on floor and ceiling effects
Interpretability	Degree to which one can assign qualitative meaning to an instrument’s quantitative scores or change in scores	(+) Mean and SD scores presented for at least four relevant subgroups of patients^c (?) Doubtful design or method (e.g., data provided on less than four subgroups) (0) No information found on interpretation

Adaptations made to supplement ‘Minimal important change’ (MIC): Criterion 5.1 (Reproducibility—Agreement) modified such that test–retest reliability is sufficient to receive a positive score. Criterion 7 (Responsiveness) modified such that MIC not utilised. Criterion 9 (Interpretability) modified such that MIC not needed to be defined for a positive score
SDC smallest detectable change, LOA limits of agreement, ICC Intraclass correlation, AUC area under the receiver operating characteristics curve, RR Responsiveness Ratio, SD standard deviation
^a+ = positive rating; ? = indeterminate rating;—= negative rating; 0 = no information available
^bDoubtful design or method = lacking a clear description of the design or methods of the study, sample size smaller than 50 subjects, or any important methodological weakness in the design or execution of the study
^cTerwee et al. (2007) have used the term ‘patients’ in this table given the original application of these criteria to medical populations. More recently, the quality criteria have been employed to assess measures relevant to a variety of populations, including clinical, non-clinical, and normative samples. Despite not including medical samples in the present review, we have retained the term ‘patients’ here in order to present Terwee et al.’s original criteria
Printed with permission from the publisher: Elsevier

Back to article page

ISSN: 2050-2974

Contact us

General enquiries: journalsubmissions@springernature.com