Skip to main content

Table 1 Criteria of quality of psychometric properties [87]

From: Psychometric properties of self-report measures of eating disorder cognitions: a systematic review

Property

Definition

Criteria of adequacy

Content validity

The degree to which the content of an instrument is an adequate reflection of the construct to be measured

(+) A clear description is provided of the measurement aim, the target population, the concepts that are being measured, and the item selection AND target population and experts were involved in item selection

(?) A clear description of above-mentioned aspects is lacking OR only target population involved OR doubtful design or method

(−) No target population involvement

(0) No information found on target population and experts’ involvement

Internal consistency

The degree which items are intercorrelated, thus measuring the same construct

(+) Factor analyses performed on adequate sample size (7 times the number of items)

AND Cronbach’s alpha(s) or McDonald’s omega(s) between 0.70 and 0.95 for each scale

(?) Cronbach’s alphas or McDonald’s omega(s) presented without factor analysis considered OR doubtful design or method

(−) Cronbach’s alpha(s) or McDonald’s omega(s) < 0.70 or > 0.95

(0) No information found on internal consistency

Criterion validity

The degree to which the scores of an instrument are an adequate reflection of a ‘gold standard’

(+) Convincing arguments that gold standard is ‘‘gold’’ AND correlation

with gold standard ≥ 0.70

(?) ≥ 0.70 correlation presented without convincing arguments that gold standard is ‘‘gold’’ OR doubtful design or method

(−) Correlation with gold standard < 0.70

(0) No information found on criterion validity

Construct validity

The degree to which scores on a particular questionnaire relate (or are unrelated) to other measures in a manner that is consistent with theoretically derived hypotheses concerning the concepts that are being measured

(+) Explicitly tested for AND at least 75% of the results are in expected direction and size (e.g., reporting the correlation between two measures in the expected direction, or the expected lack of correlation)

(?) Doubtful design or method (e.g., not explicitly tested)

(−) Less than 75% of results as expected

(0) No information found on construct validity

Reproducibility Agreement (test–retest)

The extent to which the scores on repeated measures are close to each other (absolute measurement error)

(+) Test–retest agreement r > .70 AND means and standard deviations are

presented at both time points

(?) > 0.70 correlation presented without means and standard deviations at both

time points OR doubtful design or method

(−) Test–retest agreement r < .70

(0) No information found on test–retest reliability

Reliability

The extent to which patients can be distinguished from each other, despite measurement errors (relative measurement error)

(+) T tests ICC or weighted Kappa > 0.70

(?) Doubtful design or method (e.g., time interval not mentioned or less valid measure then a Kappa used)

(−) ICC or weighted Kappa < 0.70;

(0) No information found on reliability

Responsiveness

The ability of an instrument to detect clinically important changes over time in the construct to be measured

(+) Treatment program outlined, and longitudinal expected changes presented AND > 75% of results are as expected OR RR > 1.96 OR AUC > 0.70

(?) Doubtful design or method

(−) RR < 1.96 OR AUC < 0.70

(0) No information found on responsiveness

Floor and ceiling effects

The number of respondents who achieved the lowest or highest possible score

(+) < 15% of the respondents achieved the highest or lowest possible scores

(?) Doubtful design or method

(−) > 15% of the respondents achieved the highest or lowest possible scores

(0) No information found on floor and ceiling effects

Interpretability

Degree to which one can assign qualitative meaning to an instrument’s quantitative scores or change in scores

(+) Mean and SD scores presented for at least four relevant subgroups of patientsc

(?) Doubtful design or method (e.g., data provided on less than four subgroups)

(0) No information found on interpretation

  1. Adaptations made to supplement ‘Minimal important change’ (MIC): Criterion 5.1 (Reproducibility—Agreement) modified such that test–retest reliability is sufficient to receive a positive score. Criterion 7 (Responsiveness) modified such that MIC not utilised. Criterion 9 (Interpretability) modified such that MIC not needed to be defined for a positive score
  2. SDC smallest detectable change, LOA limits of agreement, ICC Intraclass correlation, AUC area under the receiver operating characteristics curve, RR Responsiveness Ratio, SD standard deviation
  3. a+  = positive rating; ? = indeterminate rating;—= negative rating; 0 = no information available
  4. bDoubtful design or method = lacking a clear description of the design or methods of the study, sample size smaller than 50 subjects, or any important methodological weakness in the design or execution of the study
  5. cTerwee et al. (2007) have used the term ‘patients’ in this table given the original application of these criteria to medical populations. More recently, the quality criteria have been employed to assess measures relevant to a variety of populations, including clinical, non-clinical, and normative samples. Despite not including medical samples in the present review, we have retained the term ‘patients’ here in order to present Terwee et al.’s original criteria
  6. Printed with permission from the publisher: Elsevier