|
Item Property
|
Reason for Change or Deletion
|
Clarity or relevance
|
Response range
|
Variability
|
Reproducibility
|
Inter-item correlation
|
Ability to detect change
|
Item discrimination
|
Redundancy
|
Recall period
|
• Reported as not relevant by a large segment of the target population
• Generates an unacceptably large amount of missing data points
• Generates many questions or requests for clarification from patients as they
complete the PRO instrument
• Patients interpret items and responses in a way that is inconsistent with the
PRO instrument’s conceptual framework
|
• A high percent of patients respond at the floor (response scale’s worst end)
or ceiling (response scale’s optimal end)
• Patients note that none of the response choices applies to them
• Distribution of item responses is highly skewed
|
• All patients give the same answer (i.e., no variance)
• Most patients choose only one response choice
• Differences among patients are not detected when important differences are
known
|
• Unstable scores over time when there is no logical reason for variation from
one assessment to the next
|
• Item highly correlated (redundant) withother items in the same concept of
interest
|
• Item is not sensitive (i.e., does not change when there is a known change in
the concepts of interest)
|
• Item is highly correlated with measures of concepts other than the one it is
intended to measure
• Item does not show variability in relation to some known population
characteristics (i.e., severity level, classification of condition, or other known
characteristic)
|
• Item duplicates information collected with other items that have equal or
better measurement properties
|
• The population, disease state, or application of the instrument can affect the
appropriateness of the recall period
|
|
Measurement
Property
|
Type
|
What Is Assessed?
|
FDA Review Considerations
|
Reliability
|
Validity
|
Ability to detect
change
|
Content validity
|
Construct validity
|
Inter-interviewer reliability
(for interviewer-administered
PROs only)
|
Test-retest or intra-
interviewer reliability (for
interviewer-administered
PROs only)
|
Internal consistency
|
Stability of scores over time when no change
is expected in the concept of interest
|
• Intraclass correlation coefficient
• Time period of assessment
|
• Extent to which items comprising a scale
measure the same concept
• Intercorrelation of items that contribute
to a score
• Internal consistency
|
• Cronbach’s alpha for summary scores
• Item-total correlations
|
Agreement among responses when the PRO
is administered by two or more different
interviewers
|
• Interclass correlation coefficient
|
Evidence that the instrument measures the
concept of interest including evidence from
qualitative studies that the items and domains
of an instrument are appropriate and
comprehensive relative to its intended
measurement concept, population, and use.
Testing other measurement properties will
not replace or rectify problems with content
validity.
|
• Derivation of all items
• Qualitative interview schedule
• Interview or focus group transcripts
• Items derived from the transcripts
• Composition of patients used to develop content
• Cognitive interview transcripts to evaluate patient
understanding
|
Evidence that relationships among items,
domains, and concepts conform to a priori
hypotheses concerning logical relationships
that should exist with measures of related
concepts or scores produced in similar or
diverse patient groups
|
• Strength of correlation testing a priori hypotheses
(discriminant and convergent validity)
• Degree to which the PRO instrument can distinguish
among groups hypothesized a priori to be different
(known groups validity)
|
Evidence that a PRO instrument can identify
differences in scores over time in individuals
or groups (similar to those in the clinical
trials) who have changed with respect to the
measurement concept
|
• Within person change over time
• Effect size statistic
|