Item Property

Reason for Change or Deletion

Clarity or relevance

Response range

Variability

Reproducibility

Inter-item correlation

Ability to detect change

Item discrimination

Redundancy

Recall period

• Reported as not relevant by a large segment of the target population • Generates an unacceptably large amount of missing data points • Generates many questions or requests for clarification from patients as they complete the PRO instrument • Patients interpret items and responses in a way that is inconsistent with the PRO instrument’s conceptual framework

• A high percent of patients respond at the floor (response scale’s worst end) or ceiling (response scale’s optimal end) • Patients note that none of the response choices applies to them • Distribution of item responses is highly skewed

• All patients give the same answer (i.e., no variance) • Most patients choose only one response choice • Differences among patients are not detected when important differences are known

• Unstable scores over time when there is no logical reason for variation from one assessment to the next

• Item highly correlated (redundant) withother items in the same concept of interest

• Item is not sensitive (i.e., does not change when there is a known change in the concepts of interest)

• Item is highly correlated with measures of concepts other than the one it is intended to measure • Item does not show variability in relation to some known population characteristics (i.e., severity level, classification of condition, or other known characteristic)

• Item duplicates information collected with other items that have equal or better measurement properties

• The population, disease state, or application of the instrument can affect the appropriateness of the recall period

Measurement Property

Type

What Is Assessed?

FDA Review Considerations

Reliability

Validity

Ability to detect change

Content validity

Construct validity

Inter-interviewer reliability (for interviewer-administered PROs only)

Test-retest or intra- interviewer reliability (for interviewer-administered PROs only)

Internal consistency

Stability of scores over time when no change is expected in the concept of interest

• Intraclass correlation coefficient • Time period of assessment

• Extent to which items comprising a scale measure the same concept • Intercorrelation of items that contribute to a score • Internal consistency

• Cronbach’s alpha for summary scores • Item-total correlations

Agreement among responses when the PRO is administered by two or more different interviewers

• Interclass correlation coefficient

Evidence that the instrument measures the concept of interest including evidence from qualitative studies that the items and domains of an instrument are appropriate and comprehensive relative to its intended measurement concept, population, and use. Testing other measurement properties will not replace or rectify problems with content validity.

• Derivation of all items • Qualitative interview schedule • Interview or focus group transcripts • Items derived from the transcripts • Composition of patients used to develop content • Cognitive interview transcripts to evaluate patient understanding

Evidence that relationships among items, domains, and concepts conform to a priori hypotheses concerning logical relationships that should exist with measures of related concepts or scores produced in similar or diverse patient groups

• Strength of correlation testing a priori hypotheses (discriminant and convergent validity) • Degree to which the PRO instrument can distinguish among groups hypothesized a priori to be different (known groups validity)

Evidence that a PRO instrument can identify differences in scores over time in individuals or groups (similar to those in the clinical trials) who have changed with respect to the measurement concept

• Within person change over time • Effect size statistic