Item Property Reason for Change or Deletion Clarity or relevance Response range Variability Reproducibility Inter-item correlation Ability to detect change Item discrimination Redundancy Recall period • Reported as not relevant by a large segment of the target population • Generates an unacceptably large amount of missing data points • Generates many questions or requests for clarification from patients as they complete the PRO instrument • Patients interpret items and responses in a way that is inconsistent with the PRO instrument’s conceptual framework • A high percent of patients respond at the floor (response scale’s worst end) or ceiling (response scale’s optimal end) • Patients note that none of the response choices applies to them • Distribution of item responses is highly skewed • All patients give the same answer (i.e., no variance) • Most patients choose only one response choice • Differences among patients are not detected when important differences are known • Unstable scores over time when there is no logical reason for variation from one assessment to the next • Item highly correlated (redundant) withother items in the same concept of interest • Item is not sensitive (i.e., does not change when there is a known change in the concepts of interest) • Item is highly correlated with measures of concepts other than the one it is intended to measure • Item does not show variability in relation to some known population characteristics (i.e., severity level, classification of condition, or other known characteristic) • Item duplicates information collected with other items that have equal or better measurement properties • The population, disease state, or application of the instrument can affect the appropriateness of the recall period
Measurement Property Type What Is Assessed? FDA Review Considerations Reliability Validity Ability to detect change Content validity Construct validity Inter-interviewer reliability (for interviewer-administered PROs only) Test-retest or intra- interviewer reliability (for interviewer-administered PROs only) Internal consistency Stability of scores over time when no change is expected in the concept of interest • Intraclass correlation coefficient • Time period of assessment • Extent to which items comprising a scale measure the same concept • Intercorrelation of items that contribute to a score • Internal consistency • Cronbach’s alpha for summary scores • Item-total correlations Agreement among responses when the PRO is administered by two or more different interviewers • Interclass correlation coefficient Evidence that the instrument measures the concept of interest including evidence from qualitative studies that the items and domains of an instrument are appropriate and comprehensive relative to its intended measurement concept, population, and use. Testing other measurement properties will not replace or rectify problems with content validity. • Derivation of all items • Qualitative interview schedule • Interview or focus group transcripts • Items derived from the transcripts • Composition of patients used to develop content • Cognitive interview transcripts to evaluate patient understanding Evidence that relationships among items, domains, and concepts conform to a priori hypotheses concerning logical relationships that should exist with measures of related concepts or scores produced in similar or diverse patient groups • Strength of correlation testing a priori hypotheses (discriminant and convergent validity) • Degree to which the PRO instrument can distinguish among groups hypothesized a priori to be different (known groups validity) Evidence that a PRO instrument can identify differences in scores over time in individuals or groups (similar to those in the clinical trials) who have changed with respect to the measurement concept • Within person change over time • Effect size statistic