High-level estimation methodology

MIT NEWDIGS FoCUS undertook a pipeline analysis to determine the scale of the financing challenge of durable, potentially curative therapies in the United States.  The figure below illustrates the basic methodology employed.

Schematic of Therapy Launch and Patient Number Estimation
Figure: Schematic of Therapy Launch and Patient Number Estimation

Therapies were identified primarily by utilizing appropriate therapeutic classes and modalities as search criteria in the citeline™ Pharmaprojects™ database.  Further therapies were found in the clinicaltrials.gov database using a combination of natural language processing and manual searches and extraction.  Clinical trials registered on clinicaltrials.gov were identified where possible for all identified therapies.  Only interventional trials with a known status and Phase were included.  In addition to these therapies, a sample of gene therapies in advaned preclinical investigation was identified from the citeline™ Pharmaprojects™ database for inclusion in our analysis.

Data were then filtered for cell and gene therapy on the basis of the plasmid DNA, viral vectors, human gene editing technology, and patient-derived cellular gene therapy products. Qualifying therapies were those falling into the modalities below, and which produce or promise to produce durable effects beyond 18 months from treatment:

Qualifying therapies were those falling into the modalities below, and which produce or promise to produce durable effects beyond 18 months from treatment:

  • Gene replacement therapies both in vivo and ex vivo using viral vectors
  • T-cell receptors (TCRs) and immune cells engineered to incorporate chimeric antigen receptors (CARs)
  • Gene editing therapies:
    • Zinc finger nucleases (ZFNs)
    • Transcription activator-like effector nucleases (TALENs)
    • CRISPR-Cas9 (clustered regularly interspaced short palindromic repeats)
  • Long-acting DNA plasmids

We excluded the following on the basis of focusing on durable, potentially curative therapies: siRNA therapies delivered naked, via liposomes, nanoparticles, or in bacteria; vaccines; mRNAs delivered via liposomes or nanoparticles; and oncolytic viruses.

Two scoping decisions impact the estimates:

  • First, the decision to exclude clinical trials based in China for therapies developed by Chinese companies. We assumed these therapies were targeted for the Chinese market and have not included them.
  • Second, the impact of an ‘expanded snapshot’ of the clinical pipeline – including pre-clinical development programs. Very accurate pipeline projection for therapies with no clinical history is impossible. We have excluded these very early stage therapies.

Therapies were identified primarily by utilizing appropriate therapeutic classes and modalities as search criteria in the citeline™ Pharmaprojects™ database. Further therapies were found in the clinicaltrials.gov database using a combination of natural language processing and manual searches and extraction. Clinical trials registered on clinicaltrials.gov were identified where possible for all identified therapies. Only interventional trials with a known status and Phase were included. In addition to these therapies, a sample of gene therapies in advanced preclinical investigation was identified from the citeline™ Pharmaprojects™ database for inclusion in our analysis.

For use in the modeling process, the remaining data were characterized by product, indication, and disease group. Each product-indication was modeled separately in an overall Monte-Carlo model.

To forecast the number of product launches, the team applied estimates of the time taken to progress through each level of trials, and probabilities of success (defined as the probability that a product, on completing one level in the trials process, will initiate a trial at the next level) of each trials program.  Estimates of the duration of the phases of clinical trials were obtained by examining samples of completed trials. Therapies from the target categories above make up too small a sample size to populate the model parameters for estimating trial durations; therefore, a larger set of cell and gene therapies was used, still restricted to novel, large-molecule therapies.  These data were extracted in September 2018.  Start dates and primary end dates of individual trials were extracted from the data; these were the basis for estimating trial durations by phase:  Phase 1, Phase 2 (including Phase 1/2), and Phase 3 (including Phase 2/3).  Trials that had not reached a conclusion (completed, terminated or withdrawn) were discarded. Phase-specific distribution functions were used to determine the probability of a trial completing (successfully or not) in each cycle of the model. From these samples, completion curves were derived for each phase of trials, relating time taken to complete a trial to the number of trials taking that amount of time. These were used probabilistically to forecast whether an active trial would be completed in the current year of forecast.  The same data set of completed trials, broken down into 4 groups for the purposes of forecasting: hematological cancers, solid tumors, gene therapies for orphan diseases, and gene therapies for higher prevalence diseases, was used to derive probable success rates for each phase of trials.

Population incidence and prevalence data for non-oncology, gene therapy products were obtained through targeted searches of published and online literature on diseases for each disease for which a gene therapy is in our trials pipeline.  The team reviewed clinical trial study eligibility criteria carefully to derive as tight an estimate of the expected pool of treatment-eligible patients as possible.  Oncology data were obtained from the most recent SEER database. For our purposes we defined the treatment-eligible population as those who would not survive 5 years after diagnosis. The most relevant therapies in this analysis are CAR-T and T-cell receptor therapies, which are most likely to be second-line and third-line treatments. Patients who survive longer than 5 years have generally responded well to first-line or second-line therapies. Those with relapsed or refractory disease represent the patient pool who might benefit from CAR-T or T-cell receptor therapies. On average these are about 30% of those diagnosed, although for individual cancers, the proportions range from more than 90% (for lung or pancreatic cancer) to less than 10% (for prostate cancer). The implication is that the potentially treatable pool in oncology is entirely incident—there is no prevalence.

In the case of approval, data on treatment-eligible incident and prevalent patient populations, together with projections of the penetration rate for the product, were used to estimate the potential number of patients on a year-by-year basis.

Once launched, new products will have market penetration established on the basis of adoption, which comprises 2 factors: the maximum penetration rates achieved for incident cases, including the time taken to reach that ceiling, and the maximum proportion of prevalence cases to be “cleared,” and the time taken for that. In diseases with poor prognosis (e.g., aggressive cancers), there will be very little backlog because few patients survive. Many cell and gene therapies in development, however, are targeted at chronic conditions. Both uptake and clearance are expected to be lower than 100% because of factors such as other non–cell and gene therapy products in the market, payer-imposed access restrictions, or individual willingness to try new-to-world treatments relative to existing alternatives.

Adoption and market penetration cannot be known reliably for some time for cell and gene therapies. In addition to insufficient real-world data, historical data make a poor foundation for forming assumptions because they will not include cell and gene therapies. Our analyses were based on assumptions that are flexible, but in the base case are maximum penetration rate of new incident cases of 90% in total for all products in an indication, with a 2-year ramp-up to that ceiling from the time of the first product’s launch; maximum penetration rate of prevalent cases (the “backlog”) of 70%, also in total for all products in an indication with a 5-year time frame to clear. These assumed base-case parameters were formed jointly by the authors, including the MIT NEWDIGS FoCUS Writing Group. The differential penetration rates for incident and prevalent cases reflect a decline based on access restrictions, patient deaths, or other factors that reduce eligibility. The explicit access restriction of all erstwhile eligible patients is a limit on access that reflects (1) access restrictions (e.g., by payers, but also clinician prescribing, etc.) and (2) likely narrowing of indications relative to what were broad indications listed in clinical trials databases.

Our model uses a Markov Chain Monte Carlo process; within each iteration of the model, each individual development program was forecast on a year-on-year basis.  i.e., in any given cycle

  • If a trial is successfully concluded, a new trial at a higher level will be considered to have started
  • If a trial is unsuccessful, this terminates the development program

 The highest development level that can be achieved is approval.

For each development program the estimated year of approval, should it occur, is noted.  Across all iterations the years of approval are summarized to provide probabilities that approval will be obtained in specific years (typically, our results are based on 100,000 iterations).

To derive potential total expenditures, estimated patient numbers were multiplied by estimated price/patient treated.  Prices were assumed to be $1,500,000 per medicine for ultra-orphan diseases, $800,000 for orphan diseases, $500,000 for higher prevalence conditions, and $400,000 for CAR-Ts and TCRs (cancer therapies). These figures will be adjusted periodically to reflect market-based pricing information as more gene therapies gain approval and are launched.