Liu’s Insights on Navigating the Fragility of Phase 3 Trials

Fact checked by Sabrina Serani
News
Article

Yufei Liu, MD, PhD, discussed the importance of careful data interpretation to accurately assess the efficacy of new treatment options in oncology.

Yufei Liu, MD, PhD

Yufei Liu, MD, PhD

The design and interpretation of phase 3 trials in oncology are pivotal for advancing therapeutic strategies. However, according to Yufei Liu, MD, PhD, there are several complexities of trial designs, particularly regarding crossover methodologies.

As crossover designs spark considerable debate within the field, Liu, assistant professor of radiation oncology at City of Hope, emphasized how patient transitions between treatment arms can significantly impact trial outcomes, potentially skewing results and affecting clinical decision-making.

"I think the fragility of a clinical trial allows us to assess clinical trials in a different way...If we see a fragile, statistically significant result, it suggests that we may want to design another trial—potentially a larger one with a bigger sample size—to ensure that what we're seeing is not just a statistical fluke," said Liu in an interview with Targeted OncologyTM.

In the interview, Liu highlighted the critical need for meticulous data interpretation to ensure that the efficacy of new treatments is accurately captured and understood.

Targeted Oncology: Could you elaborate on the potential biases introduced by crossover designs in phase 3 trials and how they impact the validity of results?

Liu: In terms of crossover designs, there are not a lot of clinical trials that utilize that approach. But for ones that do, it certainly introduces the problem of, how do you interpret that data when patients do cross over? In using the technique of the fragility of the phase 3 clinical trials that I assessed, we essentially are directly addressing that question by crossing patients over from one arm of a clinical trial to another arm, and using that approach, we can see that it can significantly affect the results of clinical trials, because sometimes it only takes 1 or 2 crossover patients, and oftentimes only in the single digit numbers, to see a dramatic difference in the significance of a trial.

I think we do have to be careful when doing these crossover designs. One way of assessing this is potentially looking at the outcomes of the patients who crossed over, taking note of that, and then looking at the results if they had not crossed over, and seeing if they still see the same conclusions as with the crossovers.

What are some of the challenges in selecting primary end points for phase 3 trials in oncology?

The gold standard of the end points is always going to be overall survival for the phase 3 trials. However, unfortunately, we are not always able to get enough sample size to get meaningful overall survival data, or sometimes overall survival takes a long time to see the difference. For certain types of cancers where survival is very good, we often use surrogate end points, because oftentimes we do want to publish the results of trials before overall survival data can become mature, which sometimes can take 10 years or more. We have to use surrogate end points such as progression-free survival. Sometimes we use local control or distant metastasis-free survival. There is certainly some danger to using these end points, because, in a way, the more end points that we use, the more chances that we can find an end point that just happens to be more statistically significant. Whereas overall survival is the golden end point, these other end points are a little bit harder to interpret because you can find that one end point is significant, but it does not affect the overall survival. In some respects, it leaves you wondering, is this really a meaningful end point, or how valuable is that? I think we do have to be careful in selecting these other end points, and I think this requires a conversation when designing the clinical trial with practitioners out in the field as to what is truly meaningful. Which outcomes do we think can truly stand in as surrogates for overall survival? I think that is an important conversation to have at the beginning of designing these trials.

How do you think surrogate end points impact the perceived fragility of trial outcomes through oncology?

If we are looking at something like progression-free survival, oftentimes it is a little bit easier to see differences in progression-free survival compared with overall survival. From that standpoint, the trials can seem a little bit less fragile if you do use the surrogate end points, just because it allows you to see a more significant result. However, when looking at that, it does not necessarily mean that the overall survival data itself is not fragile. Sometimes these surrogate end points can mask the fragility of trials. It is almost like you are choosing whichever end point gives you the most significant P value. In terms of your most important end point, which is their survival, it is sometimes that can be masked in these circumstances,

Can you discuss the role of subgroup analyses in phase 3 trials and how they contribute to or mitigate the fragility of these results?

Subgroup analyses are certainly a very tricky issue. I think one of the things is that whenever clinical trials are designed, we always want to make sure that we have enough patients so we have enough power for in the studies to detect a certain clinical end point up to a certain magnitude. Whenever we do subgroup analyses, these are essentially unplanned, so we certainly do not have enough power to detect differences.

However, sometimes we are able to find interesting differences. If we find that [in[ the overall group of patients that were assessing, some certain intervention was not significant, oftentimes we like to see, is it because of certain groups of patients, certain characteristics? In that respect, these trials can still be beneficial and give us information about that. However, I would caution that, because the trial is not powered to detect those differences, this is more of an exploratory end point. It is very hypothesis generating. Potentially, we want to design a future clinical trial addressing the question that we addressed in our subgroup analysis if we found something interesting. However, I would say these subgroups analyze themselves because they are not powered, and the trial was not designed to assess those. It is hard to make a conclusion from the overall trial itself, and I would say from the fragility standpoint, because it is not powered, it is hard. Oftentimes, the results tend to be very fragile as well. It does not take very many patients, oftentimes less than 3 to 4 patients or so flipping from one arm to another, to see that the results will disappear.

How do you suggest oncologists balance the results of phase 3 trials with real-world evidence?

Whenever we see the results of phase 3 trials, whether they end up being positive or negative, we also have to think about the clinical significance of what we are testing as well. Oftentimes, certain agents have been used for a long time. We have to try to incorporate all the data that has been found in addition to what has been done in that phase 3 trial.

How should oncologists approach clinical decision-making when a phase 3 trial demonstrates a fragile, statistically significant result?

It is one of those circumstances where, for instance, when we think about clinical trials, we tend to fixate on certain statistical values. So, we often use the 0.05, the 5% significance cutoff, when determining if a trial is significant or not. I think the fragility of a clinical trial allows us to assess clinical trials in a different way. For a lot of people, it is hard to visualize what it means for a trial to have a P value of, say, 0.049 versus 0.01. Whereas, if you think about fragility, the way we are assessing it is by determining how many patients it takes to flip a result from significant to not significant. This provides a much better visual cue.

So, if you have a trial with 100 patients in each arm, and you just flip one patient's result, that could change the significance of the trial. I think it provides a much better visual picture than, say, a P value that barely attains that 0.05 threshold. What it means is that, if we see a fragile, statistically significant result, it suggests that we may want to design another trial—potentially a larger one with a bigger sample size—to ensure that what we're seeing is not just a statistical fluke. That way, we can make sure that a single patient, who may be an outlier, didn’t tilt the results one way.

I would also say it depends on how the result is statistically fragile. Using the method that assesses fragility, we can flip patients who are the most average patients from one arm to the other. We can also flip the most extreme patients. When you flip the most extreme patients, it addresses the question of outliers. For example, if you have one major outlier in a group, and you’re asking whether that outlier is responsible for the statistical significance of the trial, this method can help assess that. So, I think it does depend on how fragile the trial is, but certainly, if it only takes 1 or 2 patients to flip the result, it leads us to think we may need more patients and further investigation.

How important do you think long-term follow-up studies are for validating the results of phase 3trials?

I think they are extremely important, especially when looking at overall survival, because oftentimes, in terms of looking at surrogate end points, we can see that patients progress, but we have good therapies that allow them to have prolonged survival, even with progression. I think there is certainly no substitute for having a long-term follow-up and giving the time to see how patients do. I think the issue is that oftentimes we want to know the answers to questions quickly. Sometimes we do not have the luxury of waiting a decade or so to really see how the data plays out over time. But I do think it is important to, even if you have a positive result or a negative result initially, say at the 2- or 3-year mark, to still follow that up to 5 and 10 years, just to see if what we are seeing in the early stage is persistent and carries on to the later stages.

What strategies would you recommend for increasing the robustness of phase 3 trials in oncology?

I think the first thing is sample size. There is certainly no substitute for having a large sample size for these clinical trials. I think that can be challenging for certain types of cancers, especially rare cancers, or for clinical trials that address questions that are a little harder to sell to patients. It can be difficult to get enrollment, and it's always a constant challenge. Whenever we design phase 3 clinical trials in oncology, we always have a robust statistical analysis, and we work closely with our teams to design these trials. However, that is essentially in the theoretical realm. In the real world, what happens is, say, we have a trial that is powered to address a question with 500 patients, but oftentimes we are only able to enroll 100 or 200 patients. Because we enrolled these patients, we still want to know what the trial results show.

Sometimes, the trial results show a statistically significant result, but it ends up being a fragile result because the patient number was so small. It only takes a patient or two to change the statistical significance of the result. There is certainly no substitute for increasing the number of patients and trying to conduct large trials in the real world. Of course, it's not always possible, so we have to be careful about certain ways we analyze the data.

Crossover can certainly be a huge confounding factor when it comes to clinical trials, as it can skew the results. I would say, whenever we consider doing a crossover—often for ethical reasons, like when we know an arm is doing well, or patients request a certain arm—we do honor the request for crossovers, but we have to carefully track these patients and denote that it was a crossover. We should also analyze the data without the crossover, so do it both ways. Additionally, our results for clinical trials are often reported as intention –to treat, meaning patients are analyzed based on the arm they were originally randomized to. However, sometimes patients who were intended for one arm ended up in a different arm, and the intention-to-treat analysis doesn’t always capture that. This aspect is very important, especially if a significant number of patients were not treated based on their initial randomization. That’s another approach to decrease the fragility of trials and make the results more reproducible—by taking into account what intervention patients actually received.

As we know from fragility analysis, it sometimes doesn’t take very many patients to change the outcomes. In terms of other things we can do to reduce the fragility of phase 3 trials, part of it comes down to conducting a good statistical analysis. Sometimes, the statistical tools we use aren't necessarily the best for analyzing the data. For instance, in many trials, we assume the proportional hazards assumption. However, when we look at the Kaplan-Meier curves, they sometimes cross over, which means the proportional hazards assumption is not met. In such cases, we need to use a different type of statistical analysis to correctly analyze the data. So, I think it's always very important for phase 3 trials to have a robust statistical team and to use the right type of analysis for these studies.

I think the biggest thing is that this fragility tool is not a substitute for the P value. The P value, the statistical significance, whether it is 0.05 or 0.01, is something well established, and I foresee it continuing to be the main factor we look at in the future when assessing clinical trials. However, this is another tool we can use in our toolbox.

In terms of clinical trials, it helps us visualize things much better than a P value, especially when we are talking to patients about certain trials. When we discuss the P value, it is a theoretical construct, whereas if we talk about fragility—like flipping patients from one arm to another—that is much easier to visualize and a good tool to use.

For trials with P values close to the cutoff, we can do a fragility analysis to see how fragile these trials are. If they are extremely fragile, we can consider adding more patients or extending the trial to ensure the results are validated and potentially less fragile. This applies to negative trials as well, because sometimes negative trials come close to the P value we are looking for. In those cases, we need to examine whether there were crossovers or if we are doing an intention-to-treat analysis where many patients did not receive the arm they were randomized to. This helps us assess, “What if these patients stayed on the arm they were supposed to be on? Would that influence the results?"

REFERENCE:
Liu Y. The fragility of phase 3 trials in oncology. Abstract presented at: 2024 American Society for Radiation Oncology Annual Meeting; September 29-October 2, 2024; Washington, DC. Abstract 65267.
Recent Videos
Related Content