Bioequivalence, featuring Edgeworth and Friends

I’m a huge Costco guy. I mean, Whole Foods? In this economy? Anyway, 1 second into this blog post, you’re probably thinking what all this has to do with bioequivalence. Very little, except that Costco has their own generic drug brand and their drugs come in massive sizes that I dare not buy.

In the year of our lord 2022, I was doing a summer internship at the FDA, working with a team in the division responsible for the statistics behind generic drugs. For those of you who aren’t old and crusty enough to care about medications1, a generic drug is a medicine, for all intents and purposes, that’s supposed to be functionally identical to a brand-name drug. For example, the main active ingredient in a Tylenol is acetaminophen, which sounds like a scary chemical that can kill you (it can, it’s all about the dosage), but you’ll see tubs of acetaminophen at a CVS or a Costco, which are very much functionally the same as taking a Tylenol, but they’re much cheaper. Usually, these generic drugs have marketing labels that compare themselves to related reference drugs. So CVS brand acetaminophen oral pills will have a label saying “oh we’re 99.9% similar to Tylenol Turbo-fucking Fantastic Oral Pills” or something to that effect.

I love generic drugs because they’re cheaper and I don’t want to have to sell one of my kidneys in a black market to afford medicine. Therefore, I’m gonna talk about what happens when you try to take generic drugs to the market.

The FDA is a massive agency that regulates pretty much everything that goes into our bodies, and generic drugs are no exception. The FDA requires a lot of things but one thing they always ask of drug developers is statistical validation. Even if a drug is proven to be effective in a lab setting, it can cause problems in an uncontrolled situation in the real world with lots of confounding factors2, so it needs to be tested in both settings3.

For a new drug, you can choose4 to prove one of the following three:

  1. My new drug is fantastic and beats yours (superiority)
  2. My new drug is at least as good as yours (noninferiority)
  3. My new drug is on average gonna be similar to yours (bioequivalence)

For new drug applications, the FDA asks to show 1 or 2, but a generic drug is different. We already have a name-brand drug that works and the FDA is just allowing another similar drug on the market. So why be the meanie when you can take advantage of this opportunity to show them how generous you are? So they lower the bar slightly and ask to show 3. And that was fine until… it wasn’t.

When some drugs work but then don’t, and then kinda do

Look, the FDA wanted to play nice, but things got a little tricky. Generic drugs are required to show bioequivalence, meaning it should be therapeutically the same as the name-brand drug5. But then, they realized some name-brand drugs work for some people but don’t work for others, and kinda do work a little for them too, just not enough. These are called “highly variable drugs.” Because of how the bioequivalence testing is set up, highly variable drugs ended up requiring a huge study sample to meaningfully show whether the two drugs (the new test drug and reference drug) are bioequivalent.

You’d think pharmaceutical companies would’ve been like “well shit, I quit.” No, they went to the FDA and said “if you’ve set out to be nice, why not actually be nice all the way and let us change the hypotheses?”6 The FDA said “I’ll think about it” and thought about it for ~10 years. Come 2001, the FDA revised their guidance on bioequivalence and recommended what’s now called the reference-scaled bioequivalence (RSABE)." The usual average-bioequivalence (ABE) hypotheses looked like this: $$ H_0:|\mu_T-\mu_R| \geq \varepsilon \text{ vs } H_a:|\mu_T-\mu_R| < \varepsilon, $$ where $\mu_T$ is the mean of the logarithm of test-drug measurements and $\mu_R$ is the reference-drug equivalent.7 The RSABE hypotheses look like this: $$ H_0:\frac{|\mu_T-\mu_R|}{\sigma_{WR}} \geq \varepsilon \text{ vs } H_a:\frac{|\mu_T-\mu_R|}{\sigma_{WR}} < \varepsilon. $$ The $\varepsilon$ for ABE and RSABE is not the same, actually, but I’m lazy and this is my blog so I get to be lazy. You’ll see that the null hypothesis can be rewritten as $H_0:|\mu_T-\mu_R| \geq \varepsilon\sigma_{WR}$, so the BE limit8 that determines whether the drugs are bioequivalent now depends on the variance of the reference-drug measurements.

OK Daeyoung but what the fuck does this have to do with me?

Not much except that if a generic drug doesn’t work as well for you as a name-brand drug, you shouldn’t be surprised. And remember that daddy FDA made sure it doesn’t kill you.

Failure of textbook math stat

That was a long intro to why I’m writing this blog post. Turns out, that RSABE is a bitch (Try coming up with a good test statistic for the RSABE. You can’t). You’d think we can define $I_{ij}=T_{ij}-R_{ij}$, $\bar{I} = \frac{1}{nr}\sum_{i,j}I_{ij}$, $\overline{R}_{i\cdot}=\sum_{j=1}^{r} R_{ij}$, and $S_{WR}^2 = \frac{1}{n(r-1)}\sum_{i,j}(R_{ij}-\overline{R}_{i\cdot})^2$, and use something like $\bar{I} / S_{WR}$ as your test statistic. That’s a great idea, and there’s a paper about its sampling distribution9. Yeah, if everything was that warm and fuzzy, I wouldn’t be posting this, would I?

That test statistic turns out to be quite useless in our setting because we’re dividing the difference $\mu_T-\mu_R$ by $\sigma_{WR}$, not $\sqrt{\sigma_{WR}^2+\sigma_{WT}^2}$. It’s very challenging to get rid of these pesky variances. Such a nuisance!

Howe doin?

If you look at the guidance about bioequivalence testing handed down by the FDA or the European Medicines Agency (EMA), you’ll notice they both recommend using Howe’s approximation. To be Francis, I hadn’t heard of what the fuck that was until I interned at the FDA. But I had heard of the Edgeworth expansion. Howe’s approximation is a spicy version of inverting Edgeworth expansion so we can get an approximate quantile of a distribution that we don’t know the analytical form of. This inverting of the Edgeworth expansion is called Cornish-Fisher expansion. Here’s the original paper.

Honestly, I don’t really give a shit about who’s reading my blog post but what I’m about to write from now on isn’t for the faint of heart. Keep reading at your own peril.

Let’s first get through some basic facts:

  1. You don’t need me to tell you $(1+a/n)^n \to e^a$. Consult Thomas and Friends. But you probably need me (and Isaac Newton) to tell you that $$ \left(1+\frac{a}{n} \right)^{n-k} = e^a\left(1 - \frac{a(a+k)}{2n} \right) + o\left(\frac{1}{n} \right). $$ That cute little $o(\cdot)$ friend tells you that the rest of the terms will disappear to oblivion faster than $1/n$ does10.
  2. We need Hermite polynomials here. Let $\phi(x)$ denote the standard normal density function. Then the $k$-th Hermite polynomial11 $H_k(x)$ is defined by $$ (-1)^k \frac{d^k}{dx^k}\phi(x) = H_k(x)\phi(x). $$ We can even derive a differential equation slash recurrence relation of $$ \frac{d}{dx}[H_k(x)\phi(x)] = -H_{k+1}\phi(x). $$
  3. It’s public information that characteristic functions can be turned into densities (if they exist) by the inverse Fourier transform12. This is called the inversion formula. Suppose $X \sim F$ where $F$ is a distribution function, and $\psi_X(t)=\mathrm{E}(e^{itX})$ is the characteristic function of $X$. Then, $$ f(x) = \frac{1}{2\pi}\int_{-\infty}^\infty e^{-itx}\psi_X(t)\,dt. $$ Consult Shao for proof.
  4. An identity that will come in handy in a minute: $$ \begin{align} \frac{1}{2\pi}\int_{-\infty}^\infty e^{-itx}e^{-t^2/2}(it)^k\,dt &= \frac{1}{\pi}\int_{-\infty}^\infty (-1)^k\frac{d^k}{dx^k}e^{-itx}e^{-t^2/2}\,dt\\ &= \frac{(-1)^k}{2\pi}\frac{d^k}{dx^k}\int_{-\infty}^\infty e^{-itx}e^{-t^2/2}\,dt\\ &= (-1)^k \frac{d^k}{dx^k}\phi(x)\\ &= H_k(x)\phi(x). \end{align} $$ The second equality where we switch the order of differentiation and integration requires some care but it works for “nice” functions. And we love characteristic functions because they’re nice. The third equality is using the inversion formula on the standard normal distribution.

Edgeworth finally

The question Edgeworth set out to answer is “how do we approximate the distribution of a random variable expressed as the sum of standardized variables?” I would’ve used the central limit theorem and gone to sleep, but Edgeworth is an ambitious fella. So the point here is to beat the CLT.

Say $X_1,\ldots,X_n \overset{\text{iid}}{\sim} F$ with $\mathrm{E}(X_i)=0$ and $\mathrm{E}(X_i^2)=1$. Then definie $S_n = \sqrt{n}\overline{X}_n$ where $\overline{X}_n=\frac{1}{n}\sum_{i=1}^n X_i$. Then, with $F_n(x) = \mathrm{P}(S_n \leq x)$, the Berry-Esseen theorem tells us $$ \sup_x |F_n(x) - \Phi(x)| \leq \frac{9\mu_3}{\sqrt{n}}, $$ where $\mu_3 = \mathrm{E}(X_i^3)$, assuming $\mu_3 < \infty$. So in the CLT world, the convergence happens as $O(1/\sqrt{n})$.

The magic sauce to beating the CLT is using more information. See, if you’ve read the assumptions for the classical CLT, you should know the CLT has some strings attached: The sample needs at least the first two moments moments to be finite. This is because higher moments like skewness or kurtosis eventually don’t matter as a normal distribution is completely determined by the first two moments. But “eventually” is doing a lot of work here. See, the CLT talks about the asymptopia but we only have so many samples, and if we want to do better than the CLT, all of a sudden we need to start caring about how fast the CLT is happening.

To see if we can do better than $O(1/\sqrt{n})$, we use the fact that $$ \psi_{S_n}(t) = \left[\psi_X\left(\frac{t}{\sqrt{n}}\right) \right]^n, $$ and apply a Taylor expansion to $\psi_X(t/\sqrt{n})$ because why not. $$ \begin{align} \psi_X\left(\frac{t}{\sqrt{n}} \right) &= \mathrm{E}\left(1 + \frac{itX}{\sqrt{n}} + \frac{(it)^2X^2}{2! n} + \frac{(it)^3 X^3}{3!n\sqrt{n}} + \frac{(it)^4X^4}{4!n^2} \right) + o\left(\frac{1}{n^2} \right)\\ &= \left(1 - \frac{t^2}{2n} \right) + \frac{(it)^3\mu_3}{6n\sqrt{n}} + \frac{(it)^4 \mu_4}{24n^2} + o\left(\frac{1}{n^2} \right). \end{align} $$ Now we’re going to raise to the $n$-th power, but to simplify the algebra13, we won’t care about $o(1/n)$. That is, terms that go to zero faster than $1/n$ will be absorbed into $o(1/n)$. $$ \begin{align} \left[\psi_X\left(\frac{t}{\sqrt{n}} \right)\right]^n &= \left[\left(1 - \frac{t^2}{2n} \right)^n + \left(1 - \frac{t^2}{2n} \right)^{n-1}\left( \frac{(it)^3\mu_3}{6n\sqrt{n}} + \frac{(it)^4 \mu_4}{24n^2}\right) \right. \\ &+\left. \left(1 - \frac{t^2}{2n} \right)^{n-2} \right] \end{align} $$ By Fact 1, this turns into $$ \begin{align} \psi_{S_n}(t) &= e^{-t^2/2}\left\{1 + \frac{(it)^3\mu_3}{6\sqrt{n}} + \frac{(it)^4(\mu_4-3)}{24n} + \frac{(it)^6\mu_3^2}{72n} \right\} + o\left(\frac{1}{n} \right). \end{align} $$

Using the inversion formula, we get $$ \begin{align} f_{n}(x) &= \frac{1}{2\pi}\left(\int_{\mathbb{R}}e^{-itx}e^{-t^2/2}\, dt + \frac{\mu_3}{6\sqrt{n}}\int_{\mathbb{R}}e^{-itx}e^{-t^2/2}(it)^3\,dt\right.\\ &+ \left. \frac{\mu_4-3}{24n}\int_{\mathbb{R}}e^{-itx}e^{-t^2/2}(it)^4\,dt + \frac{\mu_4^2}{72n}\int_{\mathbb{R}}e^{-itx}e^{-t^2/2}(it)^6\,dt \right). \end{align} $$

Fact 4 provides the basis for turning the hideous integrals into familiar Hermite and Gauss. $$ f_{n}(x) \approx \phi(x)\left(1 + \frac{\mu_3H_3(x)}{6\sqrt{n}} + \frac{(\mu_4-3)H_4(x)}{24n} + \frac{\mu_3^2H_6(x)}{72n} \right). $$

And the recurrence relation in Fact 3 gives us $$ F_n(x) \approx \Phi(x) - \phi(x)\left( 1 + \frac{\mu_3H_2(x)}{6\sqrt{n}} + \frac{(\mu_4-3)H_3(x)}{24n} + \frac{\mu_3^2H_5(x)}{72n}\right). $$ This may look like it’s a third-order Edgeworth expansion since there are 3 terms, it’s not. It’s second. Why? Because the order is defined as the highest $m$ for $n^{-m/2}$ in the denominator. And that’s also because of the following: $$ \sup_x\left|F_n(x) - \Phi(x) + \phi(x)\left( 1 + \frac{\mu_3H_2(x)}{6\sqrt{n}} + \frac{(\mu_4-3)H_3(x)}{24n} + \frac{\mu_3^2H_5(x)}{72n}\right) \right| = o\left(\frac{1}{n} \right). $$ Finally, let’s give those Hermite terms a form: $$ F_n(x) \approx \Phi(x) - \phi(x) \left(\frac{\mu_3(x^2-1)}{6\sqrt{n}} + \frac{(\mu_4-3)(x^3-3x)}{24n} + \frac{\mu_3^2(x^5-10x^3+15x)}{72n} \right). $$

So what?

I don’t know. You might have bigger fish to fry and would rather go to bed early and wake up early in the morning to have a productive day than to waste your time deriving some obtuse formula concocted decades ago. But hey, if you need it, you know where to find it!

In fact, you can deduce some interesting facts about the CLT.

  1. Skewness affects the quality of the normal approximation to $F_n$. It shouldn’t be surprising taking the average of $X_i \overset{\text{iid}}{\sim} \mathrm{Bernoulli}(0.5)$ will converge faster to a normal distribution than $X_i \overset{\text{iid}}{\sim} \mathrm{Bernoulli}(0.0001)$. Anyone who’s been in the statistics business should’ve intuited this but here’s a concrete proof for ya. To be extra nitpicky, if we hadn’t scaled our $X_i$’s to have unit variances, the first-order term would depend on $\mu_3/\sigma^3$, not just $\mu_3$. So skewness matters less if the density is so diffuse it’s flat af.
  2. The first-order term corrects the skewness as an even function of $x$. Which means it forces the distribution function to be symmetric about zero. This is the root-$n$ skewness correction.

We have a long way to go to get to our friend Howe but it’s weekend and today’s Daeyoung has some bubble tea to grab, so I’m gonna be leaving the rest of this post for future Daeyoung.

  1. You just wait. ↩︎

  2. For example, a reasonable dose of acetaminophen alone is harmless but if one drinks around the same time, it can cause severe liver damage. ↩︎

  3. in vitro vs in vivo ↩︎

  4. Not really. The FDA is the crazy bitch around here and you do what she tells you. ↩︎

  5. Remember, Tylenol and acetaminophen. Or Motrin and ibuprofen. ↩︎

  6. I phrased it like that for comedic effect but the request was justified on multiple grounds. First, the FDA doesn’t want developers to conduct massive studies on ineffective drugs due to ethical reasons (i.e., those people could’ve received effective treatments). So a huge sample is not always great, although a lot of people tend to discredit small-sample studies—it’s for your own sake! Second, those drugs that are ineffective to some people are proven harmless in the process of obtaining FDA approval—they’re just a little placebo-ish. ↩︎

  7. For the setting, we assume the test-drug measurements $T_{ij}\overset{\text{iid}}{\sim} N(\mu_T,\sigma_{WT}^2)$ and the reference-drug measurements $R_{ij}\overset{\text{iid}}{\sim} N(\mu_R,\sigma_{WR}^2)$ for $i=1,\ldots,n$ and $j=1,\ldots,r$. There are $n$ subjects and each subject is measured $r$ times. ↩︎

  8. The thing that’s on the right-hand side of the inequality ($\varepsilon$ vs $\varepsilon\sigma_{WR}$). ↩︎

  9. It’s called the Glass’s estimator and it follows a noncentral $t$-distribution. It’s not difficult to derive this distribution since $\bar{I}$ and $S_{WR}$ are independent, using the affine properties of a multivariate normal. The difference in our setting is that the denominator is not the standard deviation of the differences, but rather of the reference measurements, so the variances $(\sigma_{WT}^2, \sigma_{WR}^2)$ survive the cancellation. ↩︎

  10. If $a_n = o(b_n)$, then for every $\epsilon > 0$, even for a very very very very tiny number, there is an integer $N$ such that $|a_n| \leq \epsilon|b_n|$ for all $n>N$. Consider $a_n = o(1/n)$. $1/n$ already vanishes pretty fast as $n\to \infty$ but $a_n$ only has a finite number of terms for any $\epsilon >0$ that’s bigger than $\epsilon /n$. ↩︎

  11. Hermite polynomials unfortunately come in two versions because engineers quite frankly don’t give as much of a fuck about $e^{-x^2/2}$ as they do $e^{-x^2}$. We’re not engineers so we give some fucks about that one half. Pledge your fealty to normal distribution. ↩︎

  12. Signal engineers swear by it. Whisper “Fourier transform” into a signal engineer’s ear and they won’t shut up about it as if you’ve put a spell on them. ↩︎

  13. It’s still complicated as hell. You might need some Kumon practice if you make a lot of algebra mistakes. ↩︎

Daeyoung Lim
Daeyoung Lim
Statistics PhD Candidate

My research interests include Bayesian statistics, biostatistics, and computational statistics. I’m an English grammar fiend and a staunch proponent of plain language.