The AAPS Journal, Vol. 14, No.

3, September 2012 ( # 2012)

DOI: 10.1208/s12248-012-9359-0

Research Article
Theme: Facilitating Oral Product Development and Reducing Regulatory Burden through Novel Approaches to Assess Bioavailability/Bioequivalence
Guest Editors: James Polli, Jack Cook, Barbara Davit, and Paul Dickinson

In vitro–In Vivo Correlations: Tricks and Traps

J.-M. Cardot1,3 and B. M. Davit2

Received 6 January 2012; accepted 5 April 2012; published online 1 May 2012
Abstract. In vitro–in vivo correlation (IVIVC) is a biopharmaceutical tool recommended to be used in
development of formulation. When validated, it can speed up development of formulation, be used to fix
dissolution limits and also as surrogate of in vivo study. However, as do all tools, it presents limitations
and traps. The aim of the present paper is to investigate five common traps which could limit either the
setting or use of IVIVC (1) using mean or individual values; (2) correction of absolute bioavailability; (3)
correction of lag time and time scaling; (4) flip-flop model; and (5) predictability corrections.
KEY WORDS: biowaiver; in vitro–in vivo correlation (IVIVC); predictability; prediction; time scaling.

INTRODUCTION difference in dissolution could reflect, as expected in IVIVC,

either a difference in release from the drug dosage form based on
Time is an important factor in successful drug development known characteristics (e.g., polymer grade or quantity for an
and all strategies to shorten development duration while extended release formulation), a difference in API characteristics
supporting safety and overall quality of the product are or a difference in the manufacturing process, which is of great
encouraged. One tool that can be used in this race against the interest for the identification of the critical quality attribute.
clock is in vitro–in vivo correlation (IVIVC) and its establishment, When correlation is established and validated, prediction of
when possible, is recommended by authorities (1–12). An in vivo profile can be done based on in vitro dissolution profile.
acceptable IVIVC requires that the in vitro dissolution and in Those predictions can be used in a wide range of applications
vivo release or dissolution behavior of a dosage form should be such as to be a surrogate of in vivo study for formulation variation
either similar or have a scalable relationship to each other. IVIVC and also, for example, to validate scale up with a factor more than
could only be established when the factor controlling the 10, to justify widening of dissolution limits, to validate production
appearance of the drug in the blood flow is linked with the transfers, to modify the manufacturing process (2,6,8–10) as the
formulation (for example: slow release (SR) formulation for class predictions allow to guaranty the in vivo performance and the
I drug) or the characteristics of the active pharmaceutical bioavailability or bioequivalence of the formulations.
ingredient (API; for example slow dissolution of API (Class II However making accurate prediction of in vivo perfor-
drug) presented as an immediate release (IR) formulation in mance based on in vitro dissolution is not a straightforward
vivo) and not with any physiological limiting factor (for example process and many traps could involuntarily bias the predic-
rate limited permeation). In practice, the formulations with a tions. The quality of the predictions depends on the tools,
release from the drug dosage form slower than the dissolution of hypothesis and shortcuts used to establish them.
the API and a high permeability (SR formulations of class I and The aim of the present paper is to investigate common traps
II) are the best candidates as everything depends of the which could limit either the setting or use of IVIVC. Five
formulation. In IVIVC, the pharmacokinetic (PK) absorption or common traps will be investigated in detail: (1) using mean or
in vivo release parameters are related to the in vitro dissolution individual values; (2) correction of absolute bioavailability; (3)
which reflects a global performance of the drug product. A correction of lag time and time scaling; (4) presence of a flip-flop
model; and (5) predictability corrections.
Disclaimer The views presented in this article are those of the author
(Dr. Davit) and do not necessarily reflect those of the US Food and
UFR Pharmacie, ERT-CIDAM, Biopharmaceutical Department, This point has been extensively described (13–17).
Auvergne University, 28 Place H. Dunant, BP 38, 63001 Clermont-
Averaging the full set of data means that the dataset is
Ferrand, France.
Division of Bioequivalence II, Office of Generic Drugs, Center for restricted to a limited number of points implying a loss of
Drug Evaluation and Research, US Food and Drug Administration, information. However, is this loss the key factor of failure of
7520 Standish Place, Rockville, Maryland, USA. IVIVC? To answer this question, the distinction between the
To whom correspondence should be addressed. (e-mail: j-michel. averaging of in vitro data and of in vivo data as well as their
cardot@u-clermont1.fr) implications must be investigated.

In Vitro Let us now see two calculations based on data from

Hemmingsen et al. (18) fulfilling the previous requisites. In
Averaging in vitro data for analysis is a common practice the first approach the absorption curve is established on the
used in tests such as the f1 and f2 tests (8). Such averaging mean plasma profile of a SR formulation by deconvolution vs.
may not really be a serious problem for the following reasons: the mean plasma profile of the IR formulation as response
function according to the numerical deconvolution method
& The analysis of each formulation before its intake by using Phoenix™ Win nonlin® (Pharsight Certara corporation).
subjects could not be done and thus its content could not As described in Ref. (18), the deconvolution technique
be adjusted for further analysis. was applied independently of any modelization of the
& The uniformity of content is usually fulfilled and the absorption. Deconvolution allows isolating the input (absorp-
difference vs. the theoretical value quite low and in a BE tion) function by a numerical algorithm as a function of the
study, the drug content of the test product cannot differ observed concentration for the studied tablet and for the IR
from that of the reference listed product by more than reference tablet. In the current case, this input function
5 % (6). If that is not the case a correction could be reflected the in vivo release observed after administration of
applied (see CORRECTION OF BIOAVAILABILITY) the SR test tablets. The simulations of the curves from the
but only if batch is homogeneous. theoretic input were performed using convolution. On the
& In clinical practice, the dose mentioned on the label is second approach, individual deconvolutions were performed
the only one that is going to be used by the prescribing using the same technique and the mean of all absorption
physician who does not adjust his prescription to the real curves was calculated as recommended by some agencies.
content; and rely on the strength indicated on the label The results of individual deconvolution and of mean curves
& The dissolution is a reproducible technique with a controllable are presented in Fig. 1 (bottom).
environment (pH, temperature, medium composition, etc.) In the example presented, the results of both techniques
leading to tight results and low variability. Highly variable are similar. The frequent request from some authorities to
dissolution denotes non homogeneous production or non- perform individual deconvolution and then to average the
robust formulation that may be associated in vivo with a large results is not different in the presented case from performing
inter individual variability. However, if that is not the case, directly the deconvolution on the mean of the plasma
then dissolution is not the key factor contributing to in vivo concentration curve. The more interesting point is to know
variability. whether it is better to perform IVIVC on an individual basis
or on any type of averaged vales.
In case of individual IVIVC, two major points must be
In Vivo taken into consideration. First, is the IVIVC established on a
common base for all subjects or on individual equations (see
The question of averaging is here more interesting and a also time scaling and prediction sections later for this point)?
pragmatic approach must be taken. Many points must be Performing IVIVC with an individual adjustment of the
answered to make the right decision. Is the variability mainly equation means that the IVIVC is based on each individual’s
within or between the subjects? In the first case, large intra specificity of the initial data set and cannot be extrapolated to
subject variability, IVIVC cannot necessarily detect the other set of subjects. This is in contradiction with the universality
difference between two different formulations and the of the prediction insuring the role of the in vivo surrogate of
inherent subject variability. In other words, two different IVIVC. Second, which type of averaging is used? Many
profiles could be due to either the large intra subject functions could be used to calculate a mean value: arithmetic,
variability or to the difference in formulations. This means geometric, harmonic, quadratic, etc. Generally, most scientists
that IVIVCs are discouraged for highly variable drugs as take the simplest and most self-explanatory route which is the
study power and predictability are low. arithmetic mean. This seems like an obvious approach for most
The second case, with usual or low intra subject variability, scientists, and is a common way for calculating mean plasma
is more interesting. Here, the first question is to see if the mean concentrations curves. However, some simple points must be
curve reflects the individual behaviors or not. A simple addressed. Usually, the calculation of PK parameters to assess
pragmatic approach is to study two main parameters which are bioavailability are based on log-transformed parameters and
lag time (Tlag) and time to maximum concentration (Tmax) and after calculating the 90 % confidence interval (90 % CI), the
to a less extent the time of last non-zero value or the terminal results are back-transformed using exponential function. Doing
half-life. Of course when lag time exists and difference between such a transformation implies, de facto, the calculation of
subjects is significant (Tmax very different) the mean curve does geometric means which as a property decreases the weight of
not reflect the individual behavior (see Fig. 1 top right) and extreme values and increases the chance to have normally
IVIVC is not recommended (as for some enteric-coated (EC) distributed parameters. Taking into account this mode of
formulation for example). In such a case, the in vivo behavior of calculation means also that the reference values for the internal
the formulation does not depend only on the release from the predictability must be calculated in accordance with taking the
formulations but is influenced mainly by physiology (gastric Cmax and AUC of the mean curve and not the mean of
emptying for example; Fig. 1 top). If the Tlag and Tmax of all individuals (or even more complicated the LS means extracted
subjects and of the mean curves are close together, the use of a from the general linear or mixed model), and using for that the
mean curve will not modify dramatically the results, the only right method of calculating geometric or arithmetic means.
problem is the loss of information such as variability (that will be In addition, if the IVIVC is established on all the
discussed later). absorbed drug sampling points obtained for all subjects
Fig. 1. The top shows mean of curves without and with lag time, the bottom shows plasma
concentration curves and related absorption (data from (18) individual and mean curves)

(which will reinforce the power of the IVIVC) the coefficient linked with a parameter which does not depend upon any
of correlation (r), dependent on the number of values used, process that is under control (such as formulation) but only
will decrease. This fact must always be kept in mind to assess from physiology (gastric emptying). In such a situation, lag
the quality of a IVIVC and it is better to link the appreciation time correction should not be encouraged as it is not
of the quality of the regression (IVIVC) with a p value than reproducible.
with a r value. When a lag time correction is done, all further simula-
An alternative method using direct convolution techni- tion, including predictability, must be based on a similar
ques, Bayesian or neural network approaches are proposed correction even if it was modified by the formulator. Using
(13–17,19–22) that allows the investigator to treat all the such an approach is close to the spirit of the Japanese
information in a single process using specific algorithms and guideline on bioequivalence which allows, in certain case, a
software. This approach needs perfect comprehension of all correction of in vitro lag time to compare formulations
the mechanisms of calculation underlined and of the cova- (24,25).
riates to be included in the programing part. The time scaling is nothing more than an extension of the
concept of correction of rate between in vivo and in vitro data
TIME SCALING AND LAG TIME CORRECTION (Fig. 3). As shown in Fig. 3, the correction could be of two
natures (case 1 and case 2); however, in the two different
Time scaling and lag time correction are based on the cases, time scaling does not exhibit a similar meaning and
same approach (2,5,7,11,23). Time scaling and lag time strength.
correction are needed if the in vitro and in vivo curves exhibit The common way to estimate the time scaling is to draw
different rates and/or a difference in the starting of a a Levy plot (2,5), which reports on the X-axis the time to
phenomenon or a difference in lag time. For lag time, two have certain percentages of drug dissolved in vitro, and, on Y-
basic cases exist (Fig. 2), and correspond either to an in vivo axis the time to have similar percentages absorbed in vivo
or in vitro lag time. The first curve (triangle in Fig. 2) (Fig. 4). In the case of lag time, the intercept on the Y-axis
corresponds to the situation in which absorption data in vivo gives the estimated lag time (Fig. 4, left; 1.1 h). If a similar
is observed before the release of any drug is observed in the process is observed between in vitro and in vivo data, the
dissolution test in vitro. This case corresponds to an inade- Levy plot will be a straight line (Fig. 4, left). If that is not the
quate in vitro test which must be improved. case (Fig. 4, right) ruptures in the Levy plot could be
The second case corresponds to dissolution data detect- observed which denotes at which time the in vitro and in
ing drug release in vitro before any absorption data existed in vivo processes diverged.
vivo (diamond in Fig. 2). This case is more understandable, it Some questions arise with Levy plots. Among them are
could correspond either to a formulation characteristic key questions of how to interpolate the in vitro data and how
(enterocoated formulation, delayed release or a SR formula- to validate a time scaling. The interpolation of in vitro data is
tion) or to physiology (delay in the gastric emptying). In the based on the existing in vitro dissolution and on functions to
case of a delay linked with the formulation characteristic and represent all observed values. The equations generally used
not to physiology, and only if a certain homogeneity exists are, for example, linear interpolations, Weibull, Hill, double
between the subjects, a lag time correction common to all Weibull, Makoid Banakar, Higushi, spline, polyexponetial,
subjects could be tried. That is not the case if the lag time is (26–28). The quality of the estimation depends on the number
and two or three slopes can be observed. This nonlinear time

scaling denotes that the processes explored in vivo and in
vitro were not consistently similar, that is, the in vivo process
was slower when time increased compared to in vitro. As
described in (29), the presence of the formulation was
monitored through the small intestine, ascendant and trans-
verse colon, and descendent colon by scintigraphy. The rate
of absorption was estimated to decrease gradually in each
part of the intestine. This modeling of absorption could
directly be linked to the observed plasma curves and the GI
transit. The nonlinear time scaling could also be explained by
where the tablet was in the GI tract and the resulting
Fig. 2. Lag time correction, different cases absorption rates (see (18,29) for more explanation). In this
latter case, the time scaling is linked with observed physiology
and is thus validated by observation.
of sampling points used to generate the dissolution profile. Nonlinear time scaling must in all other cases be
The best and safer way is to program a large quantity of investigated and the reason for absence of consistency
dissolution samples, which is simple when an on-line analysis between vitro and vivo data determined.
can be programmed. In this case, the interpolation between
two points is simple and the results of linear interpolation or
more sophisticated methods bring overall similar results. In FLIP-FLOP MODEL
the opposite case, when only a few dissolution points exist,
that often being the case for IVIVC established a posteriori Wagner Nelson is a mass equation which allows calculation
(after the end of development), the equations lead often to of the absorption in the case of the one compartment model as
over-parameterization of the profiles leading to inaccurate stated in guidelines (7,8). This equation uses observed concen-
prediction. Over-parameterization starts when the number of trations (C(t)), AUC and apparent elimination rate constant
parameters of the equation corresponds to number of sample determined from the data (ke) as presented in Eq. 1.
minus one. As an example, if a double Weibull equation is
used seven parameters must be established to fit the model to CðtÞ þ ke  AUCt0
A% ¼  100 ð1Þ
the data (weighting factor, amount released at infinity, two ke  AUC1 0
mean dissolution times, two slopes, and one intercept). If the
dissolution dataset presented up to 8 dissolution points, then This equation exhibits a domain from 0 to 100 %. In
the fit could be perfect but the accuracy of parameters low some cases, when the Wagner Nelson equation is used, a flip-
and the predictability subjects to variation. The best equation flop model could exist (30,31), especially in case of SR
to interpolate between two dissolution points is either the formulations where the absorption rate is much lower than
equation which reflect a known reality observed in vitro or the elimination rate. In this case, the terminal decreasing of
the simpler one like a linear interpolation if the sampling the plasma concentration curve, which normally reflects the
points are close enough to avoid misestimation. elimination rate (ke), becomes a reflection of actual absorp-
The Levy plots and time scaling are presented for the tion rate (ka), while the initial increasing part of the curve,
two previous examples in Fig. 4. which normally reflects the absorption rate (ka), is the actual
Obviously, in Fig. 4, the left graphic indicates a good representation of the elimination rate (ke). The equation is
linear relationship, denoting similar processes observed in presented in Eq. 2 for normal model and Eq. 3 for flip-flop.
vitro and in vivo, and thus validating de facto the in vitro
dissolution method even if more rapid than in vivo. On the CðtÞ ¼ F  D   eket  ekat ð2Þ
right side of Fig. 4, the graph of the relationship is not linear v  ðka  keÞ

Fig. 3. Different rate between in vitro and in vivo data left case 1, right case 2
Fig. 4. Levy plot for examples presented in Fig. 3 (right extracted from (18))

The development team may want in some cases to

ke   improve this absolute bioavailability either by modifying the
CðtÞ ¼ F  D   ekat  eket ð3Þ
v  ðke  kaÞ API itself (new salt, size of crystals, polymorphs, etc.) or by
improving the formulation (usage of specific excipients,
In the case of flip-flop kinetics, many parameters which creation of the controlled released formulation, etc.). In
influence BE studies and the IVIVC are modified: addition, the absorption of the substance into the blood could
be governed by either the solubility of the API itself, the
– A UC from 0 to infinity is overestimated and leads to an release of the API from the drug dosage form, the perme-
incorrect estimation of relative or absolute bioavailability ability through the intestinal membrane, but also by the
– Absorption which is overestimated and faster than the real influence of efflux proteins such as pGp, and by first pass
absorption leads to a bad IVIVC metabolism. The absolute bioavailability is a function of first
pass intestinal effect, first pass membrane effect and first pass
This example is described in Fig. 5. hepatic effect and can be estimated according to the following
In case of flip-flop, the IVIVC will not lead to a good formula (Eq. 4):
estimate. Nonetheless, simple tricks exist to avoid such a X
problem. As an example, the accuracy of the terminal half-life F ¼1 ðfirst pass effectÞ ð4Þ
extracted from the formulations under consideration must be
A change in one of the parameters related to intestinal,
compared to IR or IV data (observed or published). In
membrane or hepatic behavior of the drug dosage form or the
addition, terminal half-life which is modified between for-
drug itself or of the drug dosage form could change the
mulations and increased when the formulation is designed to
absolute bioavailability.
release longer is a good sign of a potential problem. In this
The in vivo data are derived from the plasma concen-
case, Wagner Nelson could be used only if the ke estimated
tration curve. After the calculations are performed, the input
from IV or IR formulation is available.
curve known as “absorption” curve is presented either as the
percent of the fraction of dose (%FD) absorbed or the
CORRECTION OF BIOAVAILABILITY percent of dose (%D) vs. time. If the data are presented in
%D both rate and extent as confused as the maximum of
During the development of new drugs and drug dosage absorption corresponds to the fraction of the dose of the test
forms one of the main concerns of the development team is formulation vs. the formulation used to perform the decon-
the correct estimation of the bioavailability (absolute (F) or volution (F in case of IV or f for per os solution or IR
relative (f)). formulation). If the %FD is used, only the rate of absorption

Fig. 5. Wagner Nelson in case of flip-flop, on the right, difference in terminal half-life is
clearly visible between IR and SR
Fig. 6. Curve simulation with an overestimation of the F

is calculated as all the formulations reach 100 % at the end, exact dose absorbed cannot be estimated (Fig. 6). In this case,
that being in particular the case with Wagner Nelson method the AUC as well as Cmax are overestimated by a similar
(see above). In this last case, the value of F or f must be magnitude
reintegrated in the convolution step to correct the final time As the simulations performed with level A ICIVC can be
concentration simulated profiles. This presentation in %FD used as a surrogate to in vivo data and replace a bioequiva-
allows differentiating between the rate of absorption and the lence study, authorities do not allow a correction of the
extent of absorption (F or f). bioavailability in order to adjust the curves to the expected
This absorption curve, which is the input curve of the results. This position is comprehensible, as in this case,
drug in the body, depends on the dosage form and the IVIVC does not exhibit a good predictability and thus cannot
properties of the active pharmaceutical ingredient, and anticipate the bioavailability in vivo.
thereafter its pharmacokinetics input processes (first pass However, a simple question could be asked, could the
effect [FPE], location and type of absorption (32–35)). As in uniformity of content (U of content of formulations, estab-
all chains of phenomena, the observation made at the end of lished according to the certificate of analysis) be taken into
the chain (blood concentrations) is determined by the slowest account? The answer is somehow presented in guideline (38)
phenomenon in the whole chain. In practice, no modification where it is specified if the U of content of the formulation
of the bioavailability is allowed. This assumption is sensible differs by more than 5 % then a correction could be
when all mechanisms underlined by the release and the envisaged to correct the results of BE studied.
absorption are not dependent on the dose included in the
formulation, its release rate or any physiological problems. PREDICTABILITY AND CORRECTION
If by any modification technique the release or availabil- OF THE REFERENCE
ity of the drug is increased, then the quantity that might be
absorbed will be dependent on the formulation and not easily The last step before IVIVC can be used as biowaiver or
predicted. For this reason for biowaivers as well as for IVIVC as surrogate of any in vivo data is validation by predictability
(2,4,7,8,36–38), it is strongly recommended not to modify, (2,4,5,7,8).
between the formulation, the composition qualitatively and Predictability is the ability of the IVIVC to predict
quantitatively in excipients which might influence the solubi- accurately the in vivo data. It could be based on internal
lisation and/or the permeation of the active ingredient. For (retro calculation of initial data) or external (new set of data)
per os formulation or implants, the problem could be similar. predictability. In this last case, the data could come from the
For example, for a drug which could release on a long period same BE study or for a new one (Fig. 7).
of time per os, the limiting factor is the duration of the transit. If a new study is used to validate the IVIVC or to
If the fraction of dose available for absorption is confirm some calculations made with the IVIVC, it is rare
increased vs. the formulation used to establish the IVIVC, that the same subjects participated to both studies. In this
the simulated profile, calculated based on the in vitro release, case, some differences could exist in the response due to any
could exhibit the proper shape but not magnitude (as the reason, varying from a more basic one (sex repartition, age,

Fig. 7. Internal vs. external predictability

Fig. 8. Comparison of data between two studies. The curves to the left show initial data, the
curves to right show corrected data

etc.…) to a more complicated one (dissimilar geno- or on an individual basis. Figure 9 presents the results of such a
phenotyping). In this case, the response of the subjects to prediction.
the drug could be different and challenge the IVIVC. Figure 8 For the prediction of the plasma concentration based on
presents some data from two studies. individual data values as the IVIVC is common for all subjects as
As shown in Fig. 8, it is obvious that the two test well as the in vitro dissolution, the input function is similar for all
formulations are superimposable leading to a conclusion that subjects. That leads to an identical shape for all predicted curves
they are equal. A more clever analysis will bring additional even if their magnitude is different. The interest of such
information. The reference formulation could be estimated to individual predictability is to avoid restricting the pool of data
have a constant quality compared to the new tested formu- to a single set of data called a mean curve. This mean curve
lation. Assuming this point, the two curves corresponding to cannot estimate adequately the variability of response and
the reference must exhibit similar patterns and magnitude. If underestimates the subject effect. As IVIVC is usually done
that is not the case, it means that the subjects or patients did when intra subject variability is lower than inter subject, this is
react on the same way to this formulation. It could be not reflected by this individual predictability, which reflects only
expected in this case that the test formulation must exhibit the inter subject one. In any case, the variability of the predicted
similar differences, and thus the new test formulation must be set of data is equal to the variability of the initial one and thus
reevaluated taking into account this point. The more simple could not estimate a specific action of the formulation on the
correction factor which could be used is the ratio of AUC variability of the subjects.
between the two reference formulations (20 % in this case). The next point to take into account for predictability is the
This correction leads to the curves depicted on to the right in data used to calculate the percentage of error. In case of use of
Fig. 8. After correction, the two reference curves are mean curves (either plasma concentration or absorption), the
comparable in terms of AUC, although this is no longer the simulated data refers to this mean curve. The Cmax and AUC of
case for Cmax. This approach helps to compare data between this mean curve is different from the mean of individual Cmax
studies and to take a better decision for the formulation but and AUC, so those parameters must be calculated in similar way
could not be used to support any surrogate application. in both cases to allow an accurate comparison. When parame-
The prediction, as stated in the first part of this paper for ters are calculated on a subject basis, the mean value could be
establishing IVIVC, could be performed on the mean value or different from the parameters presented in the results of the

Fig. 9. Prediction based on individual data compared to mean prediction. The curves to the
left shows original individual data values whereas the curves to the right show predicted
individual data values
