Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Discover millions of ebooks, audiobooks, and so much more with a free trial

From $11.99/month after trial. Cancel anytime.

Statistical Thinking for Non-Statisticians in Drug Regulation
Statistical Thinking for Non-Statisticians in Drug Regulation
Statistical Thinking for Non-Statisticians in Drug Regulation
Ebook515 pages5 hours

Statistical Thinking for Non-Statisticians in Drug Regulation

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Written by a well-known lecturer and consultant to the pharmaceutical industry, this book focuses on the pharmaceutical non-statistician working within a very strict regulatory environment. Statistical Thinking for Clinical Trials in Drug Regulation presents the concepts and statistical thinking behind medical studies with a direct connection to the regulatory environment so that readers can be clear where the statistical methodology fits in with industry requirements. Pharmaceutical-related examples are used throughout to set the information in context. As a result, this book provides a detailed overview of the statistical aspects of the design, conduct, analysis and presentation of data from clinical trials within drug regulation.

Statistical Thinking for Clinical Trials in Drug Regulation:

  • Assists pharmaceutical personnel in communicating effectively with statisticians using statistical language
  • Improves the ability to read and understand statistical methodology in papers and reports and to critically appraise that methodology
  • Helps to understand the statistical aspects of the regulatory framework better quoting extensively from regulatory guidelines issued by the EMEA (European Medicines Evaluation Agency), ICH (International Committee on Harmonization and the FDA (Food and Drug Administration)
LanguageEnglish
Release dateMay 20, 2013
ISBN9781118702352
Statistical Thinking for Non-Statisticians in Drug Regulation
Author

Richard Kay

Wishes to not have this.

Read more from Richard Kay

Related to Statistical Thinking for Non-Statisticians in Drug Regulation

Related ebooks

Medical For You

View More

Related articles

Reviews for Statistical Thinking for Non-Statisticians in Drug Regulation

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Statistical Thinking for Non-Statisticians in Drug Regulation - Richard Kay

    1

    Basic ideas in clinical trial design

    1.1 Historical perspective

    As many of us who are involved in clinical trials will know, the randomised, controlled trial is a relatively new invention. As pointed out by Pocock (1983) and others, very few clinical trials of the kind we now regularly see were conducted prior to 1950. It took a number of high profile successes plus the failure of alternative methodologies to convince researchers of their value.


    Example 1.1: The Salk Polio Vaccine trial

    One of the largest trials ever conducted took place in the US in 1954 and concerned the evaluation of the Salk Polio Vaccine. The trial has been reported extensively by Meier (1978) and is used by Pocock (1983) in his discussion of the historical development of clinical trials.

    Within the project there were essentially two trials and these clearly illustrated the effectiveness of the randomised controlled design.

    Trial 1: Original design; observed control

    1.08 million children from selected schools were included in this first trial. The second graders in those schools were offered the vaccine while the first and third graders would serve as the control group. Parents of the second graders were approached for their consent and it was noted that the consenting parents tended to have higher incomes. Also, this design was not blinded so that both parents and investigators knew which children had received the vaccine and which had not.

    Trial 2: Alternative design; randomised control

    A further 0.75 million children in other selected schools in grades one to three were to be included in this second trial. All parents were approached for their consent and those children where consent was given were randomised to receive either the vaccine or a placebo injection. The trial was double-blind with parents, children and investigators unaware of who had received the vaccine and who had not.

    The results from the randomised control trial were conclusive. The incidence of paralytic polio for example was 0.057 per cent in the placebo group compared to 0.016 per cent in the active group and there were four deaths in the placebo group compared to none in the active group. The results from the observed control trial, however, were less convincing with a smaller observed difference (0.046 per cent versus 0.017 per cent). In addition, in the cases where consent could not be obtained, the incidence of paralytic polio was 0.036 per cent in the randomised trial and 0.037 per cent in the observed control trial, event rates considerably lower than those amongst placebo patients and in the untreated controls respectively. This has no impact on the conclusions from the randomised trial, which is robust against this absence of consent; the randomised part is still comparing like with like. In the observed control part however the fact that the ‘no consent’ (grade 2) children have a lower incidence that those children (grades 1 and 3) who were never offered the vaccine potentially causes some confusion in a non-randomised comparison; does it mean that grade 2 children naturally have lower incidence than those in grades 1 and 3? Whatever the explanation, the presence of this uncertainty reduced confidence in other aspects of the observed control trial.


    The randomised part of the Salk Polio Vaccine trial has all the hallmarks of modern day trials; randomisation, control group, blinding and it was experiences of these kinds that helped convince researchers that only under these conditions can clear, scientifically valid conclusions be drawn.

    1.2 Control groups

    We invariably evaluate our treatments by making comparisons; active compared to control. It is very difficult to make absolute statements about specific treatments and conclusions regarding the efficacy and safety of a new treatment are made relative to an existing treatment or placebo.

    ICH E10 (2001): ‘Note for Guidance on Choice of Control Group in Clinical Trials’

    ‘Control groups have one major purpose: to allow discrimination of patient outcomes (for example, changes in symptoms, signs, or other morbidity) caused by the test treatment from outcomes caused by other factors, such as the natural progression of the disease, observer or patient expectations, or other treatment.’

    Control groups can take a variety of forms, here are just a few examples of trials with alternative types of control group:

    Active versus placebo

    Active A versus active B (versus active C)

    Placebo versus dose level 1 versus dose level 2 versus dose level 3 (dose-finding)

    Active A+active B versus active A+placebo (add-on)

    The choice will depend on the objectives of the trial.

    Open trials with no control group can nonetheless be useful in an exploratory, maybe early phase setting, but it is unlikely that such trials will be able to provide confirmatory, robust evidence regarding the performance of the new treatment.

    Similarly, external or historical controls (groups of subjects external to the study either in a different setting or previously treated) cannot provide definitive evidence. Byar (1980) provides an extensive discussion on these issues.

    1.3 Placebos and blinding

    It is important to have blinding of both the subject and the investigator wherever possible to avoid unconscious bias creeping in, either in terms of the way a subject reacts psychologically to a treatment or in relation to the way the investigator influences or records subject outcome.

    ICH E9 (1998): ‘Note for Guidance on Statistical Principles for Clinical Trials’

    ‘Blinding or masking is intended to limit the occurrence of conscious or unconscious bias in the conduct and interpretation of a clinical trial arising from the influence which the knowledge of treatment may have on the recruitment and allocation of subjects, their subsequent care, the attitudes of subjects to the treatments, the assessment of the end-points, the handling of withdrawals, the exclusion of data from analysis, and so on.’

    Ideally the trial should be double-blind with both the subject and the investigator being blind to the specific treatment allocation. If this is not possible for the investigator, for example, then the next best thing is to have an independent evaluation of outcome, both for efficacy and for safety. A single-blind trial arises when either the subject or investigator, but not both, is blind.

    An absence of blinding can seriously undermine the validity of an endpoint in the eyes of regulators and the scientific community more generally, especially when the evaluation of that endpoint has an element of subjectivity. In situations where blinding is not possible it is essential to use hard, unambiguous endpoints.

    The use of placebos and blinding go hand in hand. The existence of placebos enable trials to be blinded and account for the placebo effect; the change in a patient’s condition that is due to the act of being treated, but is not caused by the active component of that treatment.

    1.4 Randomisation

    Randomisation is clearly a key element in the design of our clinical trials. There are two reasons why we randomise subjects to the treatment groups:

    To avoid any bias in the allocation of the patients to the treatment groups

    To ensure the validity of the statistical test comparisons

    Randomisation lists are produced in a variety of ways and we will discuss several methods later. Once the list is produced the next patient entering the trial receives the next allocation within the randomisation scheme. In practice this process is managed by ‘packaging’ the treatments according to the pre-defined randomisation list.

    There are a number of different possibilities when producing randomisation lists:

    Unrestricted randomisation

    Block randomisation

    Unequal randomisation

    Stratified randomisation

    Central randomisation

    Dynamic allocation and minimisation

    Cluster randomisation

    1.4.1 Unrestricted randomisation

    Unrestricted (or simple) randomisation is simply a random list of, for example, As and Bs. In a moderately large trial, with say n = 200 subjects, such a process will likely produce approximately equal group sizes. There is no guarantee however that this will automatically happen and in small trials, in particular, this can cause problems.

    1.4.2 Block randomisation

    To ensure balance in terms of numbers of subjects, we usually undertake block randomisation where a randomisation list is constructed by randomly choosing from the list of potential blocks. For example, there are six ways of allocating two As and two Bs in a ‘block’ of size four:

    AABB, ABAB, ABBA, BAAB, BABA, BBAA

    and we choose at random from this set of six blocks to produce our randomisation list, for example:

    ABBA BAAB ABAB ABBA,…

    Clearly if we recruit a multiple of four patients into the trial we will have perfect balance, and approximate balance (which is usually good enough) for any sample size.

    In large trials it could be argued that block randomisation is unnecessary. In one sense this is true, overall balance will be achieved by chance with an unrestricted randomisation list. However, it is usually the case that large trials will be multicentre trials and not only is it important to have balance overall it is also important to have balance within each centre. In practice therefore we would allocate several blocks to each centre, for example five blocks of size four if we are planning to recruit 20 patients from each centre. This will ensure balance within each centre and also overall.

    How do we choose block size? There is no magic formula but more often than not the block size is equal to two times the number of treatments.

    What are the issues with block size?

    ICH E9 (1998): ‘Note for Guidance on Statistical Principles for Clinical Trials’

    ‘Care must be taken to choose block lengths which are sufficiently short to limit possible imbalance, but which are long enough to avoid predictability towards the end of the sequence in a block. Investigators and other relevant staff should generally be blind to the block length…’

    Shorter block lengths are better at producing balance. With two treatments a block length of four is better at producing balance than a block length of 12. The block length of four gives perfect balance if there is a multiple of four patients entering, whereas with a block length of 12, perfect balance is only going to be achieved if there are a multiple of 12 patients in the study. The problem, however, with the shorter block lengths is that this is an easy code to crack and inadvertent unblinding can occur. For example suppose a block length of four was being used in a placebo controlled trial and also assume that experience of the active drug suggests that many patients receiving that drug will suffer nausea. Suppose the trial begins and the first two patients suffer nausea. The investigator is likely to conclude that both these patients have been randomised to active and that therefore the next two allocations are to placebo. This knowledge could influence his willingness to enter certain patients into the next two positions in the randomisation list, causing bias in the mix of patients randomised into the two treatment groups. Note the comment in the ICH guideline regarding keeping the investigator (and others) blind to the block length. While in principle this comment is sound, the drug is often delivered to a site according to the chosen block length, making it difficult to conceal information on block size. If the issue of inadvertent unblinding is going to cause problems then more sophisticated methodologies can be used, such as having the block length itself varying; perhaps randomly chosen from two, four or six.

    1.4.3 Unequal randomisation

    All other things being equal, having equal numbers of subjects in the two treatment groups provides the maximum amount of information (the greatest power) with regard to the relative efficacy of the treatments. There may, however, be issues that override statistical efficiency:

    It may be necessary to place more patients on active compared to placebo in order to obtain the required safety information.

    In a three group trial with active A, active B and placebo(P), it may make sense to have a 2:2:1 randomisation to give more power for the A versus B comparison as that difference is likely to be smaller then the A versus P and B versus P differences.

    Unequal randomisation is sometimes needed as a result of these considerations. To achieve this, the randomisation list will be designed for the second example above with double the number of A and B allocations compared to placebo.

    For unequal randomisation we would choose the block size accordingly. For a 2:1 randomisation to A or P we could randomly choose from the blocks:

    AAP, APA, PPA

    1.4.4 Stratified randomisation

    Block randomisation therefore forces the required balance in terms of the numbers of patients in the treatment groups, but things can still go wrong. For example, let’s suppose in an oncology study with time to death as the primary endpoint that we can measure baseline risk (say in terms of the size of the primary tumour) and classify patients as either high risk (H) or low risk (L) and further suppose that the groups turn out as follows:

    A: HHLHLHHHHLLHHHLHHLHHH (H=15, L = 6)

    B: LLHHLHHLLHLHLHLHHLLHLL (H=10, L=12)

    Note that there are 15 patients (71 per cent) high risk and six (29 per cent) low risk patients in treatment group A compared to a split of 10 (45 per cent) high risk and 12 (55 per cent) low risk patients in treatment group B.

    Now suppose that the mean survival times are observed to be 21.5 months in A and 27.8 months in group B. What conclusions can we draw? It is very difficult; the difference we have seen could be due to treatment differences or could be caused by the imbalance in terms of differential risk across the groups, or a mixture of the two. Statisticians talk in terms of confounding (just a fancy way of saying ‘mixed up’) between the treatment effect and the effect of baseline risk. This situation is very difficult to unravel and we avoid it by stratified randomisation to ensure that the ‘case mix’ in the treatment groups is comparable.

    This simply means that we produce separate randomisation lists for the high risk and the low risk patients, the strata in this case. For example the following lists (which are block size four in each case):

    H: ABBAAABBABABABABBBAAABBAABABBBAA

    L: BAABBABAAABBBAABABABBBAABBAABAAB

    will ensure firstly that we end up with balance in terms of group sizes but also secondly that both the high and low risk patients will be equally split across those groups, that is balance in terms of the mix of patients.

    Having separate randomisation lists for the different centres in a multi-centre trial to ensure ‘equal’ numbers of patients in the treatment groups within each centre is using ‘centre’ as a stratification factor; this will ensure that we do not end up with treatment being confounded with centre.

    ICH E9 (1998): ‘Note for Guidance on Statistical Principles for Clinical Trials’

    ‘It is advisable to have a separate random scheme for each centre, i.e. to stratify by centre or to allocate several whole blocks to each centre. Stratification by important prognostic factors measured at baseline (e.g. severity of disease, age, sex, etc.) may sometimes be valuable in order to promote balanced allocation within strata…’

    Where the requirement is to have balance in terms of several factors, a stratified randomisation scheme using all combinations of these factors to define the strata would ensure balance. For example if balance is required for sex and age, then a scheme with four strata:

    Males, < 50 years

    Females, < 50 years

    Males, ≥ 50 years

    Females, ≥ 50 years

    will achieve the required balance.

    1.4.5 Central randomisation

    In central randomisation the randomisation process is controlled and managed from a centralised point of contact. Each investigator makes a telephone call through an Interactive Voice Response System (IVRS) to this centralised point when they have identified a patient to be entered into the study and is given the next allocation, taken from the appropriate randomisation list. Blind can be preserved by simply specifying the number of the (pre-numbered) pack to be used to treat the particular patient; the computerised system keeps a record of which packs have been used already and which packs contain which treatment. Central randomisation has a number of practical advantages:

    It can provide a check that the patient about to be entered satisfies certain inclusion/exclusion criteria thus reducing the number of protocol violations.

    It provides up-to-date information on all aspects of recruitment.

    It allows more efficient distribution and stock control of medication.

    It provides some protection against biased allocation of patients to treatment groups in trials where the investigator is not blind; the investigator knowing the next allocation could (perhaps subconsciously) select patients to include or not include based on that knowledge; with central randomisation the patient is identified and information given to the system before the next allocation is revealed to them.

    It gives an effective way of managing multi-centre trials.

    It allows the implementation of more complex allocation schemes such as minimisation and dynamic allocation.

    Earlier we discussed the use of stratified randomisation in multi-centre trials and where the centres are large this is appropriate. With small centres however, for example in GP trials, this does not make sense and a stratified randomisation with ‘region’ defining the strata may be more appropriate. Central randomisation would be essential to manage such a scheme.

    Stratified randomisation with more than a small number of strata would be difficult to manage at the site level and the use of central randomisation is then almost mandatory.

    1.4.6 Dynamic allocation and minimisation

    ICH E9 (1998): ‘Note for Guidance on Statistical Principles for Clinical Trials’

    ‘Dynamic allocation is an alternative procedure in which the allocation of treatment to a subject is influenced by the current balance of allocated treatments and, in a stratified trial, by the stratum to which the subject belongs and the balance within that stratum. Deterministic dynamic allocation procedures should be avoided and an appropriate element of randomisation should be incorporated for each treatment allocation.’

    Dynamic allocation moves away from having a pre-specified randomisation list and the allocation of patients evolves as the trial proceeds. The method looks at the current balance, in terms of the mix of patients and a number of pre-specified factors, and allocates the next patient in an optimum way to help redress any imbalances that exist at that time.

    For example, suppose we require balance in terms of sex and age (≥65 versus < 65) and part way through the trial we see a mix of patients as in Table 1.1.

    Table 1.1 Current mix of patients

    Treatment group A contains proportionately more males (12 out of 25 versus 10 out of 25) than treatment group B but fewer patients over 65 (7 out of 25 versus 8 out of 25). Further suppose that the next patient to enter is male and aged 68 years. In terms of sex we would prefer that this patient be placed in treatment group B while for age we would prefer this patient to enter in group A. The greater imbalance however is in relation to sex so our overall preference would be for treatment group A to help ‘correct’ for the current imbalance. The method of minimisation would simply put this patient in group B. ICH E9 however recommends that we have a ‘random element’ to that allocation and so for example we would allocate this patient to treatment group A with say probability 0.7. Minimisation is a special case of dynamic allocation where the random assignment probability (0.7 in the example) is equal to one. Of course with a small number of baseline factors, for example centre and two others, stratified randomisation will give good enough balance and there is no need to consider the more complex dynamic allocation. This technique, however, has been proposed when there are more factors involved.

    Since the publication of ICH E9 there has been considerable debate about the validity of dynamic allocation, even with the random element. There is a school of thought which has some sympathy within regulatory circles that supports the view that the properties of standard statistical methodologies, notably p-values and confidence intervals, are not strictly valid when such allocation schemes are used. As a result regulators are very cautious:

    CPMP (2003): ‘Points to Consider on Adjustment for Baseline Covariates’

    ‘… techniques of dynamic allocation such as minimisation are sometimes used to achieve balance across several factors simultaneously. Even if deterministic schemes are avoided, such methods remain highly controversial. Thus applicants are strongly advised to avoid such methods.’

    So if you are planning a trial then stick with stratification and avoid dynamic allocation. If you have an ongoing trial which is using dynamic allocation then continue, but be prepared at the statistical analysis stage to supplement the standard methods of calculating p-values with more complex methods which take account of the dynamic allocation scheme. These methods go under the name of randomisation tests.

    See Roes (2004) for a comprehensive discussion of dynamic allocation.

    1.4.7 Cluster randomisation

    In some cases it can be more convenient or appropriate not to randomise individual patients, but to randomise groups of patients. The groups for example could correspond to GPs so that each GP enters say four patients and it is the 100 GPs that are randomised, 50 giving treatment A and 50 giving treatment B. Such methods are used but are more suited to phase IV than the earlier phases of clinical development.

    Bland (2004) provides a review and some examples of cluster randomised trials while Campbell, Donner and Klar (2007) give a comprehensive review of the methodology.

    1.5 Bias and precision

    When we are evaluating and comparing our treatments we are looking for two things:

    An unbiased, correct view of how effective (or safe) the treatment is

    An accurate estimate of how effective (or safe) the treatment is

    As statisticians we talk in terms of bias and precision; we want to eliminate bias and to have high precision. Imagine having 10 attempts at hitting the bull’s eye on a target board as shown in Figure 1.1. Bias is about hitting the bull’s eye on average; precision is about being consistent.

    These aspects are clearly set out in ICH E9.

    Figure 1.1 Bias and precision

    c01_fig1-1.jpg

    ICH E9 (1998): ‘Note for Guidance on Statistical Principles for Clinical Trials’

    ‘Many of the principles delineated in this guidance deal with minimising bias and maximising precision. As used in this guidance, the term bias describes the systematic tendency of any factors associated with the design, conduct, analysis and interpretation of the results of clinical trials to make the estimate of a treatment effect deviate from its true value.’

    What particular features in the design of a trial help to eliminate bias?

    Concurrent control group as the basis for a ‘comparison’

    Randomisation to avoid bias in allocating subjects to treatments

    Blinding of both the subject and the investigator

    Pre-specification of the methods of statistical analysis

    What particular features in the design of a trial help to increase precision?

    Large sample size

    Measuring the endpoints in a precise way

    Standardising aspects of the protocol which impact on patient-to-patient variation

    Collecting data on key prognostic factors

    Choosing a homogeneous group of patients

    Choosing the most appropriate design (for example using a cross-over design rather than a parallel group design where this is appropriate)

    Several of the issues raised here may be unclear at this point, simply be aware that eliminating bias and increasing precision are the key issues that drive our statistical thinking from a design perspective. Also be aware that if something should be sacrificed then it is precision rather than bias. High precision in the presence of bias is of no value. First and foremost we require an unbiased view; increasing precision is then a bonus. Similar considerations are also needed when we choose the appropriate statistical methodology at the analysis stage.

    1.6 Between- and within-patient designs

    The simplest trial design of course is the parallel group design assigning patients to receive either treatment A or treatment B. While this is a valid and effective design it is important to recognise some inherent drawbacks. For example, suppose we have a randomised parallel group design in hypertension with 50 patients per group and that the mean fall in diastolic blood pressure in each of the two groups is as follows:

    c01_img1-4.jpg

    It would be easy simply to conclude in light of the data that B is a more effective treatment than A, but is that necessarily the case? One thing we have to remember is that the 50 patients in group A are a different group of patients from the 50 in group B and patients respond differently, so in fact the observed difference between the treatments could simply be caused by patient-to-patient variation.

    As we will see later, unravelling whether the observed difference is reflective of a real treatment difference or simply a chance difference caused by patient-to-patient variation with identical treatments is precisely the role of the p-value; but it is not easy.

    This design is what we refer to as a between-patient design. The basis of the treatment comparison is the comparison between two independent groups of patients.

    An alternative design is the within-patient design. Such designs are not universally applicable but can be very powerful under certain circumstances. One form of the within-patient design is the paired design:

    In ophthalmology; treatment A in the right eye, treatment B in the left eye

    In a volunteer study in wound care; ‘create’ a wound on each forearm and use dressing of type A on the right forearm and dressing of type B on the left forearm

    Here the 50 subjects receiving A will be the same 50 subjects who receive B and the comparison of A and B in terms of say mean healing time in the second example is a comparison based on identical ‘groups’ of subjects. At least in principle, drawing conclusions regarding the relative effect of the two treatments and accounting for the patient-to-patient variation may be easier under these circumstances.

    Another example of the within-patient design is the cross-over design. Again each subject receives each of the treatments but now sequentially in time with some subjects receiving the treatments in the order A followed by B and some in the order B followed by A.

    In both the paired design and the cross-over design there is, of course, randomisation; in the second paired design example above, it is according to which forearm receives A and which receives B and randomisation is to treatment order, A/B or B/A, in the cross-over design.

    1.7 Cross-over trials

    The cross-over trial was mentioned in the previous section as one example of a within-patient design. In order to discuss some issues associated with these designs we will consider the simplest form of cross-over trial; two treatments A and B and two treatment periods I and II.

    The main problem with the use of this design is the possible presence of the so-called carry-over effect. This is the residual effect of one of the treatments in period I influencing the outcome on the other treatment in period II. An extreme example of this would be the situation where one of the treatments, say A, was very efficacious, so much so that many of the patients receiving treatment A were cured of their disease, while B was ineffective and had no impact on the underlying disease. As a consequence many of the subjects following the A/B sequence would give a good response at the end of period I (an outcome ascribed to A) but would also give a good response at the end of period II (an outcome ascribed to B) because they were cured by A. These data would give a false impression of the A versus B difference. In this situation the B data obtained from period II is contaminated and the data coming out of such a trial are virtually useless.

    It is important therefore to only use these designs when you can be sure that carry-over effects will not be seen. Introducing a washout period between period I and period II can help to eliminate carry-over so that when the subject enters period II their disease condition is similar to what it was at

    Enjoying the preview?
    Page 1 of 1