Multivariate Statistical Methods for the Environmental Forensic Classification of Coal Tars from Former Manufactured Gas Plants

Laura McGregor

Article pubs.acs.org/est Multivariate Statistical Methods for the Environmental Forensic Classification of Coal Tars from Former Manufactured Gas Plants Laura A. McGregor,*,† Caroline Gauchotte-Lindsay,† Niamh Nic Daéid,‡ Russell Thomas,§ and Robert M. Kalin† † David Livingstone Centre for Sustainability, Department of Civil and Environmental Engineering, University of Strathclyde, Graham Hills Building, 50 Richmond Street, Glasgow, United Kingdom ‡ Centre for Forensic Science, Department of Pure and Applied Chemistry, University of Strathclyde, Royal College Building, 204 George Street, Glasgow, United Kingdom § Parsons Brinckerhoff, Queen Victoria House, Redland Hill, Bristol, United Kingdom S Supporting Information * ABSTRACT: Compositional disparity within a set of 23 coal tar samples (obtained from 15 different former manufactured gas plants) was compared and related to differences between historical on-site manufacturing processes. Samples were prepared using accelerated solvent extraction prior to analysis by two-dimensional gas chromatography coupled to time-of-flight mass spectrometry. A suite of statistical techniques, including univariate analysis, hierarchical cluster analysis, two-dimensional cluster analysis, and principal component analysis (PCA), were investigated to determine the optimal method for source identification of coal tars. The results revealed that multivariate statistical analysis (namely, PCA of normalized, preprocessed data) has the greatest potential for environmental forensic source identification of coal tars, including the ability to predict the processes used to create unknown samples. environment to drive off volatiles as a gas.6 Impurities were removed by passing the gas through a plant which would cool (condensers), wash (washers and scrubbers), and purify (purifiers) the gas; any tar and liquor recovered would drain into the tar well.3 Low-temperature horizontal retorts were initially found on all gasworks, prior to their modification to withstand higher temperatures which provided optimal gas production.7 In 1903, high-temperature vertical retorts were introduced in Britain to allow continuous loading of coal and thus continuous gas production.8 These minor changes in retort size, shape, and temperature all had an influence on the composition of the byproducts.9 Another significant advance was the development of the carbureted water gas (CWG) process, introduced in 1889, to allow cheaper and more rapid gas production to meet the everincreasing market demands. In the CWG process, steam was passed through a source of organic carbon to produce hydrogen and carbon monoxide gases. A spray of oil was then injected to the hot gas stream to induce thermal cracking and enrich the final product. The use of two fuel types (both coal and oil) gave rise to a different range of byproducts than traditional retort gasworks. An additional complicating factor was the use of coal 1. INTRODUCTION Coal tar is a byproduct of the manufactured gas industry, a global industry which thrived in Europe, North America, and other parts of the world from the early 19th century until the late 20th century, when the discovery of plentiful natural gas fields led to its decline.1 The process of gas manufacture created a number of hazardous byproducts, including coal tar and ammoniacal liquor.2 Coal tars are dense non-aqueous-phase liquids (DNAPLs) containing thousands of organic and inorganic components, including known carcinogens such as polycyclic aromatic hydrocarbons (PAHs).3 They are a ubiquitous contaminant at former manufactured gas plant (FMGP) sites, where environmental forensic investigations may be required to trace present-day contamination to its historical source.4 It has been estimated that more than 3000 FMGP sites exist in the United Kingdom alone, making coal tar contamination a heavy environmental burden.2 The composition of coal tar is highly dependent on the raw materials and the method of gas production; therefore, environmental forensic investigators must have a measure of knowledge of historical gas-making processes to understand the chemical signatures obtained.5 Gas manufacturing processes evolved over the 100 years of operation to ensure the fastest, most economic methods of gas production were employed. Early gas production was based on low-temperature horizontal retorts (LTHRs), where coal was carbonized within large, cast iron cylinders in an anoxic © 2012 American Chemical Society Received: Revised: Accepted: Published: 3744 October 19, 2011 January 20, 2012 February 15, 2012 February 15, 2012 dx.doi.org/10.1021/es203708w | Environ. Sci. Technol. 2012, 46, 3744−3752 Environmental Science & Technology Article Table 1. Summary of Manufacturing Processes Employed at FMGP Sites Investigated in This Studya site name DNAPL label site classb manufacturing process(es)d years of operation vertical retorts, potential traces of CWG, oil reforming, and early horizontal retort tar horizontal retorts horizontal retorts horizontal (early, low-temperature) and vertical retorts plus CWG plant horizontal retorts wood preservation sitedistillation of coal tar for creosote oil complex mixture, including CWG plant, horizontal and vertical retorts, tar distillation, and oil gas CWG plant horizontal retorts, potential traces of CWG and coke oven tar tully gas plant, combination of vertical retorts and water gas horizontal retorts (early, low-temperature) coke ovens (at a steelworks) coke ovens (at a steelworks) vertical retorts, potential traces of CWG tar horizontal retorts (early, low-temperature) vertical retorts 1836−1971 1856−1969 1856−1971 ceased production by 1953 1849−1981 unknown 1854 to unknown S1 S2 S3 S4 S5 S6 S7 1−6 7 8 9 10 11 12 VR HR HR VR/CWG HR CR CWG S8 S9 S10 S11 S12 S12 S13 S14 S15 13 14 and 17 15 16 18 19 20c 21 and 22 23c CWG HR VR LTHR CO CO VR HR VR 1885 to unknown unknown 1841−1961 1854−1946 1970s to present day 1930s to present day 1885 to unknown ceased production by 1870 1896−1979 a FMGP sites have been anonymized for confidentiality reasons. bThe probable site classes (VR = vertical retort, HR = horizontal retort, LTHR = low-temperature horizontal retort, CR = creosote, CWG = carbureted water gas, and CO = coke oven) were assigned on the basis of historical site data which indicated the periods of operation for each process. Labels correspond with those given in Figures 2−5. cSample used as part of a blind study, site details only disclosed after analysis and data processing were completed. dAdditional site details are provided in the Supporting Information. tar as a feedstock for the chemical industry, such as the production of dyes or creosote for wood treatment. Therefore, it is possible to find coal tar contamination at sites other than FMGPs.10 Environmental forensic based chemical fingerprinting of different coal tar types has not been thoroughly reported in the literature, and analytical techniques have previously limited the information that could be obtained from such complex samples.3 The early literature focused on the presence/absence of certain chemical classes rather than individual constituents, as this would have required extensive distillation and fractionation processes.9,11 For example, it has been reported that CWG tars contain a high abundance of alkanes (due to the carburetion oil) but low quantities of naphthalenes and phenols compared to retort tars,9 while differences in tar composition due to retort shape are mainly thought to have been caused by differences in the surface area and contact time of evolved gases with the heated retort walls.12 More recent studies have investigated the possibility of using diagnostic ratios to differentiate between major coal tar types.4,13 However, simple ratios focus on a very small portion of the overall coal tar signature, thereby limiting the source identification capability. The enhanced separation capacity of comprehensive twodimensional gas chromatography (GC×GC) can provide a wealth of information on coal tar composition without the need for rigorous, time-consuming sample fractionation.14 Previous coal tar research by McGregor et al.14 has focused on the optimization of extraction and analysis procedures to provide the entire coal tar signature in a single-step process. Consequently, this study aims to refine the statistical methods which are essential to elucidate the compositional differences between coal tars from the large volumes of chemical data produced by GC×GC analyses.15 This study investigates a number of univariate and multivariate statistical methods for source-specific correlation of the chemical signatures of various types of coal tar DNAPLs. 2. EXPERIMENTAL SECTION The analytical approach employed in this study consists of four separate steps: sample preparation, GC×GC−time-of-flight mass spectrometry (TOFMS) analysis, data preprocessing, and statistical data analysis, based on methodology previously applied by Christensen et al.16 and a continuation of coal tar research by McGregor et al.14 2.1. Samples and Standards. Twenty-three coal tar samples (labeled 1−23) were obtained from 15 different FMGP sites (labeled S1−S15) across the United Kingdom. All samples were obtained as free phase coal tar DNAPLs which were sealed and stored at 4 °C prior to analysis. The gas manufacturing processes used at each site are summarized in Table 1. At sites where multiple samples were obtained (sites S1, S9, S12, and S14), different sampling locations were used; further details of these can be found in the Supporting Information. At sites where a single sample was obtained, the tar generally represents the sole area of DNAPL discovered at that site, e.g., from within a former tar tank during excavation. The closure of the last coal gasworks in Britain occurred in 1981;17 thus, all tar samples have been exposed to environmental processes for at least 30 years (excluding fresh tar samples 18 and 19 obtained from a present-day steelworks). All solvents were of analytical grade, purchased from Fisher Scientific (Loughborough, U.K.). All deuterated PAHs were obtained from Isotec, Sigma-Aldrich (Gillingham, U.K.). 2.2. Sample Preparation. Extraction was performed using an ASE 350 accelerated solvent extraction system (Dionex, Camberley, U.K.) equipped with 10 mL stainless steel extraction cells, using hexane as the extraction solvent. Sample extraction and cleanup were performed simultaneously by the addition of a layer of silica gel to each extraction cell. The extraction procedure has been previously described in detail.14 Four deuterated surrogates (d8-naphthalene, d10-fluorene, d10-fluoranthene, and d12-chrysene) were used to monitor the extraction efficiency. The extraction efficiency was within the U.S. Environmental Protection Agency (USEPA) recommen3745 dx.doi.org/10.1021/es203708w | Environ. Sci. Technol. 2012, 46, 3744−3752 Environmental Science & Technology Article ded limits of 70−120%.18 All coal tars were extracted in duplicate, spiked with an internal standard (150 μg/mL d10phenanthrene), and analyzed by GC×GC−TOFMS. 2.3. GC×GC−TOFMS Analyses. All GC×GC−TOFMS analyses were performed using a Leco (St. Joseph, MI) time-offlight mass spectrometer, model Pegasus 4D, connected to an Agilent 7890A gas chromatograph equipped with a Leco thermal modulator. The TOF ion source was fixed at 200 °C, and masses between 45 and 500 u were scanned at a 200 spectra/s rate. The detector voltage was set at 1700 V with an applied electron ionization voltage of 70 eV. All standards and extracts were analyzed with the primary oven temperature programmed at 10 °C/min from 55 °C (2 min isotherm) to 110 °C, 3 °C/min to 210 °C, and then 8 °C/ min to 310 °C (15 min isotherm). The secondary oven and modulator temperatures were maintained at a 20 °C offset relative to the primary oven. The modulation period was 6 s with a 1.3 s hot pulse time. Helium was used as the carrier gas, with a flow rate of 1.0 mL/min. An MPS2 twister autosampler (Gerstel, GmbH & Co., Germany) was used to inject 1 μL of sample per run at a split ratio of 1:50 and injection port temperature of 250 °C. The column set comprised of a TR-50 MS supplied by Thermo Scientific (30 m × 0.25 mm i.d. × 0.25 μm film thickness) as the primary column and an Rtx-5 (1.2 m × 0.18 mm i.d. × 0.2 μm film thickness) supplied by Thames Restek (Buckinghamshire, U.K.) as the secondary column, connected via a Thames Restek Press-tight connector. 2.4. Data Collection and Preprocessing. The chromatograms for each sample were processed using Leco ChromaTOF software (version 4.22) to search, identify, and align peaks with signal-to-noise values greater than 10; peaks with similar retention times and mass spectra were selected. Initially, the 16 EPA PAHs and their alkyl homologues were selected (due to their high concentration in coal tars), and the data set was expanded by adding peaks which presented heightened intensity within at least one of the coal tar samples and thus were likely to aid source differentiation. For example, it was discovered that n-alkanes were prevalent in CWG tars and were therefore included in the data set. Chemical classes, such as phenols, which had been previously shown in the literature to vary in concentration between different tar types were also included.9 Furthermore, highly positive or negative loadings found during principal component analysis of the full chromatographic data set of 3479 peaks were also included (see the Supporting Information for score and loading plots). In total, 156 peaks were ultimately selected, and the peak areas were collated in an Excel (version 11.8) spreadsheet ready for preprocessing. The number of peaks included for each statistical data set is summarized by chemical class in Table 2. A complete list of selected compounds can be found in Table S1 of the Supporting Information. The peak area response for each target analyte was normalized against the internal standard (d10-phenanthrene) to account for any instrumental variability, before calculation of the average peak areas of duplicate extracts. A number of data transformations were then performed to evaluate their effect on source identification of coal tars, including application of square root, fourth root, eighth root, logarithm, and reciprocal transformations. Univariate statistical analyses were performed using Minitab version 16 (Minitab Ltd., Coventry, U.K.). 2.5. Multivariate Statistical Analysis. Hierarchical cluster analysis (HCA) is a statistical method of classifying samples Table 2. Number of Compounds from Each Chemical Class (and Class Labeling System) Used within Various Statistical Methods no. of peaks used for each statistical method class no.a i ii iii iv v vi vii viii ix x xi xii chemical class n-alkanes isoalkanes alkylbenzenes phenols hydronaphthalenes naphthalenes parent PAHs (≥3 rings) alkyl-PAHs (≥3 rings) N-PAHs O-PAHs parent S-PAHs alkyl-S-PAHs total HCA heat map 1b HCA heat map 2b PCAb 18 5 11 7 3 22 24 15 5 11 7 3 5 24 18 5 11 7 3 22 24 22 1 5 5 34 156 8 1 5 5 10 99 22 1 5 5 34 156 a Corresponds to chemical class numbering within Figures 3 and 4. bAs illustrated by Figures 3−5, respectively. into clusters within a dendrogram by using a similarity criterion and a clustering rule. The similarity criterion is a measure of the distance between samples; for the purpose of this study the Euclidean distance was used. A number of different clustering rules are available that describe the way in which samples are linked in the dendrogram; single, average, and complete linkage methods were all evaluated in this study to allow the method with optimal clustering to be selected. The use of two-dimensional HCA to form a heat map (or clustergram) was also investigated. In this technique HCA is performed twice, on the observations (samples) within the data set as well as on the variables (chromatographic peak areas). The results are represented by a heat map, two dendrograms linked by a color-shaded mosaic representing the intensity of each variable within each sample. Principal component analysis (PCA) is another standard technique for reducing data dimensionality and visualizing trends within a data set. In this method, large data sets can be converted into a small number of principal components (PCs), which are weighted sums of the original variables. The PCs describe the variation within a data set. Generally, 2−3 PCs are sufficient to describe the variability between samples, allowing the data to be viewed as a simple two- or three-dimensional plot of PCA scores. The variance explained by each PC is given in terms of a percentage of the total variance; a large percentage (70−100%) of the variance should be explained by the first 2− 3 PCs. HCA, PCA, and two-dimensional HCA were all performed using the Statistical Toolbox in Matlab (R2011a, version 7.12, Mathworks Inc.). 3. RESULTS AND DISCUSSION 3.1. Chromatographic Analysis. GC×GC chromatograms were compared by pattern recognition and grouped by general sample relationships. A number of samples were very similar in nature. For example, a comparison of two similar tars (samples 13 and 14) is shown in Figure 1a,b. The same components were present in each sample (with PAHs comprising the greatest portion) but in varying concentrations. 3746 dx.doi.org/10.1021/es203708w | Environ. Sci. Technol. 2012, 46, 3744−3752 Environmental Science & Technology Article Figure 1. Comparison of total ion current (TIC) GC×GC contour plots of (a, b) CWG tars representing samples 13 and 14, respectively, (c) creosote oil (sample 11), and (d) a fresh coke oven tar (sample 18). Certain samples, however, showed clear compositional differences. Sample 11 contains a high aliphatic content compared to most tars investigated in this study, and these major differences can be easily seen by comparing the GC×GC contour plot to that of a fresh coke oven tar (Figure 1c,d). Nevertheless, with over 1000 peaks per sample identified by the software (ChromaTOF, Leco), it is difficult to get an accurate comparison of the results on a visual basis. Thus, a number of statistical methods were studied to develop a robust method of sample comparison as described in the following sections. 3.2. Univariate Statistical Analysis. Initial processing of the chromatographic data focused on a traditional univariate study by investigation of a variety of diagnostic ratios. Many univariate methods have been reported in the literature for the analysis of crude oils, including a range of PAH and biomarker diagnostic ratios.19,20 However, there has been very little published data on chemical fingerprinting of coal tars. Saber et al.4 state that a plot of the fluoranthene/pyrene ratio against the dibenzofuran/fluorene ratio has the capacity to differentiate between major coal tar types. However, the double ratio plot did not distinguish any major clusters between the coal tar samples examined in this work, as shown in Figure 2a. In this study, fluoranthene/pyrene plotted against the acenaphthene/acenaphthylene ratio (Figure 2b) was found to produce the best clustering of all the diagnostic ratios investigated (which included various combinations using the 16 EPA priority pollutant PAHs and a range of heterocyclic PAHs). However, the plot is unable to distinguish between retort tar subtypes, as well as considerable overlap between CWG and retort tars, and as such univariate methods are not recommended for environmental forensic interpretation of coal tar DNAPLs. A multivariate approach is therefore deemed necessary for full environmental forensic differentiation of tar types. 3.3. Hierarchical Cluster Analysis. Preliminary trials involving HCA showed that optimal clustering could be achieved using a normalized eighth root data set with Euclidean distance and complete linkage. A two-dimensional HCA heat map was prepared for the full data set of 23 coal tars and is given in Figure 3. Four main sample clusters were identified, and the approximate groupings by manufacturing process are highlighted (based on the site classes listed in Table 1). The CWG tars and creosote cluster together; however, the coke oven and retort tars do not fully separate and are spread out among the remaining three clusters. The second dendrogram associated with clustering of the variables (in rows) can provide extra information at a glance on the way in which the sample clusters have been formed. The shading of the heat map mosaic indicates the differences between variables within each cluster. The chemical classes in Table 2 are identified within the heat map by numbering next to the variable clusters. In an attempt to improve the classification power of the heat map, the variables with poor differentiation samples were removed. In general, alkylsubstituted PAH isomers showed similar responses in the heat map and were deemed unnecessary for differentiation of 3747 dx.doi.org/10.1021/es203708w | Environ. Sci. Technol. 2012, 46, 3744−3752 Environmental Science & Technology Article Figure 2. Diagnostic ratio plot of (a) fluoranthene/pyrene (FLT/PYR) against (b) acenaphthylene/acenaphthene (ACY/ACE). Labeling corresponds to site classes given in Table 1, where VR = vertical retort, HR = horizontal retort, LTHR = low-temperature horiziontal retort, CO = coke oven, CR = creosote, and CWG = carbureted water gas. the tar sources. The reduced data set is indicated in Table S1 of the Supporting Information. In total, 57 data points were removed, and the HCA evaluation was repeated; the resulting heat map is given in Figure 4. The removal of unnecessary data points resulted in improved clustering for both samples and variables. Four main sample clusters were once again identified in Figure 4 and labeled according to the site class labels given in Table 1. The CWG/creosote tars are again grouped within a single cluster; while the coke oven and vertical and horizontal retort tars can now all be separated. However, samples 21 and 22, the low-temperature horizontal retort tars, were still misclassified with the vertical retort tars. These rare samples were obtained from an FMGP which closed in 1870, meaning they were released into the environment over 140 years ago. It is possible that environmental degradation processes have altered the coal tar signature and interfered with source identification. Sample 20 also exhibited unexpected clustering. The vertical retort tar was located within the main retort cluster; however, it is represented by a single branch rather than as part of the vertical retort group. The heat map mosaic of the reduced data set now shows defined sections resulting in the process-specific clusters, allowing easy interpretation of the results. By examination of the shading intensity, the main differences in coal tar composition can be found. The main chemical classes within each cluster are again labeled corresponding to the numbering system for chemical classes shown in Table 2. Sample 20 and, to a lesser extent, sample 15 have a high content of C27−C33 alkanes compared to the other retort tars. This deviation in alkane content is sufficient to cause sample 20 to branch outside of the vertical retort cluster. It is possible that these samples have resulted from multiple contamination sources (such as mixing with an aliphatic-rich petrogenic source) or may simply have been exposed to less degradation than the other retort tars studied. The main distinction between coke oven tars and other samples is the high parent PAH content (Figure 4). The higher proportion of parent PAHs present within the coke oven tars is indicated by the highly positive (dark red) shading for that cluster in the heat map mosaic. The coke oven samples (18 and 19) were obtained fresh from a present-day coke oven; thus, high levels of parent PAHs were anticipated. Parent PAHs degrade faster than their alkylated homologues; therefore, weathered pyrogenic samples generally have a characteristic 3748 dx.doi.org/10.1021/es203708w | Environ. Sci. Technol. 2012, 46, 3744−3752 Environmental Science & Technology Article Figure 3. Heat map of the coal tar data set of 156 peaks. The red-blue color gradient represents values of highest to lowest intensity (abbreviations correspond to the site classes identified in Table 1, while numbering corresponds to chemical classes shown in Table 2). Figure 4. Heat map of a reduced coal tar data set (using 99 peak areas). The red-blue color gradient represents values of highest to lowest intensity (abbreviations correspond to the site classes identified in Table 1, while numbering corresponds to chemical classes shown in Table 2). PAH pattern of C0 < C1 < C2 < C3, while parent PAHs are dominant in fresh, pyrogenic samples.21 The heat map also depicts the main variation between horizontal and vertical retort tars. Unlike the initial heat map (Figure 3), the two retort types are now clearly separated into two main clusters, with the exception of low-temperature retort samples 21 and 22. This is due to significant differences in the production of phenol/alkylphenol compounds between the two major retort types (Figure 4), and as such, cluster analysis separates horizontal retorts (which generally have a high content of phenols) from vertical retorts (which have little or no phenols present). This variation is likely a result of the length of time in which hot gases evolving in the retort are kept in contact with the hot retort walls. In horizontal retorts, the gaseous compounds have a greater opportunity for further degradation and higher degree of oxygen and water vapor availability, thus explaining the presence of phenols. The CWG tars exhibit a far greater concentration of low molecular weight alkanes, alkylated benzenes, and sulfurcontaining heterocycles than other tars. As previously mentioned, during the CWG process a spray of oil is introduced to enrich the gas by thermal cracking. The addition of oil could easily account for the high levels of low molecular weight aliphatics and aromatics in tar samples originating from CWG plants, as they are abundant within petroleum products. Furthermore, the enhanced levels of sulfur-containing hetero3749 dx.doi.org/10.1021/es203708w | Environ. Sci. Technol. 2012, 46, 3744−3752 Environmental Science & Technology Article Figure 5. PCA score plot of the full coal tar data set including blind study samples. Labeling corresponds to the site classes indicated in Table 1. The additional cluster of LTHRs comprises samples 9, 16, 21, and 22. On the basis of historical data of site processes, it was anticipated that tar samples 6, 21, and 22 were produced by low-temperature horizontal retorts; however, sample 9 was expected to have been formed by later vertical retorts or CWG processes on site. This alone indicates how powerful the technique can be for source identification of DNAPL plumes. The creosote oil sample is interestingly grouped within the CWG cluster, as with both HCA heat maps. There are two possible explanations for this: (a) the tar used to produce the creosote oil via distillation was produced by a CWG plant or (b) the high aliphatic content of the medium distillate creosote oil is simply most similar to that of CWG tar and is thus grouped accordingly. On the other hand, the creosote tar is clustered most closely to sample 12, a tar acquired from a complex FMGP where on-site distillation of tar was most likely performed. Nevertheless, the abundance of higher molecular weight PAHs in sample 12 suggests it is not a distilled tar. The inclusion of a greater number of creosote samples in the score plot is required to determine which analytes uniquely define this DNAPL. The basic site details for all coal tar samples are summarized in Table 1, where it can be seen that samples 1−6 were obtained from the same FMGP. The samples cluster closely in the PCA plot, despite small differences in the GC×GC chromatograms thought to be due to weathering. This shows that the model employs an accurate source fingerprint. Samples 14 and 17 were also obtained from different sampling locations within a single site, and as Figure 5 illustrates, this has a significant effect on the PCA score plot. Sample 14 clusters with the CWG tars, while sample 17 is clearly grouped with the horizontal retort tars. Historical site data show that the sampling site of sample 14 was located near a CWG structure, whereas the location of sample 17 was close to the horizontal retort house. This demonstrates that GC×GC analysis coupled with PCA has the capability to differentiate between tar signatures, not only from different sites, but within a single site also. The technique has the potential to allocate tar plumes to a specific time period when the identified process was in operation. The high degree of contamination present at most FMGP sites results in such sites being blamed immediately for cyclic PAHs, such as benzothiophenes and dibenzothiophenes, may also be explained by the addition of oil. Heavy oil or lighter petroleum products could be used in the CWG process and were generally chosen on the basis of availability and cost; therefore, the sulfur content may be a useful way of distinguishing different CWG sources. On the other hand, low-quality, inexpensive coals (with higher sulfur contents) may have been used to form the coke used within CWG systems. 3.4. Principal Component Analysis. PCA score plots were prepared for the chromatographic data set (as shown in Table S1 of the Supporting Information) using a variety of preprocessing transformations. The normalized, fourth root data set was found to produce the greatest differentiation between tar types, with the first two PCs describing 82% of the total variance (Figure 5). It is hypothesized that, without preprocessing transformations, the large range of peak intensities within the data set results in small peaks contributing less toward the principal components, regardless of their chemical importance. This was evident in PCA score plots using normalized only data and normalized square root data; thus, higher root transformations were performed. Moreover, the samples have been in the environment for decades, if not centuries in some cases (as indicated by the FMGP dates of operation in Table 1); therefore, environmental processes may have changed the contaminant ratios due to evaporation or dilution into groundwater. Taking the fourth root of the data allows the focus to be on the presence/absence of contaminants rather than their concentrations and as such relates to the primary production method instead of changes which may have occurred due to environmental factors. Two data sets were analyzed by PCA, an initial data set of 21 coal tars (samples 1−19 and 21−22) to establish source identifications and a blind study data set including a further two “unknown” samples (labeled 20 and 23) to validate the PCA model. PCA allowed separation of the coal tars into five groups according to historical manufacturing processes, as shown in Figure 5 (labeling corresponds to site classes identified by Table 1). The retort tars are now well separated from the CWG tars and further split into three subtypes; vertical (VR), horizontal (HR), and low-temperature horizontal (LTHR) retorts. 3750 dx.doi.org/10.1021/es203708w | Environ. Sci. Technol. 2012, 46, 3744−3752 Environmental Science & Technology Article separation of two tar types (samples 14 and 17) from the same FMGP site. Multivariate statistical analysis should be used for effective environmental forensic source identification of coal tar, as univariate methods were shown to be inadequate as an environmental forensic statistical approach. Multivariate methods, such as PCA, were shown to have a higher discriminatory power for the classification of coal tars collected from the environment (when a normalized, fourth root preprocessing transformation was employed), with blind study samples correctly identified according to the manufacturing process by which they were formed. any PAH contamination found in the vicinity. The level of knowledge on coal tar composition obtainable by GC×GC with PCA has the potential to easily settle any debates over liability at FMGP sites and the surrounding area. The separation of retort tars and coke oven tars appears to be described most by PC2. The loadings were investigated, showing that the phenols and toluene (positive loadings) and high molecular weight alkanes (C24−C29; negative loadings) were the most relevant variables for PC2 and were thus the peak areas with most variation between the retort tar subtypes and the coke oven tars. PC1 was mainly defined by acenaphthene, alkylbenzenes, biphenyls, numerous sulfur heterocycles, and C1-methylnaphthalenes (all positive loadings) and C30−C33 alkanes (negative loadings) and explained the distinction between CWG/creosote tars and the other tar types. These observations were consistent with the composition of the samples; for example, CWG scores fell in the positive areas of PC1, and such tars exhibited a noticeably high methylnaphthalene and acenaphthene content when compared to the other tar types. Loading plots are provided in the Supporting Information. With the initial PCA model established, the blind study samples were analyzed and added to the data set for a new PCA model to be developed. The blind study samples are highlighted by the arrows in the PCA score plot shown in Figure 5. The site details for blind study samples 20 and 23 were not disclosed by Parsons Brinkerhoff, who provided the samples, until the PCA model had been created and evaluated. The unknown samples were both classified by the PCA as vertical retort tars, the correct source identification for both samples. The PCA model allows source identification of all five tar types investigated in this study, with the potential for sitespecific differences within each cluster to be identified. The model could prove invaluable for source allocation of FMGP wastes, by identifying the specific process(es) used to produce tar plumes across a site and thus the operator(s) responsible for the contamination. Expansion of the data set of MGP coal tars in the environment will allow further validation of the groupings and could potentially allow additional industrial processes to be classified. For example, tars produced by a number of additional MGP processes (such as producer gas and Mond gas plants) could provide further clusters in the score plots. Additionally, it was not uncommon for wood or heavy oil to be used as the primary fuel source at FMGPs in certain parts of the world when the coal supply was limited. Therefore, PCA score plots of a range of global FMGP tar samples could potentially indicate differences in the raw materials used in gas production. The results of this study have shown that coal tar samples can be classified according to the processes by which they were formed over 100 years ago. The enhanced analytical power of GC×GC−TOFMS allows additional chemical information to be gained per sample compared to conventional GC techniques, allowing for more robust source identification. Traditional GC analyses coupled with statistical methods may overlook the compositional data vital for robust source identification (see Figure S6 in the Supporting Information for further details). The combination of the powerful GC×GC−TOFMS technique and the evaluation of historical site data allows for source identification of multiple tar samples in a simple and timely manner, as illustrated by the clear ■ ASSOCIATED CONTENT S Supporting Information * Additional site information, table giving the data set used for statistical classification of coal tars, and figures showing PCA score and loading plots for the full GC×GC data set of 3479 peaks, a Scree plot of the percent variability explained by each PC, a PCA score plot using PC1 and PC3 for GC×GC data (156 peaks), a PCA loading plot for PC1 and PC2 for GC×GC data (156 peaks), and a PCA score plot using only 16 U.S. EPA PAHs and alkyl PAH peaks. This material is available free of charge via the Internet at http://pubs.acs.org. ■ AUTHOR INFORMATION Corresponding Author *Phone: +44 141 548 3902; fax: +44 141 553 2066; e-mail: l.a. mcgregor@strath.ac.uk. Notes The authors declare no competing financial interest. ACKNOWLEDGMENTS We thank the Scottish Funding Council (SFC) Glasgow Research Partnership in Engineering, the SFC-funded WestCHEM Partnership, and the Engineering and Physical Sciences Research Council (EPSRC; Grant EP/D013739/2) for funding support. Parsons Brinckerhoff and National Grid are also gratefully acknowledged for providing all coal tar samples investigated. ■ ■ REFERENCES (1) Tarr, J. A. History of Manufactured Gas; Academic Press: New York, 2004; Vol. 4, pp 733−742. (2) Thomas, A. O.; Lester, J. N. The reclamation of disused gasworks sites: New solutions to an old problem. Sci. Total Environ. 1994, 152, 239−260. (3) Birak, P. S.; Miller, C. T. Dense non-aqueous phase liquids at former manufactured gas plants: Challenges to modeling and remediation. J. Contam. Hydrol. 2009, 105, 81−98. (4) Saber, D. L.; Mauro, D.; Sirivedhin, T. Applications of forensic chemistry to environmental work. J. Ind. Microbiol. Biotechnol. 2005, 32, 665−668. (5) Hamper, M. J. Manufactured gas history and processes. Environ. Forensics 2006, 7, 55−64. (6) Murphy, B. L.; Sparacio, T.; Shields, W. J. Manufactured gas plantsProcesses, historical development, and key issues in insurance coverage disputes. Environ. Forensics 2005, 6, 161−173. (7) Williams, R. A Practical Treatise on the Manufacture and Distribution of Coal Gas; E. and F. Spons: London, 1877. (8) Lewes, V. B. The Carbonisation of Coal; John Allan and Co.: London, 1912. 3751 dx.doi.org/10.1021/es203708w | Environ. Sci. Technol. 2012, 46, 3744−3752 Environmental Science & Technology Article (9) Butler, T. H. Fractional distillation in the coal tar industry. In Distillation Principles and Processes; Young, S., Ed.; Macmillan and Co.: London, 1922; pp 359−366. (10) Murphy, B. L.; Brown, J. Environmental forensics aspects of PAHs from wood treatment with creosote compounds. Environ. Forensics 2005, 6, 151−159. (11) Findlay, A. The Treasures of Coal Tar; Turnbull and Spears: U. K., 1917; pp 1−31. (12) Young, S. Distillation Principles and Processes; MacMillan and Co.: London, 1922. (13) Brown, D. G.; Gupta, L.; Kim, T. H.; Moo-Young, H. K.; Coleman, A. J. Comparative assessment of coal tars obtained from 10 former manufactured gas plant sites in the eastern United States. Chemosphere 2006, 65, 1562−1569. (14) McGregor, L. A.; Gauchotte-Lindsay, C.; Nic Daeid, N.; Thomas, R.; Daly, P.; Kalin, R. M. Ultra resolution chemical fingerprinting of dense non-aqueous phase liquids from manufactured gas plants by reversed phase comprehensive two-dimensional gas chromatography. J. Chromatogr., A 2011, 1218, 4755−4763. (15) Reichenbach, S. E.; Tian, X.; Tao, Q.; Ledford, E. B. Jr.; Wu, Z.; Fiehn, O. Informatics for cross-sample analysis with comprehensive two-dimensional gas chromatography and high-resolution mass spectrometry (GC×GC−HRMS). Talanta 2011, 83, 1279−1288. (16) Christensen, J. H.; Tomasi, G. Practical aspects of chemometrics for oil spill fingerprinting. J. Chromatogr., A 2007, 1169, 1−22. (17) Thomas, R. Forensic Investigation of Coal Tar; Technical Report FSE97140C; Parsons Brinckerhoff: Bristol, U.K., 2011. (18) U.S. Environmental Protection Agency. Test methods for evaluating solid wastes, SW-846 Method 8000B, 1997. http://www. epa.gov/waste/hazard/testmethods/sw846/online/8_series.htm (accessed Sep. 20, 2011). (19) Douglas, G. S.; Emsbo-Mattingly, S.; Stout, S. A.; Uhler, A. D.; McCarthy, K. J. Chemical fingerprinting methods. In Introduction to Environmental Forensics, 2nd ed.; Murphy, B. L., Morrison, R. D., Eds.; Elsevier: London, 2007; pp 312−439. (20) Wang, Z.; Fingas, M. Development of oil hydrocarbon fingerprinting and identification techniques. Mar. Pollut. Bull. 2003, 47, 423−452. (21) Malmquist, L. M. V.; Olsen, R. R.; Hansen, A. B.; Andersen, O.; Christensen, J. H. Assessment of oil weathering by gas chromatography−mass spectrometry, time warping and principal component analysis. J. Chromatogr., A 2007, 1164, 262−270. 3752 dx.doi.org/10.1021/es203708w | Environ. Sci. Technol. 2012, 46, 3744−3752

RELATED PAPERS

RELATED TOPICS

Log In

Multivariate Statistical Methods for the Environmental Forensic Classification of Coal Tars from Former Manufactured Gas Plants

Multivariate Statistical Methods for the Environmental Forensic Classification of Coal Tars from Former Manufactured Gas Plants

Related Papers

RELATED PAPERS

RELATED TOPICS