DSC System PCEI99339pdf
DSC System PCEI99339pdf
DSC System PCEI99339pdf
by R. H. (Rick) Meeker, Jr., P.E. President and Principal Controls Engineer Process Control Solutions, Inc.
ABSTRACT Over the last 15 years, Distributed Control Systems (DCSs) have made large quantities of process data readily available in digital form. Process data historians and information systems to take advantage of this have emerged and evolved, as well, while enabling computing technology has undergone a steady reduction in cost and increase in processing power and storage capability over the same period. Where once only implemented by large forwardthinking manufacturing sites, process data historians are soon to be standard equipment for manufacturing and utility operations of all sizes (and budgets). At the low end, these systems can be fairly simple and easy to manage, often implemented within the framework of PC-based control systems. At the high end, however, for medium to large processes, with large numbers of tags and sophisticated data collection and analysis needs, the considerations become more numerous and complex. For small or large systems, a reasonably good understanding of data historization and analysis tools is essential to proper collection, storage, management, and interpretation of process data. Presented here are a basic technical foundation and a discussion of important considerations to aid in selection, implementation, support, and appropriate use of process data historians and their associated tools. Data and perspective are drawn from actual experience supporting a large plant process database at a major pulp mill and the authors extensive combined experience in real-world process control and information systems.
BACKGROUND AND INTRODUCTION During the 1980s, Distributed Controls Systems (DCSs) grew to become the primary means of monitoring and control in pulp and paper, as well as, most other process industries. Where once specialized data acquisition computers were required, in parallel with traditional analog control systems, to gather and store process data, abundant availability of process data in digital form would become a natural by-product of the improvements brought about by the DCS. The DCS, however, was designed first and foremost to be good at control, not the storage of large quantities of data; Though real-time and historical trending capabilities were designed in early on, the basic mission of the DCS limited the extent to which data would be archived in that environment for future retrieval, analysis, and decision-making. The inevitable desire by managers, engineers, and technicians for ready access to this data, continued advances in computing technology, and innovative supplier response this perceived need soon led to systems specially designed to acquire, store, and present this data in an efficient and effective manner. The acronym of choice for these early systems was Computer Integrated Manufacturing, or CIM (which, not surprisingly, became part of the trade name of some of the early products). Today, the term CIM has largely lost its buzz in the process industries, and, though it appears as part of the trade name of some of the products still in use, it is more appropriate and more popular to now refer to process information systems and process historians, more apt descriptions of the role these systems play in the modern process industries such as pulp and paper, and the subject of this paper.
BASIC CONCEPTS Architecture The major components of a process data history system are the collector, the server, and the clients, illustrated in Figure 1, below:
Clients
Clients
Clients
Desktop retrieval and analysis tools Other databases and business applications
History Server
Real-time process history database Relational process and/or tag reference database History engine and history/query Server Tag scan tables Data buffering I/O driver/interface (specific to data source)
Collector Interface
Data Source(s)
Distributed Control Systems (DCSs) Programmable Logic Controllers (PLCs) Data acquisition systems Other databases
Figure 1. General process data history system architecture The architectural and software design implementation details within each of these major components vary with supplier, as does the terminology used to describe the major components and their internals. Collectors and servers are offered for most of the major operating systems, but, the offerings can vary considerably, depending upon the supplier. Some of the most common hardware platforms and operating systems include (hardware platform/operating system):
Interface. Most often, data is being acquired from a proprietary control system (DCS or PLC) of some sort, requiring the interface to the data source to be written for a specific system. Since interfaces are written for specific systems, the major suppliers of process data historians, typically, have large numbers of unique interface options. The many individual pieces of interface software that must be developed and maintained raise the cost and complexity of the process history system. This complexity is multiplied further when collector interfaces are offered on multiple hardware/operating system platforms. Owners and users of these systems must recognize that revision and version changes to the operating system or the surrounding history system software often require re-validation of the interface software as well, which, sometimes, is left to the user. The same may be true for testing of the software for sensitivity to specific situations or events, such as year 2000 or other date compliance testing. Note, too, that the data source can also be a data destination, when supervisory control strategies or mill process information architecture design require that data from the process history system (or elsewhere in the mill information system) be sent back down to the control system level. The use of bi-directional data flow with the control system can be expected to become more common as the open architecture evolution occurring with DCS
and PLC based control systems results in more and more applications residing in the more traditional ethernet-based information system environment. The specifics of how the process data collector interface functions depends on what type of interface(s) the control system supplier made available into their proprietary system. It is not unusual for the control system supplier to make available an interface device containing special C and FORTRAN program language libraries with which third parties, such as process data historian suppliers, can write their own custom interfaces. Ultimately the device either is, or connects directly to, a computer connected on the mill process information system network. There is currently a rapidly emerging (de-facto) standard that has particular potential to reduce the quantity of unique interface software required. OLE for Process Control, or OPC, has been developed by a task force of, mainly, companies in the controls industry, and, OPC Version 1.0, released 29 August 1996, specifies client and server side implementation of data access interfaces between systems. For example, if a proprietary control system had available an OPC-compliant server, any OPC-compliant client on any system could communicate with it, with little or no special knowledge of control-system-specific communication requirements. OPC is based on a client-server, object-oriented programming model developed by Microsoft, known as the Distributed Component Object Model, DCOM.
OPC Server(s)
OPC Server(s)
OPC 1.0a
Fieldbus
Fieldbus I/F
OPC Client(s)
OPC 2.0
Field Devices
OPC Client(s)
Figure 2. Future model for flow of data from the field, using emerging digital standards. Collector. The collector utilizes the interface to acquire data. Collector and server do not have to be on the same machine, or network node, and, if on different nodes, they do not necessarily have to be on the same hardware or operating system platform. The structure of the collector includes some sort of logical lists or tables of variables to collect (tag name and parameter or variable name) and information on when and how often to collect them. The collector is also responsible for buffering data, in the event the server is unavailable to accept it. The amount of buffering is, normally, user configurable, and can be anywhere from hours to days, depending upon the quantity of data being collected and the available storage on the collector node. Buffering becomes more important when the collector and server are physically on different network nodes due to the increased potential for reliability problems brought about by multiple machines and the network path between them. Server. Together, the server components represent the most complex portion of a process history system. They consist of tag and variable configuration management facilities and actual configuration data, the history engine, the actual history data files, applications to coordinate changes between the tag database, the history engine, and the collector, and a query server to process client requests for data. A simplified illustration appears in figure 3, below.
History Engine
On-line History
Collector Figure 3. Typical server architecture (functional) Clients. The clients for the process data historian are most often desktop personal computers, PCs. Typically, a suite of applications is available to retrieve and analyze the data. Often, components are available that integrate with office application suites to ease, for example, the direct import of data into spreadsheet or database applications. Within the client architecture, a connection mechanism must be installed and configured to access the process data historian. Common connection mechanisms are:
NetDDE, the network version of DDE (dynamic data exchange) ODBC (open database connectivity) OPC (OLE for process control)
NetDDE is a separate application that must be supplied if it is to be used as the connection mechanism. In simple terms, NetDDE provides a conversation between an application on the client node (the DDE client) and an application on the server node (the DDE server). A standard syntax is defined for the user to acquire data directly into, for example, a spreadsheet application. Templates are usually supplied to spare the user from having to become an expert on DDE syntax. For self-contained applications, such as trending and analysis packages, the NetDDE details are usually transparent to the user. ODBC connections require a suitable database driver and proper setup of the driver and the ODBC data source on the client. Support for ODBC is built into the Windows 95, 98, and NT operating systems. ODBC connections imply the possibility of using Structured Query Language (SQL) to acquire data, but, ODBC and SQL are designed to provide a standard means of working with relational databases, not proprietary real-time databases. Therefore, some intermediary products or applications (such as stored procedures), usually in the server, are required to service a query. With the release of OPC version 2.0, covering access to historical data, OPC may become a more prevalent connection means. (See the discussion under Interface.)
All of the above require some underlying connection protocol and mechanisms, the most popular being:
Functional And Operational Description Collection and Storage. The original focus of process data historians was the collection and storage of real-time (time-stamped) process data. In database terms, a real time process database can be thought of as a relational database where there is only one table, and, the primary key is time (Figure 4).
Point Tag Table Variable Name Point Tag Name Variable ID [ Key ] Point ID [ Key ] Description Description Point ID Location Range Hi etc. Range Lo Compression Type Compression Band Collection Grp ID etc. Collection Group Table Collection Name
Collection Grp ID
Relational Example
Timestamped Data
Timestamp [ Key ] Variable ID 1 Value 1 Value 1 status Variable ID 2 Value 2 Value 2 status etc.
Timestamped Data
Real-time Example 1
Status
Real-time Example 2
Figure 4. Real-time vs. relational databases. Technically, however, there are also differences in how the data is structured and stored. Databases dedicated to storage of real time process data are usually of a proprietary design, optimized for efficient (in terms of speed and space requirements) storage and retrieval. A basic record in a real time database may consist of the following
REAL-TIME
RELATIONAL
Variable Identifier, Timestamp, Value, Status Where: Variable Identifier = unique identifier of the variable being acquired; sometimes in the form of TAGNAME.VARIABLE, but, in some cases, may just be an internal ID number. Timestamp = the time the data was acquired, usually based on the system time of the collector Value = actual value retrieved; most standard data types supported, including real, string, time, etc. Status = the status of the value collected; e.g. good, no data, bad data, etc. The history files themselves may consist of one self-contained file for each discrete period of history or a set of files (history filesets) for each period, with one containing the actual process data and the other(s) information about the data contained in the main file, to aid in the retrieval (and restoration from off-line status, when required) of the data. The management of what variables are to be collected and how they are to be collected and stored can be fairly sophisticated for the more popular process data historians. The technical implementation of this can vary considerably with different suppliers. Typical information the user would specify for each variable to be collected is listed in Table 1., below: Item Description Data Source Identification of the system from which the variable will be acquired (e.g. which DCS) Tag Name The unique tag name associated with the data at the data source Variable The variable or parameter to be collected from the associated tag name Description Description of the variable being collected Data Type The data type of the variable to be collected Units Engineering units of the variable Range Low Low range limit of variable at the data source, in engineering units Range High High range limit of variable at the data source, in engineering units Tend Low Default low trend limit for the variable, for data trending and analysis tools Trend High Default high trend limit for the variable, for data trending and analysis tools Collection Period How often to sample the variable Compression Select Whether or not to perform compression on the variable Compression Type Type of compression to perform Compression Band Defines amount of movement in variable required for storage (depends on Type) Table 1. User specified data for variables to be collected. Management of the data to be collected is one of the more important considerations in the selection of a process history system. The systems tag management components should be designed for power, flexibility, dependability, and ease of use. How well a system performs these tasks can determine how successful implementation and use of the system will be, and, of course, will have a significant impact on support costs. Features of a well designed system for management of point tag and variable information include:
Point tag and variable editing - Intuitive, menu driven interface for building and modifying tags, variables, and overall classes of the same. Bulk loading facilities - Reliable, flexible, and easy-to-use tools for bulk-loading numerous variables at a time into the history system. Methods for extracting tag data from the control system for input into the bulk load process should be available and as straight forward as possible. Variable type classes and inheritance A means to set general configuration rules for whole classes of variables, and, ideally, the ability, for each configuration parameter, to define whether it will be used only on initial configuration, or to dynamically change variable configurations for existing variables. Synchronization Configurable automatic synchronization of the point tag and variable configuration data with control system (DCS) configuration data, including tag additions, deletions, and changes to key parameters such as engineering units range.
Compression Adequate selection of effective compression algorithms to minimize wasted storage space while still capturing important changes in the process data.
When a new variable is configured in the process history system, or a change is made to an existing variable, the history engine and collector must have a means of picking up the new information. Automatic change detection is normally part of the design, but, the actual mechanics depend upon the system. Often the user configuring variables must initiate some sort of processing of the changes or reset operation. Tools to monitor the status of a variable that has been configured for history collection are essential to troubleshooting, and, again, due to the proprietary design of process history systems, can vary considerably with supplier. Extremely useful, when available, are a means to check the following:
If a point tag and variable are configured in the tag database If the variable is in the history engine, and its value and status there If the variable is in the collector, and its value and status there If the variable is available from the data source, and accessible through the interface
Facilities to allow real-time data to be exported or relational data imported Application Programming Interface (API) libraries and tools for custom application development
When evaluating process data historians, make note of the licensing structure, which may be based on:
Number of point tags configured Number of variables configured Number of concurrent users (for a site or system), with no limit on tags or variables
It is important to distinguish between licensing based on number of tags and licensing based on number of variables, since, each point tag (from the control system) may have a number of variables (e.g. process variable, setpoint, output, mode, etc.) that could be historized. Compression. Concerning collection and storage of real-time process data, compression refers to intelligent decision-making as to whether or not to store a value, thereby making the best use of available storage space by not storing values that are not changing at all or by some significant amount (user-defined). This is important, since, process data historians are designed to store large amounts of data for long periods of time. Disk space usage can become considerable. Compression can be performed at the collector or the server, and, some systems provide the option to perform it at either or both. Popular compression methods, in order of increasing sophistication, are:
Store-on-change works exactly like it sounds if the variable changes it is stored. This is useful for Boolean, string, ordinal, and enumerated data types, where changes in the variable are stepwise, not continuous. (Examples include motor stop/start, digital I/O, and flag point variables, as well as strings such as point descriptor or engineering units descriptor.) Deadband applies to numerical values (variables of integer or real/floating point data types) and simply uses a fixed, user defined limit above and below the last stored value when the current value falls outside of the limits, it is
stored, and, the deadband then applies about its value. The deadband method, generally, is not as efficient at storage or data reconstruction as constrained slope or boxcar/backslope. Constrained slope utilizes line segments between successive data points and a user-defined tolerance band about the segment. Essentially, for a line drawn between the last value saved and the current value, any intermediate values falling within the tolerance band will not be stored. When the line drawn causes a prior value to fall outside the tolerance band, then the prior value is stored and becomes the new origin for the line segment. Boxcarbackslope can be considered a combination of the deadband and constrained slope methods, except that, it is always the previous value that is stored (not the current, as with the deadband method) and, in general, the current value must meet or exceed the tolerance or deviation limits for both the slope and the boxcar (i.e. deadband) for the previous value to be stored. This method is the most efficient with storage space, but, is also the most complex. For methods requiring the user to specify a tolerance, deviation, or deadband, it is important to know if the number is entered in percent of span (where span = range high range low) or in engineering units. Some systems permit the user to select which way it will be entered. When dealing with large numbers of variables, it is generally better to work with compression settings in percent, thereby allowing guidelines to be developed that are independent of the actual engineering units of the variables. Properly setting compression bands or limits for all variables in a large plant site to achieve acceptable data resolution without excessive storage requirements can be quite challenging. It is difficult to develop simple rules or formulae to determine compression band settings. In choosing the best setting, the following must be considered:
The accuracy, repeatability, and resolution of the sensor and transmitter used to make the measurement for example, it is not meaningful to set the compression band less than the repeatability band of the instrumentation. The definition of significant change for the measured variable in terms of the process and product somewhat subjective, this should be defined in percent of span to properly take into account the configured range of the variable. Typically, the range configured in the process history system will be the same as the range on the control system (DCS or other data source). Sometimes instrument ranges are configured much wider than the normal operating region, in some cases by design (for example, a chemical addition flow rate for which small fluctuations at a given process rate may have a large affect on quality, but, the chemical rate has to be measurable at all process rates and down to zero flow), and, in other cases for no particular reason (for example a boiler tube temperature transmitter ranged 0 1500 deg. F). In these cases, the compression band may have to be set smaller than what might be acceptable, on average, for most other variables.
When faced with bulk configuring a large number of variables, a good starting point for selection of the compression band (in a typical continuous process environment) is 0.5%. This should provide, on average, an acceptable balance of resolution and storage requirements. Retrieval. Dedicated applications usually exist within the history server to handle client requests for data. These sometimes utilize ODBC and SQL to maintain some sort of familiar and standardized access on the front end, while utilizing stored procedures and custom applications to actually extract requested data from history. Once again, the actual mechanics of this are mostly proprietary and supplier-specific. A well-designed system should accommodate large numbers of users (50 + ) without difficulty. When evaluating process history system suppliers, inquire as to the number of concurrent users supported, while maintaining acceptable performance, and gain a clear understanding and agreement as to the licensing arrangement for the desktop applications. Licensing can be based upon number of installed applications (on the desktop), number of concurrent users, or both. DESKTOP TOOLS USING THE DATA Understanding the Underlying Data With a wealth of data readily available, and powerful tools on the desktop to easily retrieve and analyze the data, there comes considerable potential for misuse of tools, misinterpretation of data and analysis results, and the rapid
proliferation of faulty conclusions (supported by positively impressive color charts and graphs). Much of this can be avoided with basic understanding and training. Origin of the data. The user must be ever aware of the point of origin of the data and the path it took to arrive at the desktop. The point of origin is usually field instrumentation (sensor/transmitter), but may also be derived or calculated values, or manually entered data (such as lab tests). In the case of instrumentation, the user should ask:
Is the device properly calibrated? What is the normal accuracy and repeatability? Is the device in service and functioning properly? At what point in the process is the device located? What kind of filtering or other signal processing (even data compression, in digital transmitters) is occurring at the device that may alter the meaning of the measurement or the speed of response?
Like instrumentation, lab tests also, typically have an accuracy and repeatability associated with them; And, stated accuracy and repeatability are for a test performed according to a specified procedure under specified conditions, not necessarily accounting in full for the human variability factor (considering the fact that one test is often performed by different people, on different shifts, with varied training, qualifications, and attentiveness). Finally, derived or calculated values will ultimately have some sort of measured or tested data for inputs, so, the above considerations still apply, in addition to the obvious need to understand the derivation or calculation. Data path to the desktop. Data from the field will usually be subject to input conditioning and signal processing in the control system. When an analog signal (usually 4-20mA) is used, there, can also be noise and error introduced electronically. The gradual trend towards digital signaling from the transmitter to the control system eliminates that particular concern (but does not eliminate the possibility of data loss due to noise or electrical problems). For analog signals, anti-aliasing filtering usually occurs at the point the signal enters the control system, designed with a cut-off appropriate to the analog-to-digital (A/D) conversion rate. A first-order lag filter with a cut-off frequency around 1 HZ would be typical. Once in the control system, additional filtering can be added by the user. In some control system devices, the actual rate at which the fully conditioned variable (the one the process data historian will access) is updated is configurable; For industrial process DCS equipment (including systems commonly used in pulp and paper), update or scan periods typically run from sec. to 1 sec. For inferential measurements, calculated and derived values, and certain types of analyzer measurements, update periods typically run from 1 sec. to hours. Aliasing. When the information contained in a measurement signal is periodic in nature, it must be sampled at least twice the highest frequency present in the signal in order to completely recover the actual signal (Shannons sampling theorem). Process signals are inherently noisy, with frequency components often well above those that will be practically reproducible in the control system and, certainly, the process data historian. A practical rule of thumb is to sample 6 10 times the frequency of any periodic components that must be reproduced. Sampling a periodic waveform at less than twice any frequency present can produce aliasing, that is, a new set of periodic data at a different frequency than the actual data, and, most importantly, one that never existed in the actual data. Any periodic components of a frequency greater than or equal to the sampling frequency should be attenuated by a proper anti-aliasing filter. Figure 5 illustrates the effects of aliasing when under-sampling a 60 HZ waveform. The result, in this case, is a 4 HZ waveform that does not exist in the actual/original data.
6 0 H Z s ig n a l s a m p le d 2 4 0 0 tim e s /s e c
1 5 0
1 0 0
Amplitude (percent)
5 0
0
-5 0
0 .0 2
0 .0 4
0 .0 6
0 .0 8
0 .1
0 .1 2
0 .1 4
0 .1 6
0 .1 8
-1 0 0
-1 5 0
T im e (s e c o n d s )
1 5 0
6 0 H Z s ig n a l s a m p le d 1 6 tim e s /s e c o n d (R e s u ltin g w a v e fo r m = 4 H Z )
1 0 0
Amplitude (percent)
5 0
0
-5 0
0 .2
0 .4
0 .6
0 .8
1 .2
1 .4
-1 0 0
-1 5 0
T im e (s e c o n d s )
Figure 5. Example of aliasing Time series analysis (trends, SPC charts, etc.), correlation (regression), and frequency domain analysis tools are all vulnerable to aliasing. Due to the practical and economic considerations of computing power for data collection and disk space for data storage, most variables in the process data historian are configured for collection sample periods considerably longer than the control systems sampling time. Collection sample periods for long term history have typically been around 1 minute (though this may grow shorter, on average, in the future, as computing and storage costs and capability continue to improve). Most desktop tools available today with process history systems do not incorporate any anti-aliasing filtering. The need for additional filtering, and how to accomplish it, is up to the user to determine. When analyzing data from process data historians, it is important to factor an understanding of aliasing into interpretation of any data that appears cyclical. Selecting and browsing data. Full featured process history systems include tools for selecting variables from the process history database with, at a minimum, literal or wildcard searches on point tag and variable names. Searching on point description or a combination of point description and name is even more useful, when available. These selection tools are, typically, integrated with the particular desktop application into which the user is trying to retrieve data. In addition, it is helpful to have a tool for browsing the point tag and variable configuration database, to simply view configuration data, such as range, compression, and collection period. These are things the average user needs to
know to intelligently interpret data, and, it should not be necessary to go to a system administrator to get this information. All data retrieval tools should have a means for choosing how the data will be extracted from the process history database. Normally, the user can request a value, or some derivation of a value, such as average, minimum, maximum, standard deviation, etc. Since data may be compressed, the user cannot be sure, prior to actually viewing the data, exactly how often values were actually stored. For most systems, when values are requested, the default is either some sort of a best fit of the data (different systems use different methods) or some sort of interpolation. Typical choices for retrieving data may include:
Best fit an algorithm is used to retrieve a subset of data from a very large data series, in order to reconstruct a representative plot (especially suited to plotting, when a very large data series is involved) [ at the time interval determined by the best fit algorithm ]. As-stored return the actual stored values [ at their as-stored time intervals ] Linear interpolation values are returned at a fixed user-specified interval by linear interpolation between the actual values stored. Non-linear interpolation/extrapolation values are returned based on non-linear interpolation between the actual values stored or extrapolated fit through the stored values (e.g. exponential).
It is important to have and exercise control over how the data is retrieved, particularly if any statistical or correlation analysis is to be done. Time-series plots Probably the most common way to view data is as a time-series trend. A good desktop trending tool will allow complete control over the appearance, time-scale, and individual variable scaling of the plot. Multiple traces on a trend are standard, with, typically 3 8 allowed. A typical trending application appears in figure 6, below:
C iller O eration h p
40.0 0 40.0 0
Other useful features in time-series trending applications (availability depends on selected supplier) include:
Hairline cursor ability to turn on/off a hairline cursor to indicate an exact data value and time at a point selected on the plot. Pan and scroll ability to move plot backwards and forwards in time. Zoom ability to enlarge/reduce scale on entire plot. Update ability to turn on/off active updating of plot with fresh data (for plots anchored to current time). View and copy data ability to view the underlying data points that make up the plot, and to copy them to the clipboard to paste directly into other applications (such as spreadsheets). Save ability to save the current plot, with all its setup information. Analysis built in analysis tools (to be discussed in more detail in paragraphs that follow).
Analysis Tools Major suppliers of process data historians also provide basic desktop analysis tools. Anytime analysis of data is to be performed, the data must be retrieved on a consistent time interval over a time span appropriate to the type of analysis being performed. Best fit or as-stored data is not acceptable for statistical analysis. Too large or too small a sample population can also be unacceptable. It is often necessary to avoid or parse out data for outage periods, reduced rate, or other special process conditions, and, elimination of outlier data may also be required. Selection of the interval and span, and determination of data preprocessing requirements means that the user must understand the source of the data, and, further, the process it is associated with. In many cases, the necessary preprocessing tools may not be available within the standard desktop analysis packages. SPC charts. Statistical Process Control ( SPC) charts are time-series representations of Xbar and R, where:
Xbar, X
X1 X 2 ...X n n
R = Xmax Xmin
Upper and lower control limits should also be calculated using standard SPC rules (see statistical process control reference literature for a complete treatment of the subject). An example of an SPC chart produced with one suppliers process data historian analysis tools appears in figure 7.
From: 5/11/98 12:00:00 AM To: 5/18/98 1:24:46 AM 58 X 56 Ba r 54 R ET 52 U 50 48 2.0 X Ra 1.5 ng e 1.0 R ET 0.5 U R 0 5/11/98 Sample Size
X Bar OutCtl CL LCL/UCL
5/17/98
Proper application of an SPC chart begins with choosing an appropriate subgroup size. For continuous process control applications, a subgroup size of 2 or 3 is typical. With any of the analysis tools, do not assume that the default settings are acceptable. For example, if the subgroup size defaults to 1, the result is merely a time-series trend. Histograms. A histogram displays a bar chart of the distribution of numerical values. It is extremely useful in statistical analysis to view how data is distributed. Possible distributions include normal, skewed, multiple peaks, etc. The shape of the distribution often lends insight into the process or, sometimes, the quality of the data itself (for example, distribution of data that falls off sharply at a specification limit may be cause for suspicion). If the distribution is not normal (bell-shaped curve), then the interpretation of accompanying statistics, such as standard deviation, changes. For example, with normally distributed data, about 68.3% of the data sampled will fall between +/- one standard deviation (1 ) of the mean. The user should have control over the number of bars into which the data should be divided. Figure 8 illustrates a typical histogram.
From: 5/11/98 1:24:46 AM To: 5/18/98 1:24:46 AM #Bars = 10 50
Histogram Norm .D ist M ean Value LSL/HSL 3xStd Dev
40 Co 30 un t 20
10
#Data= 248 LSL=0 HS L=99.79 M ean=53.39 Std Dev=1.808 M in=49.09 M ax=56.26 Cp=9.199 48 50 52 54 56 58 60
0 46
32TI0105.PV RET UR N
Figure 8. Histogram Regression and correlation. Tools to analyze correlation between variables are frequently provided. Most common is the straightforward two-variable linear regression. Some packages may provide additional options, including other varieties of linear regression (inverse, logarithmic), non-linear regression (quadratic, polynomial, exponential), multi-variable regression, or other less common correlation tools. For any regression, the resultant coefficients of the regressed equation, as well as some measure of goodness of fit or degree of correlation, such as the coefficient of determination, R2, should be provided (R2 is always between 0 and 1; the closer to 1, the better the correlation). Figure 9 illustrates a typical linear regression:
Y = 32 70 TI 01 05 65 .P V 60 R ET U 55 R N 50 45 43 From: 5/11/98 1:24:46 AM To: 5/18/98 1:24:46 AM
XY Plot Regression
44
45
46
47
48
49
X = 32TC0117.PV SUPPLY
Figure 9. Linear regression. Frequency Domain Analysis. Various forms of frequency domain analysis tools, based on Fast Fourier Transform (FFT) calculations, are available with some desktop tools. A common useful version of the FFT analysis is power spectrum, which, is, essentially, the mean square, or power, of the FFT coefficients, plotted against frequency.
The primary purpose of frequency domain analysis is to identify cycles in the process data. The primary risk in using frequency domain analysis at the process history system level is aliasing. As long as this is understood, it still maintains its usefulness. In process control, frequency domain analysis is particularly useful to identify the source of cycles (which often affect product quality), because, continuous processes and automatic control will tend to propagate a cycle of a particular frequency throughout the process. When applied correctly, it also helps separate controllable disturbances from uncontrollable disturbances and noise (though, it is less likely to be used in this way at the process history system level, due to the typically long collection periods, relative to the control system). Figure 10 illustrates the results of a power spectrum analysis, where the horizontal axis is displayed in cycles per hour (cph).
From: 5/11/98 1:09:18 AM To: 5/18/98 1:09:18 AM 3.0 Spectrum 2.5 Sp ect2.0 ru m 1.5 1.0 0.5 0 #Data=252 Min=0 Fmin=0 cph Max=2.336 Fmax=.0418 cph
0.6
0.8
Figure 10. Power spectrum analysis. The above example illustrates two dominant cycles, one at about 0.01 cph (or a period of about 100 hours) and the other at about 0.042 cph (or a period of about 24 hours). In this case, the 24 hour cycle (which is also obvious in the time-series plot shown in figure 6) is in chilled water supply and return temperature, about a 5 deg. F fluctuation with normal ambient temperature fluctuation over the course of a day. Retrieval of Data Directly into other Desktop Applications Tools to acquire data into popular desktop office applications, such as databases, and, especially, spreadsheets, are quite common. For databases, these are typically supplied as add-ins. For spreadsheet applications, they may be supplied as add-ins or as a template spreadsheet, containing necessary coding to connect to and query data from the process data historian. This can include use of built-in scripting languages, such as Visual Basic. Regardless of the approach the supplier has chosen, there exists added complexity in keeping these applications tested and working correctly, because, they are vulnerable to changes in software versions and revisions of the office suite applications with which they work. The ability to query data directly into a spreadsheet, however, is considered a most basic and mandatory function that must be provided, due to the widespread use and comfort level with spreadsheets among desktop users. Graphical Display Some process history system suppliers offer the ability to call up process or other graphics on the desktop, populated with values from the process data historian. At least one supplier even offers an application that will open and display proprietary graphic files from their own DCS on the desktop PC, again, populated with actual values from the process data historian. The application can then be used to play the data backward and forward through time, at different rates and intervals, in the custom graphic. More commonly, suppliers provide a means to build custom graphics on the desktop PC that can then be populated with data, according to the users design.
While these graphics applications make for good demonstrations, and, can potentially be good training tools, they are not the most useful, nor the most often used, of the tools discussed.
SUPPORT A discussion of process data historians and process information systems is not complete without addressing the subject of support. Particularly for mid to large scale operations, the complexity of the process history system itself, along with the inherent need to manage the configuration of point tags and variables, can represent a significant effort. Support Issues Issues to consider include:
On-going maintenance of point tag and variable configuration data, including additions, deletions, and changes. Keeping the process data historian configuration database consistent with the data source (control system, usually DCS), can be a significant effort. When selecting process history systems, look for features that automatically synchronize configuration data with the control system. Support and troubleshooting of the process history system applications themselves, recalling that these systems are usually of proprietary design and quite complex. Support and troubleshooting of the hardware and software operating system on which the process history system, both the process data historian and the desktop applications, resides; including system administration and software upgrade management. Management of history data files these eventually outgrow the online storage space and must be taken off-line to some other storage location (typically magnetic tape or optical storage). When the data is needed, it must be restored to the system. These files can be quite large, and, sometimes, time consuming and awkward to deal with. Support of users with widely varying levels of desktop PC literacy and varying application requirements from very simple to very sophisticated.
On an installation with approximately 6000 point tags, over 15,000 historized variables, and about 35 users, support can be estimated to require an average of 8 - 12 effort hours per week (an approximate guide, based on actual experience). The relative ease of support of various process history systems varies considerably with supplier and shouldnt be overlooked when selecting a system (the natural tendency is to focus only on what functions a system performs and how the user can access and utilize the data). Support can be handled in the following ways:
Planning for Success Site responsibility for process history systems is typically handled by process control information systems support staff. Which group will depend upon a mills organizational structure and philosophy; Because the process history system bridges the control system with the business information system world, it important that, if process control and information systems are separate groups, they have a good model for working together. As with all hardware and software systems support efforts, it is useful to develop a consistent and disciplined approach. This should include the following:
Establish and communicate a procedure for contacting key support resources. Establish a means to communicate system status information to all users (e.g. e-mail or voice mail lists). Keep a record of all installed desktop applications, including user and location. Keep good records of system problems and their resolution (an incident database can be useful) Develop a change request procedure for users to request tag and variable changes, additions, and deletions. Institute good backup procedures. Assess spares and reliability requirements of the critical data path, especially collector node(s).
CURRENT AND FUTURE Current Application It is difficult to identify hard dollar returns to justify implementation and support of a process history system. But, there are benefits. Some general applications in pulp and paper include:
Environmental monitoring and reporting environmental data (such as air emissions and vent data) readily available at the environmental managers desktop contributes to more accurate reporting (reducing fees and penalties based on emissions) and proactive response to impending permit violations. Production coordination production managers and shift supervisors can construct and customize reports for daily communication and millwide production coordination, with little assistance from process control or information systems resources. Process control performance monitoring process measurement and control data can be easily brought to the desktop for performance analysis and reporting with common desktop PC tools. Process analysis and troubleshooting process engineers and technicians can easily acquire data at the desktop for process troubleshooting, reporting, optimization analysis, etc., again, with common desktop PC tools.
Having process data readily available on the desktop, even if its short-term data that is also available on the DCS, encourages utilization of data for process and business improvement. Access is improved by the abundance of desktop PCs, and, the costly DCS operator station or workstation can be dedicated to process operations. Future Application and Directions Making use of extensive data. Continued advances in computing hardware and software can be expected to improve the practicality of collecting and storing data at faster rates. The true limitation will be a human beings capacity to make intelligent use of such massive amounts of data. Applications are needed that can distill information into digestible summaries and conclusions (in the case of information for managing the business or providing data to customers, suppliers, or regulatory agencies) or automatic action (in the case of high level supervisory controls). Relational/real-time integration. Relational database integration into the overall process history system is developing and should eventually result in some extremely powerful capabilities. Use of a relational database for point tags, variables, and their associated configuration data (see example in figure 4) is a fairly simple and logical application; But, the real power and benefit will come from successful integration of real-time process databases with relational process and business databases. Examples of relational databases that would become even more useful if data could be matched up with real-time process data include:
Finished product data (e.g. by roll, bail or lot no.), including quality, production, and specifications Maintenance management data, including failure/repair data by equipment number, instrument loop number, or process area. Reliability data, by rate/surge incident, or process area.
With good relational/real-time integration, one could retrieve, for instance, all relevant data from all mill databases associated with a specific finished product lot number not only the measured finished product quality attributes, but
process conditions throughout the mill for the product associated with that lot number, even data on specific equipment in service, or operator-entered reliability incident data. Achieving this level of integration will be a formidable challenge, requiring different data models for different types of manufacturing businesses and process plant operations, tunable further for individual sites. Open control system architectures. The trend towards open (i.e. non-proprietary) control systems is further blurring the delineation between control and information systems. Some supplier models for control system architecture make the process data historian an integral and vital part of the open control system, assuming the task of storing all required process history, alarm and event journals, and even data required for advanced supervisory control applications. Adapting open systems computing and networking designs to the stringent reliability and performance requirements of process control will undoubtedly prove both interesting and challenging.
SUMMARY AND CONCLUSIONS The most popular process data historians for medium to large scale operations are, by no means, simple shrinkwrapped software applications. They require special consideration in selection, implementation, and support. And, the data itself must be used and interpreted with careful consideration and knowledge. Thoughtful application of some of the basic concepts and guidelines exposed here can aid in successfully applying process history systems and desktop tools, and, ultimately, in deriving true business value from the improved access to data.
BIBLIOGRAPHY 1. 2. 3. 4. 5. 6. Tugal, D. A., Tugal, O.: Data Transmission, 2nd ed., McGraw Hill, New York, N.Y., 1989. Industrial Systems, Inc.: DDE/21 Users Manual, Industrial Systems, Inc., Bothell, WA. Intellution, Inc.: Historical Trending Manual, Intellution, Inc., Norwood, MA, 1996. Honeywell, Inc.: TotalPlant History Programmers Guide, Honeywell, Inc., Phoenix, AZ, February, 1997. Industrial Systems, Inc.: CIM/21 System Administration Manual, Industrial Systems, Inc., Bothell, WA, 1995. Honeywell, Inc.: Process History Database and Uniformance Desktop Specification and Technical Data, Honeywell, Inc., Phoenix, AZ, April, 1998. 7. IOtech, Inc.: Signal Conditioning & PC-based Data Acquisition Handbook, IOtech, Inc., Cleveland, OH, 1997. 8. Honeywell, Inc.: Honeywell Extended Controller Specification and Technical Data, Honeywell, Inc., Phoenix, AZ, September, 1985 9. Smith, C. L.: Digital Computer Process Control, International Textbook Company, Scranton, PA, 1972. 10. Brassard, M., (Editor): The Memory Jogger A Pocket Guide of Tools For Continuous Improvement, 5th printing, G.O.A.L., Lawrence, MA, 1985. 11. Shunta, J. P.: Achieving World Class Manufacturing Through Process Control, Prentice Hall, Englewood Cliffs, NJ, 1995. 12. Schaeffer, R. L., McClave, J. T.: Statistics for Engineers, PWS Publishers, Boston, MA, 1982.