Testing laboratories associated with manufacturing plants play a key role by assisting production personnel in monitoring the manufacturing process. The laboratories are often chartered with providing customers, both internal and external, with proof that the material and/or products sold meet customer specifications. To ensure the data provides users with the appropriate level of reliability, the laboratories need to monitor their own processes (the testing procedures) themselves.

BestPractices_NW_Fig1.gif

Figure 1. Accuracy and Precision

Process and Quality Control
The methods used to monitor processes, track conformance to specifications and evaluate the measurement process are collectively known as statistical process control (SPC). SPC enables an organization to track and reduce process variability by using tools such as control charts. Laboratories often refer to the use of SPC methods in their internal quality control program as statistical quality control (SQC).

Table 1. Subset of Sulfide Data

Precision and Accuracy
Laboratories use the terms precision and accuracy to characterize a method's performance. Precision describes how well a measurement can be repeated while accuracy refers to the closeness of the measured value to the true value. Figure 1 illustrates how accuracy and precision work together to define method performance.

Ideally, customers of testing data want test results to be generated from a method that is both accurate and precise. In reality, achieving a high level of reliability may be cost prohibitive or overly time consuming. To determine if the data is useful, the laboratory and its customer must understand how the data will be used as well as the types of decisions that will be made. They must also be aware of the industry standards and regulatory guidelines. Control charts measure the level of method variability as well as provide ongoing feedback to laboratory staff. This article discusses how a laboratory can use the control charting capabilities of NWA Quality Analyst® software to monitor the performance of its testing methods.

BestPractices_NW_Fig2.gif

Figure 2. Sulfide Duplicate Data Relative Percent Difference

Types of Errors
To successfully use control charts to monitor and manage variation in analytical processes, the analyst must first understand the sources of variability. Four types of errors characterize the uncertainty of a test result:

  • Random errors

  • Variable systematic errors

  • Constant systematic errors

  • Gross errors

Random
Random errors are uncontrollable fluctuations seen on either side of the mean. They may be caused by environmental factors such as small changes in temperature, humidity or barometric pressure, or by electrical fluctuations or differences in sample homogeneity. All glassware used to contain or deliver materials has published tolerances in which the true volume is found. Balances also have published tolerances and will add to the variability of the test method. This type of error is often thought of as noise and is uncontrollable or indeterminate in nature. No matter how carefully a method is validated or performed, random errors will always cause some variability in the measurement. Ideally, the analyst wants these errors to be small and consistent.

Variable
Systematic errors arise from the measurement system itself and can be classified as variable or constant. Variable systematic errors, like random errors, can appear on both sides of the mean, but are larger in magnitude. These errors can be identified and removed, or reduced if necessary. Factors causing these errors include environmental changes, differences between analysts, static buildup inside balance housing, and dirty glassware. The presence of random and systematic errors will affect the precision of a test method.

Constant
Constant systematic errors can be identified by a positive or negative constant offset, from a mean or true value of a reference material. This type of error may be bias and will affect the accuracy of a method. Constant systematic errors can be caused by improper instrument calibration, reagents used at the wrong concentration or glassware used incorrectly. Both variable and constant systematic errors are determinate because the source of the error can be identified and corrected.

Gross
Uncertainty can also be introduced by gross errors. These errors occur when material is spilled or calculations are performed incorrectly. Usually, these errors are caught by the analyst, the mistake is noted and the test is repeated.

Table 2. Raw Spike Recovery Data

Monitoring the Precision of Test Methods
The complexity of determining method precision depends on the nature of the test and the requirements surrounding the method. In its simplest form, the standard deviation can be calculated on repeat measurements made on a representative sample. With a small sample set, the range can be used as a measure of precision. The more precise the method, the smaller the standard deviation or range. Determining the precision of a method in this manner can provide a better picture of the method's inherent performance characteristics than what is seen on a daily basis because it is likely that the measurements will be taken on the same day by the same analyst, using the same equipment and reagents.

A more rigorous process to determine method uncertainty involves conducting a propagation of errors determination, which identifies all of the method's components that contribute to the uncertainty, and measures their combined variability. Differences in analysts, equipment and environment can all contribute to method variability and should be included in the measurement of precision.

To monitor precision over time, the laboratory needs a stable sample that contains the analyte in a matrix similar to production samples. This type of sample may be commercially available as a certified standard, or may be obtained in-house. Once this material has been found, testing can be conducted over a period of time and the standard deviation can be calculated.

The next step is to plot the results on a laboratory control chart. Each value is plotted on the chart rather than averaging repeat measurements to make an X-bar chart. The control chart graphically represents how the method performs over time. Trends and cycles are illustrated and the control limits offer a measure of method variability.

BestPractices_NW_Fig3.gif

Figure 3. Recovery Spike Control

Calculating Control Limits
Control limits can be calculated a number of ways. NWA Quality Analyst provides two ways of calculating control limits.

  • Use the average range (the difference between two measurements) to estimate the standard deviation of a process.

  • Calculate the standard deviation directly from the individual data points.

The first method calculates a more robust estimate of the standard deviation. The control limits will be narrower than those calculated based on the actual standard deviation. The second method is preferred by some regulatory agencies. On the control chart, it is important to indicate which method is used. All control limits discussed in this article have been calculated using individual measurements.

The key to establishing this type of control chart is to have a control sample that is both stable and representative of the process being monitored.

Occasionally, these two criteria cannot be met due to the nature of the material. In this case, precision can be tracked by conducting multiple measurements (usually two) on an actual sample. The difference between the two measurements (range) can be plotted on a control chart. This provides the laboratory staff the standard deviation of duplicate measurements over time as well as a graphical illustration of how the method is performing.

Duplicate Chart
The colorimetric analysis of sulfide in wastewater by methlylene blue is a fairly easy test to perform; however, a stable control standard is not available. Standards prepared for calibration must be used within one hour, and should be prepared daily. Process samples are also unstable. In this case, duplicate measurements on actual samples are a suitable technique to track the precision. Table 1 shows a subset of data that will be plotted using Quality Analyst. Note that in addition to recording the date and two data points for each sample, the analyst performing the test is also documented. An additional comment field is included to capture other information that may affect the test result.

The absolute difference between duplicate value one and two is easily computed using the calculation editor. The calculated values are shown in Table 1, Columns 5 and 6. A control chart can then be constructed from the difference data.

The difference calculated between each duplicate represents the precision associated with the analyst performing the test on a single day. The differences seen daily represent variability due to different analysts, environmental conditions and other changes that can affect this method.

None of the data is outside the control limits, even with the unit upset on 2/17/1999 (Table 1, Row 6). The plotted data represents differences between measurements, not the original measured values. Rather than tracking the unit, the absolute difference indicates the method variability. There are, however, a number of points that show an absolute difference greater than 0.1.

Looking at the original data, many of these points correspond to higher levels of sulfides in the sample. It is common for the imprecision of the test to increase with increasing values of the analyte. Two control charts can be kept, or the relative percent difference (RPD) can be calculated and plotted on the same chart.

RPD is equal to the difference between the two duplicates, divided by the average value of the duplicates, then multiplied by 100.

RPD = [abs(Dup1 - Dup 2)/Avg (Dup1:Dup2)] x 100

This calculation can be entered into Quality Analyst with the resulting chart shown in Figure 2.

No out-of-control points are indicated on this chart, and the data seem to be randomly distributed around the mean. The average RPD of 11 percent may be high depending on how that data will be used and thus may warrant improvement.

Monitoring Method Accuracy
Quantifying and monitoring method accuracy is more difficult than measuring precision. Accuracy is the closeness of the measured value to the true value. The true value is a quantity that can be estimated, but is unknown. Accuracy is affected by the amount of method imprecision as well as bias. To determine bias, the analyst needs a reference material with a known mean value similar in composition to the material measured on a daily basis.

While certified reference materials are available from standards organizations, they are expensive and often do not represent the material being tested. When used, these materials can assure the laboratory that the method is being conducted properly with regard to the reference material. Sampling and matrix effects associated with actual samples make it difficult to translate the results obtained from a highly characterized standard to actual samples tested in the laboratory.

Spike Recovery
"Spike recovery" quality control samples can indicate the method bias under daily operating conditions. Because spike recovery is calculated from multiple tests conducted on the same sample, the random and variable systematic errors should be low. Usually, the sample is analyzed in duplicate and a known amount of the analyte is added to a third sample. Percent spike recovery (PSR) is calculated using the following equation:

If there were no interferences or matrix effects and the variability was low, the recovery would be expected to be close to 100 percent. Spike recoveries that are always low or always high indicate a method bias that should be investigated. If testing is conducted under regulatory guidelines, spike recovery limits may be supplied by the agency. Spike recovery limits of 75 to 125 percent or 80 to 120 percent are common.

BestPractices_NW_Fig4.gif

Figure 4. Capability Plot of Nickel Spike Recovery

Spike Recovery Chart
Spike recovery data can be collected, calculated and charted using Quality Analyst. Table 2 shows typical data that would be collected for nickel tested using atomic absorption spectroscopy.

Using the calculation editor, the average of the unspiked duplicates and spike recovery can be assessed. The control chart of the percent spike recovery values is shown in Figure 3.

Figure 3 shows a section of data that has fallen below the lower control limit and shows some pattern rule violations. Reviewing the data, we can see that there was a newly hired analyst performing the analysis. This person produced results that are consistently low and introduced constant systematic error. All the data generated by this analyst is suspect and should be recalled and the samples rerun if possible. Continued training and closer supervision are warranted until improvements are seen.

Because there is an assignable cause associated with this data, it can be tagged or set aside and not used in subsequent data analysis.

Capability Analysis
U.S. Environmental Protection Agency guidelines suggest that spike recoveries fall in the range of 75 to 125 percent if the result is to be considered valid.

The expected level of performance can be treated as specifications. Capability analysis looks at how wide the measured distribution is compared to the specification limits. Before performing this analysis, however, the process must be stable and in statistical control. Some common measures of capability are the Cp and Cpk indexes. The equations are given below.

The higher the ratios, the more capable the method is of meeting specifications. Capability is represented graphically by plotting a histogram of the data. If the mean and specification limits are added to the chart, the user can see if method bias is present as well as how well the method is performing with regard to the specification. The calculated values of the indexes are also reported in Figure 4.

In this example, both Cp and Cpk are greater than one, which means the process (in this case, the analytical method) is capable of consistently producing data within the spike recovery guidelines. However, there may be an opportunity for method improvement by investigating the points that fall below 90 percent recovery and above 105 percent recovery.

Conclusion
Evaluating and monitoring the measurement process are key activities in any SPC program. Understanding the nature of measurement imprecision and being able to quantify testing imprecision will give the consumers of the data an indication of reliability.

Monitoring the testing process at a predefined frequency assures the analyst and laboratory that the test method is in control and that results can be released to production personnel with confidence. Having control standards and charts may be a requirement of regulatory agencies and is an indication that the data produced by the laboratory are defensible.

Proper tools such as NWA Quality Analyst, which can store the data, perform calculations and produce a variety of control charts, simplify the monitoring program and reduce laboratory workload.