- I. Introduction
- II. Daubert’s Dilemma: Determining the Scientific Validity of the Forensic Sciences in the Absence of Statistical Proof of Validity
- III. Proficiency Testing to Demonstrate Scientific Validity
- IV. How One Forensic Lab Is Solving Daubert’s Dilemma Through Blind Testing
- A. Laying the Groundwork by Creating a Client Services/Case Management Division
- B. HFSC’s Blind Proficiency Testing Program
- C. Lessons Learned—Challenges and Benefits of Blind Proficiency Testing
- V. Conclusion
“An empirical measurement of error rates is not simply a desirable feature; it is essential for determining whether a [forensic science] method is foundationally valid.”
The Supreme Court’s landmark Daubert decision calls on trial courts to consider the “error rates” of scientific evidence, among other factors, in deciding whether the proffered evidence is sufficiently reliable to be admitted. When a conviction hinges on a critical finding produced by a forensic test, the argument for requiring an expert to disclose the error rate is compelling: depending on the error rate, a test’s findings may be sufficiently trustworthy as to support a conviction or sufficiently untrustworthy as to lack probative value. Yet courts have admitted numerous types of forensic science evidence without requiring evidence data that quantifies the error rates of the disciplines. Why not? The truth is that for most forensic disciplines empirical proof of efficacy simply does not exist. Thus, Daubert poses a dilemma for the criminal courts. On the one hand, they could insist on proof of validation studies for the forensic disciplines demonstrating sufficiently low error rates to consider the forensic evidence reliable. Since such validation studies have not existed, such a rule would result in the exclusion of a vast amount of evidence and make it impossible to prosecute a huge number of cases. On the other hand, courts could admit a wide array of forensic evidence without requiring statistical proof of the error rates associated with each type of evidence, relying instead on past precedents and the testimony of forensic scientists claiming that the evidence is reliable. The courts chose the latter approach.
This lack of statistical support for the forensic sciences is not a mere academic issue. By indiscriminately admitting a wide range of forensic science evidence, courts have allowed both reliable evidence and junk science, leading to numerous wrongful convictions that included certain types of junk science like bite mark evidence, hair microscopy, and traditional arson techniques. Remarkably, despite the growing awareness about the problems with the unscientific nature of these disciplines, some courts continue to admit these forms of unreliable evidence.
The 2009 report of the National Academy of Sciences (NAS Report) on the state of forensic science laid bare the shocking lack of empirical data to support the scientific validity of most forensic disciplines. The report provides detailed analyses of numerous disciplines describing the methodologies used and evaluating the scientific research available on the disciplines’ efficacy. In the end, the report makes clear that much work remains to be done by scientists to provide statistical data establishing the scientific validity of the forensic disciplines.
In the ten years since the NAS Report, other prominent national organizations have likewise called for additional research on the forensic sciences. In particular, these validation studies should determine the “foundational validity” of the various disciplines as a whole, and each laboratory should also conduct studies to assure the proficiency of the work performed within its walls, or “validity as applied.” Notably, several of these organizations have recommended that proficiency tests be administered without analysts knowing they are being tested—known as “blind testing”—to more accurately determine the reliability of the disciplines.
Researchers have completed a small number of validation studies for a few disciplines, and others are underway. The National Institute of Standards and Technology (NIST) has begun validity assessments, one on DNA analysis involving multiple sources (or “mixtures”) of DNA and another on bite mark analysis. NIST may also conduct validation studies on firearms examination and digital facial recognition as well.
Despite these efforts, the need for empirical research to support the forensic disciplines continues to be the most pressing issue currently facing forensic science. This Article reports on a blind testing program implemented by the Houston Forensic Science Center (HFSC) that puts it on track to solve Daubert’s dilemma. The program, implemented in six forensic disciplines, enables HFSC to collect statistical data on the efficacy of the forensic testing process and thereby will ultimately yield the data to determine the error rate for each of the disciplines as practiced in the laboratory. The data also allows HFSC to do more refined performance assessments for evidence of various degrees of difficulty within a particular discipline. Moreover, blind testing provides information on the entire process including evidence packaging, storage, and handling, and the many aspects of testing and reporting.
This Article provides a detailed description of the laboratory’s blind testing in three disciplines: toxicology, firearms, and latent print examination. HFSC’s blind testing program puts to rest the previous assumption that blind testing is not feasible in the forensic sciences. Indeed, HFSC has established a robust blind testing program without any substantial budget increase. This is not to say that HFSC’s blind testing program is easily replicated, especially at smaller crime laboratories that do not have a dedicated quality division. While we strongly believe that similar blind testing programs can and should be implemented across the country, the HFSC experience also demonstrates the challenges a laboratory would face. Moreover, the fledgling blind testing program in Houston could itself be strengthened if there were similar programs around the country, and if rigorous blind testing became a feature of accreditation programs.
Part I begins with a discussion of the Supreme Court’s decision in Daubert and the challenge that it poses for demonstrating the scientific validity of many forensic science disciplines given the nonexistence of proper validation studies. Part I then summarizes the findings of two major government reports by the National Academy of Sciences (NAS) and the President’s Council of Advisors on Science and Technology (PCAST), both of which call for additional empirical research to demonstrate the scientific validity of the forensic disciplines.
In Part II, this Article addresses the literature on validation studies conducted by means of open proficiency tests and through blind testing. Then it explains the advantages and disadvantages of each type of study. It demonstrates the urgent need for forensic laboratories to conduct validation studies using blind testing to provide empirical data to establish the scientific validity of each discipline as practiced in each laboratory.
Part III reports on the blind testing program that HFSC has established in six forensic disciplines. It begins with a description of the case management system at HFSC, in which case managers act as a buffer between test requestors and laboratory analysts. A case management system is a necessary predicate for implementing blind testing, and it also enhances the quality of the laboratory’s work by making it more efficient and eliminating sources of bias. Part III then provides an overview of HFSC’s blind testing in three of the disciplines—toxicology, latent prints, and firearms. It then concludes with a discussion of the challenges and benefits of blind testing. Of critical importance, the blind testing program in these six disciplines will provide the error rate data expected of good science, as explained in Daubert, for the testing done in the laboratory. Moreover, as a quality control measure, blind testing offers valuable insights for improving the laboratory’s processes across the board, from receiving the evidence to issuing a report.
HFSC’s experience with blind testing provides a roadmap for criminal justice stakeholders to support similar programs in their jurisdiction’s forensic laboratories, at least, for the time being, in the larger crime laboratories. Such studies would provide the empirical evidence as contemplated in Daubert to support the admission of forensic test results. Moreover, blind testing provides the means to continually monitor the quality of the laboratory’s processes across the entire organization. In the future, blind testing could be a requirement for accreditation.
II. Daubert’s Dilemma: Determining the Scientific Validity of the Forensic Sciences in the Absence of Statistical Proof of Validity
When the Supreme Court decided Daubert v. Merrell Dow Pharmaceuticals, Inc. in 1993, it established a new framework for deciding the admissibility of scientific evidence. No longer would the “general acceptance” of the evidence in the pertinent field be sufficient to find a scientific theory or technique to be reliable. Instead, the Court called on the lower courts to decide for themselves whether the evidence is sufficiently reliable to be admitted. From 1993 to 2009, when the National Academy of Sciences published its report, Daubert had had no real impact on the admissibility of the many forensic disciplines. Criminal courts continued to admit forensic evidence of all kinds without demanding additional proof of scientific validity. The NAS report took the courts to task for this and exposed the lack of statistical research supporting many of the forensic sciences, as have others. Since the publication of the NAS report, researchers have made progress in designing appropriate research studies, but much work remains to be done.
The following Sections address: (1) the legal standard for admission of scientific evidence as expressed in the Supreme Court’s decision in Daubert; (2) the calls for validation studies to support the forensic sciences; and (3) a survey of research efforts.
A. Courts Fail to Demand Proof of Scientific Validity as Contemplated in Daubert
In Daubert, the Supreme Court interpreted Rule 702 of the Federal Rules of Evidence to place on the judiciary the task of determining whether evidence qualifies as “scientific,” which “implies a grounding in the methods and procedures of science.” The Court stated: “[I]n order to qualify as ‘scientific knowledge,’ an inference or assertion must be derived by the scientific method.” By means of the “scientific method,” scientists conduct “validation studies” to test the integrity of a method or technique. Thus, the Court states that, “Proposed testimony must be supported by appropriate validation—i.e., ‘good grounds,’ based on what is known.”
The Court then describes the features of the validation process. First, a principal feature of good science is that “it can be (and has been) tested.” Pursuant to the scientific method, a scientist develops a hypothesis to see if it can be refuted, or as scientists might say, “falsified.” To further the scientific enterprise, scientists will publish the findings of their research in peer-reviewed publications to subject the research to further scientific inquiry by others in the field. Peer review and testing provide the means by which the scientific community can develop the statistical data to produce a “known or potential rate of error,” which is the ultimate measure of validity for a scientific theory or technique. Once the scientific community has adequately tested a theory or technique and found it to produce the expected outcome on a consistent basis, the theory or technique should garner “general acceptance” in the relevant scientific community, another indicium of validity under Daubert.
In criminal cases, courts have historically admitted forensic evidence of many types. When analysts testify on behalf of the government, their testimony typically links a physical specimen found at a crime scene to the defendant. For example, if a woman was raped and murdered in her home, analysts might link a defendant by means of DNA evidence taken from the victim’s body or clothing, a fingerprint on a window, or a firearm cartridge which might be linked to a gun found in the defendant’s possession.
Within police crime laboratories, forensic scientists have developed a wide variety of forensic disciplines. Crime laboratory analysts have performed forensic testing in drug chemistry, toxicology, gunshot residue analysis, blood splatter analysis, and digital and multimedia analysis. Analysts have also offered testimony comparing shoe treads, tire treads, paint chips, fabric, glass, carpet threads, and handwriting samples.
Developed as investigative tools for solving crimes, most forensic techniques have no use outside of law enforcement as a means of generating evidence for investigations and as evidence for criminal trials. Thus, with few exceptions, most forensic disciplines were invented within police crime laboratories and not in university or medical research laboratories where scientists would be expected to follow the scientific method outlined in Daubert. For a variety of reasons, crime laboratories, most of which are located within law enforcement agencies, simply did not undertake the basic scientific research that would have been standard practice within other laboratories. Moreover, both before and after Daubert, the criminal courts have admitted all sorts of forensic testimony without requiring evidence of the technique’s underlying scientific validity. Thus, the courts failed to demand that crime laboratories undertake such foundational research.
B. Scientists Call for Validation Research for the Forensic Sciences
For many years, forensic science thrived without drawing much negative attention to itself. The testimony of crime laboratory analysts definitively linked defendants to the crime scene evidence, and it provided powerful evidence of guilt, leading to countless convictions. The advent of DNA exonerations changed all that. Approximately a quarter of all wrongful convictions involved flawed forensic testimony. The cases pointed to a few forensic areas as particularly unreliable, such as bite mark evidence, traditional arson techniques, comparative bullet lead analysis, and microscopic hair comparisons, each of which contributed to a significant number of wrongful convictions.
The discovery that flawed forensic evidence contributed to many miscarriages of justice shined a harsh light on crime laboratories and forensic evidence and called into question whether the forensic disciplines can be trusted to produce correct results. Two major national studies have examined whether the forensic sciences have been validated through appropriate research studies.
First, Congress tasked the National Research Council of the NAS with examining the field of forensic science more closely. After years of extensive study, in 2009 the NAS announced its startling conclusion: “[N]o forensic method other than nuclear DNA analysis has been rigorously shown to have the capacity to consistently and with a high degree of certainty support conclusions about ‘individualization’ (more commonly known as ‘matching’ of an unknown item of evidence to a specific known source).” This one sentence in the report called into question the criminal courts’ practice of allowing forensic experts to testify to a definitive match. Indeed, it was common for forensic scientists to testify that a technique was “infallible” or had a “zero error rate,” or to declare that two specimens matched “to the exclusion of all others” in the world. The central purpose of forensic science had been to provide “individualization[s]” by informing jurors that certain crime scene evidence matches, or comes from the same source, as that associated with a defendant. The NAS Report raised awareness about the lack of scientific basis for such statements and recommended the development of programs to “improve understanding of the forensic science disciplines and their limitations within the legal systems.”
The 2009 NAS Report generated much political, legal and academic interest. Then in 2016, a second major government group, PCAST, took up the question again in a report entitled, “Forensic Science in the Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods.” This report provides a roadmap for researchers to develop empirical support for the forensic sciences. In particular, the PCAST Report focuses on the “feature-comparison methods,” meaning the “methods that attempt to determine whether an evidentiary sample (e.g., from a crime scene) is or is not associated with a potential ‘source’ sample (e.g., from a suspect), based on the presence of similar patterns, impressions, or other features in the sample and the source.” The report specifically analyzed several feature-comparison methods: “analysis of DNA, hair, latent fingerprints, firearms and spent ammunition, toolmarks and bitemarks, shoeprints and tire tracks, and handwriting.” The report specifies the two types of validation that the forensic sciences should endeavor to achieve: foundational validity and validity as applied.
First, with regard to foundational validity, the PCAST Report explains that established scientific methods can be used to measure the efficacy of the feature-comparison methods, even though these methods are largely “subjective” processes that involve substantial human judgment rather than an application of objectively measurable criteria. As such, “the overall procedure must be treated as a kind of ‘black box’ inside the examiner’s head.” The report tells us that it is “possible and important” to determine the validity of the feature-comparison methods, and that “science has clear standards for determining whether such methods are reliable.” These validation methods “belong squarely to the discipline of metrology—the science of measurement and its application.”
The report defines “scientific validity” or “[f]oundational [v]alidity” to mean that “a method has [been] shown, based on empirical studies, to be reliable with levels of repeatability, reproducibility, and accuracy that are appropriate to the intended application.” Such studies measure the incidence of error in a process, yielding “error rates.” As the report makes clear, “An empirical measurement of error rates is not simply a desirable feature; it is essential for determining whether a method is foundationally valid.” Evidence of foundational validity is likewise crucial to the justice system to enable judges and jurors to assign the proper weight to forensic reports. The discovery of hundreds of wrongful convictions shows the dangers of admitting forensic evidence without having the means to distinguish valid methods from junk science.
Of special importance to the forensic sciences, validation studies should test the methods to determine the false positive rate (the probability that the method declares a match between samples that actually come from different sources) and sensitivity (the probability that the method declares a match between samples that actually come from the same source). As the PCAST Report explains, “The false positive rate is especially important because false positive results can lead directly to wrongful convictions.”
However, since the feature-comparison methods involve significant subjective human judgment that cannot be measured directly, it is not possible to conduct empirical studies except by conducting “studies of examiner’s performance to determine whether they can provide accurate answers; such studies are referred to as ‘black-box’ studies.” By black-box studies, PCAST meant empirical studies in which “many examiners are presented with many independent comparison problems—typically, involving ‘questioned’ samples and one or more ‘known’ samples—and asked to declare whether the questioned samples came from the same source as one of the known samples.” Then researchers could “determine how often examiners reach erroneous conclusions.”
The PCAST Report notes that errors can occur for several reasons. An incorrect conclusion may occur through human error in misapplying a procedure, which may imply fault. However, errors may also occur either because of technical failures, or by change occurrence, even when the procedure itself was properly performed. From a statistical point of view, both types of failure are viewed as “errors” because they can each lead to incorrect conclusions.
Done under appropriate conditions, black-box studies can provide empirical measurements of accuracy (or “error rates”) for the forensic disciplines. The “[k]ey criteria for validation studies to establish foundational validity” include three important aspects: (1) the size of the study and number of studies; (2) the realism of the evidentiary samples; and (3) the need to guard against possible distortions of the results by cheating or the effects of bias. First, PCAST calls for sufficient scale to ensure robust scientific conclusions. Validation studies should include a “sufficiently large number of examiners” and “sufficiently large collections of known and representative samples,” which should be “large enough to provide appropriate estimates of the error rates.” The report also recommends “multiple studies by separate groups” in order to “ensure that conclusions are reproducible and robust.”
Second, to ensure realistic evidentiary samples, the report specifies that they should be “from relevant populations to reflect the range of features or combinations of features that will occur in the application,” and they should be “representative of the quality of evidentiary samples seen in real cases” and “chosen from populations relevant to real cases.”
Third, the report recommends shielding examiners from learning the correct answers (effectively, to prevent cheating), stating that studies “should be conducted so that neither the examiner nor those with whom the examiner interacts have any information about the correct answer.” To eliminate the possibility of bias in the proficiency testing process, the report recommends that the research “should be conducted or overseen by individuals or organizations that have no stake in the outcome of the studies.”
In addition to foundational validity, the PCAST Report also urges scientific research to establish validity as applied, meaning “that the method has been reliably applied in practice.” The aim with research on the validity as applied is to demonstrate empirically that an individual “forensic practitioner has reliably applied a method to the facts of a case.” The report suggests the use of proficiency tests, by which is meant “ongoing empirical tests to ‘evaluate the capability and performance of analysts.’” PCAST also makes clear that proficiency tests “should be performed under conditions that are representative of casework and on samples, for which the true answer is known, that are representative of the full range of sample types and quality likely to be encountered in casework in the intended application.” In other words, the report envisions an ongoing, established program in which analysts are regularly tested on realistic samples for which the true answer is known, using the same procedures and under the same time constraints that apply to real casework. Such a program can generate the data to demonstrate that a particular analyst has the capability to perform the foundationally valid method. A laboratory report would need to include comprehensive information about “the procedures actually used in the case, the results obtained, and the laboratory notes” to allow scientific review by others.
Thus, the PCAST Report urges researchers to follow the guidelines they have outlined in conducting validation studies for the feature-comparison forensic disciplines to establish both foundational validity and validity as applied. In what has apparently become a consensus view, the PCAST Report also proposes that validity studies use blind proficiency testing, as addressed in the next Part.
III. Proficiency Testing to Demonstrate Scientific Validity
The lack of statistical support for most of the forensic disciplines continues to pose a conundrum for the courts: If the error rate is not zero, what is it? Are error rates low enough that courts can consider these disciplines scientifically valid or so high that courts should exclude test results? These fundamental questions posed by the NAS and the PCAST reports remain largely unaddressed today. Fortunately, researchers are at work trying to answer these questions.
Over the years, research studies involving the administration of proficiency tests have been conducted in different types of laboratories, using different methods of administration, and for different goals. A proficiency test can determine whether an analyst performs a particular forensic test properly and obtains the correct result. To use proficiency testing as part of a validation study, however, the study should be designed to conform to the standards of good science. In what follows, we report on studies involving the administration of “open” or “declared” proficiency tests (meaning that the analysts know that they are taking a test) and “blind” or “partly declared blind” proficiency tests (meaning that analysts are informed that proficiency tests have been disguised as real casework and inserted into the normal workflow in the laboratory). The research on validation studies by means of proficiency tests makes a compelling case for laboratories to implement blind testing programs.
A. Open Proficiency Tests
Forensic laboratories have long taken part in open proficiency testing as a requirement of laboratory accreditation. These exams are not designed to test the proficiency of individual analysts. Rather they function as a minimal check on the quality of work performed within the laboratory as a whole. For a variety of reasons, the results of these open proficiency tests do not provide the data needed for validation studies, nor does the commercial manufacturer of these tests claim that the testing results are suitable for estimating the error rates for the disciplines.
Moreover, the research on proficiency testing within clinical laboratories confirms that open proficiency tests do not provide a reliable measure of performance on actual casework. In one study, conducted by the Centers for Disease Control (CDC), researchers compared the results of open proficiency tests with blind proficiency tests. The blind tests were inserted into the normal workflow of the laboratory, accompanied by fake documentation, to give the appearance that it is real evidence. In this study, the blind tests were taken without informing the analysts that they were being tested in this manner. Most of the laboratories had perfect scores on the open proficiency tests, but the same laboratories scored at least twenty-seven percentage points lower on the blind proficiency tests, which rated as unacceptable. The superior performance on open proficiency tests in clinical laboratories was due to the special attention that the laboratories paid to such regulatory tests, including by “assigning only their most experienced personnel to analyze the samples, by repeatedly analyzing survey samples, or by sending the samples to another laboratory for analysis.” Scientists refer to “the tendency of people to alter their behavior when they know they are being monitored” as the “Hawthorne Effect.”
B. Blind Proficiency Tests
For validation studies in the forensic sciences, both government organizations and scholars have recommended blind proficiency testing instead of open testing. The lead federal organization working on forensic science validation studies, the Organization of Scientific Area Committees (OSAC), part of NIST, recommends blind proficiency testing, but not the method used in the CDC study. The OSAC does not recommend keeping analysts in the dark about the fact that proficiency tests will be inserted into their workflow as in the CDC study. Rather, analysts should be informed that realistic proficiency tests will be inserted along with their casework. Researchers have referred to this method of proficiency tests as “declared double blind” or “part[ly]-declared blind.” (In this Article, we refer to this methodology simply as “blind testing.”) The PCAST Report also recommends blind testing as the preferable design for validation studies to counteract the Hawthorne Effect.
A recent internal research study by the Central Unit of the Dutch National Police implemented blind testing in its forensic laboratory. While valuable, the study was small in scale and involved only one discipline, firearms examination. In this study, conducted in 2015 and 2016, researchers inserted twenty-one cartridge case comparisons into the normal workflow. The researchers understood that the experiment would not serve as a validation study for measuring error rates but reported the findings as a blueprint for such studies. The Dutch study discusses the optimal design of a validity study for the forensic sciences, in terms that comport with those outlined by PCAST: blind testing, prepared by external parties, properly designed to mimic real casework and sufficient in numbers.
While recognizing the value of externally prepared blind tests, the Dutch study also sheds light on the practical challenges external sources present. The firearms examiners in this study proved to be adept at detecting the blind tests. During the period of the study, the examiners completed 779 real cases and twenty-one blind tests. Of the twenty-one, the examiners correctly identified three blind tests, which they spotted because cartridge cases from more than one firearm were included in a single request. The examiners knew that it was atypical to receive two or more different firearms of the same caliber. Laboratory managers generally avoid combining multiple cartridges of the same caliber in one testing request as it “creates the potential for reporting a false positive.” The fact that the examiners in the study could spot the blinds highlights the challenge the preparers of blind tests face. They must take care in designing blind tests to mimic the requests received in each laboratory from the various law enforcement agencies in that community. Such fine customization would be more challenging for an external agency or commercial vendor. An external provider would have to obtain and incorporate feedback from examiners on how they detect blind tests on a continuing basis. With this information, they could adapt the blind tests to the constantly changing nature of evidence requests in each community.
Another critical feature of blind testing is that it tests the accuracy of the entire laboratory process, as opposed to assessing an individual examiner’s proficiency. The error rate for a forensic discipline should include all sources of error. Improper evidence handling, labeling, and storage, as well as equipment and technical failures can cause errors in applying a forensic method. By inserting a known sample disguised as a real test in the normal casework, researchers have the means of determining an actual error rate for the type of work done.
C. A Two-Tiered Approach to Validation Studies
For the forensic disciplines to conduct sufficient studies to demonstrate their scientific validity will inevitably take a two-tiered approach requiring validation studies using both open and blind proficiency testing. The black-box studies using open proficiency testing have the advantage that they can include large numbers of analysts from many laboratories, but they lack the advantage of realism that blind testing provides. The federal government, through NIST, is currently examining the available research as part of “scientific foundation reviews” in several areas. The agency plans to “review the published, peer reviewed literature and collect data from proficiency tests, laboratory validation studies, and other non-peer reviewed sources.” Reviews have begun in the fields of interpretation of DNA mixtures and bite mark evidence. Additional reviews might cover firearms examination and digital evidence, among others.
The PCAST Report found insufficient black-box studies to satisfy scientific standards for foundational validity for bite mark analysis, hair analysis, footwear analysis, and firearms examination. Similarly, it found insufficient research to validate the statistical model used to compute the probability of inclusion in complex DNA mixtures. Bite mark analysis was so lacking in evidence of scientific validity as to be “far from meeting such standards.” Firearms examination, however, had one appropriate black-box study, and the discipline has the potential to achieve foundational validity with additional research.
In the latent print field, the PCAST Report found only two black-box studies, one of which had not yet been published in a peer-reviewed scientific journal. The available black-box studies found error rates, including false positive errors, “much higher than the general public (and, by extension, most jurors) would likely believe based on longstanding claims about the accuracy of fingerprint analysis.” The validation studies can be helpful in identifying further research avenues. For example, one of the black-box studies suggested that “a properly designed program of systematic, blind verification,” in which a second examiner verifies the decision of the first, might decrease the false positive error rate. PCAST found that an empirical study of such a blind verification program would be a worthy project.
The authors of the larger latent print black-box study, conducted by the FBI Laboratory, noted that future research should include laboratory studies utilizing blind testing. They write, “Ideally, a study would be conducted in which participants were not aware that they were being tested.” The authors caution about various practical challenges of doing a large-scale blind testing study involving multiple laboratories. For one thing, they state that it would be “complex to the point of infeasibility” to insert blind tests in laboratories where examinations involve physical evidence samples, as opposed to electronic samples. A study comparing the work of numerous laboratories doing blind tests would need to provide the same samples to the various laboratories, which is not feasible if it involves replicating physical evidence. Moreover, “Combining results among multiple [laboratories] with heterogeneous procedures and types of casework would be problematic.” Laboratories that follow different standard operating procedures are not performing their tasks in the same manner. Thus, it would be “problematic” to combine the results from the blind tests as a measure of overall proficiency.
Scientific research on the forensic disciplines is also developing a more sophisticated understanding of error rates. An important study in latent fingerprint examination sought to “identify and quantify fingerprint image features that are predictive of identification difficulty and accuracy.” The study successfully showed that it is possible to predict examiner performance based on the quantifiable level of difficulty in examining a particular image. For purposes of refining “error rates” for the entire discipline, this study makes the critical finding that “it is not especially helpful to seek a field-wide ‘error rate’ for latent fingerprint identification.” Instead, the study indicates that “it will be more useful to seek error rates for different categories of comparisons, based on objective difficulty level.”
Computers can also play a role in conducting validation studies, not to mention in supporting everyday casework. In the field of firearms examination, a team of researchers from NIST developed a statistical method using an algorithm that compares the three-dimensional impressions left on the cartridges after firing. The computer then produces a numerical score that describes how similar the pairs of cartridges are and provides a probability of random effects causing a false positive match. The study showed that the algorithm correctly distinguished all matching and nonmatching pairs and has the potential to establish accurate error rates that firearms examiners can use in court. The authors emphasized, however, that their study did not include a sufficient number of cartridge pairs to allow for generalized error rate findings. They cautioned that “[t]he current test is intended mainly to demonstrate the error rate procedure rather than to show application to a real result from case work.” However, the report gives hope that a similar computer program can assist firearms examiners in their casework: “[I]t would be feasible to scale up the statistical procedure to case work with large population sizes and still arrive at reasonable and usefully small false identification error rates.” The infrastructure needed, however, would be considerable. To apply this computerized approach to casework would require “a database with accurate counts of firearms manufactured by different methods with different class characteristics,” as well as additional data on firearms and additional statistical research. The research project is slated to expand to combine the algorithmic process with black-box studies of open proficiency tests.
In short, scientific research in the forensic science disciplines since the NAS Report has begun to produce some validation studies and other pertinent research. There is clearly a place for black-box studies involving open proficiency testing, and other efforts to provide objective measurement and computer modeling have advanced these efforts as well. However, most observers recognize that large-scale validation studies across numerous organizations using blind testing would provide an ideal methodology for estimating error rates.
D. The Dual Benefit of Blind Proficiency Testing: Error Measurement and Continual Quality Improvement
Blind proficiency testing offers a second benefit in addition to facilitating validation studies: quality control. A properly designed validation study that utilizes blind proficiency testing serves the purpose of scientific research while also promoting effective internal quality control. This has led groups like the American Bar Association to call for blind testing in forensic laboratories as a quality control tool.
The Dutch study points to the benefit of a blind testing program for quality control that goes beyond the obvious advantage of identifying problems in the testing process. The study lauded the importance of such a program to enable “proper feedback and ‘deliberate practice’ in acquiring and maintaining expert performance.” As the authors explain:
Experts should be actively and constantly looking for feedback on their performance, preferably the most relevant feedback one can get. . . . The current study showed that it is possible to set up a blind case program that may result in the type of feedback forensic examiners need in order to acquire and maintain their expertise.
Creating a feedback loop that enables constant process improvement is a central objective of quality assurance and quality control programs.
Since the 1980s, military toxicology laboratories in the United States that perform drug screening analyses have operated “blind quality control” programs. Military laboratories use the blind program “to verify all aspects of the laboratory processing” and as a part of a broader quality assurance program that includes retesting on demand, laboratory inspections, and open proficiency testing. Unfortunately, the data generated by this long-standing blind testing program has not been published in scientific journals. The program has focused only on quality assurance and has not served as a validation study.
Likewise, in Ontario, Canada, the government forensic laboratory has done a modest amount of blind testing as a quality assurance measure. Police and fire investigation agencies submit mock evidence samples for forensic testing, and the Quality Assurance personnel report on the results and provide feedback to analysts and supervisors.
IV. How One Forensic Lab Is Solving Daubert’s Dilemma Through Blind Testing
In the past, the scientific validity of the forensic disciplines was not a subject of great interest to lawyers. The numerous wrongful convictions caused by the admission of faulty forensic evidence changed that. The NAS Report explicitly called on the courts to scrutinize forensic methods more rigorously, in the way that Daubert suggested they should. The PCAST Report provides guidance to the forensic science community and other researchers for designing proper black-box validation studies.
Many researchers have heeded the call to conduct black-box studies, including some of those addressed in the previous Part. One forensic laboratory, the Houston Forensic Science Center (HFSC), has taken it upon itself to institute the type of blind testing proposed by PCAST and others. The HFSC blind testing program deserves close study by scientists and lawyers alike. While many of the details of the program are unique to the laboratory or other agencies in Houston, it nonetheless illustrates many of the benefits and challenges that other laboratories would experience if they implemented blind testing. For the legal profession, HFSC’s experience makes clear the important evidentiary benefits of having implemented the client services/case management system and blind testing. Putting aside the imperative to demonstrate the scientific validity of the disciplines, the following discussion also explains the nature of the quality assurance gains these changes produce. Moreover, the blind testing program allows for more efficient proactive corrective measures as well. It goes without saying that the quality and timeliness of forensic work greatly affects the criminal justice system.
A. Laying the Groundwork by Creating a Client Services/Case Management Division
To implement a blind testing program in which mock evidence samples are inserted into the laboratory’s workflow, as a practical necessity, analysts should be limited in their ability to communicate directly with investigators or to access police case file information that is irrelevant to their work. Otherwise, it would be exceedingly difficult to prevent them from detecting blind tests. In recommending blind testing, the National Commission on Forensic Science noted that “blind studies are much easier to conduct in laboratories that employ a context management system to shield examiners from task-irrelevant contextual information.”
Such a system offers important benefits in improving the quality of forensic work as well. More than a decade’s worth of research establishes that forensic analysts are vulnerable to bias-induced errors when they are exposed to certain contextual information regarding casework. Fingerprint examiners who are told that a suspect has confessed to a crime, for example, may be more likely to conclude that the suspect’s fingerprint matches a latent print lifted from the crime scene. Similarly, forensic examiners who work closely with law enforcement officers may be influenced to reach a particular conclusion that supports an investigator’s theory of the case. To combat this “cognitive contamination” of evidence, scholars suggest that forensic analysts be shielded to the extent possible from both case-irrelevant information and unnecessary interactions with investigators that could influence their conclusions.
Much of the important research in this area has been conducted by Dr. Itiel Dror, who has proposed that, in addition to training analysts about the dangers of cognitive bias, crime laboratories implement a “case manager” system. This would require crime laboratories to restructure their evidence intake and work assignment procedures to insert a third-party case manager as a buffer between forensic analysts and law enforcement personnel. The case manager would, where possible, remove case-irrelevant data from evidence submissions and handle interactions between the laboratory and the investigative team, thereby limiting analysts’ exposure to information that could bias forensic testing results.
In July 2015, HFSC engaged Dror to train HFSC analysts, as well as deliver a public lecture, regarding cognitive bias. Within two months, HFSC management had begun creating a Client Services/Case Management Division (CS/CM) to separate production-line analysts from test requestors. By mid-December 2015, the Division was fully staffed and operational. Since that time, HFSC case managers have been responsible for: (1) handling most client interactions, in part to shield examiners from case-irrelevant information; (2) collecting evidence from submitting agencies, distributing it to the appropriate laboratory sections and returning those items after analysis is complete; (3) managing laboratory record-keeping and responding to subpoenas and discovery requests; and (4) assisting in the implementation of HFSC’s blind quality control testing program.
Under this arrangement, CS/CM picks up evidence to be tested from the Houston Police Department (HPD) property room and delivers it to the various laboratory sections where it is assigned for testing to a particular examiner by the section manager or supervisor. Most inquiries regarding that evidence, or the status of testing, are then handled by CS/CM. In the event that CS/CM employees lack the technical knowledge to answer an investigator’s question, CS/CM usually will forward the question to a laboratory manager or supervisor (not a bench analyst) for a response. This significantly minimizes communications between investigators and analysts, thereby reducing opportunities for examiners to learn potentially biasing information.
The amount of case-related information that HFSC examiners receive depends on their forensic discipline. For example, before CS/CM’s creation, toxicology analysts could view the entire police case file, including the complete offense report. Now, toxicology examiners receive only blood tubes labeled with a bar-coded laboratory case number printed by CS/CM staff. Analysts are also told the type of offense that has been charged with respect to each blood tube because offenses that involve a fatality require additional testing as a matter of policy; however, they can no longer access law enforcement databases that provide detailed case information.
Compare this to the forensic biology section, where analysts need access to more detailed case information to test appropriately and effectively. Because DNA testing is expensive, analysts must determine how to optimize the chance of obtaining a useful result. Accordingly, analysts may need access to the full case report so they know what portion of the evidence to test. If, for example, a victim told investigators or a sexual assault nurse that her attacker deposited semen on the right shoulder of her shirt, the analyst can then perform DNA testing on the precise spot where the attacker’s biological material is most likely to be found. Biology analysts may also need to contact investigators to determine whether a particular sample is eligible under FBI rules to be entered into the Combined DNA Index System (CODIS), the national DNA database.
In other laboratory sections, analysts have access to whatever case information is included in HFSC’s Laboratory Information Management System (LIMS)—generally, the offense date, type and location, as well as the names of the complainant and any suspect. This information is transferred automatically from the HPD’s electronic database system into HFSC’s LIMS system when CS/CM accepts the evidence from the HPD property room and can include comments made by officers or requesters. Commercially-produced laboratory information systems are not currently designed to hide information fields. This not only hinders the laboratory’s ability to deny analysts access to case-irrelevant information, but also makes it impossible to implement the “sequential unmasking” protocols recommended by Dror and others.
Even if HFSC could expend the enormous effort that it would take to redact this information manually from LIMS, HFSC cannot remove the information from HPD systems, some of which may transfer automatically to HFSC’s system when a testing request is made. However, only a handful of HFSC staff can access HPD’s record management system, where the majority of potentially biasing case-related information exists. Additionally, officers often note various pieces of information on the materials they use to package evidence. To remove all information would require CS/CM to repackage most of the evidence submitted to the laboratory, a process that would be both prohibitively time-consuming and expensive.
In 2019, HFSC transitioned to a new LIMS that allowed it to implement an improved online portal where stakeholders submit requests for evidence testing in a way that significantly reduces the need for communications between investigators and laboratory personnel. Laboratory management designed the new portal to make it easier for requesters to provide all of the important, task-relevant information at the front end. Moreover, requestors who use the new portal will no longer see which analyst has been assigned to a particular testing request, further ensuring that any questions those requestors have must be directed to either CS/CM or the laboratory section manager.
B. HFSC’s Blind Proficiency Testing Program
In 2015, HFSC’s Executive Director worked with the HFSC Quality Division to implement blind proficiency testing at the laboratory, starting in the toxicology section. Importantly, the HFSC Quality Division reports directly to laboratory management, and is organizationally independent from laboratory sections that handle casework—which allows it to implement and evaluate blind testing results in an objective manner. 
The program began modestly with four “fake” evidence samples introduced into the toxicology section’s regular workflow in such a way that the analysts would think they were working on real cases. By the end of 2018, blinds had been inserted into examiners’ casework in all but one of the laboratory’s seven sections, for a total of about 900 completed samples. Ultimately, laboratory management’s goal is for 5% of each discipline’s caseload (excluding the crime scene unit) to be blind proficiency test cases.
Some of these blind test cases were submitted with testing requests directed to multiple laboratory disciplines—for example, one piece of mock “evidence” might be accompanied with requests for both DNA and latent print analysis. As the laboratory’s Quality Division Manager told the HFSC Board of Directors in 2018, “We don’t always submit single discipline blinds because we don’t just have single discipline evidence.”
To illustrate more precisely how HFSC implements its blind proficiency testing program, we next take a closer look at how blind proficiency testing operates in three representative laboratory sections—toxicology, firearms, and latent prints. In this Section we draw primarily on personal interviews conducted with laboratory executive management, including members of HFSC’s Quality Division, which operates the program.
Implementation of blind proficiency testing in HFSC’s toxicology section begins when HFSC’s Quality Division purchases human blood samples mixed with different concentrations of alcohol from a third-party vendor. These blood samples are delivered in the same grey-top test tubes HPD uses to submit blood samples to the laboratory in driving-while-intoxicated (DWI) cases. To create a test case, HFSC Quality Division staff place two of the blood tubes into the usual toxicology evidence kit used by HPD.
Next, the Quality Division fabricates standard case information, including a fake subject name and driver’s license number, as well as a date, time, and location for the offense, to complete the paperwork associated with the blind sample. Because a real DWI evidence submission form contains handwritten information from both an officer and a phlebotomist, HFSC Quality Division employees strive to create handwriting differences between those two sections of the form, while at the same time disguising their own handwriting. According to analyst feedback, the single biggest tip-off to toxicology examiners that a case sample is a blind is that Quality Division employees’ handwriting tends to be more legible than that of many HPD officers.
To pass muster as real cases, all blind test samples, not just those in toxicology, must be assigned an HPD case number and submitted under the name of a bona fide HPD officer. HPD has agreed to allow the Quality Division to access its records management system, through which it can reserve actual case numbers for this purpose. HPD case numbers proceed sequentially; therefore, they reveal the age of a request or sample. This means case numbers assigned to blinds must be time-appropriate in relation to the rest of the section’s casework; otherwise, an out-of-sequence number can alert the analyst to a blind sample. Importantly, a group of HPD officers who work in the appropriate HPD divisions have agreed to participate in the blind program by allowing their names to be used as blind test “requestors.”
At this point, CS/CM delivers the blind toxicology evidence kit to the HPD property room, to ensure that all blind samples follow the same chain of custody and display the same item barcodes, as authentic evidence. Using the name of a cooperating HPD officer, the Quality Division then submits a testing request through the online submission portal. CS/CM picks up the test kit from the property room, returns it to HFSC, enters required case information about the blood tubes into HFSC’s LIMS, and attaches the case-number labels described above.
CS/CM staff typically place blood tubes into rack stands of thirty tubes, each for delivery to a toxicology analyst for testing. Early in the blind testing program’s implementation, a CS/CM employee unknowingly placed several blind samples bearing nearly consecutive case numbers into the same rack. As a result, the assigned analyst quickly identified the samples as blinds, given their similar case numbers and alcohol concentrations. To prevent this from recurring, CS/CM employees are now informed which case numbers are blind tests so that they can be sure to place only one blind sample into each tube rack. The Quality Division also changed its submission schedule to spread the delivery of blind samples over more days.
Between September 2015 and July 2018, examiners in the toxicology section completed 317 blind proficiency test cases, which were made up of five different blood alcohol concentrations. Fewer than 5% of those blind samples were identified as blinds by toxicology analysts, and no unsatisfactory test result was reported. During the rest of 2018, an additional sixty-four blinds were completed by HFSC toxicologists, all of which were successfully analyzed.
In December 2015, HFSC announced that it had incorporated blind proficiency testing into its firearms section, supplemented with a blind verification program. No matter the forensic discipline, the first step in blind proficiency testing always requires HFSC’s Quality Division to construct or obtain fake evidence samples that appear sufficiently authentic to fool analysts. Procuring these samples for testing in the firearms section is much more complicated than simply buying blood tubes from a third-party. Rather, the firearms section manager, assisted by a section employee who does not do comparison analysis, works with HFSC’s Quality Division to create or acquire the blind samples, which consist of fired cartridge casings, bullets, and/or guns.
To make fired bullets and cartridge casings, these two firearms section employees test-fire personal guns, guns from the HFSC reference collection, or guns borrowed from personal contacts. The section manager examines the resulting markings on the bullets and casings to make sure that those chosen as blind samples will not be too easy or too difficult. The goal is to create blind samples that mimic actual casework, with a range of difficulty levels. Bullet and casing samples may be fired from different gun makes or models; often, they are fired from the same make and model of two different guns so that the resulting comparisons will be more challenging. Then, one of the two firearms will be submitted to the laboratory and the examiner will be asked to determine which, if any, of the fired evidence was fired from the known gun. 
All cartridge casings sent to HFSC for testing, whether as blind controls or actual casework, must first be entered into the National Integrated Ballistic Information Network (NIBIN) database. This creates a difficulty for the blinds program because, in Houston, HPD officers enter cartridge casings into NIBIN. Those officers place evidence cartridges into a NIBIN unit for digital imaging and send the resulting images to the National Correlation and Training Center (NCTC) in Huntsville, Alabama. NCTC technicians run those images through the NIBIN database, obtain a printout of potential matches, and then determine whether those cartridges were fired from a weapon that was used in another crime.
This is problematic because the test-fired cartridges created as blind samples do not originate from crime scenes—they are manufactured evidence, not real evidence. Scanning by HPD cannot be omitted, however, because any cartridge without a related NIBIN entry would stand out as a blind sample to firearms examiners. Nevertheless, it is imperative that HFSC avoid creating a link between a fictional crime and a NIBIN-listed gun. To ensure that this will not occur, the firearms section manager creates blind cartridge samples using guns that do not have a NIBIN history, so that cartridges fired from them will not result in a “hit” to NIBIN.
When guns are submitted directly to HFSC, they are test-fired by HFSC analysts, who also enter information relating to those guns into NIBIN. To procure guns for the blind testing program, the section manager borrows guns from other gun owners or salvages guns scheduled for destruction from the HPD property room. The guns obtained through the latter method will only work as blind samples if any identifying marks made on those guns by HFSC personnel in the past can be removed. Even so, if examiners have seen a certain gun even once before, they are not easy to fool; one analyst identified a property room weapon as a blind because, she said, it “smelled familiar.”
Manufactured bullet samples, on the other hand, present fewer problems because bullets are not entered into NIBIN. Normally, the firearms section manager and her assistant create bullet “evidence” by test-firing into a water tank. This method, however, produces bullets that are more pristine than those recovered from bodies or after hitting concrete. To make test-fired bullets that are less “perfect” and more representative of real casework, the section manager and her assistant also fire weapons into other substances, including, on one occasion, a cow’s liver.
Once the evidence materials have been constructed, HFSC’s Quality Division packages guns, bullets, and/or cartridges into “cases” with assigned HPD case numbers, and has them delivered by CS/CM to the HPD property room. A request for testing is submitted through HFSC’s online portal using the name of an HPD officer who has agreed to participate in the blind testing program, and CS/CM staff picks up the evidence and conveys it to HFSC’s firearms section. The firearms section has a dedicated CS/CM case manager who opens firearms evidence packages, inventories the contents, engraves any bullets or cartridge cases with the HFSC forensic case number and a unique item number, and uses a waterproof marker to write identifying numbers on any weapons. After logging in the evidence, the CS/CM firearms case manager—who does not know which cases are blinds—assigns the evidence to the various analysts for testing.
After testing is complete, CS/CM returns the blind samples to the HPD property room and the Quality Division evaluates the test results. To determine whether examiners reached the expected result, HFSC’s Quality Division counts “cases,” not conclusions within each case. In other words, if a blind case contains ten cartridge cases for review, the Quality Division counts all ten cartridge identifications as one test result. The analyst must obtain the right answer with respect to all ten cartridges for the Quality Division to conclude that the examiner reached the expected case result. In the event that an examiner correctly identifies nine out of ten cartridges, the Quality Division would report that the examiner reached the incorrect result—although the report would specify what the examiner missed (i.e., that one cartridge case out of ten was incorrectly identified). According to the head of HFSC’s Quality Division, results are tabulated by case rather than by conclusion because it more closely approximates how laboratory stakeholders view real test results.
Since the program’s inception in 2015 through December 31, 2018, twenty-five blind proficiency test cases have been completed by HFSC firearms analysts. They were all analyzed satisfactorily—in each case, HFSC firearms examiners reached the expected conclusions.
3. Latent Prints.
HFSC introduced blind proficiency testing in its latent prints section in November 2016, with respect to both fingerprint processing and fingerprint comparisons. To construct blind case samples, HFSC Quality Division employees first find five volunteers who agree to be fingerprinted by the latent prints section manager to create five ten-print cards. The Quality Division attaches fake names to the ten-print cards using a fake name generator, and the cards are uploaded into the regional Automated Fingerprint Identification System (AFIS). This is made possible through the cooperation of the Harris County Sheriff’s Office, which controls the regional AFIS and allows HFSC’s fingerprint section manager to submit five “fake” fingerprint records at a time into the database.
The Quality Division next creates multiple blind “cases” by having the fingerprint donors handle everyday items, such as sunglasses, hats, shoes, and knives, that have been purchased from local stores by Quality Division staff members. Creating these test items can be more challenging than it sounds. For example, one blind-test item was identified as such by an HFSC fingerprint processor because of the way the fingerprint “donor” gripped the item. According to the processor, “[N]o one holds a crowbar like this.”
Once constructed, the items are assigned HPD case numbers and transported to the HPD property room. As with all blind-test samples, requests for testing are submitted through HFSC’s online portal using the name of a participating HPD officer, and CS/CM picks up the evidence, delivers it to HFSC’s latent prints section and enters the fake requests into HFSC’s LIMS. At that point, the blind evidence samples are assigned to fingerprint processors who lift and preserve any usable prints from the items onto lift cards.
The blind print cards are then sent, in the same manner as any other lift card submitted to HFSC, to print examiners for comparison and identification purposes. First, however, HFSC’s Quality Division provides a list of blind case numbers to the assigning section supervisor. This is necessary so that the supervisor can prevent the same examiner from working on multiple blind samples that “hit” in AFIS to the same fake case number. Otherwise, print examiners may realize they are working on a blind sample because they remember either the print itself or the AFIS identifier from a previous case.
After the lift cards are assigned, fingerprint examiners create high resolution digital images of the lifted fingerprints, determine whether those prints are suitable for analysis, and compare the unknown evidentiary fingerprints with known comparison samples. Usually, this means that HFSC examiners will upload a digital fingerprint image from the lift card to the AFIS database, which will return possible match prints along with associated identifier numbers. Next, the examiner performs a preliminary analysis to determine whether any of the database “hits” is a possible match. If so, the examiner will take a screen shot and have a second examiner verify the potential match. If a verification is obtained, the latent print supervisor retrieves the name associated with that number identifier and provides it to the officer as an investigative lead. Only if the investigating officer determines that the print has potential investigative value will an HFSC examiner perform a full comparison of the potential match print.
In February 2019, HFSC’s latent print section employed ten fingerprint comparison examiners—meaning that one fingerprint donor’s prints could be used to generate at most ten blind quality samples, one for each examiner in the section. After the blind cases associated with a set of five “fake” ten-print cards have been fully worked by HFSC print examiners, the fingerprint section manager and the Quality Division find five new fingerprint donors and create five new “fake” ten-print cards. When these are delivered to the Harris County Sheriff’s Office to be uploaded into AFIS, the previous five fake records are removed from the regional database, and the blind proficiency testing cycle begins again.
From November 2016 through December 31, 2018, 141 blind proficiency test samples were completed by analysts in HFSC’s latent prints section, and 4 were recognized by analysts as blind samples. All blind samples were completed successfully.
C. Lessons Learned—Challenges and Benefits of Blind Proficiency Testing
Even in recent years, crime laboratory directors and scholars have expressed skepticism regarding whether a system of blind proficiency testing could successfully be implemented at a working forensic lab. Some objected that blind proficiency testing would be too time consuming, complex, and expensive to execute; others predicted that some forensic analysts would be unwilling to submit voluntarily to a blind proficiency testing regime. Certainly, over the last four years HFSC has had to overcome a number of obstacles as its blind proficiency testing program has expanded across six forensic disciplines. In hopes that HFSC’s experiences may make it easier for other laboratories to employ blind proficiency testing, we have summarized some of those challenges—and, where possible, how they were resolved—below. Finally, we discuss the corresponding benefits experienced by HFSC thus far in embracing blind proficiency testing, benefits that we believe far outweigh any costs associated with its adoption.
1. Challenges in Implementing Blind Proficiency Testing.
Constructing or procuring test samples that fool analysts. It goes without saying that there can be no blind proficiency testing without fake evidence samples to test. Creating these samples is complicated because any materials used in blind testing must accurately mimic ordinary casework or the testing will not be blind. Devising test cases that analysts cannot identify as such is not easy; forensic examiners are smart people who make their living analyzing evidence, and who pick up on very subtle differences between constructed blinds and actual casework samples. While HFSC can purchase the blood tubes used as blind test cases in toxicology, most blind samples used in other laboratory divisions are significantly more difficult for the Quality Division to obtain and/or manufacture.
In particular, the HFSC Quality Division has found it both costly and demanding to construct realistic blind samples for the laboratory’s digital forensics section. The electronic items submitted to that section for analysis, such as cellphones, laptops and tablet computers, can be expensive to purchase in the first instance. Then, they must be loaded with digital information—photos, e-mails, text messages, call records, documents, etc.—so they will pass muster as legitimate case materials. For example, to construct a cellphone, the Quality Division first purchases a burner phone from a convenience store. Staff members then rely on a network of personal contacts who call and send text messages to the phone over a period of time to provide it with a plausible “history.” In the future, the Quality Division hopes to obtain electronics scheduled for destruction from the HPD property room. In the meantime, the Quality Division tries to reduce costs by buying less-expensive phones and tablets from local pawnshops.
Experience has shown that the best way for Quality Division staff members to build blinds that do not stand out as such is to learn from their mistakes. Analysts who successfully spot blind proficiency tests are debriefed, and the Quality Division modifies future blinds accordingly. HFSC has been able to obtain this feedback by incentivizing analysts to report the blind samples as they are discovered. At the inception of HFSC’s blind testing program, laboratory director Dr. Peter Stout made a standing bet with his staff: any analyst who correctly reports a blind receives a $10 Starbucks gift card. If the analyst is incorrect, he or she owes Stout a dollar.
Over the past four years, Stout has paid out about a dozen Starbucks cards, and has himself received a couple dollars. The important point, however, is that analysts genuinely enjoy matching wits with the Quality Division and are eager to report blinds and explain how they beat the system. This is how, for example, the Quality Division learned that toxicology analysts were identifying blind samples because the handwriting on the accompanying paperwork was too neat—a “tell” that Quality Division staff members then worked to eliminate by filling out that paperwork with their nondominant hands.
One obvious way to reduce the cost, both in time and money, of constructing blind samples would be to develop an inter-laboratory evidence exchange program among crime laboratories that use blind proficiency testing. In 2018, HFSC shared some of its blind sample blood tubes with several other crime laboratories to help them implement blind proficiency pilot studies in toxicology. Representatives from these laboratories have indicated interest in apportioning costs if and when they elect to expand these studies to full-scale programs. As soon as additional crime laboratories adopt blind proficiency testing, an exchange program could be devised to significantly reduce the time and expense involved in constructing case materials.
Obtaining program buy-in from bench analysts. Some scholars have speculated that crime laboratory analysts will oppose blind proficiency testing because of its deceptive nature, seeing it as an underhanded and unfair method of evaluating their work. To prevent laboratory staff from feeling threatened or resentful by the introduction of blind proficiency testing, HFSC’s Quality Division informed examiners about the program in advance, explaining its goals and benefits. Laboratory management assured analysts that the blind testing program was simply another method to ensure that the laboratory produced quality work; if any issues were discovered through blind testing, they would be treated no differently from any other quality incidents and addressed pursuant to existing laboratory protocols.
After implementation, Quality Division staff also devised and conducted a blind chocolate chip cookie taste test to diffuse any fears and gauge analysts’ comfort levels with the idea of blind testing.
By far the most successful tactic used to obtain analyst buy-in, however, has been the Starbucks gift card challenge described above, where examiners try to outsmart the Quality Division by correctly identifying blind test samples in return for a gift card awarded by HFSC’s executive director. The enthusiasm with which HFSC lab analysts have embraced this simple incentive program is hard to overstate. As a result, HFSC experienced negligible, if any, pushback from its analysts, and they remain highly supportive of blind proficiency testing. Analysts frequently report that the blind testing program gives them added confidence and credibility on the witness stand because it gives them an objective way to prove their competency.
Securing collaboration from other criminal justice agencies. Convincing other criminal justice agencies to collaborate with HFSC has been essential to the success of its blind testing program. For HFSC, the most important partnering agency has been HPD. Recall, for example, that each testing request must be submitted through HFSC’s online portal under the name of a real HPD officer and with a real HPD case number. Without the support of HPD leadership and officers willing to have their names used in this way, blind proficiency tests could not be submitted in the same manner as regular casework—which would be a dead giveaway to HFSC analysts. The cooperation of both HPD and the Harris County District Attorney’s Office has also allowed HFSC’s Quality Division to obtain firearms and drugs scheduled for destruction for use in making blinds. The willingness of these agencies to collaborate was achieved primarily through the efforts of HFSC’s executive director, who worked extensively to educate them about how the program would advance their interests as well as the laboratory’s goals.
Achieving access to forensic databases. In a number of forensic disciplines, examiners use local, state, and national databases on a daily basis to link evidence with specific individuals or other evidence items. Forensic biologists, for example, upload DNA samples to CODIS seeking a “match”; fingerprint examiners use AFIS as a repository of identified fingerprints; firearms examiners employ NIBIN to link fired cartridge casings to individual guns. Ideally, any blind proficiency testing of analysts in these disciplines would also evaluate their ability to use these databases. To do this, blind evidence samples must be introduced into the databases on at least a temporary basis. Devising procedures to do this without either corrupting the database or violating database eligibility rules is a continuing challenge for HFSC’s quality staff.
In general, HFSC has had some success at the local level with respect to obtaining database access for its blind testing program. For example, the Harris County Sheriff’s Office allows HFSC’s latent prints section to enter five “fake” ten-print cards at a time into the regional AFIS and then remove them when blind testing is completed. To date, however, HFSC’s efforts to develop acceptable similar procedures with respect to CODIS at the state and national levels have been unsuccessful.
2. Benefits Attained Through Blind Proficiency Testing.
Enabling the calculation of error rates and other statistical evaluation of forensic methods. The need for forensic scientists to use rigorous scientific methods to collect data regarding the rates at which they make mistakes has been proclaimed again and again, and has been a major theme of this Article. Without reliable information about how often forensic scientists get the wrong answer, the probative value of forensic evidence is impossible to assess. This leaves our criminal justice system vulnerable to contamination by junk science and forensic fraud, often with tragic consequences. Yet how can error rates be determined in forensic disciplines where ground truth—the actual source of the bullet or fingerprint, for example—is rarely, if ever, known?
Blind proficiency testing fills this critical research gap. The whole premise behind blind testing is that test administrators know ground truth with respect to test materials. HFSC Quality Division staff members are aware of the source of the latent prints submitted on fake fingerprint cards; they know which guns fired the cartridge casings dropped into the firearms section’s workflow. This means that HFSC can calculate error rates not only for the various laboratory sections, but also for individual analysts and specific laboratory instruments.
In 2017, HFSC’s Executive Director Peter Stout reported preliminary findings regarding HFSC’s error rates at an international symposium organized by NIST, to an assembly composed primarily of forensic scientists. Based on 189 blind test samples that had been completed in the HFSC toxicology section at that time, Stout could demonstrate with a 95% degree of confidence that HFSC’s error rate with respect to positive samples was less than 3%. Because negative samples made up a much smaller percentage of the 189 test cases than positive ones, Stout could show only that the error rate was something less than 9% on those cases.
While HFSC’s toxicology section’s actual error rate is probably something significantly lower than the preliminary results reported above, the laboratory will need to obtain larger sample sizes to prove it. Four years into the program, HFSC is beginning to accumulate sufficient data that will allow it to compute error rates with a higher degree of precision. Because HFSC introduced blind proficiency testing into its various forensic sections at different times, sample sizes are larger in some sections than in others and will take time to develop. As the laboratory gets closer to its goal of completing blind test cases equal to 5% of its annual casework, the statistical power of these error rate calculations will increase.
Blind proficiency testing can also provide other types of statistical support for the validity and reliability of forensic methodology. HFSC scientists demonstrated how this can be achieved in an article recently published in the Journal of Analytic Toxicology, which focused on 317 blind test samples reported by eight HFSC toxicologists using three different instruments and two pipettes. The tests were submitted from September 2015 through July 2018, and included 89 negative samples and 228 positive samples containing five different alcohol concentrations. Based on the manufacturer’s specifications, HFSC toxicologists reached the expected result in every case.
Simple and multiple linear regression analyses were used to determine whether the blood alcohol concentration levels reported by HFSC analysts differed depending on the age of the sample, the identity of the analyst, the instrument used to perform the test, or the pipette used to aliquot the test sample. Importantly, the analysis showed no statistically significant difference in the blood alcohol test results with respect to any of these variables. Because blind test samples at HFSC are treated in the identical manner as actual casework, the study concluded that HFSC toxicology methods were reliable, and its analysts produced accurate results, with respect to both blind samples and real evidence. This objective data also gives HFSC analysts a way to demonstrate during courtroom testimony that their test results are both valid and reliable.
In sum, a blind testing program like that at HFSC can establish validity as applied for the forensic practitioners who practice the six forensic disciplines in the laboratory. If replicated in a large enough number of other laboratories, blind testing of this type, in combination with other black-box studies involving open proficiency tests, might provide the data necessary to establish foundational validity as well.
It goes without saying that blind proficiency testing will only provide support for forensic disciplines and methods that have valid scientific bases. If a forensic discipline lacks a scientific foundation, such as bite mark evidence, blind proficiency testing will demonstrate that the results obtained by its practitioners cannot be replicated. Through blind testing, real science will be strengthened, and junk science discredited.
Improving quality control procedures. The much-criticized reliance by crime laboratories on open proficiency tests as a quality control measure has already been discussed. Blind proficiency tests provide a more accurate reflection of an analyst’s actual performance on daily casework because they are, in fact, a continuous, indistinguishable part of that casework. An analyst cannot devote extra attention to a proficiency test if he or she is unaware which samples are test cases. When any case might be a test, it stands to reason that overall work performance by analysts is improved. At the same time, by presenting a more realistic picture of a laboratory’s work product, results from blind test cases provide a road map for quality specialists to focus additional training or other remedial efforts where they are most needed.
Beyond this common sense conclusion, blind proficiency testing allows laboratories to evaluate their analysts in subject areas and at difficulty levels not contemplated by declared proficiency tests. Declared proficiency tests, which are commercially manufactured for purchase by laboratories across the country, have been faulted for being much easier than typical crime laboratory casework. By their nature, declared proficiency tests are limited in subject matter; for example, they may test drug analysts with respect to the powdered, but not the crystallized, form of a controlled substance.
Compare this with HFSC’s Quality Division, which manufactures blind test samples that include a wide variety of test materials in all their different forms. Additionally, the Quality Division constructs a number of blind “extreme challenge samples,” designed to evaluate how examiners perform on the near-match, near-nonmatch cases that can confound even the best analysts. The goal here is not to trap analysts into making a wrong choice, but rather to assist them in how best to approach such complex cases when they arrive.
Testing all laboratory systems. Because blind proficiency tests travel the same path through the crime laboratory as real casework, they provide a means for the laboratory to evaluate all its processes from evidence handling to report writing. In this way, blind samples act as a notification system regarding bottlenecks or other issues that exist in any area of the laboratory. In fact, blind proficiency tests even check the operation of forensic databases that are outside of the laboratory’s control. For example, when a “fake” fingerprint is uploaded into the local AFIS database, a blind test case using that fingerprint not only tests the examiner’s ability to find the correct print in AFIS, but also ensures that the database includes the donor’s print on its list of possible matches.
Other real-life examples from HFSC’s experience demonstrate how blind proficiency testing evaluates more than just analytic judgments made by bench examiners:
— A drug blind test sample unexpectedly transformed from a powder into a liquid before it could be delivered to the controlled substances section by CS/CM. This event allowed the Quality Division to evaluate CS/CM’s response to ensure that it followed proper protocols. It had: CS/CM repackaged the evidence and documented the occurrence before notifying supervisors in the controlled substances section, all in accordance with HFSC’s standard operating procedures.
— During shipment to an outside laboratory for confirmatory toxicology testing, a test tube in a blood collection kit broke. Although the tube was a blind sample, it was packaged in the same manner as real casework. The broken test tube alerted laboratory management of the need to improve evidence packaging in advance of shipment.
— In 2016, a refrigerator used by the toxicology section containing 529 evidence samples malfunctioned, with the result that evidence sat at room temperature for almost four hours. Sixteen of those samples were blind test cases with known alcohol concentrations. Analysis of those samples after the event showed no difference in blood alcohol concentrations, demonstrating that the event had not compromised the remaining evidence.
— An HFSC controlled substance analyst determined that a blind test sample contained both cocaine and methamphetamine, even though the Quality Division believed that the test sample contained only cocaine. Additional testing revealed that the analyst was correct; the sample indeed contained a mixture of cocaine and methamphetamine. An investigation showed that the Quality Division most likely had itself introduced the methamphetamine into the test sample while preparing the blind test. This illustrates how the blind proficiency testing program acts as a check on the Quality Division itself, and can indicate areas where the Quality Division needs to improve its own procedures.
Exposing forensic fraud. Last but certainly not least, blind proficiency testing provides a means for laboratory management to detect and prevent forensic fraud. Stories about forensic analysts who tamper with evidence or engage in dry-labbing, such as drug chemists Sonja Farak and Annie Dookhan of the Massachusetts state crime laboratory, have been far too frequent in recent years. Dookhan not only issued reports without testing the underlying evidence, she also added controlled substances to evidence vials that tested negative for drugs. According to the state inspector general’s report, the laboratory’s “laissez-faire” management style was the primary cause of the crisis, which resulted in the dismissal of charges in 36,000 drug cases. A blind proficiency testing program as robust as that implemented by HFSC will go a long way to ensure that fraudulent analysts will be either deterred or unmasked before inflicting this kind of damage on the criminal justice system.
“In an ideal world, all crime labs would be able to do this.”
—Ray Wickenheiser, President of the American Society of Crime Laboratory Directors, on the use of blind testing
In deciding the admissibility of scientific evidence, Daubert requires courts to find the evidence to be valid according to the standards of good science. To date, the research on most forensic disciplines is insufficient to demonstrate foundational validity. The PCAST Report urged large-scale validation studies for the forensic sciences as a means of validating the disciplines themselves. Ongoing blind testing occurs routinely in clinical laboratories, as required by federal law, and military laboratories have also done blind testing for many years. However, most forensic laboratories (including those abroad) have yet to organize blind testing programs. One laboratory, HFSC, has taken up the challenge of the PCAST Report and developed robust, ongoing blind testing programs in six forensic disciplines as a regular feature of its quality control system. Thus, HFSC will be uniquely positioned to amass the statistical data to support the validity of the six technical disciplines practiced in this laboratory.
The NAS and PCAST reports, as well as the discovery of numerous wrongful convictions, make clear the urgent need to demonstrate the scientific validity of the forensic disciplines to enable criminal courts to make more informed decisions about admissibility and to give factfinders appropriate information on the probative value of the evidence. A blind testing program like that at HFSC can provide this critical information. A robust blind testing program can even provide more refined error rates for subsets of testing, distinguishing between easier cases and hard cases, to give the criminal justice system even better information.
Moreover, the top-to-bottom quality assessment provided by blind testing creates a continual feedback loop regarding every aspect of the laboratory process from evidence packaging, storage and transportation to the numerous procedures involved in forensic testing and the use of databases, as well as the issuance of laboratory reports. The case management system, a necessary predicate to implementing blind testing, can help reduce analysts’ exposure to irrelevant investigation information that is known to create cognitive bias. In and of itself, this change in forensic laboratory procedures is long-overdue and should be a priority for criminal justice stakeholders—prosecutors, defense attorneys, and the judiciary.
In 2015, when HFSC embarked on its ambitious plan to implement blind testing across the entire laboratory, many forensic scientists thought blind testing would be infeasible. Today the laboratory runs blind tests that total approximately 5% of the total caseload in all six technical disciplines practiced. As this Article has explained, doing so was not without substantial challenges, but HFSC’s experience has shown that it can be done and should be replicated in other laboratories, especially the larger ones with dedicated quality assurance personnel. Indeed, the more laboratories that conduct blind testing, the greater the opportunities for sharing constructed samples. An interlaboratory exchange program for blind test samples would reduce costs for all of the laboratories, and, perhaps more importantly, would enable large black-box studies for foundational validity.
Writing in 2016, PCAST viewed the adoption of blind testing as a priority, stating that, “[T]est-blind proficiency testing of forensic examiners should be vigorously pursued, with the expectation that it should be in wide use, at least in large laboratories, within the next five years.” Local criminal justice stakeholders in communities with large forensic laboratories should expect to see progress in moving toward a case management system and blind testing. State legislatures and accrediting bodies can also play a role in setting future accreditation requirements for forensic laboratories to include standard operating procedures that utilize case management and blind proficiency testing. At the federal level, the Department of Justice can likewise implement blind testing in federal forensic laboratories.
See President’s Council of Advisors on Sci. & Tech., Exec. Office of the President, Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods 53 (2016) [hereinafter PCAST Report], https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/PCAST/pcast_forensic_science_report_final.pdf [https://perma.cc/YCY4-3T7Z].
Daubert v. Merrell Dow Pharm., 509 U.S. 579, 592–94 (1993).
See infra notes 47–48 and accompanying text.
See infra notes 51–86 and accompanying text.
See Sandra Guerra Thompson, Cops in Lab Coats: Curbing Wrongful Convictions Through Independent Forensic Laboratories 91–92 (2015).
See Keith A. Findley, Innocents at Risk: Adversary Imbalance, Forensic Science, and the Search for Truth, 38 Seton Hall L. Rev. 893, 942 (2008) (citing study showing that Daubert did not significantly change the admission rate of scientific evidence in criminal cases, both at trial and on appeal). Commentators point to “inertia or tradition” as one reason, since courts have long admitted many types of forensic science evidence. Id.; see also Boaz Sangero, Safety from Flawed Forensic Sciences Evidence, 34 Ga. St. U. L. Rev. 1129, 1137 (2018).
See Michael J. Saks et al., Forensic Bitemark Identification: Weak Foundations, Exaggerated Claims, 3 J.L. & Biosciences 538, 566 (2016) (concluding that “the foundations of bitemark identification are unsound”). The Texas Forensic Science Commission found bite mark analysis so lacking in scientific support as to call for a moratorium on its admission in court and undertook a case review of past convictions. See Erik Eckholm, Texas Panel Calls for an End to Criminal IDs via Bite Mark, N.Y. Times, Feb. 13, 2016, at A10.
The FBI reported that “74 of the 329 wrongful convictions overturned by DNA evidence involved faulty hair evidence.” FBI Testimony on Microscopic Hair Analysis Contained Errors in at Least 90 Percent of Cases in Ongoing Review, FBI (Apr. 20, 2015) [hereinafter FBI Testimony on Microscopic Hair], https://www.fbi.gov/news/pressrel/press-releases/fbi-testimony-on-microscopic-hair-analysis-contained-errors-in-at-least-90-percent-of-cases-in-ongoing-review [https://perma.cc/P6QX-GBT7]. DNA exonerations in cases involving hair analysis prompted case reviews of past convictions by the FBI, as well as by the Texas Forensic Science Commission. See Tex. Forensic Sci. Comm’n, Sixth Annual Report 14 (2017), https://www.txcourts.gov/media/1440353/fsc-annual-report-fy2017.pdf [https://perma.cc/8HPJ-S6NQ].
The Texas Forensic Science Commission recommended a case review of prior convictions to identify cases based on scientifically invalid evidence. See Texas Fire Marshal Speaks About Arson Case Review, Innocence Project (Feb. 8, 2017), https://www.innocenceproject.org/texas-fire-marshal-discusses-arson-case-review/ [https://perma.cc/CC5R-N5Y4].
In some jurisdictions, prosecutors continue to admit microscopic hair analysis “where mitochondrial DNA testing is deemed too expensive, time consuming or is otherwise unavailable.” FBI Testimony on Microscopic Hair, supra note 8.
Nat’l Research Council, Strengthening Forensic Science in the United States: A Path Forward 87 (2009) [hereinafter NAS Report], https://www.ncjrs.gov/pdffiles1/nij/grants/228091.pdf [https://perma.cc/59PP-X864]; see also infra note 47 and accompanying text.
See NAS Report, supra note 11, at 127–82.
See id. at 22–23 (recommending research to demonstrate the scientific validity of forensic methods).
See infra notes 82–86 and accompanying text.
See infra notes 97–100 and accompanying text.
See infra notes 118–22 and accompanying text.
See infra note 116 and accompanying text.
See infra note 117 and accompanying text.
Cf. Nat’l Comm’n on Forensic Sci., U.S. Dep’t of Justice, Views of the Commission: Facilitating Research on Laboratory Performance 1 (2016) [hereinafter NCFS], https://www.justice.gov/archives/ncfs/page/file/909311/download [https://perma.cc/LK48-47QR] (noting that while blind studies may be possible, they are both “burdensome and expensive”).
See E-mail from Dr. Peter Stout, President & CEO, Hous. Forensic Sci. Ctr., to Nicole Bremner Cásarez, Professor of Commc’n, Univ. of St. Thomas (July 15, 2019, 2:00 PM) [hereinafter July 15 Email from Stout to Cásarez] (on file with Author).
See PCAST Report, supra note 1, at 58–59.
See infra notes 88–89 and accompanying text.
Daubert v. Merrell Dow Pharm., 509 U.S. 579, 592–93 (1993).
Id. at 587–89. Derived by the Court from its understanding of Federal Rules of Evidence 104, 401, 402, and 702, the Daubert standard replaced the prior “general acceptance” standard of Frye v. United States, 293 F. 1013, 1014 (D.C. Cir. 1923).
See infra notes 31–38 and accompanying text.
See infra notes 47–48 and accompanying text.
See infra notes 47–48 and accompanying text.
See infra note 47.
See, e.g., Michael J. Saks & David L. Faigman, Expert Evidence After Daubert, 1 Ann. Rev. L. Soc. Sci. 105, 121–22 (2005) (reviewing studies of criminal jurisprudence post-Daubert and concluding that “if Daubert gatekeeping were rationally based on the quality of the underlying expert evidence, the exclusion rate pursuant to defense challenges would be higher than it is”).
See infra notes 118–22 and accompanying text.
Daubert v. Merrell Dow Pharm., 509 U.S. 579, 589–90 (1993).
Id. at 593.
Id. (quoting Michael D. Green, Expert Witnesses and Sufficiency of Evidence, in Toxic Substances Litigation: The Legacy of Agent Orange and Bendectin Litigation, 86 Nw. U. L. Rev. 643, 645 (1992)).
Id. at 594.
See generally Thompson, supra note 5, at 87 (addressing the invention of many forensic disciplines as investigative tools and that these disciplines are now often practiced in law enforcement laboratories).
See, e.g., Crime Laboratory Service Overview, Tex. Dep’t Pub. Safety, https://www.dps.texas.gov/CrimeLaboratory/index.htm [https://perma.cc/AJ6L-B738] (last visited June 11, 2019) (listing areas of trace evidence analysis—hair, fiber, glass, gunshot residue, shoe prints, and tire impressions, biological evidence/DNA, drugs, blood alcohol, fire debris, and toolmarks/firearm examination, toxicology, latent fingerprints, questioned documents, digital/multimedia evidence); Forensic Services, Va. Dep’t Forensic Sci., https://www.dfs.virginia.gov/laboratory-forensic-services/ [https://perma.cc/8VPV-GHLB] (last visited June 11, 2019) (listing breath alcohol, controlled substances, digital and multimedia evidence, DNA, firearms and toolmarks, forensic biology, forensic training, latent prints and impressions, toxicology, and trace evidence).
See Crime Laboratory Service Overview, supra note 41; Bloodstain Pattern Analysis Service Now Offered by the Crime Lab, Utah Dep’t Pub. Safety, https://dpsnews.utah.gov/bloodstain-pattern-analysis-service-now-offered-by-the-crime-lab/ [https://perma.cc/42E7-HTJV] (last visited June 11, 2019) (announcing that Utah’s state police laboratory had begun to provide bloodstain pattern analysis).
See Crime Laboratory Service Overview, supra note 41. For a discussion of the use of composite bullet lead analysis, see FBI Laboratory Announces Discontinuation of Bullet Lead Examinations, FBI (Sept. 1, 2005) https://archives.fbi.gov/archives/news/pressrel/press-releases/fbi-laboratory-announces-discontinuation-of-bullet-lead-examinations [https://perma.cc/CKV5-7MP6].
See NAS Report, supra note 11, at 127–28; see also Thompson, supra note 5, at 86–87.
See Thompson, supra note 5, at 86.
Id. at 181–82.
The NAS Report states: “‘[T]here is no evident reason why [“rigorous, systemic”] research would be infeasible.’ However, some courts appear to be loath to insist on such research as a condition of admitting forensic science evidence in criminal cases, perhaps because to do so would likely ‘demand more by way of validation than the disciplines can presently offer.’” See NAS Report, supra note 11, at 109 (alterations in original) (quoting Joan Griffin & David J. LaMagna, Daubert Challenges to Forensic Evidence: Ballistics Next on the Firing Line, Champion, Sept.–Oct. 2002, at 21).
See id. at 101–02 (explaining the difficulty defendants face in challenging adverse rulings on the admission of evidence).
Overturning Wrongful Convictions Involving Misapplied Forensics, Innocence Project, https://www.innocenceproject.org/overturning-wrongful-convictions-involving-flawed-forensics/ [https://perma.cc/LY4Y-ZQEG] (last visited June 29, 2019) (noting that 24% of all wrongful convictions nationally involved the misapplication of forensic science as a contributing factor).
See id. (addressing arson, comparative bullet lead analysis, and hair comparisons); see also Minutes from February 11, 2016 Bite Mark Analysis Panel Meeting in Austin, Texas, Tex. Forensic Sci. Comm’n (Feb. 11, 2016) https://www.txcourts.gov/media/1439998/20160211-minutes-fsc-bite-mark-comparison-review-panel.pdf [https://perma.cc/XA6V-CD5S] (unanimously recommending “a temporary moratorium on the use and admission of bitemark evidence in Texas courts until the appropriate research, criteria and guidelines are established”); Cases Where DNA Revealed that Bite Mark Analysis Led to Wrongful Arrests and Convictions, Innocence Project (Jan. 31, 2007), https://www.innocenceproject.org/cases-where-dna-revealed-that-bite-mark-analysis-led-to-wrongful-arrests-and-convictions/ [https://perma.cc/W43C-27RR] (providing information about five wrongful convictions based largely on bite mark analysis).
NAS Report, supra note 11, at 87.
See Thompson, supra note 5, at 108.
See NAS Report, supra note 11, at 43–44.
Id. at 20.
See Eric S. Lander, Fixing Rule 702: The PCAST Report and Steps to Ensure the Reliability of Forensic Feature-Comparison Methods in the Criminal Courts, 86 Fordham L. Rev. 1661, 1663–65 (2018) (discussing the establishment of the National Commission on Forensic Science and President Obama’s request for recommendations from PCAST).
PCAST Report, supra note 1.
Id. at 1.
Id. at 5–6.
Id. at 49.
Id. at 44.
Id. at 47–48.
Id. at 53.
See Jonathan J. Koehler, Proficiency Tests to Estimate Error Rates in the Forensic Sciences, 12 Law Probability & Risk 89, 90 (2013) (“Researchers have long suggested that it is crucial to measure error rates for the various forensic sciences because the probative value of forensic science evidence is inextricably linked to the rates at which examiners make errors. Without such information, jurors and other legal decision makers have no scientifically meaningful way of assigning weight to forensic match reports across the various forensic subfields.”).
See supra notes 49–50 and accompanying text.
See PCAST Report, supra note 1, at 50.
Id. at 49.
Id. at 51.
Blind testing has also been recommended by other national organizations. See infra notes 97–100 and accompanying text.
PCAST Report, supra note 1, at 52–53; see also Koehler, supra note 66, at 89, 92 (calling for “methodologically rigorous, blind, external proficiency tests using realistic samples across the forensic sciences” and stating that the study should involve a “broad participant pool” so as to conduct “proper sampling [which] will allow for the identification of industry-wide error rates across various forensic subfields”).
PCAST Report, supra note 1, at 52 (emphases omitted).
Id. at 53.
Id. at 52 (emphasis omitted).
Id. at 53.
Id. at 56 (emphasis omitted).
Id. at 57.
Id. at 56.
In one study, researchers inserted proficiency tests into the laboratory’s workflow without informing analysts that the samples they were testing might include a proficiency test. See infra note 93 and accompanying text. In this Article, we refer to “blind testing.”
See Thompson, supra note 5, at 195–96 (explaining the operation of laboratory proficiency tests and addressing the criticisms of external proficiency tests used as part of the accreditation process for forensic laboratories).
See Brandon L. Garrett & Gregory Mitchell, The Proficiency of Experts, 66 U. Pa. L. Rev. 901, 917 (2017).
See Joseph L. Peterson et al., The Feasibility of External Blind DNA Proficiency Testing. I. Background and Findings, 48 J. Forensic Sci. 21, 26–27 (2003) (explaining that blind proficiency testing is a “truer” measure of laboratory performance because the operation of open proficiency testing omits “pre- and post-analytic process” errors and allows for special treatment to be given to those specimens that are the subject of the proficiency test).
D. Joe Boone et al., Laboratory Evaluation and Assistance Efforts: Mailed, on-Site and Blind Proficiency Testing Surveys Conducted by the Centers for Disease Control, 72 Am. J. Pub. Health 1364, 1364–66 (1982).
Id. at 1365.
Id. at 1366–68.
Id. at 1367.
See Jackeline Moral et al., Implementation of a Blind Quality Control Program in Blood Alcohol Analysis, 43 J. Analytical Toxicology 630, 635 (identifying one benefit of blind testing: it “capitalizes on the Hawthorne Effect”).
Org. of Sci. Area Comms., Nat’l Inst. of Standards & Tech., Draft Guidance on Testing the Performance of Forensic Examiners 12–13 (2018) [hereinafter OSAC], https://www.nist.gov/system/files/documents/2018/05/21/draft_hfc_guidance_document-may_8.pdf. The National Commission on Forensic Science also recommended blind testing in the forensic sciences. See NCFS, supra note 19, at 1; see also William Thompson et al., Am. Ass’n for the Advancement of Sci., Forensic Science Assessments: A Quality and Gap analysis, Latent Fingerprint Examination 47–48 (2017), https://www.aaas.org/sites/default/files/s3fs-public/reports/Latent%2520Fingerprint%2520Report%2520FINAL%25209_14.pdf [https://perma.cc/8ZW7-XPGQ].
OSAC, supra note 97, at 14. The PCAST Report also recommends blind testing “under conditions that are representative of casework and on samples, for which the true answer is known . . . .” PCAST Report, supra note 1, at 57–58.
See W. Kerkhoff et al., A Part-Declared Blind Testing Program in Firearms Examination, 58 Sci. & Just. 258, 258 (2018).
See PCAST Report, supra note 1, at 58; see also Moral et al., supra note 96, at 635.
See Kerkhoff et al., supra note 99, at 259–60. Another blind testing program was similarly instituted in the Victoria (Australia) Police Forensic Services Department, but only in the questioned documents section. See Bryan Found & John Ganas, The Management of Domain Irrelevant Context Information in Forensic Handwriting Examination Casework, 53 Sci. & Just. 154, 158 (2013).
Kerkhoff et al., supra note 99, at 262.
Id. at 258, 262.
Id. at 263.
Id. at 261.
See infra notes 291–95 and accompanying text (regarding HFSC’s experience).
One author explains that in DNA laboratories, blind testing serves as a check on “the whole system,” usually meaning “all the steps and record keeping that go into case intake, sorting and selection of items for analysis, screening or preliminary tests, DNA analysis itself, interpretation of the results and preparation of a report.” Peterson, supra note 91, at 29–30. Laboratories may use different methods for responding to these types of process discrepancies.
Error rates for validity studies should be distinguished from estimates of measurement uncertainty that are calculated and reported in some forensic disciplines, such as toxicology. See Edward J. Imwinkelried, Regulating Expert Evidence in US Courts: Measuring Daubert’s Impact, in Forensic Science Evidence and Expert Witness Testimony: Reliability Through Reform? 275, 301–07 (Paul Roberts & Michael Stockdale eds., 2018). Measurement uncertainty estimates reflect the fact that properly calibrated equipment will produce a result that will be accurate within a margin of error. Id. at 302–06. These estimates allow legal decision-makers to know whether a test result, when considering the margin of error, indicates a violation of the law or not. Id. at 302–05.
See supra notes 91–100 and accompanying text.
See NIST Details Plans for Reviewing the Scientific Foundations of Forensic Methods, Nat’l Inst. Standards & Tech. (Sept. 24, 2018), https://www.nist.gov/news-events/news/2018/09/nist-details-plans-reviewing-scientific-foundations-forensic-methods [https://perma.cc/4WQD-935L].
PCAST concluded that bite mark analysis “does not meet the scientific standards for foundational validity, and is far from meeting such standards.” See PCAST Report, supra note 1, at 87. Hair and footwear analysis were likewise completely lacking in scientific support. Id. at 117, 120. For firearms examination, PCAST found only one black-box study, which was not sufficient to validate the discipline. Id. at 111. The group also found the statistical model used to report the probability of a person’s inclusion in a DNA mixture was “not foundationally valid.” Id. at 82.
PCAST found the statistical model used to report the probability of a person’s inclusion in a DNA mixture was “not foundationally valid.” Id. at 82.
Id. at 87.
Id. at 111–13.
Id. at 91.
Id. at 95 (emphasis omitted). The PCAST Report found: “Of the two appropriately designed black-box studies, the larger study (FBI 2011 study) yielded a false positive rate that is unlikely to exceed 1 in 306 conclusive examinations while the other (Miami-Dade 2014 study) yielded a considerably higher false positive rate of 1 in 18.” Id. at 96.
Id. at 96.
See Bradford T. Ulery et al., Accuracy and Reliability of Forensic Latent Fingerprint Decisions, 108 Proc. Nat’l Acad. Sci. 7733, 7734 (2011).
Jennifer Mnookin et al., Error Rates for Latent Fingerprinting as a Function of Visual Complexity and Cognitive Difficulty 4 (2016), https://www.ncjrs.gov/pdffiles1/nij/grants/249890.pdf [https://perma.cc/5QSA-7WP3].
Id. at 6.
See John Song et al., Estimating Error Rates for Firearm Evidence Identifications in Forensic Science, 284 Forensic Sci. Int’l 15, 16 (2018). The PCAST Report had encouraged the research on developing image-analysis algorithms for firearms examination. See PCAST Report, supra note 1, at 113.
See How Good a Match Is It? Putting Statistics into Forensic Firearms Identification, Nat’l Inst. Standards & Tech. (Feb. 8, 2018), https://www.nist.gov/news-events/news/2018/02/how-good-match-it-putting-statistics-forensic-firearms-identification [https://perma.cc/TEH3-YYTR].
Song et al., supra note 134, at 19. The authors also “emphasize that the estimated error rates in this report are specific to the sets of firearms studied here and are not applicable to other firearm scenarios.” Id. at 27.
Id. at 29.
The American Bar Association recently recommended blind proficiency testing in the forensic sciences as a means of quality assurance/quality control. Section of Criminal Justice, Am. Bar Ass’n, Report to the House of Delegates 111B, at 8 (2004). The Technical Working Group on DNA Analysis Methods (TWGDAM) also recommends blind proficiency testing for quality assurance purposes in forensic DNA laboratories. See Peterson et al., supra note 91, at 24. However, one group balked at instituting such a requirement for quality control purposes, arguing that other measures might be as effective and with fewer costs. The National Forensic DNA Review Panel of the National Institute of Justice recommended deferring the implementation of a blind testing program requirement for forensic DNA laboratories. Id. at 30. The panel found that “blind proficiency testing is possible, but fraught with problems (including costs)” and its purposes might be served through an accreditation system that should include stringent external case audits. Id.
Kerkhoff et al., supra note 99, at 263.
Blind testing also leads to efficiency gains in giving analysts “real-time feedback about their casework” that allows them to “more quickly see potential problems in casework and take preventative measures rather than corrective actions.” See Moral et al., supra note 96, at 635.
John Irving, Drug Testing in the Military—Technical and Legal Problems, 34 Clinical Chemistry 637, 639 (1988).
Id. at 639–40; see also Richard Lardner, Friendly Fire: An Apache Pilot Fights to Survive the Army’s War on Drugs, 8 Inside Army 17, 22 (1996) (noting that in 1996 the open and blind proficiency testing was overseen by the Armed Forces Institute of Pathology (AFIP)).
Our research found little public information regarding the results of the military blind quality control program. At least one journalist has obtained a statement regarding the results. See Lardner, supra note 147, at 22 (reporting that the AFIP stated that from 1983–1996 “no military lab has reported a false positive on an AFIP blind sample” and that the only errors involved “clerical errors made by the submitting units such as incorrect transcription of social security numbers prior to the submission of the samples”); Thomas H. Maugh II, Navy Viewed as Setting Drug-Testing Standard, L.A. Times, Oct. 29, 1986, at 33 (citing chemist who helped to establish the Navy program who reports that over four years (1982–1986), there had been over 4,000 blind tests, with no false positives, and “a small percentage of false negatives”). However, there does not appear to be a study published in a peer-reviewed scientific journal that would include a detailed description and data.
See Garrett & Mitchell, supra note 90, at 958; July 15 Email from Stout to Cásarez, supra note 20.
See Garrett & Mitchell, supra note 90, at 958.
See supra notes 49–50 and accompanying text.
See supra note 47 and accompanying text.
See supra notes 56–86 and accompanying text.
See supra notes 97–100 and accompanying text.
See Moral et al., supra note 96, at 631–32 (regarding importance of evidence packaging to the insertion of blind testing into entire chain-of-custody).
See NCFS, supra note 19, at 3.
See Jeff Kukucka et al., Cognitive Bias and Blindness: A Global Survey of Forensic Science Examiners, 6 J. Applied Res. Memory & Cognition 452, 452–53 (2017) (listing studies); see also Nat’l Comm’n on Forensic Sci., U.S. Dep’t of Justice, Views of the Commission: Ensuring that Forensic Analysis Is Based upon Task-Relevant Information 4 (2015), https://www.justice.gov/archives/ncfs/file/818196/download [https://perma.cc/36T9-YJHV]; NAS Report, supra note 11, at 122–24 (discussing cognitive bias in forensic science).
See Itiel E. Dror & David Charlton, Why Experts Make Errors, 56 J. Forensic Identification 600, 606–10 (2006).
See Jessica D. Gabel, Realizing Reliability in Forensic Science from the Ground up, 104 J. Crim. L. & Criminology 283, 302 (2014).
See Itiel E. Dror, Cognitive Neuroscience in Forensic Science: Understanding and Utilizing the Human Element, Phil. Transactions Royal Soc’y B: Biological Sci., Aug. 5, 2015, at 1, 4; see also Nat’l Comm’n on Forensic Sci., supra note 157, at 1–4.
See Itiel E. Dror, Practical Solutions to Cognitive and Human Factor Challenges in Forensic Science, Forensic Sci. Pol’y & Mgmt., May 14, 2014, at 1, 7.
Research shows that analysts who have not been trained regarding the problem of cognitive bias in forensic testing are more likely to deny that their own judgments could be impaired by it. See Kukucka et al., supra note 157, at 456.
See Hous. Forensic Sci. Ctr., Meeting of Board of Directors Minutes 2 (June 12, 2015), http://www.houstonforensicscience.org/meeting/57716600Nex5cSigned.pdf [https://perma.cc/GP6T-5B7Y]. Dr. Dror has presented additional workshops on cognitive bias at HFSC since 2015. See HFSC Brings Leading Expert on Cognitive Bias, Dr. Itiel Dror, to CSI Academy, What’s News @ HFSC (Hous. Forensic Sci. Ctr., Houston, Tex.), Nov. 2017, http://www.houstonforensicscience.org/event/5acd1ae24wqblr 2017.pdf [https://perma.cc/S42U-E54R].
See Hous. Forensic Sci. Ctr., Meeting of Board of Directors Minutes 2 (Oct. 9, 2015), http://www.houstonforensicscience.org/meeting/57716878EFA4LSigned.pdf [https://perma.cc/T3RB-7SLD]; see also Hous. Forensic Sci. Ctr., Houston Forensic Science, Hous. Television (Oct. 9, 2015) [hereinafter Oct. 2015 Board Meeting Video], http://houstontx.swagit.com/play/10132015-1596 [https://perma.cc/XFY3-3VLK]. Most testing requests received by the HFSC come from either the Houston Police Department or the Harris County District Attorney’s Office. See Just Blind Proficiency Testing, Just Sci. (Jan. 14, 2019), https://forensiccoe.org/js7-e10/ [https://perma.cc/2WLV-BCHZ] (interviewing HFSC CEO Dr. Peter Stout).
See Hous. Forensic Sci. Ctr., Houston Forensic Science, Hous. Television (Jan. 8, 2016), http://houstontx.swagit.com/play/01112016-501 [https://perma.cc/J2S7-4VHD] (reporting at the HFSC Board of Directors Meeting, during the Operations Report, that in December 2015, the CS/CM Division became operational).
See Oct. 2015 Board Meeting Video, supra note 166 (setting out, during the presentation of the Operations Report at the HFSC Board of Directors Meeting, the responsibilities of CS/CM Division and how it will serve as a structural bulwark between forensic analysts and outside influences); HFSC’s Client Services/Case Management Division Achieves Accreditation, What’s News @ HFSC (Hous. Forensic Sci. Ctr., Houston, Tex.), June 2018, http://www.houstonforensicscience.org/event/June2018.pdf [https://perma.cc/DMN7-FBSQ] (describing CS/CM’s evidence handling/packaging responsibilities).
Interview with Ashley Henry, CS/CM Manager, Donna Eudaley, Firearms Section Manager, Lori Wilson, Quality Division Director, Erika Ziemak, Assistant Quality Director, Ramit Plushnick-Masti, Communications Director, and Dr. Peter Stout, HFSC CEO & Executive Director, in Hous., Tex. (Mar. 14, 2019) [hereinafter Henry Interview].
According to HFSC Firearms Section Manager Donna Eudaley, who worked at HFSC’s predecessor laboratory, which was operated by HPD, in those days investigating officers had “virtually unlimited ability to discuss their testing requests with the assigned [crime] lab[oratory] examiners.” Id. Currently, however, e-mail communications to CS/CM from investigators—some of which may include potentially biasing information—are uploaded into HFSC files where they could be viewed by examiners. Id.
Telephone Interview with Ramit Plushnick-Masti, HFSC Communications Director (Jan. 28, 2019) [hereinafter Plushnick-Masti Interview].
Id. CS/CM employees, not toxicology analysts, are responsible for entering case information associated with toxicology evidence into HFSC’s laboratory information system. Id.
Id. Blood samples cases involving a fatality are automatically tested for drugs if testing for alcohol does not reveal a blood-alcohol content above the legal limit. Id.
HFSC did at one time employ a technician as a buffer to communicate with investigators regarding CODIS eligibility, however management quickly learned that only DNA analysts had sufficient knowledge and training to gather this information. Id. The need for analysts to contact investigators regarding CODIS entries has diminished, however, since HFSC introduced its online request portal. See infra notes 187–89 and accompanying text. For a basic discussion of CODIS, see Mark Nelson, Making Sense of DNA Backlogs – Myths vs. Reality, Nat’l Inst. Just. (July 15, 2010), https://nij.ojp.gov/topics/articles/making-sense-dna-backlogs-myths-vs-reality [https://perma.cc/GC3Y-G2L6].
Henry Interview, supra note 169.
See id.; Just Blind Proficiency Testing, supra note 166. Sequential unmasking of information involves blinding laboratory analysts to potentially biasing but relevant case information for as long as possible in the testing process to reduce the potential for cognitive bias. For a more detailed discussion of sequential unmasking, see Itiel E. Dror et al., Letter to the Editor, Context Management Toolbox: A Linear Sequential Unmasking (LSU) Approach for Minimizing Cognitive Bias in Forensic Decision Making, 60 J. Forensic Sci. 1111 (2015).
Henry Interview, supra note 169.
See Hous. Forensic Sci. Ctr., Houston Forensic Science, Hous. Television (Dec. 14, 2018), http://houstontx.swagit.com/play/12192018-800 [https://perma.cc/EGB2-KQL6] (describing the new LIMS at the HFSC Board of Directors Meeting).
See Henry Interview, supra note 169.
See Hous. Forensic Sci. Ctr., Meeting of Board of Directors Minutes 2 (Sept. 11, 2015), https://www.houstonforensicscience.org/meeting/5771681c96wa8Signed.pdf [https://perma.cc/G5GK-SFNF].
See Moral et al., supra note 96, at 631.
Id. at 631–32.
See HFSC Quality Division Chart (July 8, 2019) (indicating that blind proficiency tests completed by the laboratory sections as of December 31, 2018 were as follows: toxicology, 381; seized drugs, 284; firearms, 25; latent prints, 141; biology, 16; digital forensics, 10; and forensic multimedia, 4) (on file with Author).
Devising blind proficiency tests for crime scene investigators presents obvious operational and ethical problems, not least of which is how to create a fake crime scene that investigators would believe is real.
See Seth Augenstein, Houston Forensic Science Center Slides Blind Testing into Workload, Forensic Mag. (Mar. 13, 2018, 12:22 PM), https://www.forensicmag.com/news/2018/03/houston-forensic-science-center-slides-blind-testing-workload. In January 2019, this goal was achieved in all analytic laboratory sections with the exception of seized drugs. See Hous. Forensic Sci. Ctr., Meeting of Board of Directors: Quality Division Report (Feb. 8, 2019), http://www.houstonforensicscience.org/meeting/5c5dd875SD1Pu (002).pdf [https://perma.cc/DKZ3-3A7K]. HFSC management has also established blind verification programs in two laboratory sections—firearms and latent prints—as a further check against the influence of cognitive bias in forensic testing. See Jordan Benton, Latent Prints Expands Blind Program, What’s News @ HFSC (Hous. Forensic Sci. Ctr., Houston, Tex.), Apr.–May 2019, at 7, https://houstonforensicscience.org/event/5cc8514d0z6ksl 2019.pdf [https://perma.cc/NU4B-ZD39].
See Hous. Forensic Sci. Ctr., Houston Forensic Science, Hous. Television (Nov. 9, 2018) [hereinafter Nov. 2018 Board Meeting Video], http://houstontx.swagit.com/play/11162018-752/2/ [https://perma.cc/5CZS-XL29].
For a description of how the HFSC blind proficiency testing program operates in all the laboratory sections, see Callan Hundl et al., Implementation of a Blind Quality Control Program in a Forensic Laboratory, J. Forensic Sci., Dec. 24, 2019, https://onlinelibrary.wiley.com/doi/abs/10.1111/1556-4029.14259 [https://perma.cc/U523-GJYN].
See Moral et al., supra note 96, at 631.
Id. The Quality Division creates subject names and birth dates using a fake name generator, and offense locations are chosen from an online map to ensure that they are valid addresses. Id.