Links to PDFs of papers and Power Point Presentations are
provided below the title of each paper as they become available.
Brooke Abounader
University of Toronto
The Importance of Error in Learning from Scientific Models
This poster introduces a new conceptual framework within which scientific models can be understood. Though expanding on the work of Mary Morgan and Margaret Morrison (Models as Mediators, 1999), my proposed schema differs from its predecessor by introducing a scientist-entity, responsible for model generation and selection, that is not found in Morgan and Morrison’s framework. This new entity allows me to assess the occurrence and impact of errors in model selection. That is, I can see what happens in the entire system when a model fails to adequately represent some critical aspect of its target in the world. This analysis reveals that errors in model selection are not only common and unavoidable, but also necessary to the process of learning from scientific models. To illustrate this point, as well as demonstrate the utility of my proposed schema, this poster also depicts the application of the above schema to two brief case studies. One is a recent pharmaceutical drug trial, and the other an excerpt from the history of physics.
M. Emrah Aktunc
Virginia Polytechnic Institute & State University
The Tacking Paradox: A Critique of Bayesian Treatments and an Error-Statistical Proposal for Its Solution
The tacking paradox, also known commonly among Bayesian philosophers of science as the problem of irrelevant conjunction, describes a problem in philosophical accounts of scientific confirmation. The problem essentially is this; if there is a hypothesis h, and empirical evidence e, which hypothesis h fits, then h is confirmed by evidence e. However, the conjunction h&x fits e as well, where x might be a statement completely irrelevant to h, and therefore e also confirms the conjunction h&x with the irrelevant conjunct x 'tacked on' to h. This is the basic formulation of the tacking paradox.
Though the tacking paradox was originally introduced by Hempel (1945), it has recently attracted considerable attention by mostly Bayesian philosophers of science (e.g. Earman (1992), Rosenkrantz (1994), Fitelson (2002), and Maher (2004)). The common thread in Bayesian formulations of the tacking paradox seems to place the problem in the standard Bayesian account of confirmation where if the posterior probability of a hypothesis h is higher than its prior probability then it is confirmed by evidence e. Then, it is argued that evidence e, if it has a confirmatory effect on h, will confirm the hypothesis h to a greater extent than it confirms the conjunction h&x with the irrelevant conjunct x. In Bayesian terms, the posterior probability of h on e will be greater than the posterior probability of h&x on e. It is acknowledged that in this treatment the conjunction h&x still gains some confirmation from e but to a lesser degree than h. Though this approach to the problem is a common thread among Bayesian accounts, this is not to say that there is a commonly accepted Bayesian solution. Maher (2004), for instance, criticizes such Bayesian treatments because of the drawback of h&x gaining some confirmation from e and then offers a different Bayesian approach. The discussion among Bayesian philosophers of science on the ideal Bayesian treatment of the tacking paradox seems to be currently ongoing.
In this presentation, I will attempt at offering an error-statistical approach to the tacking paradox. Alan Chalmers (1999) pointed toward a potential error-statistical solution to the tacking paradox based on Mayo's (1996) notion of severe tests. In Mayo's account of severe tests, a hypothesis h is confirmed by evidence e only if the test that yields e is a severe test of h. A test is a severe test of h if and only if the test has a high probability of not passing h if h is false. So, in this account, Chalmers suggests, if the test that passes h is a severe test and hence confirms h, and if it is not a severe test of h&x then evidence e yielded by this test does not confirm h&x. Chalmers gives the example of Newton's theory which has been confirmed by observations of motions of a comet and the statement Òemeralds are green' which Òtacked onÓ to a statement of Newton's theory. The argument is that since the motions of a comet would not be affected if some emeralds are blue, the observations, which provide the evidence e for the confirmation of Newton's theory, will not constitute a severe test of the hypothesis Òemeralds are green.Ó Therefore, the evidence e, which confirms Newton's theory, will not have any confirmatory import on h&x, not even to a lesser degree than h, and the problem is solved. In this presentation, I will provide a closer look at this intuition that the notion of severe tests solves the tacking paradox and offer a formal analysis of the problem and its potential error-statistical solution. This account of the tacking paradox and severe tests will at the end be put in the broader context of severe tests in general and insights that it may offer for a general logic of severe testing will be discussed.
References:
Chalmers, A. (1999). What Is This Thing Called Science, Third Edition. Indianapolis, IN: Hackett Publishing Company, Inc.
Earman, J. (1992). Bayes or Bust? A Critical Examination of Bayesian Confirmation Theory. Cambridge, MA: MIT Press.
Fitelson, B. (2002). Putting the irrelevance back into the problem of irrelevant conjunction, Philosophy of Science, 69, 611-622.
Hempel, C. (1945). Studies in the logic of confirmation, Mind, 54; 1-26, 97-121.
Maher, P. (2004). Bayesianism and irrelevant conjunction, Philosophy of Science, 71, 515-520.
Mayo, D. (1996). Error and the Growth of Experimental Knowledge. Chicago, IL: The University of Chicago Press.
Rosenkrantz, R. D. (1994). Bayesian confirmation: Paradise regained, British Journal for the Philosophy of Science, 45, 467-476.
Dr. John Byrd
Joint POW/MIA Accounting Command, Central Identification Laboratory
john.byrd@jpac.pacom.mil
The Role of E.R.R.O.R. in the Forensic Identification of Human Remains
Philosophers of science are spending considerable effort today, as they have
always, in describing the way scientists work. The classic writings of Popper,
Kuhn, and Hempel were required readings for our generation of scientists training
in graduate school during the late 1980’s. These great philosophers
wrote eloquently about the nature of science and we aspired to conduct our
research in a manner that fit the picture they had painted for us. However,
it is apparent to us that the models of the scientific process we learned
in school do not adequately reflect the way that we conduct research and draw
conclusions. Reality is far messier. More recent models of the scientific
process as described by philosophers such as Mayo and Haack are closer to
our reality, and are useful in helping to organize our thinking about the
scientific process.
Our work in the Joint POW/MIA Accounting Command, Central Identification Laboratory
centers on the forensic identification of human skeletal remains. We have
been interested in examining more formally the process we follow in generating
data, analyzing data, and drawing conclusions. One broad question we have
is particularly haunting: When is there enough evidence in hand to draw a
conclusion (and close a case)? The reader might be amused to learn that prior
to setting out on this research path, the author was confident that he could
purchase a philosophy of science text book, open to the index, search under
“evidence”, and find the answer. After all, philosophers of science
have written countless volumes over the years and what could be more fundamental
than a detailed exposition of how one draws conclusions from evidence? This
plan failed. What we have discovered is that Deborah Mayo’s (1996) account
of the scientific process is a fair depiction of what we do and why we do
it.
We agree with Mayo that there is an important nexus between science, philosophy,
and statistics that must be appreciated if we are to understand what it is
that we are doing and to use that understanding to improve our process of
learning. We also agree that learning from error is at the core of the process.
We offer two examples of analytical work that illustrate the scientific process
as we practice it. The first is the sorting of commingled human remains, wherein
we engage numerous methods, some statistical and some qualitative, to resolve
an assemblage of bones into individual skeletons. The process works by systematically
comparing each bone to all others using a variety of methods seeking to exclude
the bone from as many others as possible. The comparisons are treated as tests,
and some of the tests (DNA, osteometrics, pair-matching) are more severe than
others (bone preservation, color, articulations). The second process is the
identification of an individual skeleton, whereby the data generated by analyzing
the skeleton (dental pattern, DNA sequence, stature, age, sex, etc.) is compared
to all possible candidates for identification. Each line of evidence is used
to exclude as many candidates as possible. An identification is made when
one individual remains. Note that this process is fundamentally different
from a Bayesian approach, whereby we would identify the individual with the
“highest probability.”
Scientists clearly need philosophers of science to help them formalize and improve the reasoning processes that lead to conclusions. Likewise, philosophers of science need scientists to help them understand the work they do and the rationale (flawed or not) behind it. We hope that our poster will shed some light on these topics and inspire fruitful debate.
Andre J. Crawford
Department of Economics
Virginia Polytechnic Institute & State University
Evaluating Economics: How much have we learned from Economic Theory?
This study maintains the position that empirical modeling, specifically the form which has been practiced over the past 70 years since the founding of the Econometric Society, has failed to evolve into an apparatus that reliably evaluates economic theory in the manner promulgated by the Society's constitution. That econometrics, which is the form of modeling used in testing and measuring economic phenomena, scarcely provides justification for conferring the title of 'science' on economics is an irrefutable position; however there is additional cause for concern since the fundamental flaws undermining empirical modeling appear deeply ingrained and are due to a pervasive system of beliefs shared by economists that: (a) a theory which explains real world phenomena 'reasonably well' while employing as few assumptions as possible is unequivocally more powerful than one which uses more stipulations and conditions; (b) criticism leveled against model assumptions because of their disconnect with reality are unwarranted as the assumptions are usually false anyway; and (c) theoretical (structural) and statistical models are identical save for those factors which are not being controlled for but which are captured by a term that is 'well-behaved' under certain regularity conditions. This approach to statistical analysis has also permeated the natural and physical sciences where scant attention is paid to the methodological bases of the respective strategies employed to yield (valid) statistical inference. Henceforth, in the spirit of Mayo & Spanos (2004), the current study takes traditional empirical modeling to task insofar as it is interpreted in the language of (a), (b) and (c) and recommends the more systematic Probabilistic Reduction (PR) approach as a viable alternative since it advocates achieving 'statistical adequacy' of any model or sub-model designed to provide statistical inference. Adopting the universally recognized framework of the Linear Regression Model several illustrative examples indicate the potentially debilitating effects of the Duhem-Quine (D-Q) 'joint-testing' thesis on much of econometrics. These examples will also indicate the extent to which the PR approach mitigates the issues raised by the D-Q thesis since it recommends listening to the 'voice' of the data in conducting a post specification/preinferential statistical exercise. Unless empirical modeling is conducted in this systematic manner, it will never achieve its intended purpose which is to provide scientific rigor to economic theory.
Dr. Jeffery Downard
Dept of Philosophy, Northern Arizona University
Inductive Forms of Inference in Law
Dr. Damien Fennell
Centre for Philosophy of Natural and Social Science, London School of Economics
The Error Term and its Interpretation in Structural Models in Econometrics
This paper makes explicit important properties of the error term in structural models in econometrics, properties that ensure successful statistical and causal inference. Structural models, unlike forecasting models, aim to capture relationships among variables of interest that are robust to intervention and thus useful for policy-making. The simultaneous equation models analysed in this paper are widely used in econometrics for modelling equilibrium systems. The paper is part of a larger research which aims to develop a clear interpretation for such models in order to understand their strengths, limits and scope.
The interpretation of the error term is a central part of understanding structural models since the error term acts as a cover-all term for those parts of the economy that play a significant role, but are not explicitly modelled. The error term is also important because by definition it is unobservable and thus, its assumed properties can only be indirectly tested. However, the error term's unobservability raises the danger that important parts of the model will be hidden in the error term and remain untested as a result. Therefore, it is important to bring out just what properties of the error term we should be concerned to justify, by observation or otherwise, if we want our statistical and causal claims based on the model to hold.
The first part of the paper summarises, using a simple representative simultaneous structural model, the properties of the error term that are used to ensure successful statistical inference. In short, it presents an overview of the existing, highly-developed analysis in econometric theory for statistical inference with structural models. In order to flesh out the properties assumed for the error term for causal inference purposes, it is first necessary to present a causal interpretation of the structural equations. This is done in the second part of the paper by presenting an interpretation of structural equations based on Herbert Simon's definition of casual order. This includes an explicit causal interpretation of the error term, one that fits with convention, in which the error term denotes the net effect of factors not explicitly modelled in an equation.
In the third and final part of the paper, important properties of the error term, typically required for identifiability of the model such as errors being mutually independent or having a specific covariance matrix, are interpreted using the causal interpretation presented in the previous part of the paper. These properties of the error term, those that are used to support causal inference, are then compared/contrasted with properties of the error term used to support statistical inference. Some properties of the error term are useful for both causal and statistical inference, for example the mutual independence of error terms and independence of error terms with exogenous factors. Crucially, however, the roles and interpretations of these common properties for error terms are distinct depending on whether the aim is to carry out statistical inference or causal inference. By highlighting the importance of these error term properties from a causal perspective, it aims to make explicit how certain assumptions of the error terms, typically presented as requirements for statistical inference, also play a significant role for causal inference.
Ulrich Frey
Department of Philosophy
Technical University of Braunschweig
u.frey@tu-bs.de
Scientific errors: Their cognitive basis and evolutionary roots
Some errors in science can be linked to errors known from Cognitive Psychology.
These in turn can be explained through an evolutionary model. Optical illusions
prove that our vision is prone to errors – and so is our thinking prone
to misjudgments, fallacies and errors in general. There is very good experimental
evidence from cognitive psychology for these fallacies. One plausible explanation
for many of these errors is provided by Evolutionary Biology: Like any other
ability or attribute, human thinking is the result of an adaptational process.
Therefore there are two sources of errors. First, these adaptations are never
perfect. Second, cognitive facilities that evolved over 100 000 years ago
are employed today to solve scientific problems they have never been designed
to solve in the first place. These shortcomings and systematic errors cannot
always be compensated by the control mechanisms of scientific method. For
these cognitive errors (e. g. linear problem-solving, contextdependency, exaggerated
expectancy of regularities, etc.) I try to provide evidence in scientific
practice through three case studies: Management of ecosystems, cold fusion
and the history of deficiency diseases. These reflections contribute towards
a cognitive theory of science, that is a naturalized philosophy of science.
This has been an area of neglect to date. I rather note the focus being on
historical and sociological aspects of science. What are the implications
for the natural sciences and a philosophy of science? An analysis of science
has to take into account the cognitive performance – the day-to-day
cognitive decisions and problem-solving strategies – of scientists.
As the human problem-solving abilities have been developed for different purposes,
errors in science are not surprising. Scientific method may be able to compensate
for some of these errors, but certainly not for all of them. Identifying and
classifying errors will make it possible to devise counter-measures. As soon
as we understand what kind of errors we are most likely to commit, we will
be able to avoid them, thus gaining in efficiency.
Emily K. Gibson
Western Kentucky University
"Debunking" the Global Warming Myth: Error and the Experimental Process in Climatology
As the debate around global warming heats up, the process of experimental inquiry is gaining more attention. As shown by Deborah Mayo in Error and the Growth of Experimental Knowledge (1996), subjecting experimental knowledge to "error probing tests" extends experimental data beyond the limitations of the traditional scientific approach. We use the debate over global warming data to show, in this special case, that Mayo's method:
1) Targets experimental errors;
2) Explains why these targeted errors occur; and,
3) Enhances the likelihood that this kind of error will be avoided in the future.
The case we consider is described as follows: An experiment designed to test atmospheric temperatures yielded an anomalous result, namely that the temperature of the earth's atmosphere was actually decreasing rather than increasing. By discovering that raw data had been incorrectly interpreted, a subsequent group of scientists analyzing the previous experiment targeted an error at what Mayo labels the Data Model Stage. After locating the error, scientists were then able to guard against the commission of the same error in future climatological experiments. We use Mayo's "error probing" techniques to show how climatologists in this case demonstrated a positive inference generated through error.
Dr. Clark Glymour
Department of Philosophy
Carnegie Mellon University
Rocks, Genes, Fire and Lead: Avoiding Testing
PDF of paper
We care about the truth of theories, but we should also care about the reliability of methods of inquiry, whether of experimental design, theory formation, theory assessment, or forecasting. The reliability of methods is quite as important as the truth of theories, and as subject to test. The advent of computerized search and forecasting methods, sometimes applied to very large data sets, has altered how methods must be tested, has in some respects made assessment more difficult, and has often resulted in evading severe testing.
In this poster paper I will describe several cases in which the assessment
of the reliability of automated methods has been evaded or falsified, and
one case in which an automated method found an error in the procedures of
other automated and human methods.
My examples will be from published work addressing the following problems:
predicting the prognosis of hospitalized pneumonia patients; estimating the
effects on low level lead exposure on children's intelligence; robotic determination
of mineral composition for exploring the surface of Mars; techniques for determining
gene regulation networks; and forest fire forecasting.
I conclude with some conjectures as to why automated methods are too often
subjected to inadequate testing.
Dr. Galina Granek
Department of Philosophy
University of Haifa
granek@research.haifa.ac.il
Scanning Tunneling Microscope (STM): an instrument that evolved from an error
The central theme of this study is the way a minor instrumental error led to precipitous transition from an electrical instrument which is apparently humdrum and deprived of any engrossing functionality to what was unearthed as the harbinger of nanotechnology, the Scanning Tunneling Microscope (STM).
In 1981 Gerd Binnig and Heinrich Rohrer of the IBM Zurich research laboratory actually had everything they wanted, as much as they dared hoped for. Taking a laborious odyssey they built the first working vacuum tunneling unit. This apparatus was so complicated that—as Binnig and Rohrer said in their Nobel lecture—they never actually used it. Indeed, the problem of insulation from mechanical vibrations was not solved satisfactorily. It was in need of desperate repair. But what seemed to be an articulated recognition of an error in the construction of the instrument precipitated a transition to what Binnig and Rohrer then called an STM, the second instrument.
Although the instrument functioned
well as a vacuum tunneling unit it staggered and trembled. Still, Binnig and
Rohrer succeeded in producing function graphs with this instrument; they sought
to demonstrate exponential dependence of the tunnel current on the gap separation
between the two metal electrodes of the tunneling unit (verifying vacuum tunneling).
In fact, the vacuum tunneling unit could function properly. Binnig and Rohrer
managed to overcome the insulation problem with lots of Scotch tape, and a
primitive version of vibration suppressor.
We thus have good reasons to ask: did Binnig and Rohrer correct an error in
their first instrument or did they "simply" switch its function
into a microscope?
On the one hand, in 1985 Binnig and Rohrer referred to their vacuum tunneling unit as a "first generation STM" and to their second instrument, the "second generation STM". It seems that for Binnig and Rohrer the vacuum tunneling unit was a microscope as well, but of the first generation. They thus did correct an error in the vacuum tunneling unit and they built the second generation STM.
On the other hand, Binnig and Rohrer did not produce images with the vacuum tunneling unit. The gap-width stability was sufficient to resolve some preliminary atomic steps, but by and large, an overall change was in need. Binnig and Rohrer were simply discussing and demonstrating vacuum tunneling in a configuration with a tip and a sample. In 2004 Binnig remarked that, "the scanning tunneling microscope was developed … without us intending to invent it." Indeed, imaging was not on the agenda—no microscope was in the offing. This raises the second option: Binnig and Rohrer switched the function of the vacuum tunneling unit into a microscope. In what ways did the 1981 switch lead to a new tradition of microcopy, scanning tunneling microscopy? Binnig and Rohrer produced STM images and used topographic language to describe these images. They gradually built up a novel tradition of microscopy; but then as "Scanning Tunneling Microscopists" they turned the STM configuration into what we know it today, a microscope.
In my presentation I pose the above question and discuss possible answers by showing the two instruments, the function graphs, and the images that it produced.
Dr. Thomas J. Koehnle
Dept. of Neuroscience , University of Pittsburgh
koehnle@bns.pitt.edu
Dr. Jeffrey C. Schank
Dept. of Psychology, Animal Behavior Graduate Group, University of California,
Davis
jcschank@ucdavis.edu
Using Monte Carlo Simulations to Evaluate the Design and Analysis
of Experiments: the Case of Pseudoreplication
The data sets generated by the day to day work of researchers bear little or no resemblance to the idealized structures demanded by typical models of statistical inference. Worse, the advice offered to remedy these shortcomings by our training, our colleagues, and the established literature can be incorrect or incomplete. There are two basic solutions to this problem. First, one can recognize that practical limitations severely constrain the ability of researchers to fit their data sets into the procrustean bed of off-the-shelf statistics, as Campbell and Stanley (1963) did when they coined the idea of "quasi-experimental designs.'' Alternatively, one can alter the procedures used in the standard statistical models on an ad hoc basis in an attempt to salvage the analysis. We propose that the best solution to this problem is to test a hypothetical (or actual) experimental design and proposed (or actual) statistical analysis using Monte Carlo simulations. To demonstrate the efficacy of Monte Carlo simulations in aiding the design and analysis of experiments, we will focus on one set of analytical procedures commonly used in research in ecology and animal behavior, which collectively fall under the term pseudoreplication (Hurlbert, 1984; 2004). Our simulations show that the advice typically offered to combat pseudoreplication can dramatically reduce statistical power, mask statistical dependencies, and reduce the ability to detect subtle block or contamination effects. We propose that widespread use of Monte Carlo simulations would enhance experimental design and increase the ability to avoid inferential errors in the analysis of data sets.
Jane Mazzagatti
UNISYS Corporation, Blue Bell, PA
Jane.mazzagatti@UNISYS.com
The Potential for Recognizing Errors in a Dataset Using a Computer Memory Resident Data Structure Based on the Phaneron of C. S. Peirce
Finding errors in datasets, sometimes referred to as data cleansing, is a daunting task. The range of data error types is wide and the datasets are ever increasing in size. Another dimension of the challenge is to identify and correct data errors as quickly as possible. While exploring the properties of the Phaneron or K (knowledge) data structure*, characteristics were isolated that facilitate the identification and correction of data errors in a real-time environment as well as in static data. This poster presents one such characteristic, the point at which a new structure must be built to record data into an existing Phaneron data structure because the data is unique to the data structure. That is, this particular sequence of events has never before been recorded into the K structure.
Because the Phaneron data structure is a new paradigm for data analysis, a large portion of the poster is devoted to describing how the structure is created. Construction of a Phaneron data structure begins with defining the basic elements of the 'data universe', the sensors. For field/record datasets the sensor set is the set of alphanumeric characters.
To realize the triadic relationships of the Phaneron in a computer data structure, a SIGN-node is created to represent each aspect of the structure. To begin a SIGN-node is created for each sensor. Then as a data stream is recorded into the Phaneron, the sensors are recognized, and new SIGN-nodes are created to represent and record the sensor sequences.

Figure 1 shows the basic triad. The '1' K node is a representamen or the current K location in a Phaneron at a specific K node during the building of the K. The '2' K node is the next sensor recognized to be recorded into the sequence. The '3' K node is the K node created to represent the sequence' 1' & '2'.
Bidirectional arrowed lines show that pointers in two K nodes reference each other.
It is important to note that there is never a direct connexion between the '1' K node and the '2' K node. Such a connexion confounds the structure.
Figure 2 shows a Phaneron recording of the word 'Tom'. The sensors are shown as small ovals at the bottom of the figure and the K structure resulting from the recognition of the sequence of sensors is shown by the solid and dotted lines. For convenience we'll establish a unique SIGN-node for the beginning of a sequence (BOT) and another for the end of a sequence (EOT).

figure 3
Figure 3 shows the beginning of the recording of the word 'Thomas' into the K that already contains the recording of the word 'Tom'. The sequence 'BOT' and 'T' is already recorded and these K nodes will be reused, but as the 'h' is recognized (as the next character in the sequence), there is no structure to record an 'h' following the 'BOT' and 'T' sequence, and new structure must be created.
At this point in the creation of a Phaneron data structure it is obvious that the new sequence has never been recorded before. This unique attribute of being able to immediately recognize a new sequence can be used to validate the incoming sequence.
New variable sequences might be compared to a set of valid variable sequences. If the new sequence is not found in the set of variable sequences the incident would be logged for further analysis. It would also be possible to use the partial sequence and the sequence set to attempt to correct the variable and then log the event.
Another approach would be to correct the variable with a statistically most probable variable from the variables already recorded in the Phaneron data structure and then log the event for further analysis.
Coordinating the error detection process with metadata and/or BI (business intelligence), rules makes the Phaneron structure a very effective tool for data cleansing.
[CP] Peirce, Charles S.: Collected Papers of Charles Sanders Peirce: 8 vols.: Vols. 1-6 ed. Charles Hartshorne and Paul Weiss: Vols. 7-8 ed. Arthur Burks: Cambridge: Harvard University Press, 1931-58
* UNISYS Patent References U.S. Patent No. 6,961,733 and
U.S. Published Patent Application Nos. 2004/0181547 A1, 2005/0076011 A1, and 2005/0165772 A1, all issued to Jane Campbell Mazzagatti and U.S. Published Patent Application No. 2005/0165749 A1, issued to Jane Campbell Mazzagatti, et al.
Department of Philosophy, University of Haifa
How Experimental Error is Discovered by Rational Belief Change Theory
The philosophical investigation experimental error had greatly advanced in the last few decades. For various reasons, and broadly speaking, the philosophical investigation of error concentrated on the specific and the descriptive, as opposed to the general and normative: philosophers are more likely to try and explain how and why errors that occurred in a specific experiment affected a specific scientific hypothesis, experimental design, or other aspect of the work of a certain scientist (or scientific community) in practice. They are less likely to propose theories about the essence or role of experimental error in tout court.
Specifically, the recommendations
of theoretical rationality had not often considered when investigating error.
The normative and general character of such theories seems to argue against
their applicability to such cases. For what useful advice could the general
desiderata of rational belief formation (or belief change) give us about
how scientists should behave (let alone how they do behave) in particular
instances?
I wish to argue against this view. Theoretical rationality—in particular, the issue of what, if any, information an agent must give up once falsehood in her corpus of belief is suspected—has quite a bit of relevance to how scientists should, and do, deal with experimental error. Very often, for instance, experimental error is stipulated in the first place as a possible explanation for the origin of the data an agent (usually a scientist) now suspects is false or inaccurate. Part of the reason for such a stipulation—as opposed to simply accepting than the experiment had disproved relativity or quantum mechanics, for example—is the adherence of scientists to the theoretical rationality principle of not giving up well-confirmed, informative theories without good reason.
This almost trivial observation differs in complexity, but not in kind, from cases I discuss in the poster. Scientists make complex decisions when looking for error: they decide, for example, what type of error to investigate as a potential ‘culprit’ in a particular case, how to correct for its influence once identified, and so on. A neglected aspect of these decisions, I argue, is the implicit commitment scientists have to principles of theoretical rationality. Making this commitment explicit and noting whether scientists obey or violate them often gives us insight into the way scientists treat error in particular cases is the issue discussed in my poster.
Department of Biological Sciences & The Mallinson Institute for Science Education
Western Michigan University
Kettlewell from an Error Statistician's Point of View
Bayesians and error statisticians have relied heavily upon examples from physics in developing their accounts of scientific inference. The present essay demonstrates it is possible to analyze H.B.D. Kettlewell's classic study of natural selection from Deborah Mayo's error statistical point of view (Mayo 1996). A comparison with a previous analysis of this episode from a Bayesian perspective (Rudge 1998) reveals that the error statistical account makes better sense of investigations such as Kettlewell's because it clarifies how core elements in the design of experiments are used to minimize erroneous inferences rather than dwelling on whether the strategies used are reasonable.
References
Mayo, D. (1996) Error and the Growth of Experimental Knowledge. Chicago: University of Chicago Press.
Rudge, D.W. (1998) A Bayesian Analysis of Strategies in Evolutionary Biology. Perspectives on Science 6(4):341-360.
Department of Philosophy, University of Osnabrueck
How something works is most easily found out if it doesn't work: The methodology of learning from an object's errors, deficits, and malfunctions
Errors, deficits, and malfunctions, or rather, the systematic analysis of patterns of malfunction and function, can help in explanation and understanding of the things in which they are observed and their normal functioning. Based on the errors of a system, information about its structure, its function, its underlying mechanisms can be gained – often information of considerable amount and usefulness. In some cases and in some phases of the research process, more can be learned by focusing on errors than from the analysis of normal behavior alone.
Sometimes malfunctions can be the most advantageous or even the only feasible way of gaining knowledge about a system. Accordingly, the analysis of errors and malfunctions plays a most important role in scientific discovery – a role that is frequently employed across a wide range of scientific disciplines and problems. But errors are also essential in the evaluation of hypotheses: Any appropriate theory or model of a system not only has to explain the working or functioning of the system, but also has to account for the system's errors and malfunctions.
Of course, such methods do not work with any object, but only with objects error can be reasonably ascribed to, i.e., those where some function or success is usually, or at least sometimes, present – and can be altered, reduced, or lost. I shall argue that "error", "malfunction", "deficit", and similar concepts are most suitably characterized as absence, damage or alteration of a function, where function is best understood in terms of an object's causal role in a system, i.e., its Cummins function.
These findings motivate the following hypotheses to be discussed and evaluated: (a) Using the malfunctions of a system, one can gain scientific knowledge about that system, for instance about its structure, function, and mechanism. In many cases, this is particularly important, and sometimes unique information. (b) The different methods using use error successfully to gain scientific knowledge form a transdisciplinary family of methods, a distinct methodology we might refer to as "error analysis". (c) Formulated as a piece of advice: Above all, analyze the errors committed by the objects of investigation, because it is precisely from these errors one learns particularly much about the objects! Or, paradoxically: How something works is most easily found out if it does not work.
I demonstrate that in many areas errors, deficits, or malfunctions do play an important role in the discovery and evaluation of hypotheses. Examples include (a) linguistics, where the analysis of speech errors and slips of the tongue helps in the understanding of the mechanisms of speech production, (b) genetics and molecular biology, where spontaneous, but also induced mutations resulting from errors in the replication and translation of genetic information help in analyzing biological mechanisms and processes, whether it be biosynthesis of amino acids, mating ability of yeast, or many others, (c) sensory physiology, where the analysis of optical illusions facilitates gaining insight into the normal processing of visual information, (d) neuropsychology, where deficits resulting from brain damage help in finding out about the normal working of the brain, (e) psychology and cognitive science, where cognitive errors or illusions aid in elucidating cognitive mechanisms and biases, (f) behavioral research, where anomalies in animal behavior lead to the understanding of the mechanisms and the evolution of behavioral features, or (g) evolutionary biology, in which rudiments, atavisms and many other suboptimal features contribute to the establishment of the fact, the course and the causes of evolution. Even the tinkering with technical items, like the engine of a car, trying to provoke errors, can help enormously in understanding how it works.
I argue that the methods and strategies employed in these cases are sufficiently similar to justify their inclusion into a family of methods focusing on and analyzing error we might designate as "error analysis". This family of methods is analyzed, and its scope, its strengths and its weaknesses are discussed.
Based on case studies of empirical research and on the reconstruction of relevant scientific methodologies I conclude that error analyses constitute, in fact, valuable contributions to scientific research. They are successfully employed in identifying "transparent" systems (i.e., systems where invisibility is a prerequisite for good design, e.g., constancy mechanisms in perception), in decomposition and determination of the degree of modularity of a system, in localization of a system's components, in determination of functional differentiation, in establishing the order of components in a mechanism or pathway, and in identifying the causal connections that make up a mechanism.
In general, the strengths of error analyses include stimulation of research, facilitation of discovery, contribution of ideas and building blocks for theories, as well as generation of hypotheses about systems where other empirical data are hard to obtain. In certain cases, error analysis alone provides convincing arguments for or against certain models or hypotheses. Nevertheless, certain limitations hold: Error analysis requires a minimum of modularity in system structure, and in unfavorable circumstances, error analysis by itself cannot rule out jumps to false conclusions. Therefore, error analysis is best regarded not as a replacement, but rather as a valuable addition to the scientists' tool-box of methods, strategies, and heuristics.
Oxford Centre for Industrial and Applied Mathematics, Oxford University
From MUD to SEA: Using Error(s) to Improve and Interpret Nonlinear Models of Dynamic Systems
From physics to environmental science, our best models of dynamic systems are often nonlinear. I argue that in the dynamical systems context methodological underdetermination (MUD) is not a problem, while severe empirical adequacy (SEA) trials are likely to rule out all our models, given observational data over any interesting duration. SEA trials of iota-shadowing are of near maximum severity ($1 - \epsilon$) and our best models, whether of electric circuits or of the weather, fail robustly; models of the climate system fail much less severe tests. Nevertheless, all these models are useful; but useful how, exactly? And why, given that accountable probability forecasts are beyond their reach?
Within the mathematical fiction of the Perfect Model Scenario the use of severe
tests swiftly guides the Bayesian Way towards the correct answer; but here
we can prove that for chaotic models, if the model class at hand does not
contain a model diffeomorphic to the system which generated the data then
(a) all our models will with probability one fail SEA trails while (b) the
Bayesian Way flounders in nonsense. In short, the denominator in Bayes Rule
P(obs|Information) approaches zero with astounding speed.
In practice, where we are (?almost certainly?) outside the
Perfect Model Scenario, SEA trials are shown to point towards physical insight
and model improvement. They also highlight rather basic questions on (i) the
various species of error (noise, uncertainty, inadequacy, and stochasticity),
(ii) how scientists can inform policy and provide decision support, and (iii)
claims for the (empirically obvious) advancement of science.
Department of Philosophy, University of British Columbia
Expert Knowledge vs. Quantitative Methods: P-value Fallacy in Epidemiology
Recently, we have seen an increased dissatisfaction among clinical physicians, more specifically epidemiologists, over the proper interpretation of quantitative results and appropriateness of research methods, including statistical techniques for drawing conclusions. (Berger 1987, Feinstein 1998, Goodman 1999, Sterne 2003) Among the reasons for dissatisfaction we find the widespread misunderstanding of the nature of statistical significance. Despite acknowledging the non intentions from the founders of statistical inference, the claim is that p-values are commonly misunderstood—also known as the “p-value fallacy”: the misguided idea that a single value can capture both the long-run outcomes of an experiment and the evidential support of a single result. This is supposedly due to the adoption of a classical-hybrid statistical inferential framework based on Fisherian methods (e.g., confidence intervals and the idea of p-values as an index measuring the strength of evidence against the null hypothesis) and Neyman-Pearson methods (type I and II errors and decision rules for interpreting the results of experiment in advance with the result equating to the rejection or acceptance of the null hypothesis, adjusting inductive behavior to minimize errors in the long run).
This classical-hybrid statistical framework is said to produce, among other
maladies, (i) an automaticity in interpreting medical research results, (ii)
publication bias, i.e. tendency to accentuate positive results and (iii) generating
potential contradictory results due to the very nature of the framework producing
“false alarms” by chance alone.
A popular response has been the promotion and adoption of an approach derived
from Bayesian statistical methods. This approach is supposed to allow for
measuring the weight of quantitative evidence via an index (Bayes factor or
likelihood ratio), thus allowing the integration of statistical summaries
with expert field knowledge that is purported to lead to a “better understanding
of the role of scientific judgment” in the interpretation of epidemiological
research. This approach reflects the idea that experiments including statistical
techniques for drawing conclusions should be seen as part of a bigger framework
based on decision making principles.
This poster will explicate the conflicting epistemological views and consequences
of an account of scientific evaluation when experiments are seen as decision
making devices in contradistinction to devices providing for reliable statistical
inference for further learning. It will suggest that, while agreeing with
the idea that the classical-hybrid statistical framework must be supplemented,
Bayesian approaches may have some place, possibly during the planning of an
investigation, but certainly not during the inferential stages of experimental
findings and/or promotion of anomalies for further investigation.
Christopher Tomanek
Institute of Sociology, Jagiellonian University
How Philosophical Decisions Shape Social Knowledge: On some Experimental Errors in Sociology
Observing contemporary debates on the role of Bayesian and error statistical methods play in scientific investigations one might come to conclusion that two different and incomparable ways of scientific enquiry exist. One cannot be just a little bit Bayesian as noticed by Deborah Mayo [D.G. Mayo and M. Kruse 2001]. Even though sociologists have discovered some basic epistemological distinctions they are still strongly convinced there is one sociological knowledge and one way to achieve it. Among others one common belief is treating methodological and statistical models as identical. In sociology, theories built according to Bayesian approach are usually used for so-called second-genus theories, while error statistical models as origin for third-genus theories. Regardless of the differences passed by in theory building given by those mentioned strategies there still is a hope. Ideas for more systematic approach in sociology exist - quite convincing, one is a Scientific Research Program devoted to Group Processes - especially using the Exchange Networks concept. Yet, like any other program it is not Error free. Quite simple analyses show how a concept came out of an error in a simple statistical investigation not bounded to any sociological discussions (an instance of an frequentists approach so to say). This example indicates how "dogmatic coma" may omit interesting from sociological viewpoint social behavior (e.g. way of learning, social orientation) not taken under consideration in an Experimental design so far. This also discusses a criterion of scientific rigor in sociological modeling techniques, and accuracy of the distinction of methodological and theoretical models pointed out by Skvoretz J. [Skvoretz J. 1998]. This poster paper investigates the example of the Bargaining idea in Exchange Networks.
References
Skvoretz J., (1998) Theoretical Models: Sociology’s Missing Links, in: Sica A., What Is Social Theory? The Philosophical Debates, Malden.
Mayo, D. G. and M. Kruse (2001), "Principles of Inference and their Consequences," pp. 381-403 in D. Cornfield and J. Williamson (eds.) Foundations of Bayesianism, Kluwer Academic Publishers, Netherlands.
Dr. Eric Walker
NASA Langley Research Center
Physical Insight from Error Analysis of a Mechanistic Model: An example from the National Transonic Facility
When testing small-scale aircraft or other vehicle configurations in a wind tunnel, the airflow around the test article is constrained due to the test section walls or boundaries. This constraint of airflow, not present in flight, can have a dramatic impact on the estimated performance of the test article. The effect of the boundaries must be modeled and proper adjustments made to acquired data. When testing near the speed of sound, it is necessary to ventilate the walls. This ventilation adds considerable complexity to the mathematical modeling of the wall effect.
Historically, approximate models for the wall boundary condition have been constructed based on the physical properties of airflow measured near a ventilated wall. These mechanistic models have parameters that must be determined for each facility. To implement these wall boundary models, it is necessary to first calibrate and than validate.
For the National Transonic Facility at NASA Langley Research Center, a hierarchy of three linear models has historically been used to represent the type of wall used. Two of these models represent the effect of the wall by using a single physical mechanism, which is assumed to be dominant. The third model is a combination of the two physical mechanisms represented by the first two models.
Calibration was performed for each of the three boundary conditions by modeling and minimizing a measure of the error between measured quantities on the walls in the wind tunnel and computational predictions of those quantities using various values of the parameters in the boundary condition. The purpose for calibrating the models is to provide them the best possible chance of representing the acquired data. Calibration is also necessary in this case to allow model discrimination and validation comparisons to be made.
Model comparisons are made in the presence of measurement error in an attempt to discriminate among the models. In this particular case, it was found that the three wall boundary condition models gave detectably different results. Upon closer inspection, it was found that differences between measured and computed quantities were significantly lower for the model which included both physical mechanisms. This evidence supported previous work for this particular type of wall ventilation that both physical mechanisms are necessary to represent the physical presence of these boundaries.
It was also important to determine the inference domain over which the model is applicable. To determine the domain boundaries, models were severely tested to a breaking point. The boundary of the domain of applicability should occur where the error being generated from the model is no longer tolerable for the application. Validation comparisons are used to help determine the domain of applicability for these models. Model validation comparisons also provide insight into the epistemic (mostly systematic) uncertainty of the models by comparing them in the presence of aleatory (most random with some fossilized uncertainty) experimental uncertainty.
For this example, the more general of the models better explained the data. Evidence for this statement was provided from both the analysis of error between the measured and computed results as well as from the comparison of independent cases which were expected to provide the same result provided the wall boundary condition model is appropriate. Additionally, the validation comparison for this model indicated that significant model form error existed. Part of the domain where model form error is significant is believed to be a potentially fixable aspect in formulation of the problem. It is also believed that an error indicator can be developed such that appropriate use of the model can be enforced and quality results can be achieved.