Links to PDFs of papers and Power Point Presentations are
provided below the title of each paper as they become available.
Thomas Bartz-Beielstein
Department of Computer Science, University of Dortmund
thomas.bartz-beielstein@udo.edu
NPT* in Evolutionary Computing
PDF of Paper Powerpoint Presentation
Evolutionary computing (EC) is a relatively new discipline in computer
science (Eiben & Smith, 2003). It tackles hard real-world optimization problems,
e.g., problems from chemical engineering, airfoil optimization, or bioinformatics,
where classical methods from mathematical optimization fail. Many theoretical
results in this field are too abstract, they do not match with reality. To
develop problem specific algorithms, experimentation is necessary. During
the first phase (before 1980), which can be characterized as "foundation
and development," the comparison of different algorithms was mostly based
on mean values, nearly no further statistics have been used. In the second
phase, where EC "moved to mainstream" (1980-2000), classical statistical
methods were introduced. There is a strong need to compare EC algorithms to
mathematical optimization (main stream) methods. Adequate statistical tools
for EC are developed in the third phase (since 2000). They should be able
to cope with problems like small sample sizes, strange distributions, noisy
results, etc.
However—even if these tools are under development—they do not bridge the gap between the statistical significance of an experimental result and its scientific meaning. Based on Mayo's learning model (NPT*) we will propose some ideas how to bridge this gap (Mayo, 1983, 1996). We will present plots of the observed significance level and discuss the sequential parameter optimization (SPO) approach. SPO is a heuristic, but implementable approach, which provides a framework for a sound statistical methodology in EC (Bartz-Beielstein, 2006).
References
Bartz-Beielstein, T. (2006). Experimental Research in Evolutionary Computation— The New Experimentalism. Berlin, Heidelberg, New York: Springer.
Eiben, A. E. & Smith, J. E. (2003). Introduction to Evolutionary Computing. Berlin, Heidelberg, New York: Springer.
Mayo, D. G. (1983). An objective theory of statistical testing. Synthese, 57, 297340.
Mayo, D. G. (1996). Error and the Growth of Experimental Knowledge. Chicago IL: The University of Chicago Press.
Rodolfo de Cristofaro
University of Florence
decrist@ds.unifi.it
Foundations of the ‘Objective Bayesian Inference’
PDF of Paper (revised 6/8/06) Powerpoint Presentation (revised 5/4/06)
This article goes to the foundations of Statistical Inference through a review of Carnap's logic theory of induction. From this point of view, it brings another solution to the attention of the scientific community in the light of certain recent results, and, in particular, how it relates to a new formulation of the Principle of Indifference. As is known, the idea of a probabilistic logic of inductive inference based on some form of the Principle of Indifference always retained a powerful appeal. Keynes recommended a modified version of the principle in order to achieve this aim. Carnap followed Keynes in this attempt of creating a purely logical theory of induction. However, up to now all modifications of the principle failed. In this paper a modified version of the Principle of Indifference is provided without generating paradoxes and inconsistencies. Besides, a general criterion of assignment of prior probabilities in case of initial ignorance is suggested, thus providing a reply to the objections to the socalled objective Bayesianism. It is worthwhile noting that this article has 'new' (i. e., not previously published) results that we think will be found interesting and useful. Because of these results, the thesis that probabilities cannot be logical quantities, determined in an objective way, through some form of the principle of indifference, no longer is supportable. Finally, the paper compares the analytical solution to the problem of statistical induction with the pragmatic or synthetic solution based on decision theory. The conclusion is that each solution is legitimate and autonomous within its respective axioms. In particular, the analytical solution needs no justification. It suffices to be ready to accept the assumed probability axioms.
Department of Fish and Wildlife Resources and Department of Statistics
University of Idaho
Keeping the faith: how prior beliefs can become data resistant
'Nuisance parameters' have long posed a thorny conceptual challenge in statistics. The presence of nuisance parameters complicates, and sometimes even ruins, estimation of parameters of interest. Advocates of Bayesian statistical methods are fond of touting the elegance and ease with which the Bayesian approach handles nuisance parameters.
Here I point out a seldom noted cost incurred in the Bayesian solution. With a simulation of a classical problem, I show that a prior distribution for a parameter of interest can become alarmingly data-resistant in the presence of a nuisance parameter. The problem is the familiar one of estimation of the mean in a normal distribution with unknown variance (the nuisance). In this instance, an 'uninformative' prior for the nuisance parameter transforms prior beliefs effectively into persistent superstitions.
I conclude with a call for greater attention to, and the systematic study of, the types and frequency of errors that can be committed when conducting inferences in the Bayesian framework.
Frederick Eberhardt
Department of Philosophy
Carnegie Mellon University
Conflicts in Sequences of Experiments
PDF of Paper (Revised 1 Jun 2006) Powerpoint Presentation (Revised 1 Jun 2006)
The work I present will focus on causal discovery using interventions. There is a large literature on causal discovery based on passive observational data, but the search for causal structure under these circumstances is limited to Markov equivalence classes (assuming the search is only based on conditional independence relations). Statisticians have used interventions for causal discovery at least since Fisher introduced randomized trials in the 1930s. However, the focus in statistics has been on the estimation of parameters that are supposed to represent the causal strength, and consequently a distinction between potential cause and effect is presumed prior to any experiment. This form of structural constraint was maintained even when the framework was extended to multiple treatment and effect variables in the experimental design literature on Latin Squares and Graeco Latin Squares. Consequently, the optimization problem focused on an optimal value assignment to the treatment variables.
In the framework of causal Bayes nets, no such separation is presumed a priori.
Any variable could potentially be a cause of any other variable. How then,
should interventions be placed in order to recover the causal structure? The
first thing to realize is that the problem cannot be solved with one
intervention, rather a sequence of experiments is required and the discovery of
the causal structure depends both on the (statistical) success of each
experiment in the sequence, as well as being able to combine results from
different experiments in such a way that any causal structure can be recovered
given the assumptions (such as faithfulness, causal Markov, possibly causal
sufficiency).
In Eberhardt et al. we provided worst case bounds on the number of experiments
necessary and sufficient to discover the causal structure assuming causal
sufficiency. However, we did not address the statistical aspects of such a
discovery procedure. In particular, I will now focus on the problem that arises
for the researcher if two experiments in the sequence provide conflicting
results, and suggest solutions (and their problems). I will, furthermore,
point out how causal discovery in sequences of experiments depends on the
results from discovery based on passive observational data.
The results I will show aim at the development of an account of efficient
causal discovery using interventions. But the questions that will arise relate
very closely to questions in meta-analysis and questions about the interplay
of assumptions that are made about the experimental setting and the interventions
that are performed.
Dr. Malcolm Forster
University of Wisconsin-Madison
PDF of Paper (revised 7/4/06)
Abstract: The Likelihood Theory of Evidence (LTE) states that the observed data are relevant to the evidential comparison of hypotheses (or models) only through the likelihoods of the simple hypotheses involved. LTE is closely related to the Likelihood Principle, which is famous in the foundations of statistics. The paper describes examples in which one can tell which of two hypotheses is true from the full data, but not from the likelihoods alone. The examples demonstrate the power of alternative forms of scientific reasoning, such as the consilience of inductions described by William Whewell in 1858. The conclusion is that any philosophy of science based on LTE is limited in its scope-including current Neyman-Pearson, Bayesian, and Likelihoodist approaches.
Dr. Thomas Kepler
Division Chief, Computational Biology
Department of Biostatistics and Bioinformatics, Duke University
Whither Statistics on Biology's Wings?
The human genome project has been completed (as have the genome projects for more than one hundred other organisms) and biology has begun its ascendancy. Where physics aimed to distill data down to its essence—an elegantly simple law, perhaps—biology has no such pretensions. Shortly after the death of Newton, who found that two equations could represent the planetary motions, Linnaeus could do little more than bring a modicum of order into the riot that is the living world. Today, biology has risen, not by emulating physics and mathematics, but by utilizing physics and mathematics in the development of measurement technologies that remain faithful to the biological perspective. High-throughput assays developed alongside the genome projects now permit the simultaneous measurement of the expression levels of all of a cell’s genes, or the states of all of its proteins, or the concentrations of all of its metabolites. The Systems Biology that is emerging in the wake of these developments is fundamentally concerned with the recording and synthesis of as much of information as possible from a given process, with all its attendant variation, and relatively less concerned with the isolation of simple subsystems through the suppression of variability judged a priori to be extraneous.
This changing landscape promises enormous opportunities and challenges for statisticians (as well as for philosophers who seek to understand the underpinnings of the scientific enterprise), but the statistical methods that will be most influential may bear closer resemblance to exploratory data analysis than to hypothesis testing. I will argue that where once the statistician strove toward data reduction, she should now focus on data representation; that instead of reducing error in search of truth, we ought now to transform variability in search of salience.
Dr. Subhash Lele
Department of Mathematical and Statistical Sciences
University of Alberta, Edmonton, Alberta T6G 2G1, Canada.
slele@ualberta.ca
On quantifying evidence in the presence of nuisance
parameters: Evidence functions and their applications in ecology.
A major goal of any statistical analysis is to quantify evidence in the data.
The law of the likelihood (Hacking, 1965; Royall, 1997) suggests that the
likelihood-ratio function is an appropriate measure of the strength of evidence
when comparing two simple hypotheses. Many practical situations, however,
require quantification of evidence for a single parameter of interest in the
presence of many nuisance parameters. The concept of evidence functions (Lele,
2004) formalizes the concept of evidence in terms of an estimating function
instead of the likelihood function. In this paper, I show how this extension
can be successfully used to quantify evidence in the presence of nuisance
parameters. I will discuss two ecological situations. In the first case, I
will discuss how to combine evidence from several 2 x 2 tables using the Mantel-Haenszel
estimating function with an application in conservation biology. In the second
case, I will discuss quantification of the evidence for relative risk and
log-odds ratio in the context of the effect of hunting on the survival probabilities
of bears. Ramifications of nuisance parameters in the computation of error
probabilities such as the probability of misleading evidence and the probability
of weak evidence will be discussed.
Dr. Wendy Parker
University of California, San Diego
Computer Simulation through an Error-Statistical Lens
When can scientists take computer simulation results to be good evidence for some hypothesis about the natural world? This is a core question that any adequate epistemology of computer simulation must address. Nevertheless, philosophers have made little progress on this question up to now. In this paper, I consider how it would be answered by an epistemology of computer simulation that is modeled on Mayo's (1996) error-statistical epistemology of experiment.
Taking an error-statistical approach, we are led to reformulate our core question as follows: What warrants our taking a computer simulation to be a severe test of some hypothesis about the natural world? That is, what warrants our concluding that the simulation would be unlikely to give the results that it in fact gave, if the hypothesis of interest were false? An error-statistical approach counsels us to supply arguments from error, just as in the case of traditional experiments. Thus, we need some account of the canonical errors that can arise in the context of computer simulation, as well as strategies for probing for their presence. I identify such errors as well as some of the lower-level testing procedures that simulation modelers might use to check for them.
This leads me to two additional questions. First, can we ever be warranted in concluding that some computer simulation constitutes a severe test of an hypothesis about the real-world system being modeled? I identify and examine some situations that seem promising in this regard, i.e. in which it seems that we would be warranted in drawing such a conclusion. Second, will formal statistical tools and analysis make the same contribution in the epistemology of computer simulation as Mayo suggests they make in the epistemology of experiment? While it is beyond the scope of the paper to address this matter in detail, I offer some preliminary remarks and argue that even if only informal arguments from error typically can be made, a severe-testing approach to the epistemology of computer simulation would help bring about some important and healthy changes in the way that scientists think about computer simulation modeling.
Dr. John T. Roberts
Department of Philosophy
University of North Carolina, Chapel Hill
Coping With Severe Test Anxiety: Problems and Prospects for an Error-Statistical Approach to the Testing of High-Level Theories
This paper raises a problem for the error-statistical philosophy of science (henceforth “ESPOS”) defended by Mayo (1996), argues against Mayo’s proposed solution, and tentatively suggests a way of revising ESPOS that retains its core commitments while providing hope for a solution.
The core commitments of ESPOS are that all scientific evidence takes the form
of severe tests, and that severity requires low error probabilities. I examine
the question of whether the basic commitments of ESPOS are compatible with
a satisfactory account of the experimental testing of high-level theories.
I argue that Mayo’s arguments for the affirmative are unconvincing:
Not only are severe tests of high-level theories impossible, but the strategies
Mayo proposes for learning about high-level theories via severe tests are
not promising. I then propose a way of extending ESPOS to make possible a
satisfactory treatment of the testing of theories.
I discuss in some detail the case of testing of general relativity using the
PPN formalism. Mayo has suggested that this example shows how piecemeal probing
for errors can help scientists acquire evidence for high-level theories that
meet the standards imposed by ESPOS. I argue that even in this example, such
piecemeal testing might be extremely valuable but it cannot result in a severe
test of a high-level theory. Since ESPOS identifies evidence with the passing
of severe tests, this means it cannot allow that there is any evidence for
a high-level theory.
However, I think that the core idea behind ESPOS can be saved from this sort
of problem, if it s combined with an idea introduced by Glymour (1980) and
defended by Longino and others: that of relativizing claims about evidence
or confirmation to a background theory. Adding this idea to ESPOS, we can
relativize the severity of a test to a background theory, and then retain
the core commitments of ESPOS in a relativized form: All scientific evidence
relative to background B takes the form of severe tests relative to B, and
severity relative to B requires that B implies that the error probabilities
are low. The obvious worry now is that our account of evidence will be trivialized,
since it seems plausible that you can generate a severe test of just about
any hypothesis you like so long as you carefully choose the right background
theory. We might suppose that we have genuine scientific evidence for a hypothesis
only when we have a severe test of that hypothesis relative to a background
theory that is itself well-supported by the evidence—but this takes
for granted a notion of evidential support, which is what ESPOS seeks to provide.
I end the paper by suggesting a criterion of adequacy for background theories
that addresses this worry. This criterion is that the background theory used
must make possible multiple, independent measurements of quantities that could
not be measured without the use of the theory. It is plausible that a severe
test relative to a background theory satisfying this criterion provides impressive
evidence for the hypothesis under test, especially when the tests whose severity
in question constitute measurements of the quantities in question..
Dr. Kent Staley
Department of Philosophy, Saint Louis University
staleykw@slu.edu
Error-statistical Theory Assessment and Alternative Hypothesis Problems:
A Role for Judgments of Plausibility?
The error-statistical theory of evidence contends that data constitute evidence in support of a hypothesis only on the condition that the hypothesis passes a severe test with those data, where a severe test is understood to require that the probability of the hypothesis erroneously passing is quite low. Because the error statistical account sets forth requirements that are more stringent than those found in many competing views of evidence, it is able to provide useful insights into numerous questions concerning experimental science.
Nonetheless, even friends of the account see the strength of error statistics as most clearly exhibited when it is applied to relatively low-level phenomenal hypotheses. The ability of error statistics to shed light on experimental reasoning about “high-level” theories is an area that remains largely unexplored. This paper seeks to shed light on how error statistics does and does not apply to high-level (i.e., very general and relatively fundamental) theories in physics.
The key to understanding how the error-statistical theory can apply to high-level theories lies in understanding how it treats alternative hypothesis problems generally. Thus, I begin with a discussion of Mayo's treatment of alternative hypothesis objections. Drawing upon and generalizing an argument from John Roberts as presented in this same workshop, I show how Mayo's discussion fails to vindicate the error-statistical treatment of alternative hypotheses. I then offer my own friendly amendment to Mayo's account that I claim will facilitate the error-statistical treatment of alternative hypothesis problems.
I argue that if, in order to have error-statistical evidence for a hypothesis, it is necessary that the hypothesis be tested severely against all logically possible alternatives, then error statistics lacks the resources to answer alternative hypothesis objections. To meet the challenge of alternative hypothesis objections (and hence avoid an extreme form of skepticism about science), some way of limiting the range of relevant alternative hypotheses is needed. I propose that it suffices, in testing a hypothesis severely, to test it against all competing hypotheses that are not implausible, where an implausible hypothesis is understood as a hypothesis that is almost certain to fail any genuinely informative test (as opposed to a test that passes a hypothesis without regard to its truth or falsity).
Having then considered how alternative hypothesis problems arise for error-statistics in general, I show how the same problem arises in the context of Mayo's proposal for using error-statistical learning for high-level theories. Her proposal is that by combining results of low-level severe tests, we can in some cases “squeeze theory space,” and thus severely test classes of parametrically-related theories. Building upon John Roberts’ criticism of this proposal, I offer a different perspective on the difficulty, showing how this is the same difficulty encountered with regard to alternative hypotheses in general.
The strategy proposed for dealing with alternative hypothesis objections more
generally applies here, but now with somewhat different results. Extending
the existing discussions by Mayo and Roberts of the case of theories of gravity,
I show how plausibility judgments of just the sort my account calls for have
played a role in the experimental testing of theories of gravity. More generally,
I claim that, while plausibility judgments play a role in testing high-level
theories that is substantially the same as their role in low-level tests,
the justification for claiming to have considered all theories that are not
implausible in the context of high-level testing is typically much weaker.
The account of error statistical testing facilitated by plausibility judgments has a number of advantages:
(1) That plausibility judgments do play a role in experimental reasoning seems undeniable. On the one hand, a pure error-statistical approach would seem to offer no positive role for such judgments. On the other hand, Bayesian accounts give them too much of a role. In my account, plausibility judgments are needed to “clear the ground” so as to make error-statistical reasoning possible, but do not figure in the assessment of the evidential import of data.
(2) Such an account conceptually unifies the treatment of low- and high-level hypotheses while also explaining why it is typically appropriate to be much more cautious in our claims regarding even very successful high-level theories than in our commitment to well-tested low-level hypotheses.
(3) On the present account, we can choose between two modes of thinking about evidence with different strengths and weaknesses. We can make our evidence statements relative to the ground-clearing assumptions needed to facilitate severe testing, which will make those statements more secure against being in error, but also make them weaker in content. Alternatively, we can make our evidence statements categorical, by taking those ground-clearing assumptions to be true, but at the expense of making our claims less secure (in a sense elsewhere discussed by the author).
Dr. Andrew Ward
Dr. Pamela Jo Johnson
University of Minnesota, State Health Access Data Assistance Center & Minnesota Population Center
Specification and Confounding Errors When Using Non-Experimental, Observational Data to Make Causal Inferences
In the tradition of Emile Durkheim and Max Weber, social epidemiologists are interested in understanding the relation of social facts to health facts. In their recent book, Is Inequality Bad for Our Health?, Daniels, Kennedy, and Kawachi make the following claim: “To act justly in health policy, we must have knowledge about the causal pathways through which socioeconomic (and other) inequalities work to produce differential health outcomes.” One of the central problems with this social epidemiological casting of the appropriate approach to the creation and evaluation of just health policy is its dependence on “knowledge about the causal pathways.” Increasingly, epidemiologists, as well as philosophers of science, have adopted some version of a counterfactual approach to causal relations. For example, suppose that the goal of inquiry is to discover whether some treatment has a causal effect on a treated population. Using a counterfactual framework, this amounts to asking whether there is a difference in outcomes between a population given the treatment and that same population if, instead, they had not been given the treatment. This difference between the outcomes in exposed versus the unexposed populations is the “causal contrast”. However, as this simple example makes clear, for those engaged in empirical research there is an especially important problem with the counterfactual approach. The problem is that it is not possible to observe counterfactual populations. In the example, it may be possible to observe the population given the treatment or the same population not given the treatment, but it is not possible to observe both. Thus, if we think of the population not given the treatment as the “counterfactual population”, there is a need to provide criteria that, if satisfied, will identify an observable substitute population for the counterfactual population that is similar enough to the counterfactual that its use in expressing the causal contrast is justified. If one chooses the wrong substitute, the result is confounding. In the words of Maldonado and Greenland, confounding “is present if our substitute imperfectly represents what our target would have been under the counterfactual condition.” For social epidemiologists and others interested in using non-experimental, observational data to make causal claims, the important question is whether such criteria must involve random assignments. If so, then we seem back to saying that there can be no knowledge about the causal pathways (used in making just health policies) based on non-experimental, observational data.
Acknowledging that “good observational studies are designed, not found,” we
propose two alternatives to the standard randomized experimental method, each
of which permits epidemiologists to make warranted causal claims using observational
data. The first alternative to an experimental study design that addresses
the problem of confounding errors is the combined use of a counterfactual
framework, explicit causal contrasts, and propensity score matching methods.
Using the randomized control trial as our gold standard, we will describe
how this proposed quasi-experimental approach to non-experimental data permits
the social epidemiologist to mimic closely the virtues of an experimental
study design. Specifically, in the perfect (treatment/control) randomized
trial, all subjects have a treatment allocation probability (propensity score)
of 0.50, which represents random treatment assignment and the expectation
of balance in both observed and unobserved/unobservable covariates across
treatment groups. Thus, in the perfect (treatment/control) randomized trail,
treated and untreated (counterfactual) subjects are exchangeable, except for
the treatment. Because treated and untreated (counterfactual) subjects are
exchangeable, it follows that it is possible to reverse treatment assignments
without affecting the effect estimates. Of course, in the non-experimental
world of the social epidemiologist, there are typically no random allocations
of social conditions (exposures); the exposed and unexposed are usually systematically
different. Yet, we still desire to assess the putative social causes of ill-health
effects. The proposed alternative approach frames causal questions as explicit
causal contrasts for which we must construct exchangeable groups of exposed
and unexposed (counterfactual) subjects. Claiming that exchangeability (non-confounding)
is the crux of causal inference within a counterfactual framework, we argue
that propensity score matching methods are a useful tool for constructing
and assessing the exchangeability of populations, thus minimizing the potential
for confounding errors with observational data. Further, this study design
makes transparent when there are in fact no data from which to construct an
appropriate counterfactual substitute. We also highlight 'errors' that this
approach alone (just like randomized trials) cannot guarantee to overcome.
The second alternative to an experimental study design that addresses the
problem of confounding errors is the combined use of a counterfactual framework,
explicit causal contrasts, and structural equation models (SEMs). To use SEMs,
it is first necessary to construct a structural model (a system of equations
defined over a set of random variables) that captures the purported causal
relation. Thus, while it is possible to use SEMs to test premises of models,
it is not possible to use SEMs to develop models. We use SEMs to formalize
conjectures about causal relations, where those conjectures come from a combination
of theory and data analysis. The important point here is that the starting
point for structural equation modeling is not (directly) counterfactual. SEMs
divide variables into exogenous variables and endogenous variables. Exogenous
variables provide the fixed background of assumptions against which it is
possible to test causal claims about the endogenous variables. Once the structural
model is constructed, we create a sub-model in which the distribution of the
endogenous variable conjectured to be causally efficacious is changed. Thus,
the substitute model is isomorphic relative to the original SEM except for
that variable whose distribution is changed. This is the counterfactual. The
resulting causal contrast is then the difference between the expected value
of the response variable in the original model and the expected value of the
response variable in the sub-model. The problem of confounding, in the counterfactual
framework, has therefore become the problem of appropriately selecting the
endogenous variables in the original SEM. Once the endogenous variables and
the structural equations modeling them are set, then the creation of substitutes
becomes a formal matter. We address this latter problem by examining criteria
by which to select amongst competing SEMs (or, more strictly, between equivalence
classes of SEMs), and the possible problems of error that such criteria are
bound to introduce.
Our conclusion is that these two approaches complement one another. The use
of SEMs requires a conjecture about what kind of causal relations exist. Using
propensity scores permits us to delimit a range of plausible candidates for
use in constructing SEMs.
Dr. Gregory Wheeler
CENTRIA-AI Center
Department of Computer Science
Universidade Nova de Lisboa (New University of Lisbon)
Compounding Doubts
One difference between logics for probability statements and probabilistic logics is how each framework handles combining and detaching basic elements within the calculus. We are aware of some of these differences from various puzzles and paradoxes in the philosophical literature, such as Kyburg's lottery paradox, which concerns adjunction, and Simpson's paradox, which concerns reasoning by cases, among others. However, some of the constructive insights from these puzzles are obscured by our limited understanding of the structural differences between boolean combination of probability statements and the calculation of joint or marginal events within a probability logic. This paper discusses some recent results for conjunction and disjunction within systems of each type.
The paper is motivated from an applied logic point of view. By bringing to
light some of the key structural differences between combinations of probability
statements and the behavior of probability logics, we wish to pinpoint important
domain assumptions necessary to warrant using the latter type of framework
to model probability statements. This in turn would place us in a better position
to assess whether the necessary boundary conditions obtain in particular problem
applications..
Dr. Jon Williamson
Department of Philosophy
University of Kent
Inductive Influence
Objective Bayesianism has been criticised for not allowing learning from experience: it is claimed that an agent must give degree of belief 1 2 to the next raven being black, however many other black ravens have been observed. I argue that this objection can be overcome by appealing to objective Bayesian nets, a formalism for representing objective Bayesian degrees of belief. Under this account, previous observations exert an inductive influence on the next observation. I show how this approach can be used to capture the Johnson-Carnap continuum of inductive methods, as well as the Nix-Paris continuum, and show how inductive influence can be measured..
Jiji Zhang
Department of Philosophy
Carnegie Mellon University
Seeking Truth and Avoiding Error: What Can We Hope Causal Inference Procedures to Achieve?
Inferences about causal relations have been the cynosure of skeptical arguments since Hume. For modern statisticians the skepticism is crystallized into the wisdom that correlation does not imply causation. The caution is rightly placed, but the simple slogan does not seem to do full justice to the rich story of the possibility of inferring causality from non-experimental or semi-experimental data. In this paper I intend to enrich the general message by presenting a relatively big picture concerning the interplay between background assumptions one is willing to make and the kind of reliability one can hope causal inference to achieve, drawing on various formal results established in the causal modeling literature.
Recent developments in the causal modeling literature feature a non-reductive probabilistic theory of causation. I will start by pointing out that this conceptually non-reductive approach still amounts to an epistemic reduction --- it reduces the problem of causal inference to one of statistical model selection by way of bridge principles linking causality to probability. A ÒmodelÓ here is simply meant to be a set of probability distributions. Reasonable bridge principles, however, usually do not translate a causal inference problem into a problem of model selection among disjoint models. In other words, the set of probabilities corresponding to one causal hypothesis typically overlaps with the set of probabilities corresponding to an alternative causal hypothesis. This I gather is what most people have in mind when they claim correlation does not imply causation. But this does not tell the whole story, as models overlapping only implies that some correlational patterns --- those implied by distributions in the intersection of the models --- do not discriminate between them, but others may tell them apart.
I will thus consider the generic problem of choosing between two arbitrary models – sets of probability distributions, and discuss a range of reliability criteria that might be used to gauge procedures of such model selection. These reliability criteria vary along such dimensions as whether and to what extent can suspense of judgments be tolerated (which may induce a tension between the goal of seeking truth and the goal of avoiding error), and the manner of convergence (which is relevant to the possibility of controlling worst-case error probability with finite sample size). For each of the criteria, simple conditions in terms of set-theoretic relationship between models are presented under which the criterion cannot possibly be met.
Given the generic characterization, we turn to the familiar bridge principles and background assumptions in the causal modeling and social science literature. We analyze carefully what each (kind of) assumption contributes to defeating the 'impossibility conditions' identified earlier. The upshot is a comprehensive menu of different (combinations of) background assumptions matching different reliability criteria. The menu is potentially helpful in two ways. For skeptics of causal inference, it presents (part of) a unified picture to locate their specific and diverse reasons for skepticism. For practitioners of causal inference, it provides a guide as to what assumptions are needed in order to achieve a certain kind of success, and conversely, what kind of success criterion is hopeful given certain assumptions and/or background knowledge.