Fiedler on the Replicability Project

This was originally posted into the ISCON Facebook Page, I repost it here in its entirety:


Klaus Fiedler has granted me permission to share a letter that he wrote to a reported (Bruce Bowers) in response to the replication project. This letter contains Klaus’s words only and the only part I edited was to remove his phone number. I thought this would be of interest to the group.

These are his words on the 2015 estimating the replicability of psychology article.


Dear Bruce:

Thanks for your email. You can call be tomorrow but I guess what I have to say is summarized in this email.

Before I try to tell it like it is, I ask you to please attend to my arguments, not just the final evaluations, which may appear unbalanced. So if you want to include my statement in your article, maybe along with my name, I would be happy not to detach my evaluative judgment from the arguments that in my opinion inevitably lead to my critical evaluation.

First of all I want to make it clear that I have been a big fan of properly conducted replication and validation studies for many years – long before the current hype of what one might call a shallow replication research program. Please note also that one of my own studies has been included in the present replication project; the original findings have been borne out more clearly than in the original study. So there is no self-referent motive for me to be overly critical.

However, I have to say that I am more than disappointed by the present report. In my view, such an expensive, time-consuming, and resource-intensive replication study, which can be expected to receive so much attention and to have such a strong impact on the field and on its public image, should live up (at least) to the same standards of scientific scrutiny as the studies that it evaluates. I’m afraid this is not the case, for the following reasons …

The rationale is to plot the effect size of replication results as a function of original results. Such a plot is necessarily subject to regression toward the mean. On a-priori-grounds, to the extent that the reliability of the original results is less than perfect, it can be expected that replication studies regress toward weaker effect sizes. This is very common knowledge. In a scholarly article one would try to compare the obtained effects to what can be expected from regression alone. The rule is simple and straightforward. Multiply the effect size of the original study (as a deviation score) with the reliability of the original test, and you get the expected replication results (in deviation scores) – as expected from regression alone. The informative question is to what extent the obtained results are weaker than the to-be-expected regressive results.

To be sure, the article’s muteness regarding regression is related to the fact that the reliability was not assessed. This is a huge source of weakness. It has been shown (in a nice recent article by Stanley & Spence, 2014, in PPS) that measurement error and sampling error alone will greatly reduce the replicability of empirical results, even when the hypothesis is completely correct. In order not to be fooled by statistical data, it is therefore of utmost importance to control for measurement error and sampling error. This is the lesson we took from Frank Schmidt (2010). It is also very common wisdom.

The failure to assess the reliability of the dependent measures greatly reduces the interpretation of the results. Some studies may use single measures to assess an effect whereas others may use multiple measures and thereby enhance the reliability, according to a principle well-known since Spearman & Brown. Thus, some of the replication failures may simply reflect the naïve reliance on single-item dependent measures. This is of course a weakness of the original studies, but a weakness different from non-replicability of the theoretically important effect. Indeed, contrary to the notion that researchers perfectly exploit their degrees of freedom and always come up with results that overestimate their true effect size, they often make naïve mistakes.

By the way, this failure to control for reliability might explain the apparent replication advantage of cognitive over social psychology. Social psychologists may simply often rely on singular measure, whereas cognitive psychologists use multi-trial designs resulting in much higher reliability.

The failure to consider reliability refers to the dependent measure. A similar failure to systematically include manipulation checks renders the independent variables equivocal. The so-called Duhem-Quine problem refers to the unwarranted assumption that some experimental manipulation can be equated with the theoretical variable. An independent variable can be operationalized in multiple ways. A manipulation that worked a few years ago need to work now, simply because no manipulation provides a plain manipulation of the theoretical variable proper. It is therefore essential to include a manipulation check, to make sure that the very premise of a study is met, namely a successful manipulation of the theoretical variable. Simply running the same operational procedure as years before is not sufficient, logically.

Last but not least, the sampling rule that underlies the selection of the 100 studies strikes me as hard to tolerate. Replication teams could select their studies from the first 20 articles published in a journal in a year (if I correctly understand this sentence). What might have motivated the replication teams’ choices? Could this procedure be sensitive to their attitude towards particular authors or their research? Could they have selected simply studies with a single dependent measure (implying low reliability)? – I do not want to be too suspicious here but, given the costs of the replication project and the human resources, does this sampling procedure represent the kind of high-quality science the whole project is striving for?

Across all replication studies, power is presupposed to be a pure function of the size of participant samples. The notion of a truly representative design in which tasks and stimuli and context conditions and a number of other boundary conditions are taken into account is not even mentioned (cf. Westfall & Judd).


What do you think about this?


I 100% agree with his concern about the expense. Speaking with some of the replicators, we estimated the endeavor cost over 1 million euros, all told. This paid for the time of 300 psychologists, who ‘donated’ their time to the endeavor. The taxpayer paid for this… Is it the best use of their tax dollars, I guess not.

I also definitely agree with his assessment about regression to the mean.



Fishing for effects – on estimating the average size of fish in our pond.

There is much nice discussion recently, most recently a Data Colada post about how difficult it is to estimate an effect size.

i am not sure if I wrote it up before or not, but the argument is succinctly captured by trying to estimate the average size of fish in any particular pond (except that is way way easier). Each particular effect size a (group of) researcher(s) gets is like catching a particular fish in that pond (though of course sampling error is essentially taken care of in the pond example).

If one catches 10 or 15 fish from the pond, one can begin estimating the ‘average size of fish in the pond’. But of course, this is only ‘the average size of fish that we caught, in the pond’.

What you catch depends on how you fish… 

The key is that, of course, the size of fish one catches in the pond is related to how one fishes, where one looks, what type of bait one uses, the strategy of reeling, the time of day, the depth that one looks at, and even things like how one approaches the spot one will try to fish.

and of course one only takes a picture/ documents with the largest fish, and the fish become bigger over time (so long as no picture is present), and of course, when it is the person who owns the land doing the documenting and they want people to come fish their land, there is some.. potential upward drift of the average size of the fish.

Image result for things that affect how many fish I catch

a pair of happy academics with the fish they want to report on, with about as much information about how they obtained it as in the average journal article. 

This says nothing of the fact that in our real world example, we don’t actually know how big the pond is, or even if there are fish in it, and we can mistake an old shoe for a fish and a fish for an old shoe.

The key is, of course, that one must be careful in making proclamations about the size or presence of fish in the pond.


On suggestions there are no fish in the pond.. 

This is especially true after any single fishing trip, or any group of fishers that all use the same lure, or look in the same area or the same way… as maybe they were just doing it wrong, or at the wrong time, or with the wrong bait, or etc etc.

Image result for things that affect how many fish I catch

In any case, I think you get the point. Would be happy to discuss below.

Love ya, Keep on,





A Treatise of Human Nature.. by David Hume (1739)


Attached here is Hume’s Treatise on Human Nature, actually it is quite interesting. Looking at the table of contents, you see that it is mostly interested in understanding, how we make sense of the world, passions, which seems like motivations really, and then morals, which is essentially about justice and how we understand right and wrong.

Really interesting, and actually it was one of Einstein’s favorite books, he even said it was influential in his thinking. I do with I had time to read it, but I am at least glad I was able to look at the TOC.

Have you read it? What do you think is best?

Love ya,




Editor’s Preface.


Book I: Of the Understanding


Part I.: Of Ideas, Their Origin, Composition, Connexion, Abstraction, &c.

Section I.: Of the Origin of Our Ideas.
Section II.: Division of the Subject.
Section III.: Of the Ideas of the Memory and Imagination.
Section IV.: Of the Connexion Or Association of Ideas.
Section V.: Of Relations.
Section VI.: Of Modes and Substances.
Section VII.: Of Abstract Ideas.

Part II.: Of the Ideas of Space and Time.

Section I.: Of the Infinite Divisibility of Our Ideas of Space and Time.
Section II.: Of the Infinite Divisibility of Space and Time.
Section III.: Of the Other Qualities of Our Ideas of Space and Time.
Section IV.: Objections Answer’d.
Section V.: The Same Subject Continu’d.
Section VI.: Of the Idea of Existence, and of External Existence.

Part III.: Of Knowledge and Probability.

Section I.: Of Knowledge.
Section II.: Of Probability; and of the Idea of Cause and Effect.
Section III.: Why a Cause Is Always Necessary.
Section IV.: Of the Component Parts of Our Reasonings Concerning Cause and Effect.
Section. V.: Of the Impressions of the Senses and Memory.
Section VI.: Of the Inference From the Impression to the Idea.
Section VII.: Of the Nature of the Idea Or Belief.
Section VIII.: Of the Causes of Belief.
Section IX.: Of the Effects of Other Relations and Other Habits.
Section X.: Of the Influence of Belief.
Section XI.: Of the Probability of Chances.
Section XII.: Of the Probability of Causes.
Section XIII.: Of Unphilosophical Probability.
Section XIV.: Of the Idea of Necessary Connexion.
Section XV.: Rules By Which to Judge of Causes and Effects.
Section XVI.: Of the Reason of Animals.

Part IV.: Of the Sceptical and Other Systems of Philosophy.

Section I.: Of Scepticism With Regard to Reason.
Section II.: Of Scepticism With Regard to the Senses.
Section III.: Of the Antient Philosophy.
Section IV.: Of the Modern Philosophy.
Section V.: Of the Immateriality of the Soul.
Section VI.: Of Personal Identity.
Section VII.: Conclusion of This Book.


Book II: Of the Passions

Part I.: Of Pride and Humility.
Section I.: Division of the Subject.
Section II.: Of Pride and Humility; Their Objects and Causes.
Section III.: Whence These Objects and Causes Are Deriv’d.
Section IV.: Of the Relations of Impressions and Ideas.
Section V.: Of the Influence of These Relations On Pride and Humility.
Section VI.: Limitations of This System.
Section VII.: Of Vice and Virtue.
Section VIII.: Of Beauty and Deformity.
Section IX.: Of External Advantages and Disadvantages.
Section X.: Of Property and Riches.
Section XI.: Of the Love of Fame.
Section XII.: Of the Pride and Humility of Animals.

Part II.: Of Love and Hatred.

Section I.: Of the Objects and Causes of Love and Hatred.
Section II.: Experiments to Confirm This System.
Section III.: Difficulties Solv’d.
Section IV.: Of the Love of Relations.
Section V.: Of Our Esteem For the Rich and Powerful.
Section VI.: Of Benevolence and Anger.
Section VII.: Of Compassion.
Section VIII.: Of Malice and Envy.
Section IX.: Of the Mixture of Benevolence and Anger With Compassion and Malice.
Section X.: Of Respect and Contempt.
Section XI.: Of the Amorous Passion, Or Love Betwixt the Sexes.
Section XII.: Of the Love and Hatred of Animals.
Part III.: Of the Will and Direct Passions.

Section I.: Of Liberty and Necessity.
Section II.: The Same Subject Continu’d.
Section III.: Of the Influencing Motives of the Will.
Section IV.: Of the Causes of the Violent Passions.
Section V.: Of the Effects of Custom.
Section VI.: Of the Influence of the Imagination On the Passions.
Section VII.: Of Contiguity, and Distance In Space and Time.
Section VIII.: The Same Subject Continu’d.
Section IX.: Of the Direct Passions.
Section X.: Of Curiosity, Or the Love of Truth.

Book III: Of Morals

Part I.: Of Virtue and Vice In General.

Section I.: Moral Distinctions Not Deriv’d From Reason.
Section II.: Moral Distinctions Deriv’d From a Moral Sense.

Part II.: Of Justice and Injustice.

Section I.: Justice, Whether a Natural Or Artificial Virtue?
Section II.: Of the Origin of Justice and Property.
Section III.: Of the Rules, Which Determine Property.
Section IV.: Of the Transference of Property By Consent.
Section V.: Of the Obligation of Promises.
Section VI.: Some Farther Reflexions Concerning Justice and Injustice.
Section VII.: Of the Origin of Government.
Section VIII.: Of the Source of Allegiance.
Section IX.: Of the Measures of Allegiance.
Section X.: Of the Objects of Allegiance.
Section XI.: Of the Laws of Nations.
Section XII.: Of Chastity and Modesty.

Part III.: Of the Other Virtues and Vices.

Section I.: Of the Origin of the Natural Virtues and Vices.
Section II.: Of Greatness of Mind.
Section III.: Of Goodness and Benevolence.
Section IV.: Of Natural Abilities.
Section V.: Some Farther Reflexions Concerning the Natural Virtues.
Section VI.: Conclusion of This Book.


The Behavior of Organisms: An Experimental Analysis

This is Skinner’s first book, so far as I can see. Really quite interesting.

Really quite interesting, in its table of contents.

IX       DRIVE 341


Basically it is about conditioning, he doesn’t come to drive until the end. He doesn’t mention person differences at all, in the table of contents at least. Also doesn’t talk about the biological basis, the neuron at all, so far as I can see. Nor anything about personality or etc, nor influence or really people beyond the individual..

It is also his first book, so far as I can see. Really quite long at 450 pages.


The full book is available here.

Find a significant effect in any study

Too much yall :,D too much


Stephen Politzer-Ahles is Assistant Professor at the Department of Chinese and Bilingual Studies of The Hong Kong Polytechnic University. He is committed to finding solutions to current challenges in the cognitive sciences. For instance, he is developing efficient and transparent strategies to empty out his own file drawer.

p>.05. We’ve all been there. Who among us hasn’t had a student crying in our office over an experiment that failed to show a significant effect? Who among us hasn’t been that student?

Statistical nonsignificance is one of the most serious challenges facing science. When experiments aren’t p<.05, they can’t be published (because the results aren’t real), people can’t graduate, no one can get university funding to party it up at that conference in that scenic location, and in general the whole enterprise falls apart. The amount of taxpayer dollars that have been wasted on p>.05 experiments is frankly…

View original post 380 more words

The parable of the NBA replicators

The normal players in the NBA could see the michael jordans and kobe bryants had once made 80 points in a single NBA game (20 years ago), and wanted to replicate such results for themselves. after all, this was an excellent outcome, really something to talk about, and they wanted some of it themselves.

Unfortunately, try as they might, they could not replicate that result; they could not get the 80 points in a game. No matter how hard they tried, they could only get 30 to 40 points in a game, at best. Not being able to do it in their own court, no matter how many times they tried, they traveled to other courts, and even asked many of their friends to try to get 80 points in a night, even to publicly claim that they would try to get the 80 points in their next game. They felt that if they logged their intentions of getting 80 points in a game and how they would do it, and had many people look, it would make it more likely, or cast doubt on the original 80 point performance.

Again unfortunately, even players who were the best at doing other things, those that could get 30 rebounds in a night, 35 assists, could not get the 80 points in a night.

It was dismaying to everyone around, and the average and bench players began to cast doubt upon the original performances, saying it couldn’t be done anymore, or even that the original was somehow faked.

This was very dismaying to players who had achieved such a laudable result. Unfortunately, when they tried to assert that they had actually achieved the desired 80 points per game, they were met only with scorn and suggestions that if they wanted people to believe them, that they should publicly register that they will recreate that 80 point game in the next match they played (despite that the original performance was now more than 20 years ago, with a different team, and MJ had since had 2 surgeries on his knee).

And so the average players called a march, and demanded that drastic changes in the league be made, such that the only things that were valued were those things that anyone else could be easily replicated by anybody on any court.

the end.

Identifying Impact

is the title of a game we are developing, examining whether people can tell how much impact a scientific paper has based upon its title and abstract.


The idea is to test how well people can identify scientific impact by the titles alone. Not only can we correlate the people scores with actual impact metrics, we can examine which metrics are the best indicators of what the people think is best or most interesting.

More than this, we can examine how much of the eventual impact of the title can be predicted utilizing e.g., expert or layperson perceptions about the value of the work from the title alone.

Finally, the study opens the ability to examine whether person differences affect e.g., what people find interesting, or how well the scores are related with commonly utilized impact metrics.


More generally, it will be made available so that other researchers can utilize the template to run their own studies that are similar.