The truth behind the Gilbert et al critique

There is much discussion going on concerning recent critiques and critiques and critiques of critiques of critiques (which I love and find very exciting, like most people!), but amongst all the fighting and such, I want to point out at least one bit of truth that exists in the Gilbert et al retreat.

As they make clear in their reply to the reply (referred to as critique by those trying to cause drama), the main concern of Gilbert et al is that the RP:P project cannot report on the replicability of Psychological Science, in total (as the title of the paper suggests). They suggest (as I will explain below) that this is an over generalization which would not be allowed in a normal, journal, let alone in Science with all of the etc that happened.

Taking away that each participant cost thousands of dollars to collect data on, at its core, this is a (meta)study with a single measurement and 100 ‘participants.’ Ok, that is great and fine, but the conclusion from the has been generalized to the entire population (of studies in psychological science). To quote Capture1from this social media post where the idea originally came to me, It is essentially like an author having 100 students
from three elementary schools, examining how many of them have blue eyes (or any other
specific characteristic that can be categorized dicotomously) and then writing a paper titled ‘estimating the prevalence of blue eyes among the world’s population’. I mean, the evidence just cannot uphold the claim being made from them. That doesn’t mean the study itself is bad, just that it is being used to say more than the evidence can support, which is a very normal thing among human scientists.

Actually though, this blue eye comparison is not even actually correct, because the dichotomous trait being measured is not really dichotomous and would need to be able to change according to the area of the world it is in (i.e., changes in replicability across area) or even by how the anchors of the scale on the questionnaire are scaled (i.e., tiny situational variables can strongly affect the results; at least most psychologists believe so).

Does that make sense? The conclusion of the study was that only 40 something percent of *psychological studies* are replicable (the title of the paper was ‘estimating the replicability of psychological science’), but the studies that were replicated in the OSC paper came from a single year of three (of the leading) journals in psychology.

In light of this, I (and I think Gilbert et al) would suggest Capture
that a far more appropriate generalization (if we choose to generalize; we could simply describe like  researchers studying drug abuse in a city would) from the data would be something like: only 40 something percent of psychological studies from these three journals, in the year 2008, are replicable. And that is IF we choose to go beyond simply describing what we found (similar to researchers studying drug abuse in a city would do).

More than this, one could even suggest that the sample the OSC utilized is one which would have especially Low replication rates, exactly because they are the leading journals in the field (i.e., Psychological Science, Journal of Personality and Social Psychology and Journal of Experimental Psychology: Learning, Memory, and Cognition). These are the ones that people are most likely to cheat to get into, they are the flashiest, the most controversial, uncertain and exciting (there is even a citation I saw once to say that higher JIF papers have lower power). The fact that they found differences across the journals indicates that the sample matters.

So, to take it to the extreme (if we are studying cocaine use in a city or the entire population), the OSC team went into the ‘ghetto’, took a sample of 100 people for cocaine (our dichotomous variable), found the %, and then extrapolated this to say something about the % of people in the world (of psychological studies) that use cocaine. Ok, I know that is wayyyyy to far, but it will give the snarkers something to snark about and maybe get some extra shares. 😀 and actually it gets the point across nicely.

But really, nobody (here at least) is saying that the study was poorly run or otherwise bad, assuming they are only trying to say something about the specific corner or neighborhood that they sampled from (the corner where it is most likely to happen). It just can’t support the claim it was suggested to support. Certainly I am not saying this was done with malicious intent, or whatever bad thing you will want to say I said (let me be explicit). The people I know know that I respect them, even if I criticize them and their science (the way they are critizing others’ science?). Capture.PNG

The thing is that we are all human, 
and this is the real problem; we have these biases and tendencies, that especially come out when we are in large groups.

Especially psychological scientists (most of the OSC researchers were psychologists) have a tendency to overgeneralize their research findings, sometimes inappropriately. [to quote again from the social media conversation] This overgeneralizing is a problem, I think most psychologists agree, and one that the OS community has been working to demonstrate (e.g., by showing that these effects are more fragile than we thought). The thing is that now in another 10 years a different  group of researchers will come along, sample a different 100 papers (e.g., from different years or journals or areas) and conclude that the OSC project cannot be replicated (potentially because they chose a different sampling frame within the population).

That was essentially the last post in the conversation (at least for now, about 30 minutes later [and again the next day, no new posts]).


What do you think? Does it make sense? I am still not sure it is right, but I do think it is worth you thinking a bit about and potentially telling me I am wrong! I think it might be right. Again, just to be sure, I want to reiterate that this does not mean the OSC study was poorly done as an estimate of the replicability of three psychology journals for 2008, but I do think it might be a bit light on data to say something about ‘the replicability of psychological science’, in total (which is really the core of the point Gilbert et al have been arguing, I would say).


Anyways, let me know what you think down below, or come yell at me on twitter like everyone else does. 😀 Thanks for coming by! 😀



  1. “The thing is that now in another 10 years a different group of researchers will come along, sample a different 100 papers (e.g., from different years or journals or areas) and conclude that the OSC project cannot be replicated”

    And that would give us more information about replicability again! It seems to me that there will always be many choices to make in gathering a “representative” sample for which you can always state that they are sub-optimal.

    But it is fun thinking of the most optimal manner to possibly achieve the most “representative” sample. I’ve tried that, and i think it’s impossible to achieve. What would be “representative”? Take a random article from every psychological journal ever published? That would mean thousands of studies, and would be nearly impossible to do.

    100 studies, as done in the OSC study, seems doable, so how would one get the most “representative” sample of “psychological science” using 100 studies as the limit? Randomly select 100 psychological journals, from which you then randomly select 1 year, from which you then randomly select 1 issue, from which you then randomly select 1 article/study each?

    Perhaps the title of the original study could have been something different than “Estimating the reproducibility of psychological science” but i sincerely hope the information gained from the project is not thrown away just because of the title of the paper.

    1. Hey, Thanks for coming by! 🙂

      Yes, I definitely agree, and there is, I would say, very little chance of the information being thrown out, and I honestly hope this doesn’t happen! 😀 It is Definitely information, even very good information, for at least certain questions.

      The only thing I want to say, and I think the main thing Gilbert et al wanted to say, is that this study and results cannot be stated to inform such a broad question as ‘how much of the entire psychological literature is reproducible’ which is what the title and much of the press suggested. The press also would make you believe that Gilbert et al want us to throw out the data of the study and say it is useless, or to suggest that there is not a reproducibility problem, but this is simply not the case. There is a ‘reproducibility problem’, at least among the results from these three journals, in this particular year. The point I (and they) are trying to make is simply that it cannot be generalized the way it was (to ALL psychological science).

      I think one of the most indicative aspects is that the different journals had different results, indicating that if we picked from different journals and different fields, we would get different results. So we cant generalize to All of psychological science.

      The funny part though is that the conclusion from this is that if we want to say something about the reproducibility of psychological science, it will need to be much broader, definitely across the different areas, and across the different impact factors. There is already evidence, without needing to replicate all the studies, that journals with higher impact factors have lower statistical power for instance (this study, in essence really had n of 100) :p : D Which is great considering the amount of effort it took! but still somehow limited

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s