There is much discussion going on concerning recent critiques and critiques and critiques of critiques of critiques (which I love and find very exciting, like most people!), but amongst all the fighting and such, I want to point out at least one bit of truth that exists in the Gilbert et al retreat.
As they make clear in their reply to the reply (referred to as critique by those trying to cause drama), the main concern of Gilbert et al is that the RP:P project cannot report on the replicability of Psychological Science, in total (as the title of the paper suggests). They suggest (as I will explain below) that this is an over generalization which would not be allowed in a normal, journal, let alone in Science with all of the etc that happened.
Taking away that each participant cost thousands of dollars to collect data on, at its core, this is a (meta)study with a single measurement and 100 ‘participants.’ Ok, that is great and fine, but the conclusion from the has been generalized to the entire population (of studies in psychological science). To quote from this social media post where the idea originally came to me, It is essentially like an author having 100 students
from three elementary schools, examining how many of them have blue eyes (or any other
specific characteristic that can be categorized dicotomously) and then writing a paper titled ‘estimating the prevalence of blue eyes among the world’s population’. I mean, the evidence just cannot uphold the claim being made from them. That doesn’t mean the study itself is bad, just that it is being used to say more than the evidence can support, which is a very normal thing among human scientists.
Actually though, this blue eye comparison is not even actually correct, because the dichotomous trait being measured is not really dichotomous and would need to be able to change according to the area of the world it is in (i.e., changes in replicability across area) or even by how the anchors of the scale on the questionnaire are scaled (i.e., tiny situational variables can strongly affect the results; at least most psychologists believe so).
Does that make sense? The conclusion of the study was that only 40 something percent of *psychological studies* are replicable (the title of the paper was ‘estimating the replicability of psychological science’), but the studies that were replicated in the OSC paper came from a single year of three (of the leading) journals in psychology.
In light of this, I (and I think Gilbert et al) would suggest
that a far more appropriate generalization (if we choose to generalize; we could simply describe like researchers studying drug abuse in a city would) from the data would be something like: only 40 something percent of psychological studies from these three journals, in the year 2008, are replicable. And that is IF we choose to go beyond simply describing what we found (similar to researchers studying drug abuse in a city would do).
More than this, one could even suggest that the sample the OSC utilized is one which would have especially Low replication rates, exactly because they are the leading journals in the field (i.e., Psychological Science, Journal of Personality and Social Psychology and Journal of Experimental Psychology: Learning, Memory, and Cognition). These are the ones that people are most likely to cheat to get into, they are the flashiest, the most controversial, uncertain and exciting (there is even a citation I saw once to say that higher JIF papers have lower power). The fact that they found differences across the journals indicates that the sample matters.
So, to take it to the extreme (if we are studying cocaine use in a city or the entire population), the OSC team went into the ‘ghetto’, took a sample of 100 people for cocaine (our dichotomous variable), found the %, and then extrapolated this to say something about the % of people in the world (of psychological studies) that use cocaine. Ok, I know that is wayyyyy to far, but it will give the snarkers something to snark about and maybe get some extra shares. 😀 and actually it gets the point across nicely.
But really, nobody (here at least) is saying that the study was poorly run or otherwise bad, assuming they are only trying to say something about the specific corner or neighborhood that they sampled from (the corner where it is most likely to happen). It just can’t support the claim it was suggested to support. Certainly I am not saying this was done with malicious intent, or whatever bad thing you will want to say I said (let me be explicit). The people I know know that I respect them, even if I criticize them and their science (the way they are critizing others’ science?).
The thing is that we are all human,
and this is the real problem; we have these biases and tendencies, that especially come out when we are in large groups.
Especially psychological scientists (most of the OSC researchers were psychologists) have a tendency to overgeneralize their research findings, sometimes inappropriately. [to quote again from the social media conversation] This overgeneralizing is a problem, I think most psychologists agree, and one that the OS community has been working to demonstrate (e.g., by showing that these effects are more fragile than we thought). The thing is that now in another 10 years a different group of researchers will come along, sample a different 100 papers (e.g., from different years or journals or areas) and conclude that the OSC project cannot be replicated (potentially because they chose a different sampling frame within the population).
That was essentially the last post in the conversation (at least for now, about 30 minutes later [and again the next day, no new posts]).
What do you think? Does it make sense? I am still not sure it is right, but I do think it is worth you thinking a bit about and potentially telling me I am wrong! I think it might be right. Again, just to be sure, I want to reiterate that this does not mean the OSC study was poorly done as an estimate of the replicability of three psychology journals for 2008, but I do think it might be a bit light on data to say something about ‘the replicability of psychological science’, in total (which is really the core of the point Gilbert et al have been arguing, I would say).
Anyways, let me know what you think down below, or come yell at me on twitter like everyone else does. 😀 Thanks for coming by! 😀