Author: Brett Buttliere


This is going to be a series I do, when I have time, on instances of cognitive conflict in the world.






Find a significant effect in any study

Too much yall :,D too much


Stephen Politzer-Ahles is Assistant Professor at the Department of Chinese and Bilingual Studies of The Hong Kong Polytechnic University. He is committed to finding solutions to current challenges in the cognitive sciences. For instance, he is developing efficient and transparent strategies to empty out his own file drawer.

p>.05. We’ve all been there. Who among us hasn’t had a student crying in our office over an experiment that failed to show a significant effect? Who among us hasn’t been that student?

Statistical nonsignificance is one of the most serious challenges facing science. When experiments aren’t p<.05, they can’t be published (because the results aren’t real), people can’t graduate, no one can get university funding to party it up at that conference in that scenic location, and in general the whole enterprise falls apart. The amount of taxpayer dollars that have been wasted on p>.05 experiments is frankly…

View original post 380 more words

The parable of the NBA replicators

The normal players in the NBA could see the michael jordans and kobe bryants had once made 80 points in a single NBA game (20 years ago), and wanted to replicate such results for themselves. after all, this was an excellent outcome, really something to talk about, and they wanted some of it themselves.

Unfortunately, try as they might, they could not replicate that result; they could not get the 80 points in a game. No matter how hard they tried, they could only get 30 to 40 points in a game, at best. Not being able to do it in their own court, no matter how many times they tried, they traveled to other courts, and even asked many of their friends to try to get 80 points in a night, even to publicly claim that they would try to get the 80 points in their next game. They felt that if they logged their intentions of getting 80 points in a game and how they would do it, and had many people look, it would make it more likely, or cast doubt on the original 80 point performance.

Again unfortunately, even players who were the best at doing other things, those that could get 30 rebounds in a night, 35 assists, could not get the 80 points in a night.

It was dismaying to everyone around, and the average and bench players began to cast doubt upon the original performances, saying it couldn’t be done anymore, or even that the original was somehow faked.

This was very dismaying to players who had achieved such a laudable result. Unfortunately, when they tried to assert that they had actually achieved the desired 80 points per game, they were met only with scorn and suggestions that if they wanted people to believe them, that they should publicly register that they will recreate that 80 point game in the next match they played (despite that the original performance was now more than 20 years ago, with a different team, and MJ had since had 2 surgeries on his knee).

And so the average players called a march, and demanded that drastic changes in the league be made, such that the only things that were valued were those things that anyone else could be easily replicated by anybody on any court.

the end.

Identifying Impact

is the title of a game we are developing, examining whether people can tell how much impact a scientific paper has based upon its title and abstract.


The idea is to test how well people can identify scientific impact by the titles alone. Not only can we correlate the people scores with actual impact metrics, we can examine which metrics are the best indicators of what the people think is best or most interesting.

More than this, we can examine how much of the eventual impact of the title can be predicted utilizing e.g., expert or layperson perceptions about the value of the work from the title alone.

Finally, the study opens the ability to examine whether person differences affect e.g., what people find interesting, or how well the scores are related with commonly utilized impact metrics.


More generally, it will be made available so that other researchers can utilize the template to run their own studies that are similar.




Evolution in action – super resistant bacteria


A Neat little video about how bacteria evolve, but also want to point out how they are basically creating super mutant bacteria in like 11 days, yes?


Also watch from about 1:19, where it is the shortest route from the original mutation to the first one who breaks into the 10 and 100 times antibotic. and notice how the ones who first broke really into the 1000 times all stuck together really well.


Context, Theory, and Expertise


Had a very good conversation here, and I think I explained the problem with saying that (even many) perfectly failed direct replications mean that the original was probably and/or that the theory needs to be reevaluated quite well. They are, essentially: Context, Theory, and Expertise.


This is just the same as in the picture, just a little more worked out:

  1. is again context stuff. You can use Literally exactly same material but mean different things. Having someone make the exact same hand gesture across cultures can mean different things.
  2. is the difference between a specific manipulation and the theory. Strack study not replicating does not mean that smiling when we are unhappy will not make us happier (even if it is just placebo or experimenter effects in world). It is clear, I think, that even if holding a pen in your mouth doesn’t make you happier or cartoons funnier, that doesn’t mean that every instance of facial feedback doesn’t exist, or even more generally that there  is no such thing as facial feedback.
  3. is that even kobe bryant and michael jordan could not hit 80 points on demand, and certainly many other people can not even do it once! 😀 even though we have video of them doing it and so it doubtlessly happened. MAybe the team was different, maybe they didn’t feel as good. Sure, we can agree it is not generalizable then, but it also doesn’t mean it didn’t happen or that it is no longer evidence for the theory. If you don’t believe in expertise in psychology, you should my last blog post, which was exactly on that subject (tldr; its a real thing).


and I find those three reasons convincing enough to at least not dismiss the original study and theory outright. Especially number 2 seems strong to me, what do you think?








On Expertise

What do you think about Expertise (within the field of Social Psychology)? From recent discussions it would seem that Any (social) psychologist should be able to ‘reproduce’ Any study withing (social) psychology, and I am not sure that this is really the case. It IS a simple fact of the matter that people differ in how good they are at designing experiments/ experimental protocols/ and the things that the participants see

Look at it this way: Some psychologists can’t even get participants to take their experiments; let alone really pay attention or care. that is a real thing. How can you expect to reproduce something if you can’t get participants to even take your experiments?

This point is especially driven home from my own research (and also many other people’s research). 1 Small effects are changed by small things. The wording of a single sentence, even a single word (!!!!), can seriously influence the outcomes of a study. How researchers treat their participants, from the first instance, in the consent form or the contact or the first email contact, conveys to them how much you care about their time, what they think the study is about, and ultimately affects how much they care about the study (and these things are likely to affect the results you get.).



THE THING IS, (from a conversation I had tonight with Levordashka) would you expect a chemist to be able to pick up any chemistry study in their field and be able to replicate it (on their first try.)? The difference is, I guess, that their studies take mixing a little baking soda and etc and ours require 100 people spending 20 minutes each, just to collect the data. 😀 3

No, We would not expect this, at least I would not. Even a specialist in the field, I would guess, needs more than a single try, even more than two or three tries, to get it right. Again, think of it this way: When you were in highschool, did you get the chemistry experiments right each time? MAybe you were better than me at chemistry, but I did good enough to get here, and I Definitely didn’t do it right each time. And those were, basically, the simplest experiments possible, explained in the simplest ways possible, with a bunch of other people around you trying to do the exact same thing at the same time. I mean,.. that is a serious thing! 😀

Do I think that psychologists can (should?) do a better job of reporting their studies and being more honest about what is involved, and even sharing the exact materials they used? Yes, absolutely, it is even better for me as an individual researcher (if the effects replicate better and I am actually interested in doing real science). But to say that just because someone couldn’t replicate an effect, that that effect doesn’t exist, or even that it isn’t robust, is too far in my opinion. I’ve gotten a solid effect 5 times and if you tried to tell me that you couldn’t get it, I would absolutely tell you it is probably your fault. Really. 😀  When I do it, it works. But my work is not important enough to replicate (and I don’t have enough of it, yet :,).


it IS a fact that some people are better at designing experiments (or coming up with ideas for experiments, or writing, or etc) than other people and that is ok for me. These things are why we collaborate, exactly so we don’t have to be great at every aspect. I am not sure how to solve the problem of expertise (or how it affects replication more generally), but I definitely think that to dismiss expertise, especially the expertise of some of the older generation in the field, who have been studying psychology for 50 or 60 years, would be foolish at best. Maybe the expertise is not in designing  or doing high powered experiments, but I could definitely see how that experience might influence participants to care more, or definitely in designing the experiments (in the case of explicit conceptual replications, or trying to directly replicate studies without original materials, consent form, approaching them in the same way, etc).  These things matter, I think we know this (e.g., advertising).

In any case, I have to go, I am tired, and I want to go before I run out of things to say. 😀 Thank you for coming by and I do wonder what you have to say about the things I’ve said! BEst, Brett.



1: We have a quite replicable difference between two conditions, and we have even shown that we can ‘switch’ the person’s behavior by putting one condition after the other. In our last study we made another tiny change, namely pushing it even further by putting the two conditions at the same time. Both effects work, so why not? But it didn’t work. I am pretty sure it is because the participants figured it out (it reveals quite plainly a kind of stupid thing people do). The effect is significant in the first trial but none of the others or all together. This is a ‘conceptual replication’ that failed, even with ‘expertise’ (at creating the original effect).

2: Footnotes are actually pretty useful for sidenotes, so ya, thanks Simine 😀

3: Psychology is harder than Chemistry? 😀 just teasing! but I think so. (thats why I do it?) 😀


Let me know what you think! 😀 On Twitter (@brettbuttliere) or here, or where ever. Thanks again! 🙂

The truth behind the Gilbert et al critique

There is much discussion going on concerning recent critiques and critiques and critiques of critiques of critiques (which I love and find very exciting, like most people!), but amongst all the fighting and such, I want to point out at least one bit of truth that exists in the Gilbert et al retreat.

As they make clear in their reply to the reply (referred to as critique by those trying to cause drama), the main concern of Gilbert et al is that the RP:P project cannot report on the replicability of Psychological Science, in total (as the title of the paper suggests). They suggest (as I will explain below) that this is an over generalization which would not be allowed in a normal, journal, let alone in Science with all of the etc that happened.

Taking away that each participant cost thousands of dollars to collect data on, at its core, this is a (meta)study with a single measurement and 100 ‘participants.’ Ok, that is great and fine, but the conclusion from the has been generalized to the entire population (of studies in psychological science). To quote Capture1from this social media post where the idea originally came to me, It is essentially like an author having 100 students
from three elementary schools, examining how many of them have blue eyes (or any other
specific characteristic that can be categorized dicotomously) and then writing a paper titled ‘estimating the prevalence of blue eyes among the world’s population’. I mean, the evidence just cannot uphold the claim being made from them. That doesn’t mean the study itself is bad, just that it is being used to say more than the evidence can support, which is a very normal thing among human scientists.

Actually though, this blue eye comparison is not even actually correct, because the dichotomous trait being measured is not really dichotomous and would need to be able to change according to the area of the world it is in (i.e., changes in replicability across area) or even by how the anchors of the scale on the questionnaire are scaled (i.e., tiny situational variables can strongly affect the results; at least most psychologists believe so).

Does that make sense? The conclusion of the study was that only 40 something percent of *psychological studies* are replicable (the title of the paper was ‘estimating the replicability of psychological science’), but the studies that were replicated in the OSC paper came from a single year of three (of the leading) journals in psychology.

In light of this, I (and I think Gilbert et al) would suggest Capture
that a far more appropriate generalization (if we choose to generalize; we could simply describe like  researchers studying drug abuse in a city would) from the data would be something like: only 40 something percent of psychological studies from these three journals, in the year 2008, are replicable. And that is IF we choose to go beyond simply describing what we found (similar to researchers studying drug abuse in a city would do).

More than this, one could even suggest that the sample the OSC utilized is one which would have especially Low replication rates, exactly because they are the leading journals in the field (i.e., Psychological Science, Journal of Personality and Social Psychology and Journal of Experimental Psychology: Learning, Memory, and Cognition). These are the ones that people are most likely to cheat to get into, they are the flashiest, the most controversial, uncertain and exciting (there is even a citation I saw once to say that higher JIF papers have lower power). The fact that they found differences across the journals indicates that the sample matters.

So, to take it to the extreme (if we are studying cocaine use in a city or the entire population), the OSC team went into the ‘ghetto’, took a sample of 100 people for cocaine (our dichotomous variable), found the %, and then extrapolated this to say something about the % of people in the world (of psychological studies) that use cocaine. Ok, I know that is wayyyyy to far, but it will give the snarkers something to snark about and maybe get some extra shares. 😀 and actually it gets the point across nicely.

But really, nobody (here at least) is saying that the study was poorly run or otherwise bad, assuming they are only trying to say something about the specific corner or neighborhood that they sampled from (the corner where it is most likely to happen). It just can’t support the claim it was suggested to support. Certainly I am not saying this was done with malicious intent, or whatever bad thing you will want to say I said (let me be explicit). The people I know know that I respect them, even if I criticize them and their science (the way they are critizing others’ science?). Capture.PNG

The thing is that we are all human, 
and this is the real problem; we have these biases and tendencies, that especially come out when we are in large groups.

Especially psychological scientists (most of the OSC researchers were psychologists) have a tendency to overgeneralize their research findings, sometimes inappropriately. [to quote again from the social media conversation] This overgeneralizing is a problem, I think most psychologists agree, and one that the OS community has been working to demonstrate (e.g., by showing that these effects are more fragile than we thought). The thing is that now in another 10 years a different  group of researchers will come along, sample a different 100 papers (e.g., from different years or journals or areas) and conclude that the OSC project cannot be replicated (potentially because they chose a different sampling frame within the population).

That was essentially the last post in the conversation (at least for now, about 30 minutes later [and again the next day, no new posts]).


What do you think? Does it make sense? I am still not sure it is right, but I do think it is worth you thinking a bit about and potentially telling me I am wrong! I think it might be right. Again, just to be sure, I want to reiterate that this does not mean the OSC study was poorly done as an estimate of the replicability of three psychology journals for 2008, but I do think it might be a bit light on data to say something about ‘the replicability of psychological science’, in total (which is really the core of the point Gilbert et al have been arguing, I would say).


Anyways, let me know what you think down below, or come yell at me on twitter like everyone else does. 😀 Thanks for coming by! 😀