Author: Brett Buttliere

Find a significant effect in any study

Too much yall :,D too much


Stephen Politzer-Ahles is Assistant Professor at the Department of Chinese and Bilingual Studies of The Hong Kong Polytechnic University. He is committed to finding solutions to current challenges in the cognitive sciences. For instance, he is developing efficient and transparent strategies to empty out his own file drawer.

p>.05. We’ve all been there. Who among us hasn’t had a student crying in our office over an experiment that failed to show a significant effect? Who among us hasn’t been that student?

Statistical nonsignificance is one of the most serious challenges facing science. When experiments aren’t p<.05, they can’t be published (because the results aren’t real), people can’t graduate, no one can get university funding to party it up at that conference in that scenic location, and in general the whole enterprise falls apart. The amount of taxpayer dollars that have been wasted on p>.05 experiments is frankly…

View original post 380 more words

The parable of the NBA replicators

The normal players in the NBA could see the michael jordans and kobe bryants had once made 80 points in a single NBA game (20 years ago), and wanted to replicate such results for themselves. after all, this was an excellent outcome, really something to talk about, and they wanted some of it themselves.

Unfortunately, try as they might, they could not replicate that result; they could not get the 80 points in a game. No matter how hard they tried, they could only get 30 to 40 points in a game, at best. Not being able to do it in their own court, no matter how many times they tried, they traveled to other courts, and even asked many of their friends to try to get 80 points in a night, even to publicly claim that they would try to get the 80 points in their next game. They felt that if they logged their intentions of getting 80 points in a game and how they would do it, and had many people look, it would make it more likely, or cast doubt on the original 80 point performance.

Again unfortunately, even players who were the best at doing other things, those that could get 30 rebounds in a night, 35 assists, could not get the 80 points in a night.

It was dismaying to everyone around, and the average and bench players began to cast doubt upon the original performances, saying it couldn’t be done anymore, or even that the original was somehow faked.

This was very dismaying to players who had achieved such a laudable result. Unfortunately, when they tried to assert that they had actually achieved the desired 80 points per game, they were met only with scorn and suggestions that if they wanted people to believe them, that they should publicly register that they will recreate that 80 point game in the next match they played (despite that the original performance was now more than 20 years ago, with a different team, and MJ had since had 2 surgeries on his knee).

And so the average players called a march, and demanded that drastic changes in the league be made, such that the only things that were valued were those things that anyone else could be easily replicated by anybody on any court.

the end.

Identifying Impact

is the title of a game we are developing, examining whether people can tell how much impact a scientific paper has based upon its title and abstract.


The idea is to test how well people can identify scientific impact by the titles alone. Not only can we correlate the people scores with actual impact metrics, we can examine which metrics are the best indicators of what the people think is best or most interesting.

More than this, we can examine how much of the eventual impact of the title can be predicted utilizing e.g., expert or layperson perceptions about the value of the work from the title alone.

Finally, the study opens the ability to examine whether person differences affect e.g., what people find interesting, or how well the scores are related with commonly utilized impact metrics.


More generally, it will be made available so that other researchers can utilize the template to run their own studies that are similar.




Evolution in action – super resistant bacteria


A Neat little video about how bacteria evolve, but also want to point out how they are basically creating super mutant bacteria in like 11 days, yes?


Also watch from about 1:19, where it is the shortest route from the original mutation to the first one who breaks into the 10 and 100 times antibotic. and notice how the ones who first broke really into the 1000 times all stuck together really well.


Context, Theory, and Expertise


Had a very good conversation here, and I think I explained the problem with saying that (even many) perfectly failed direct replications mean that the original was probably and/or that the theory needs to be reevaluated quite well. They are, essentially: Context, Theory, and Expertise.


This is just the same as in the picture, just a little more worked out:

  1. is again context stuff. You can use Literally exactly same material but mean different things. Having someone make the exact same hand gesture across cultures can mean different things.
  2. is the difference between a specific manipulation and the theory. Strack study not replicating does not mean that smiling when we are unhappy will not make us happier (even if it is just placebo or experimenter effects in world). It is clear, I think, that even if holding a pen in your mouth doesn’t make you happier or cartoons funnier, that doesn’t mean that every instance of facial feedback doesn’t exist, or even more generally that there  is no such thing as facial feedback.
  3. is that even kobe bryant and michael jordan could not hit 80 points on demand, and certainly many other people can not even do it once! 😀 even though we have video of them doing it and so it doubtlessly happened. MAybe the team was different, maybe they didn’t feel as good. Sure, we can agree it is not generalizable then, but it also doesn’t mean it didn’t happen or that it is no longer evidence for the theory. If you don’t believe in expertise in psychology, you should my last blog post, which was exactly on that subject (tldr; its a real thing).


and I find those three reasons convincing enough to at least not dismiss the original study and theory outright. Especially number 2 seems strong to me, what do you think?








On Expertise

What do you think about Expertise (within the field of Social Psychology)? From recent discussions it would seem that Any (social) psychologist should be able to ‘reproduce’ Any study withing (social) psychology, and I am not sure that this is really the case. It IS a simple fact of the matter that people differ in how good they are at designing experiments/ experimental protocols/ and the things that the participants see

Look at it this way: Some psychologists can’t even get participants to take their experiments; let alone really pay attention or care. that is a real thing. How can you expect to reproduce something if you can’t get participants to even take your experiments?

This point is especially driven home from my own research (and also many other people’s research). 1 Small effects are changed by small things. The wording of a single sentence, even a single word (!!!!), can seriously influence the outcomes of a study. How researchers treat their participants, from the first instance, in the consent form or the contact or the first email contact, conveys to them how much you care about their time, what they think the study is about, and ultimately affects how much they care about the study (and these things are likely to affect the results you get.).



THE THING IS, (from a conversation I had tonight with Levordashka) would you expect a chemist to be able to pick up any chemistry study in their field and be able to replicate it (on their first try.)? The difference is, I guess, that their studies take mixing a little baking soda and etc and ours require 100 people spending 20 minutes each, just to collect the data. 😀 3

No, We would not expect this, at least I would not. Even a specialist in the field, I would guess, needs more than a single try, even more than two or three tries, to get it right. Again, think of it this way: When you were in highschool, did you get the chemistry experiments right each time? MAybe you were better than me at chemistry, but I did good enough to get here, and I Definitely didn’t do it right each time. And those were, basically, the simplest experiments possible, explained in the simplest ways possible, with a bunch of other people around you trying to do the exact same thing at the same time. I mean,.. that is a serious thing! 😀

Do I think that psychologists can (should?) do a better job of reporting their studies and being more honest about what is involved, and even sharing the exact materials they used? Yes, absolutely, it is even better for me as an individual researcher (if the effects replicate better and I am actually interested in doing real science). But to say that just because someone couldn’t replicate an effect, that that effect doesn’t exist, or even that it isn’t robust, is too far in my opinion. I’ve gotten a solid effect 5 times and if you tried to tell me that you couldn’t get it, I would absolutely tell you it is probably your fault. Really. 😀  When I do it, it works. But my work is not important enough to replicate (and I don’t have enough of it, yet :,).


it IS a fact that some people are better at designing experiments (or coming up with ideas for experiments, or writing, or etc) than other people and that is ok for me. These things are why we collaborate, exactly so we don’t have to be great at every aspect. I am not sure how to solve the problem of expertise (or how it affects replication more generally), but I definitely think that to dismiss expertise, especially the expertise of some of the older generation in the field, who have been studying psychology for 50 or 60 years, would be foolish at best. Maybe the expertise is not in designing  or doing high powered experiments, but I could definitely see how that experience might influence participants to care more, or definitely in designing the experiments (in the case of explicit conceptual replications, or trying to directly replicate studies without original materials, consent form, approaching them in the same way, etc).  These things matter, I think we know this (e.g., advertising).

In any case, I have to go, I am tired, and I want to go before I run out of things to say. 😀 Thank you for coming by and I do wonder what you have to say about the things I’ve said! BEst, Brett.



1: We have a quite replicable difference between two conditions, and we have even shown that we can ‘switch’ the person’s behavior by putting one condition after the other. In our last study we made another tiny change, namely pushing it even further by putting the two conditions at the same time. Both effects work, so why not? But it didn’t work. I am pretty sure it is because the participants figured it out (it reveals quite plainly a kind of stupid thing people do). The effect is significant in the first trial but none of the others or all together. This is a ‘conceptual replication’ that failed, even with ‘expertise’ (at creating the original effect).

2: Footnotes are actually pretty useful for sidenotes, so ya, thanks Simine 😀

3: Psychology is harder than Chemistry? 😀 just teasing! but I think so. (thats why I do it?) 😀


Let me know what you think! 😀 On Twitter (@brettbuttliere) or here, or where ever. Thanks again! 🙂

The truth behind the Gilbert et al critique

There is much discussion going on concerning recent critiques and critiques and critiques of critiques of critiques (which I love and find very exciting, like most people!), but amongst all the fighting and such, I want to point out at least one bit of truth that exists in the Gilbert et al retreat.

As they make clear in their reply to the reply (referred to as critique by those trying to cause drama), the main concern of Gilbert et al is that the RP:P project cannot report on the replicability of Psychological Science, in total (as the title of the paper suggests). They suggest (as I will explain below) that this is an over generalization which would not be allowed in a normal, journal, let alone in Science with all of the etc that happened.

Taking away that each participant cost thousands of dollars to collect data on, at its core, this is a (meta)study with a single measurement and 100 ‘participants.’ Ok, that is great and fine, but the conclusion from the has been generalized to the entire population (of studies in psychological science). To quote Capture1from this social media post where the idea originally came to me, It is essentially like an author having 100 students
from three elementary schools, examining how many of them have blue eyes (or any other
specific characteristic that can be categorized dicotomously) and then writing a paper titled ‘estimating the prevalence of blue eyes among the world’s population’. I mean, the evidence just cannot uphold the claim being made from them. That doesn’t mean the study itself is bad, just that it is being used to say more than the evidence can support, which is a very normal thing among human scientists.

Actually though, this blue eye comparison is not even actually correct, because the dichotomous trait being measured is not really dichotomous and would need to be able to change according to the area of the world it is in (i.e., changes in replicability across area) or even by how the anchors of the scale on the questionnaire are scaled (i.e., tiny situational variables can strongly affect the results; at least most psychologists believe so).

Does that make sense? The conclusion of the study was that only 40 something percent of *psychological studies* are replicable (the title of the paper was ‘estimating the replicability of psychological science’), but the studies that were replicated in the OSC paper came from a single year of three (of the leading) journals in psychology.

In light of this, I (and I think Gilbert et al) would suggest Capture
that a far more appropriate generalization (if we choose to generalize; we could simply describe like  researchers studying drug abuse in a city would) from the data would be something like: only 40 something percent of psychological studies from these three journals, in the year 2008, are replicable. And that is IF we choose to go beyond simply describing what we found (similar to researchers studying drug abuse in a city would do).

More than this, one could even suggest that the sample the OSC utilized is one which would have especially Low replication rates, exactly because they are the leading journals in the field (i.e., Psychological Science, Journal of Personality and Social Psychology and Journal of Experimental Psychology: Learning, Memory, and Cognition). These are the ones that people are most likely to cheat to get into, they are the flashiest, the most controversial, uncertain and exciting (there is even a citation I saw once to say that higher JIF papers have lower power). The fact that they found differences across the journals indicates that the sample matters.

So, to take it to the extreme (if we are studying cocaine use in a city or the entire population), the OSC team went into the ‘ghetto’, took a sample of 100 people for cocaine (our dichotomous variable), found the %, and then extrapolated this to say something about the % of people in the world (of psychological studies) that use cocaine. Ok, I know that is wayyyyy to far, but it will give the snarkers something to snark about and maybe get some extra shares. 😀 and actually it gets the point across nicely.

But really, nobody (here at least) is saying that the study was poorly run or otherwise bad, assuming they are only trying to say something about the specific corner or neighborhood that they sampled from (the corner where it is most likely to happen). It just can’t support the claim it was suggested to support. Certainly I am not saying this was done with malicious intent, or whatever bad thing you will want to say I said (let me be explicit). The people I know know that I respect them, even if I criticize them and their science (the way they are critizing others’ science?). Capture.PNG

The thing is that we are all human, 
and this is the real problem; we have these biases and tendencies, that especially come out when we are in large groups.

Especially psychological scientists (most of the OSC researchers were psychologists) have a tendency to overgeneralize their research findings, sometimes inappropriately. [to quote again from the social media conversation] This overgeneralizing is a problem, I think most psychologists agree, and one that the OS community has been working to demonstrate (e.g., by showing that these effects are more fragile than we thought). The thing is that now in another 10 years a different  group of researchers will come along, sample a different 100 papers (e.g., from different years or journals or areas) and conclude that the OSC project cannot be replicated (potentially because they chose a different sampling frame within the population).

That was essentially the last post in the conversation (at least for now, about 30 minutes later [and again the next day, no new posts]).


What do you think? Does it make sense? I am still not sure it is right, but I do think it is worth you thinking a bit about and potentially telling me I am wrong! I think it might be right. Again, just to be sure, I want to reiterate that this does not mean the OSC study was poorly done as an estimate of the replicability of three psychology journals for 2008, but I do think it might be a bit light on data to say something about ‘the replicability of psychological science’, in total (which is really the core of the point Gilbert et al have been arguing, I would say).


Anyways, let me know what you think down below, or come yell at me on twitter like everyone else does. 😀 Thanks for coming by! 😀

When do replications become wasteful?

This writing came about when Dalmeet Singh Chawla emailed me concerning a tweet about the potential slowing of science by engaging in too many replications and not discovering enough novelty.It eventually led to this article but as most of it didn’t get it (only the main quote), I will link the rest of it here.

The main thing to take away from all this is that we actually do know some things about people,  and replicating those things, or re-branding them as new, occurs way too often and substantially hinders (in my own opinion, at least) moving forward as a field. From the classic ‘confirmation bias’ to modern day ‘self esteem’ effects, biases, and processes have a dozen names, each with their own set of studies supporting essentially the same conclusion or underlying construct.

Yes, doing novel and revolutionary work is really difficult, but it is far too often that I go to a conference to see a prediction that most psychologists, if not most of the public would make (e.g., being excluded makes people feel bad, people take the path of least resistance). The focus is now on making sure what we believe is actually true, with many ‘high powered’ replication attempts and large scale efforts, but what are the opportunity costs?


It is clear that replications become wasteful as some point, the question is when, and this is what I would like to talk about and what I wanted to talk about when I posted the target tweet.


Without further ado, his questions and my slightly edited/ added to answers (I only had a very brief time in which to answer him!).

  1. Could you please explain to me what prompted you to post this tweet?

Put simply, there is a reason the departmental colloquia are more or less mandatory and that people are generally paying only mediocre attention at conferences..  and it is not because the work is extremely interesting!

Yes, interests differ, but I don’t think I would believe you if you said that you have never gone to a (potentially cross disciplinary) talk that demonstrated something that is already well established within your own field. Or demonstrated something that you would consider to be common knowledge and even slightly boring. It might have been done in a different population, with a different manipulation, even with a different name, but it was essentially something that was shown 50 or 100 years ago. I would say we’ve all been there, many times!

But this may not be because of a lack of work on the researcher’s part, o
ften times different fields (e.g., sociology, psychology, communications, advertising, group theorists, basically everyone) contain literature supporting common underlying ideas from different angles. A good example here is social capital and equity theory, which essentially indicate the amount of value they can extract from their network. Each area have hundreds of studies that support the notions, but each area is a silo of its own, rarely interacting with each other.

This is a major problem in my opinion, because we are essentially spending money demonstrating the same phenomenon over and over. and I would suggest ‘rediscovering’ these things over and over or arguing the minute details becomes a waste of taxpayer money at some point.

  1. What exactly is the ‘stasis’ problem that Dorothy Bishop describes in her tweet?

Essentially stasis when a field is not making so much progress or actually new knowledge, I would say. Philosophers of science talk about the best science as being conflicting or ‘creating waves’ (e.g., Galileo, Darwin) and this is basically a lack of that (more on this here and here).

There are, actually, more and less efficient ways of doing science; it is obvious that not publishing when something doesn’t work is bad, but so is doing the same thing over and over again without learning something new. It is important to learn new (real!) things.

  1. How does psychology face this problem and how do you think it can be resolved?

The concern now in psychology is that nothing is replicable and that we have basically constructed a house of cards, but this is simply not true. We Do know treplicationhings about people (e.g., they seek value, they gather and utilize meaning to do so). Those two statements alone explain… a LOT of psychology. And honestly, most of the nonreplicable studies actively sought to demonstrate how the tiniest thing could make such a massive impact.

Priming is actually a great example. Essentially the idea is that small changes in the environment can lead to large (behavioral) differences. It is in some sense all of advertising.

But the thing is that it is that these studies are actually designed to demonstrate how such a fragile little change can have such behavior changes. The studies are designed to be fragile, as fragile as possible! 😀 Things that shouldn’t matter, actually matter, this is what they show.

Now, what is at issue is whether reminding people of being old makes them walk slower. Is that important? I would most respectfully say not really. Would we want to challenge priming in general? I think this would be a poor decision, but maybe. We can easily spend half a million dollars, or 100 million, of taxpayer money to determine whether or not old primes make people walk slow, but that seems like a waste to me! 😀

The way to solve it, I would suggest, would be to first to utilize more social media for scientists. Just knowing what is going on, connecting with those in your field and sharing knowledge amongst the tweeps has a huge benefit in my own experience. More than that, I would suggest (as I did in this paper and a few other places) that we create a ‘special’ site for this. It would be a (functional and valuable!) social network for scientists that is designed in such a way, and has the tools, to aid the discovery and exchange of scientific knowledge. There are many great efforts, but this actually hurts the overall initiative as it splits the majority and thus value. Creating, remembering the passwords for, learning the how to, and building a network up on 5 different sites is Much harder than doing it for one.

  1. I thought the trouble in psychology is lack of replication/reproducibility, no?

This is a problem, and this is what the focus is on, but it is also a problem  to just keep redoing the same thing over and over again without learning anything new.

  1. What do you think is the answer to you the question i.e. is there a standard number of times a study needs to be replicated before it is accepted by a field?

No standard unfortunately, it is up to the individual scientists to choose good beliefs, and this is part of what makes a good scientists, I guess. – added after: we could do some sort of polling to determine the number of scientists who believe in a certain theoretical position or not, but this doesn’t mean the crowd is right. If we asked scientists if Galileo was right, we would conclude not! 😀

  1. Does it vary between different disciplines?

Definitely varies by discipline, again it is up to the individual scientists 🙂  – added after: one Super study in Physics should change the entire world’s mind, but in psychology it is not so easy as this, simply because what we study is more complex. Some things are the same (these are the things I study!) but many things are not, and we don’t Really know yet what changes and what doesn’t.





That’s all I will say for now! The resulting article is here but it doesn’t contain most of these ideas, which is why I put them here. 😀

Let me know what you think below or elsewhere!