Open Science

The parable of the NBA replicators

The normal players in the NBA could see the michael jordans and kobe bryants had once made 80 points in a single NBA game (20 years ago), and wanted to replicate such results for themselves. after all, this was an excellent outcome, really something to talk about, and they wanted some of it themselves.

Unfortunately, try as they might, they could not replicate that result; they could not get the 80 points in a game. No matter how hard they tried, they could only get 30 to 40 points in a game, at best. Not being able to do it in their own court, no matter how many times they tried, they traveled to other courts, and even asked many of their friends to try to get 80 points in a night, even to publicly claim that they would try to get the 80 points in their next game. They felt that if they logged their intentions of getting 80 points in a game and how they would do it, and had many people look, it would make it more likely, or cast doubt on the original 80 point performance.

Again unfortunately, even players who were the best at doing other things, those that could get 30 rebounds in a night, 35 assists, could not get the 80 points in a night.

It was dismaying to everyone around, and the average and bench players began to cast doubt upon the original performances, saying it couldn’t be done anymore, or even that the original was somehow faked.

This was very dismaying to players who had achieved such a laudable result. Unfortunately, when they tried to assert that they had actually achieved the desired 80 points per game, they were met only with scorn and suggestions that if they wanted people to believe them, that they should publicly register that they will recreate that 80 point game in the next match they played (despite that the original performance was now more than 20 years ago, with a different team, and MJ had since had 2 surgeries on his knee).

And so the average players called a march, and demanded that drastic changes in the league be made, such that the only things that were valued were those things that anyone else could be easily replicated by anybody on any court.

the end.

Let’s pretend we all agree priming old doesn’t make people walk slower, now what?

The last years there has been an inordinate amount of attention and resources dedicated to examining whether subtly priming people with old concepts can make them walk slower (nocitationneeded). It has gone on for years and taken pages and pages of our limited journal blog and feed space.

Yesterday, I was talking with Brent Donnellan and Uli Schimmack in the Psychological Methods Discussion Group and they suggested that it is important to determine whether old primes make people walk slower. (also look at the way that Donnellan entered his comment with only 1 sentence and then came back later and finished the comment after Uli and I were done talking; this is something I will have to watch out for in the future, especially as the time nuances get lost later)

Anyways, rather than again having some empty argument about replicability, regurgitating all the same old party lines, I simply agreed with them that making people think about old either doesn’t, or only in some contexts, make them walk slower. I’m ok with saying that and I think you should be too (there are at least a few failed replications, so it doesn’t ALWAYS work, even if it does).

Then the question is.. now what?

So we said that, but what did we gain? very little in my opinion.

There are almost no contexts in which this old-slow link matters, I mean, when does it matter whether this link exists or not? Never! 😀 The point of the experiment (as also said by Bargh, Chen, and Burrows) is to say something about how our thoughts and behaviors depend upon factors such as what is salient in our mind at the time.

And this I would say is fairly well established, from the exogenous emotions literature, to stereotype threat, to the IAT, all depend some sort of stimuli priming a certain behavior (which, in the case of the IAT then makes it harder for them to do the opposite).

What would it mean for (Social) Priming Theory to throw out this experiment/ paradigm?

..almost nothing as far as I can tell.

Even the original Doyen et al (2012) failure to replicate says nothing about the theory for which Bargh et al utilize the study to provide evidence for. The whole discussion is about how poor the methods are and how we need to do better. Saying the methods can be better is something I am totally ok with, and I would even suggest that this may be why it is presented in a chapter, rather than in its own paper.

The idea they are suggesting stands with or without this paper. Or does it? This is what I would like to ask and this is the point of the blog post. I see little value in endlessly debating the (un)reality of the ability for old primes to make people walk slower, unless it says something for the theory, but it doesn’t (as far as I can tell).

What part of the theory is at risk here?

What makes this study matter? So far as I can see, it says nothing novel and has relatively little value, theoretically speaking (and even those decrying the study have said little about its theoretical implications). Hold it up as an example of bad methods, that is fine with me, but we are not forwarding ourselves by saying that this effect does not replicate or exist (as far as I can tell). It has no implications (or please point them out!).

Let us define which aspect of the theory (of which this is just an example) is vulnerable and then examine the literature to see if the notion is supported elsewhere. My guess is that it will be.

  • Would we say it calls into question that unconscious stimuli can affect our later behavior? that seems..

    Would it actually matter at all? it doesn’t seem like it.

    far. Certainly I would not feel comfortable saying that there are no examples where stimuli we don’t ‘consciously’ experience change our behaviors. What about nudges, or the IAT, these are essentially demonstrations of a stimuli making a certain response more likely (which also makes it harder to do the opposite, in the case of the IAT).

  • Is it that the stimuli is social? It does not seem absurd to say that our behavior can be unconsciously changed by the people that we (expect to) interact with. After all, I suspect that (at least sometimes) you change the clothes that you will wear based upon who you expect to interact with throughout that day. Even the IAT is social and about prejudices.
  • Is it that the people are not actually there? Neither is santa or ghosts, but they change some people’s behavior! 😀

Once we know which part of the theory behind the potentially false study is being questioned, we will be on our way to making real progress in determining whether there is truth there or not.

Until then, let’s just agree to do better in the future, and we can even pretend that the prime is false, because it will have no substantial effects upon the broader theory (of which this study is only an example).

In sum

I am not so much interested as Brent or Uli in determining whether old primes make people walk slower. To me, this is not an important question, unless it says something about the theory, but nobody seems to be arguing that (social) priming doesn’t exist. So I wonder how much it actually matters. 😀 Maybe instead we should move on to something more interesting, like how we can use science to improve science or understanding why many of the female pronouns are longer than the male pronouns (except in the family setting). 

me priming you to like this post, or does it not work?! we’ll never know if we keep up like this.

Re-examining the core of science.

There are real problems facing science today. While many surface level changes have been made, here we explore a deeper transformation of discipline.

The changes suggested so far are relatively surface level (getting them using a social science website), though they are psychologically informed and thus likely to work , they don’t get to the real core of the problem (human greed), as they all come after a piece of work is published.

Here we reexamine the peer review system at it’s core, in order to see if we can design a more efficient, well functioning, system. These  decisions (about the core) are most prone to small design flaws which, over the years, will grow into bigger and bigger problems (as the current issues have).

It is imperative that we have a spirited debate about the specifics outlined below and not believe that our decisions are set in stone when we make them. Only continual maintenance of the system can ensure fidelity over time (Blanchard, & Fabrycky, 1990). 

Nosek and Bar-Anan (2012) do an excellent job outlining a general proposal and our system is designed with almost all of these suggestions at the core. This being said, we think more discussion should be had about the specific mechanism of peer review and the ways that we can use the information within a ‘social’ science website to aide review.

Specifically, the system  knows how individual’s papers do, who clicks on them, and where else they click; all of this information can be utilized in the selection of reviewers and review generally speaking. Nosek and Bar-Anan (2012) propose the following peer review mechanism:

“Instead of submitting a manuscript for review by a particular journal with a particular level of prestige, authors submit to a review service for peer review… The review service does not decide whether the article meets the “accept” standard for any particular journal. Instead, it gives the article a grade. Journals could retain their own review process, as they do now, or they could drop their internal review system and use the results of one or many review services.” (pg. 232)

Again, we would suggest the information within the system be used to facilitate review, as the system knows which authors are a part of which professional organizations and their impact factors. It knows who the authors of the paper interact with and who are the leaders in the field.

For instance, professional organizations could stipulate that in order for a paper to be considered for dissemination within that group, it has to have certain keywords and have had x number of members comment on or like it, including a few people with higher impact factors (professors or fellows in the organization). Note that the work is already public.

If the paper meets the predetermined standard, it is automatically sent to a selection of individuals from the group, who are likely to be interested (because they have interacted with similar papers). Algorithms can be set up within the group to assess the reaction to the paper (based on likes/ ratings, comments, shares).

If the paper is received well within the community, it is then sent to a larger portion of the community (similar to current social media, virality). Each new level achieved upgrades the ‘stamp of approval’ (e.g., Bronze, Silver, Gold; Nosek et al., 2012) on the paper, which can then be used as another metric beyond the impact factor.

One further addition we would like to add to Nosek’s proposal is the ability for, at the end of the year (or decade), extra badges to be given for the top ten (or top 100) papers published in a particular (sub)domain. These collections could be put together for any aspect of the paper, could be printed, and provide something to aim for in the creation of content (besides high impact).

This is certainly not exactly what the system will look like in it’s final iteration, but no matter what, if there is an online utility involved, we should be using the information within that system to maximize utility and inform the review process. 

Do you agree? Is there anything you would add or change? Leave it below! 😀

Don’t forget to find us on facebook.

Citations can be seen here.

A new scientific communication system, in brief

The goal is to take the best existing suggestions that are in the literature (Nosek et al., 2012; Priem, et al., 2010; Giner-Sorolla, 2012; Frey, 2005; Skinner, 1976; Deci 1971; Legris, et al., 2003; Thaler et al., 2008) and design one system that is coherent, easy to use, and conducive to good science (because it makes sense psychologically). If this is your first article in the series, read more about the motivation for change or the motivating principles behind this system at the links.

In one sentence, we are looking to design a Facebook for scientists or a Reddit for research; a profile, a feed of stories, a sophisticated like/ comment section, and a new set of impact metrics which makes use of the available information, allowing us to realign individual and group motives. 

The basic unit of the system is the academic profile; here researchers post papers, drafts of papers, datasets, syntax files or other content which can be viewed, liked, and commented on by the scientific community at large. This and content from others (collaborators & professional groups) can be viewed in the profile feed, keeping scientists up to date on all the latest advancements in their field(s).

The fundamental reinforcements are the notifications which we receive when others interact with our content and the quality content provided in the newsfeed. These are the exact same things that make Facebook, Twitter and Instagram so phenomenally successful yet they have not been integrated into any science platform that I know of. The feed is crucial in getting scientists the best content (thus, increasing utility) and the notifications makes us feel good (again, increasing utility). The key of the whole thing is low effort with high utility.

The comment section is the place to discuss a paper and how it relates to the other literature, this is where the debate about the paper’s conclusions and assumptions takes place. This is also where the people who checked the stats, or re-ran the analyses, or replicated the results comment with their findings. This discussion is common in topical Facebook groups, though these communications could be utilized much more efficiently. Like other online outlets (e.g. research gate, PPPR), comments can be viewed by impact (which is most liked/ sub commented), date, or other metrics.

The utility comes from the information within the ‘social’ system.

Not only does the computer use information about your clicks and comments to place things in the feed (bringing you articles you are interested in; Bian, Liu, Zhou, Agichtein, & Zha, 2009), the information can also be used to determine which individuals find or create quality content (e.g., content that generates likes, downloads, and further discussion).

The system knows who uploads content that gets many return comments or shares. The system knows who makes comments on other’s work that receive many likes and response comments. The system knows which thinkers are in which fields, how often papers are linked together, which papers generate discussion, which syntax protocols or data files are often downloaded, which authors, or keywords are trending and much more.

This information can be used for many things; for instance, to develop ‘network maps’ of the research space, which could help theory development, newcomers to the field, or science researchers generally.  These maps could also be constructed for authors, papers, keywords, or many other pieces of the science game.

More importantly, this information can be used to reward pro-group behaviors that are currently undervalued and thus not done. The current impact metric is how often the author’s paper gets cited; this could easily be nuanced to include, clicks, comments, shares, downloads, and formal citations on all content including uploaded syntax files, datasets, or comments on other’s work that elicit more discussion (Altmetrics; Priem, Taraborelli, Groth, & Neylon, 2010).

For instance, let’s say someone uploads an interesting, easy to use, dataset into the system (e.g., book reviews for 5,000 goodreads profiles, 1,000 bank employee satisfaction surveys), if many people like, share, download, and use the dataset, it could be made to help their impact metric as much as (or more than!) a just average paper. If one reports that the statistics make sense with the data or report statistical errors in other’s papers, that could be made to help their impact factor, if one consistently makes interesting comments on others work, that could be made to help their impact factor (endorsing good reviews). In this way, these behaviors become relevant and interesting for the individual to do and the group benefits enormously.

These steps not only alleviate the need to cheat (by broadening the reward for things others than perfect, novel, papers), they make it much harder to do so. For instance, under this system, it is reasonable to imagine scientists making their entire careers putting their badge of approval that the statistics and data in a paper are valid, because the community comes to trust these individuals and their posts are appreciated with likes and sub comments.

The same principle holds for making interesting datasets available, designing useful protocols or holding virtual office hours/ discussion sessions on the profiles (like a Google hangout). It becomes worthwhile to upload the data and syntax not because you need to in order to publish, but because it helps your ‘impact factor’ when individuals download these documents. Suddenly, doing the good thing by the group is not utility negative and it will stick.

The system outlined so far could be implemented without changing the fundamental peer review system. The proposed changes will improve the system by encouraging (through ease and likes/ comments) open practices and endorsing group centered behavior, but only adding this to the current system does not adequately deal with the need for competition (as this is only after it has been published), the time papers spend sitting on desks, or the excess cost of the current system (Rennie, 1999; Edlin, & Rubinfeld, 2004). It is time to re-examine the core of the system in light of the advances afforded by the system.

Citations can be read here.

What do you think, could a system like this work? What would it take to bring it about? Leave a comment below! 😀