Two Half-Remembered Replications
First Author’s Anonymous.
Recently I led a scientific methods workshop. The thesis of the workshop was that by getting hands on experience reading and then immediately replicating simple psychology studies participants will have a tangible experience to connect science to. This way, when they encounter studies in their day to day life they will be better able to empathize with the researchers who ran the study and therefore be better able to understand science as a whole!
As an added bonus, any participants who want to be scientists or researchers will gain some simple experience. After the workshop I asked the participants’ permission to publish the results of their experiments on my blog, they said "yes." Some of the participants were children so I didn’t publish anyone’s name.
Study 1
Abstract
We found that giving people an anchor number, by asking a question such as “is mount everest taller or shorter than 2000ft?” heavily influences their answer to an estimation question such as “what is your best estimate for the height of mount everest?”
Introduction
Study one was a replication of the first experiment in Measures of Anchoring in Estimation Tasks by Jacowitz & Kahneman’s and was a study that I helped to run. At the end of the conference I had successfully memorized the results of this study, but I gave away the paper with our results to another experimenter. So even though I remember replicating the effect so strongly that I didn't feel a need to run any statistical tests to confirm the result, I don't remember what any of the numbers were and I forgot to take a photo.
Original Study
Measures of Anchoring in Estimation Tasks was a study about quantifying the efficacy of various anchor numbers. An example of an anchor number is as follows: by asking someone if the height of the tallest redwood is greater or less than 65 feet you anchor them to the number 65 and they are more likely to estimate the height of tall redwoods as closer to 65 feet. In this example 65 is an anchor number. This is a concept that comes up very often in sales or negotiations.
Measures of Anchoring gathered independent data on various anchor numbers for various questions and used that data to show off a novel statistical test for anchoring. We did not engage with the statistical test and only replicated the anchoring effect in estimation tasks.
Jacowitz & Kahneman defined an estimation task as answering the question “what is your best estimate for ________?” They used the question structure “Is ______ greater or less than [anchor number]?” to prime participants with an anchor number prior to asking them to complete estimation tasks.
Methods
We used four of the fifteen questions developed by Jacowitz & Kahneman. We also used the high and low anchors decided upon by Jacowitz & Kahneman.1
This meant we had a total of four anchor questions, each with two variants:
Is the length of the Mississippi River greater or less than [70/2,000] miles?
Is height of Mount Everest greater or less than [2,000/45,500] feet?
Is the height of the tallest redwood greater or less than [65/550] feet?
Is the maximum speed of a housecat greater or less than [7/30] miles per hour?
Each anchor question was followed by the estimation question: “What is your best estimate of _________” for example: “What is your best estimate of the average speed of a housecat?”
We had five researchers and split into two groups. Group one used the high anchor for questions one and three, and the low anchor for questions two and four. Group two used the low anchor for questions one and three, and the high anchor for questions two and four. Both groups asked the questions in the same order.
Each researcher surveyed other conference attendees on their own and then combined responses with their groupmates. Group one got 9 responses to all questions they asked and group two got 12 responses to all questions they asked.
Results
People who received the high anchor number gave significantly higher median answers in their estimations than people who received the low anchor number.2
Study 2
Abstract
Study 2 found that asking people “How could _________ be better or worse?” yielded a positive change (eg. Netflix could load faster) in 29 questions, and a negative response in 5 questions.
Introduction
Study Two was a replication of Things Could Be Better by Adam Mastroianni and Ethan Ludwin-Peery.
I did not help replicate this study because the group replicating Measures of Anchoring in Estimation Tasks needed help understanding the language the paper was written in. In contrast the group replicating Things Could Be Better started their own replication within 15 minutes of being handed the paper and did not have any followup questions for me before they began the replication.
Most of my memory of this study comes from a 7 minute voice memo of me interviewing one of the replicators at the conference the evening after the study was run. I also remembered some other things that I will talk about in the discussion.
Original Study
Things Could Be Better was a study that was originally conducted to figure out the origin of many human judgements. “...if you can easily imagine something being better, then it must not be very good.” Conversely “phones seem great because it’s hard to imagine huge improvements—even if their batteries lasted much longer, for instance, it wouldn’t change phones from terrible to amazing.” This idea proved to be false.
Instead Ethan and Adam found that almost universally, when you ask humans to imagine how something can be different they imagine how it can be better.
Ethan and Adam used the websites Amazon Mechanical Turk and Prolific to pay people to take their studies. They first gathered a list of 52 things that could be different from these internet people. Then they asked a second group of internet people lots of versions of the question “how could ______ be different?” Finally they asked this second group of internet people to rate their own different thing from -3 (a lot worse) to +3 (a lot better). On average people rated every single thing as better.
This effect replicated across cultures, language, and scores on some personality and mental health screening tests. Imagining how things can be better may be a universal bias in human imagination.
Methods
The experimenters at the conference replicated six of the questions from study 1 of Things Could Be Better. These questions were all structured the same “how could ______ be better or worse?” The six questions were as follows:
How could pets be better or worse?
How could life be better or worse?
How could sleep be better or worse?
How could nature be better or worse?
How could youtube or streaming services be better or worse?
How could “an elusive sixth question that [the experimenter’s] forgot” be better or worse?
Participants could skip questions if they wanted to, because of this even though 6 questions times 7 participants is 42 responses there were only 35 responses.
The following is an excerpt from my voice memo interview with one of the experimenters:
Experimenter: “So we would go up to the closest person that we saw and we would ask them ’how could ______ be better or worse?’ And I think that one or two of them we asked [whether things could be] worse or better.”
Me: “So, when you asked someone [how could streaming services be better or worse] typically you would get a positive response?”
Experimenter: “Yeah”
Me: “Do you have an example of that?”
Experimenter: “I remember that somebody said that Netflix could have actually good things to watch.”
Me: “How did you evaluate whether that person was saying something that was better, or worse, or about the same?”
Experimenter: “Completely subjectively. We forgot the part where you ask them to interpret their own response."
In Things Could Be Better Adam and Ethan asked participants to rate their response to “how could things be [different/better or worse/worse or better]?” so that they would not be inclined to judge results in a particular way. The experimenters in this experiment did not do this. Instead they deliberated on each answer and decided whether it was better, worse, or about the same.
Experimenter: “I think [this decision] definitely could’ve affected [the results]... Two or three of the answers that we got [were] very ambiguous and there was disagreement among us about whether or not the person who answered the question would’ve interpreted it as a positive change or not.”
Results
The experimenters independently and “completely subjectively” judged participants' answers to the questions “how could _____ be better or worse?” or “how could _____ be worse or better?” and found that, in 29 of 35 responses participants' answers described how Things Could Be Better, while in 5 of 35 responses participants’ answers described how things could be worse. One participant said “pets are already perfect.” This result was judged to be neither better or worse because it is not comparative.
This is consistent with the findings of Ethan Ludwin-Peery and Adam Mastroianni in their study Things Could Be Better.
Discussion
This experiment is the first (and to my knowledge only) replication of Things Could Be Better and it is heartening to see it replicate successfully even in an incredibly different context. Most of Things Could Be Better’s response data came from automated turk surveys, meanwhile ours came from in-person surveys carried out by four researchers at once! This is good news!
It was also exciting to see how interested in Ethan and Adam’s research most conference goers were! People were sharing the ideas discussed in it up until the end of the conference and many people—experimenters, experimentees, and otherwise—were busy making up their own theories to explain why things being better was a potentially universal bias in human imagination.
One experimenter said “when asking people whether things could be better or worse people only seemed to talk about a way things could be worse when they didn’t know a way things could be better.” While this is anecdotal and I heard no other experimenters independently express the same sentiment, it is worth keeping an eye on especially because Study 8 from Things Could Be Better might’ve shown that it is harder to imagine how things can be better than to imagine how things can be worse or about the same.
Both this study and Ethan and Adam’s study assumed that people expressing the ways in which something could be better was a proxy for people imagining how things could be better. Or at least that imagining how things could be better is functionally indistinguishable from expressing how things can be better3. I believe this question of external expression as opposed to internal imagination becomes hyper-relevant in social situations because humans often perform how they expect others to expect them to perform [citation needed].
I bring this up because the nature of a survey (whether digital or in-person) may create some kind of expectancy effect. And even though Ethan and Adam controlled for whether this hypothetical expectancy effect was primed by specific phrasing of their question–“how could things be [better or worse/worse or better/different]?”–, if it is true that it is harder to imagine how things could be better than to imagine how things could be worse or about the same there must be something causing people to express/imagine ways things could be better. This is the kind of thing an expectancy effect could explain, however I have no idea what people would be expecting if there is an expectancy effect!
I am excited to run some other variations of Study 8 to try and determine if it is truly harder to think of things that are better than it is to think of things that are worse or about the same. This may be particularly illuminating to behavioralist descriptions of humans since most behavioralist logic assumes that creatures which exhibit behavior tend towards the easiest path, or the path of least resistance. Which means we would expect humans to express how things could be better less often than how things could be worse or about the same, unless there was some kind of evolutionary or social pressure to express how things could be better.
If it is truly universal that humans are trying to express/imagine ways for things to be better—albeit with vastly different paradigms for what betterness entails—the only immediate conclusion I can draw is that teaching improv can’t be that hard since everyone is basically saying “yes and” to each other all the time.
Jacowitz & Kahneman developed these anchor numbers by asking a group of 53 people the estimation question without prompting them with any anchored numbers. They then plotted the answers and used the 15th percentile answer as the low anchor and the 85th percentile answer as the high anchor.
One kid told me that Mount Everest was one million feet tall, which was part of my reasoning for using the median answer instead of the mean answer.
Which may be true especially if imagination is primarily based on entertaining expressed ideas.
For example, if I imagine how a mountain could be different and I imagine that it is taller (which I think is better) and then I keep iterating from there–maybe there’s snow on the top, maybe there’s a dragon who lives in a cave halfway up, maybe the dragon has a horde of magic gold–I can only base my image of how the mountain can be different on my previous images of the mountain. So if I am constantly ignoring ideas for how the mountain can be worse in favor of how it can be better, and therefore only expressing ways the mountain can be better, I am, functionally only allowing myself to imagine ways that the mountain can be better.
I wonder if, at least for me, this has anyhting to do with an aversion to boredom or maybe frustration.


