Empirical Question: A Depletion Crisis Critique

Plus a Paisley Work Outfit--Wednesday, 3/9/16

This post about work to weekend dresses featured a floral dress that reminded me of my cream/black paisley skirt that I need to wear for the Work the Wardrobe Challenge.

From bridgetteraes.com

In keeping with the weather, a high of 42F (and the fact that this skirt is short enough that I prefer wearing it to work with tights), I took a spin on the first look with a black blazer and the color red, which I used in the form of a cashmere sweater.

Cream/black paisley skirt (thrifted, Loft), $1.67/wear+
Red cashmere pullover sweater (Macy's), $8.00/wear
Black velvet blazer (thrifted, Talbots), $0.63/wear
Black leggings
Tall black boots by Fitzwell, $2.67/wear
Black burberry plaid scarf (Sheinside/gift)

Outfit total: $12.97

I went off script by adding a plaid scarf but print mixing is an addiction I can't control. And I can justify it by noting that it's still nippy in the mornings. I mostly just wore the scarf outside to and from work, but it was cool in our conference room so I was especially glad of the blazer during our 3 hour meeting.

In other news...Tam and Robert both sent me this article [link fixed!] from Slate about a soon-to-be-published, large-sample study of ego depletion that gets a null result (i.e., shows no effect of depletion) and possibly invalidates the concept. I will reserve judgment until I see the actual paper, but will say this--the methodology as described in the article does not seem like a good ego depletion task. Here's what it says:

Subjects watched as simple words flashed on a screen: level, trouble, plastic, business, and so on. They were asked to hit a key if the word contained the letter e, but only if it was not within two spaces of another vowel (i.e., they had to hit the key for trouble but withhold their button-press for level and business).

Um, I kind of hate to break it to the authors but I wouldn't expect that task to be depleting. The task used by previous researchers (yep, including me) involves first having your participants cross out (physically) every instance of the letter e in a long, boring piece of text (I drew mine from a statistics textbook). They do this for 5 minutes...until it becomes very ingrained and habitual (it's a very easy habit to pick up). Then you switch up the task and ask them to cross out every instance of the letter e except when another vowel follows the e in the same word or when the vowel is one letter removed from the e in either direction (e.g, in the word "vowel") for another 5 minutes.

(For the record, in my research, I did the e task described above and then, for an extra dose of depletion, had people spend 3 minutes freely thinking and writing about whatever comes to mind except a white bear because suppressing a dominant response is also depleting.)

My problem with their version of the e task is that it does not create the conditions that are depleting in the original version, i.e., the inhibition of a habitual response. Rather, participants are merely doing a judgment task, and not one that I would expect to be very cognitively challenging either. (I mean, yes, it's more difficult than pressing a button for every word with an e in it, but during the task, that simple rule would itself become pretty habitual, I think. And really, if that kind of basic "please use your brains just a little bit now" task wiped people's regulatory resources in a few minutes, the world would be in even bigger trouble than it is.)

So yeah, I really need to read the actual paper to understand exactly what their participants did because exactly what participants do matters a lot. I think this is a point that journalists seem not to understand, and something even researchers (especially when they are working in an area where they don't have a lot of theoretical depth) get wrong. There is nothing magical about e's or white bears or chocolate chip cookies in creating a depletion effect. These are merely operationalizations of underlying ideas--that the act of inhibiting habitual or dominant responses (for example) is depleting. This means that there are a zillion very different manipulations you can do in a study to cause the effect. But it also means that a subtle change to a "proven" manipulation can be fatal to your study, if the little change you make means that your task is no longer an operationalization of the underlying idea. Or that your manipulation is no longer strong enough to deplete people enough for there to be a performance deficit on the second task.

Which brings me to this--there's also the question of what the second task in the study is, the one in which depleted participants should be expected to do worse, and whether it is sensitive enough to depletion effects to give a robust result.

If the depletion task is truly as described in the article--just a judgment task about the presence of e's in familiar words--then I do not agree that "the study clearly shows that the effect is not as sturdy as it seemed." Actually, it's possible that the researchers have shown that not every use of cognitive effort in very moderate amounts leads to ego depletion (which most people in the field would tell you).

This said, I do believe that there are some serious lacunae in the ego depletion research conducted to date. One of them is this idea that tasks are "depleting" or "not depleting." Indeed, one of my major papers in grad school was a research proposal that attempted to rectify the "lack of basic description of the depletion phenomenon" in terms of the duration-performance relationship--e.g. how self-control exertion over time influences performance and how varying the duration of the initial self-control task across a range of levels influences performance. Is there a threshold effect, a warm-up effect, an adaptation effect? Because most researchers care about "how can I get my participants into the desired state," we don't really know a lot about how depletion happens experimentally. (We know pretty much zero about how depletion happens in the brain. And I think the whole little bit of lemonade brings more glucose to the brain explanation is very, very wrong. It makes no sense from a neuroscience standpoint.)

I look forward to seeing the published paper, but I am not ready to throw the ego depletion concept out the window quite yet. It could be bunk, of course, but I'm not convinced of it by what this article has to say.

I think the bigger problem is that our estimates of effect sizes (e.g., how big of a difference on a second task does it make if people are depleted?) of this and other social psych phenomena are inflated...ironically because studies are generally under-powered (i.e., use too few participants) and thus it is harder/you are less likely to get a statistically significant result than it should be. The issue of under-powered studies in psychology is a truth universally acknowledged.

Let's say you run 20 studies on depletion. If you had a reasonable sample size in each, which allows you to detect effects that exist, you might get 13 with a small effect size, 6 with a moderate effect size, and 1 that isn't significant (I just totally made up those results, ok?). You publish your studies that find significant results (the 13 and the 6) and people would probably conclude that the effect exists, but it is rather weak. However, if you use a small sample size, which makes it harder to detect effects that exist, you might get 3 studies with a small effect size, 6 with a moderate effect size, and 11 that aren't significant. You publish your studies that find significant results (the 3 and the 6) and people would likely conclude that the effect exists and is of moderate strength. Because small effect size studies turned into non-significant studies, people base their estimates of effect size on only the studies that have larger effects. This is one reason that people complain about the "file drawer problem"--that studies with not statistically significant results don't get published, they just get filed away.

But the interpretation of the "null result" (no significant effect) is not straightforward, as the depletion study described by Slate illustrates. Does it mean that the effect doesn't exist? Or does it mean that the experiment was flawed? In a lot of cases, Or does it mean that the sample size was too small to detect the effect? is also a serious contender. That's why they make a big deal about how big the study in the article is--it's so rare for it to happen.

In any event, man, given the worsening replication crisis in social psychology, I am glad not to be an active researcher in the field, with the entirety of my future career dependent on the difference between p less than .05 and p=.056 and all subject to being thrown out later. Of course, learning deeply about the messed up state of the science in grad school was a major contributor to my decision to get the fuck out. I mean, the academic environment can be very horrible, and if the scholars/journals in your field are scientific magpies ("ooooh, look at that shiny new novel result! I like!") with low attention spans who are not nearly concerned enough with shoring up the fundamentals of your field so that you can trust what's being published and who don't appropriately value or reward the kind of foundational work you find yourself drawn to a lot of the time--well, that makes it easier to walk away from the whole clusterfuck. (Though I freely admit that my published work from my masters program is totally a shiny new novel result! That's why it got published.)

Social psychology, it's time to simply, grow up.

This poor mini lop is totally depleted by all this experimental design/stats talk.

4 comments:

rvman said...: The reporter is clearly in love with the "Big Idea" that "Big Ideas" are the problem and need to be debunked.; March 9, 2016 at 7:53 PM
mom said...: I can totally relate to the bunny! I got depleted just reading about ego depletion! Ha Ha; March 10, 2016 at 8:24 AM
Tam said...: I know far less about this than you do, so it's probably fruitless to comment, but I feel like an ego-repletion effect from lemonade or chocolate is more likely because they are pleasurable than because they send sugars to the brain. Pleasure improves attitudes and motivation, according to introspection.; March 10, 2016 at 11:36 AM
Sally said...: Rvman--Yes! Motivated reasoning at work.

Mom--I hope the bunny helped undo the depletion effect a little ;)

Tam--That's a good hypothesis. It is interesting that artificially sweetened lemonade doesn't work, even though (if I recall correctly) people could not distinguish it from sugary lemonade.; March 10, 2016 at 5:10 PM

Empirical Question

Wednesday, March 9, 2016

A Depletion Crisis Critique

4 comments:

About Me

Blog Archive