Wednesday, March 19, 2008

lies, damned lies and misconceptions

A reader at Columbia raised the question, what kind of novelist would be interested in statistics.

Calvino once described a city in which a man passes a girl with a puma on a leash, a sailor with a tattoo. He imagines that these are remarkable events. The comment is made: there's always some girl walking a puma for a whim. (I am paraphrasing wildly, too lazy to check.) The point is, when we observe events we can't tell whether they're unusual. Most of us are aware of this to some extent, but we start out with only a very loose sense of the ways we might go wrong; when one starts reading about data analysis, one sees how little we actually know about we actually see. Which means, of course, that the typical 'realistic' novel is not realistic in showing us a fictional analogue for what's actually in the world, it's realistic only in replicating the kinds of mistakes the naive observer makes in looking at the world. The unreliable narrator, of course, is part of fiction's stock in trade, but this is not unreliability used as literary strategy; it's just a form of unreliability that's invisible to author and readers alike.

Here's a simple example of the intuitionist's approach to probability:

We come into a room. It contains two indistinguishable urns. We’ve been told that one has 300 white balls and 700 red, the other 300 red balls and 700 white. I bet you $10 that Urn 1 has 700 red balls. Will you bet $10 that it’s not? (You may not be willing to bet with only even odds, but it’s a fair bet.)

I take a ball from Urn 1. It’s red. I put it back; the Mechanical Ball Mixer mixes the balls. I take another ball. It’s red. I put it back; the Mechanical Ball Mixer mixes the balls. After 12 draws, I have this result: RRWRRRWWRRWR (i.e. 8 red balls and 4 white). Will you still bet $10?

Almost certainly not. Most people wouldn’t. But suppose I bet $12 against your $10? $14? $16? $20? At what point would it again be a fair bet?

Most people* think something between $14 and $20 would be fair. This is not correct - I would have to bet $293 to make it a fair bet. In other words, intuition goes very quickly from being a good guide to being wildly astray.

* I do have a reference somewhere among my papers to a study with actual data, but I am not convinced that the relevant papers are not in storage in London.

Most fiction does nothing to make us aware of the gulf between cases where intution serves us well and those (surely far more common) where it does not. It does nothing to show where we should be wary, or how to think through tough cases. Most fiction is confined to the realm of false intuition; it offers us no viewpoint with a better understanding of chance. Which is simply to say that, because we live in a culture with a profound hostility to mathematics, the type of person who writes fiction is likely to be the type of person who shares that hostility and can rely on a large audience which also shares it. Among other things, this means that someone like my friend Rafe Donahue, a biostatistician at Vanderbilt, tends to be both underrepresented and misrepresented among fictional characters. Once upon a time persons of color could only get parts in films playing servants, often with amusing eccentricities which confirmed the supposed preconceptions of the audience; the sort of person who grapples with data analysis is either not seen in fiction or appears as some sort of eccentric.

As I write the Fed has cut its key interest rate by 75 points, down to 2.25%. The Fed has brokered a deal with Morgan Stanley; Bear Stearns, the fifth largest bank in the country, has been bought by MS at $2 a share, down from $160 a share last year. All this comes as a result of the collapse of the subprime market - a market dependent on financial instruments which were the pride of the finance industry only a couple of years ago. One thing fiction could have been doing all this time was enabling people to see on the page the way those in the risk business think about risk; it could have used the techniques of Edward Tufte's information design, for example, to present data in a way that did not numb the mind of the general reader.

This looks interesting from a formal point of view: fiction has made use for centuries of free indirect discourse, in which narrative is presented in the inner language of a character who is not the author, but has steered clear of the sort of inner language that helps itself to the bag of tricks of Tukey, Mosteller, Bill Cleveland and others too numerous to mention. It looks important simply because the management of risk is integral to our society; if fiction ignores the way this actually works, its view of the world is not much less primitive than one in which storms blow up because Odysseus angered Poseidon.


Lee said...

Though an interesting thesis, and certainly I see no reason why fiction can't make use of risk analysis (likewise any other area of endeavour/study), I shy away from anything which tells us what fiction 'should' be doing. And 'the realm of false intuition' is too broad a term - intuition with regard to what? I'm not sure we should lump together risk analysis with behaviour, for example.

Ithaca said...

I think the phrase I used was 'one thing fiction could have been doing'. I suppose it would be possible to unpack this. It does seem to me that it would be A Good Thing if aspects of the world that are consistently omitted in fiction - that is, left out of ALL fiction that has come my way - were to appear in SOME fiction. I don't think all fiction should include African American characters, or that there is something wrong with a novel in which the only African American character is a servant, but there would be something amiss (to my mind) if no African American characters appeared in any novels, or if the only ones seen were servants. (Of course, I can't be sure that the omissions I note in the fiction that comes my way are not made good in work I haven't seen...)

The false intuition to which I referred was that which relates to assumptions about the likelihood of a particular event (drawing a red ball from an urn, encountering a girl with a puma in the street); I think I assumed this had been established by the examples offered.

There are fictional universes whose creators consciously accept a prescientific style of explanation; it would seem strange to find fault with Tolkien on such grounds. But a distinction is generally drawn between magic realism and realism tout court; it seemed interesting that what passes for the latter so consistently prefers an anecdotal approach to chance.

Mithridates said...

This is beautiful. Where do you recommend I start acquiring a non-antiquated view of the representation of chance? I'm not sure if I'm a beginner or not - I know something about probability - but which books would be good for beginners?

One reason I find this observation interesting is that realism did, at one time, view itself as closely linked with science. Nineteenth-century realists were always engaging with scientific discourses of all kinds when they argued about realism. Much fiction no longer seems interested in keeping up with developments in science. Some of it attempts to view it from without, to examine it's psychological, sociological, etc. effects--in other words, to view it as content; but you're right: it doesn't have any effect on the form.

Perhaps a related point: I was reading Clifford Geertz recently and, while I enjoyed some of his book, the initially interesting and then really depressing thought occurred to me that he'd drawn on literature - a fairly old view of literature, essentially a kind of aestheticism - for his notion of "thick description." It felt like a tangible step backwards, even though it is or was a highly regarded way of writing anthropology.

nsiqueiros said...

And once upon a time people used to say, "What kind of writer would be interested in novels?"

Elatia Harris said...

Well, you could try reading _Chance_, by Joseph Conrad. And you wouldn't have wasted your time. Penelope Lively wrote a novel all about contingency, _Cleopatra's Nose_. Or _Border Crossing_, by Pat Barker, in which a central element of the plot is analyzed for how deeply improbable it is. Don't we like our risk of disappointment in reading fiction to be managed by asking those fictions we choose to have plots that feel neither all too neat nor utterly incoherent? Whereas we know life is full of horrifying coincidences, and really messy too.

Mithridates said...

Thanks, Elatia. I'm know the Conrad but not the other two. Actually, I was thinking more along the lines of something about data analysis.

Lee said...

Yes, you said 'could have been doing', but the whole tone of the post, at least to my mind, implies 'should be doing'. However, I'm glad for your elucidation, since - similarly - I felt the statement about false intuition, though later qualified by 'chance', again left me with obviously a mistaken impression. (And not all of us have such a faulty sense of 'intuition' regarding probability, BTW - but then I come from something of a gambling family.)

Whether I want my lessons in risk management from fiction is of course open to debate. As is the whole thorny question of realism...

SnowLeopard said...

A few years ago I attended a seminar on statistical analysis in employment litigation that whetted my appetite, and I've been looking for a good book on statistics ever since. The choice on the market seems to be between college texts that assume I already know the subject matter, and therefore skip what I consider crucial steps, and paperback reprints that assume I'm too stupid or uninterested to handle the math. Baldus & Cole's Statistical Proof of Discrimination is reasonably accessible on a conceptual level, but is a bit specialized (intended for attorneys and judges) and doesn't include much math or probability per se. Bruning's Computational Handbook of Statistics seems to focus (a good thing) on applying specific mathematical procedures to concrete problems, but I haven't tackled it yet. Richard Feynman's brilliant and completely accessible Lectures on Physics contains the clearest explanations of key scientific and mathematical concepts you can hope to find anywhere (while dumbing down nothing), but isn't specifically a statistics text.

Rachel said...

In terms of giving a philosophical/historical perspective on how thinking about statistics and probability has changed over time, I recommend The Probabilistic Revolution, edited by Lorenz Krüger.

In terms of understanding the mathematics involved, sometimes introductory texts in probability (for example Sheldon Ross) might be more elucidating than introductory texts in statistics.

Rachel said...

Also, the above I mentioned were more academic but for a popular nonfiction treatment of chance, there are Nassim Taleb's Fooled by Randomness and the Black Swan.

Mithridates said...

Hey, Rachel, thanks. Oddly, Kruger and Ross seem to be in high demand at the CUNY library: they've both been reported missing. (. . . And why do I seem incapable of reducing the probability that I will write it's instead of its and effect instead of affect: so frustrating.)

This reading might link up in interesting ways with stuff I've been reading in philosophy and cognitive psychology on emotions, particularly on feelings of depression. In _Helplessness_ Martin Seligman writes about the link between depression and a sense that one lacks control over one's environment. He makes an interesting argument about a type of learning called extinction. Extinction, as he defines it, is when we learn to stop doing stuff: "a response that once produced an outcome now produces nothing." We learn to stop doing something once we perceive that it no longer has any effect. "People and animals," he writes, "learn readily that their responses are followed only intermittently by the outcome. Moreover, once they learn this, their responses become highly resistant to extinction. To accomodate facts, a slightly more complicated organism is required: one that can out together the two kinds of moments--explicit unpairing and explicit pairing--and come up with an average. In other words, organisms can learn 'sometimes' or 'maybe,' as well as 'always' and 'never.'" Interesting, too, is that intermittent success at producing an outcome followed by a period of no success at producing that outcome means that you will be less prone to "extinction" than if you started with complete success. So, in other words, say I push an elevator button a hundred times. If the elevator door opens immediately 50 days in a row, and then suddenly does not open at all, I will give up after 3 or 4 days. But if the elevator door opens intermittently the first fifty days, it will take me much longer, several weeks, to finally give up.

Rachel said...

I think the NYU (specifically Courant Institute of Math on Mercer Street) library has them. Maybe there is reciprocity with CUNY?

Andrew said...

I'm biased, of course, but I recommend the book, Teaching Statistics: A Bag of Tricks. I've been told that it's fun to read even if you're not actually teaching statistics.

Ithaca said...

mith, my personal selection is probably not what an expert would come up with, but I love: Gerd Gigerenzer's Reckoning with Risk, also the Empire of Chance which he co-edited (which includes an discussion of the way statistics textbooks serve up a mishmash of incompatible statistical methodologies); The Significance Test Controversy (ed. Morrison and Henkel); Against the Gods (Peter Bernstein); Data Analysis & Regression (Mosteller & Tukey) and Exploratory Data Analysis (Tukey); The new Statistical Analysis of Data (Anderson & Flint); Statistics (Freeman, Pisani et al.); Jim Pitman's Probability; Edward Tufte's books on information design (Envisioning Information, The Visual Presentation of Quantitative Information (my favourite, though EI is prettier), Visual Explanations and Beautiful Evidence); Bill Cleveland's two books on information design (Visualizing Data, The Elements of Graphing Data); Thaler's Quasi-Rational Economics. I also have and love Maindonald (Data Analysis and Graphics using R) and Murrell (R Graphics) but Rafe Donahue strongly disapproves of the sort of person who thinks s/he is doing statistics because s/he can do cool plots in R. Rafe has a great short paper on How Statisticians Think and Why It Matters, available for download on his website. I think Gigerenzer is a better popular introduction than Taleb's Fooled by Randomness.

Lee, I have a habit which I tend to think is a bad one, of writing posts and then putting them in the drafts folder because I think the argument could be more fully substantiated, expressed with greater logical rigour and so on. I argue with myself that a principle of charity applies to blogs: it is better to write something in an hour and post it than a) put it in the drafts folder or b) spend a whole day revising it several times, thereby doing no work at all on my book.

Strictly speaking, fiction is not an agent, there are only writers of fiction who can do all kinds of things in their books. Anyone could do just about anything in a work of fiction. I may think: it is a bad thing that NO ONE has introduced the general reader to Greek and Japanese in a work of fiction, given that these are not commonly available below university level - in other words, this SHOULD have found a place in fiction somewhere. That is not the same thing as saying that EVERYONE should have smuggled Greek and Japanese into their fiction; it is not to say that any particular writer of fiction is to blame for not having done so. It is to make a comment, I think, on the fact that, because writers generally decide on their projects independently, the work produced taken in the aggregate displays undesirable omissions.

It would have been possible to say 'most people (but not all)' rather than 'most people'; while it's true that 'all people' are also 'most people', I think it would be unusual to say 'most' when one meant 'all'. Most people don't study Greek at school, but I have a friend who studied it from the age of 12.

It would be unusual for anything in a work of fiction to appeal to all readers. If something interests me that I haven't seen included in fiction, I assume there are other readers who would find it entertaining. I wouldn't normally expect such a book to appeal to all other readers, or even the majority - but then even Dan Brown, with his 60 million sales, appeals to only a minority.

Mithridates said...

Typo in the Seligman: I meant "one that can put [not out] together two types of moments." Didn't want to misquote him.

Thanks, Andrew. I love your blogs. And of course it would be odd if someone wasn't biased towards their own book. "You know, I wrote a really worthless book that you definitely should have some real reservations about if you wind up reading it at all." Come to think of it, this approximates how I feel about my dissertation.

Ithaca said...

AG: Teaching Statistics: A Bag of Tricks sounds great; another must-have.

Levi Stahl said...

I add this more for fun than in a hope of elucidating anything, but here's what I just now came across in The Man Without Qualities, a conversation between Ulrich--the quality-less man of the title--and his sister, Agathe:

"But for young people it is part of the song of life; they want to have a destiny but don't know what it is."

"In times to come, when more is known, the word 'destiny' will probably have acquired a statistical meaning," Ulrich responded.

. . . .
"What we still refer to as a personal destiny," Ulrich said, "is being displaced by collective processes that can finally be expressed in statistical terms."

Agathe thought this over and had to laugh. "I don't understand it, of course, but wouldn't it be lovely to be dissolved by statistics?" she said. "It's been such a long time since love could do it!"

Ithaca said...

Levi - That's fabulous. The terrible thing is, I've read this, but so long ago I'd completely forgotten.

Lee said...

Oh, don't take my quibbles too seriously: the only thing I really think a writer should do is write to satisfy themself - or to attempt to.

But what interests me more is how you'd begin to include statistics/risk analysis etc in a formal way in fiction: I'm fascinated by FID, but wonder if inner musings alone would be a radical formalistic change - or do you mean not the content of these musing, but their structure?

Hassan said...

Forgive me for jumping in so late to a discussion and post so fascinating, it has made my day. In fact, I'm actually sorry there are only a few hours left to my day, I wish I had come across this tomorrow morning.

By association, I may ruin the eloquence of Helen's argument, but I really couldn't agree with her more. Because probability is constructed on nothing other than the incomparably sound law of the excluded middle, in excluding a stastistical analysis from a discussion where it would be profitable (e.g any one involving sizeable collections of people: community/state/national level), one might as well walk into the discussion completely stoned; one would be just as far off from coming to accurate conclusions. This is often what I feel when I see the slipshod reasoning of our political minds of today. I look forward to the day when no one will have the bravery to cite up conclusions based on sample sizes too small to be statistically relevant.

A novel that involves an awareness of statistics? About bloody time. *raises whiskey tumbler*

Anonymous said...

I'm responding late, but...My college writing teacher used to say that "A robber mugged me" is a less interesting story than "My sister mugged me." The idea is that statistically likely events make better fiction, because the reader has never seen them before.

Of course, "A robber mugged me" is more likely to reach the bestseller list and become a major motion picture. But if you've seen the books on the bestseller list, they might prove my teacher was right on target.

One risk of writing original, unconventional stories is that you might go too far and pile up the improbabilities until no one believes a word you say. Or, worse, people might call you "quirky," and then you'd have to become an independent filmmaker.

But there's also what I call the zeitgeist problem: Several years ago, there were a lot of news stories about surrogate mothers, and just about every television show on the air decided it would be brilliantly original to make one of the main characters into a surrogate parent. So if you'd watched American television during around that time, and you weren't familiar with U.S. culture, you might think that every third woman in the country was a surrogate mother.

And there's also the issue of political correctness. Some writers will go so far out of their way to avoid stereotypes that every African-American character will be witty, intelligent, and financially secure--which, depending on your point of view, is either a much-needed corrective to most of American history or a bizarre, anomalous sameness, like giving every character a sister who's a mugger.

So I guess my point is, I like reality in my stories, just not too much of it.