How To Count Style: An Interview with Justin Rice from Lit Charts

Recently, we got in touch with Justin Rice, the creator of the statistical analysis “What Makes Hemingway Hemingway?” from LitCharts.  In the interview, we discuss why Rice chose Hemingway for the inaugural analysis, what he’s learned about the writer’s evolving style, and where he sees the overall project headed in the future.  Our thanks to him and to the co-founders of LitCharts, Ben Florman and Justin Kestler.


Q: How did the "What Makes Hemingway Hemingway?" statistical analysis project originate?

A: LitCharts focuses on close reading: we summarize, analyze, and assign a theme to every plot point in each of the 400 books we cover. Our unique editorial approach helps readers understand the importance of every plot development and trace how each theme develops over the course of a text.  With LitCharts Analitics, we set out to take an even closer look at the “data” that defines a text or a writer’s style. This type of data-driven textual analysis isn’t exactly new--there’s an entire academic field called “Digital Humanities” dedicated to similar efforts--but with LitCharts Analitics our goal is to make that approach less academic, and more accessible and illuminating to readers of all ages.


Q: Why choose Hemingway?

A: Hemingway has a signature writing style that lends itself well to the type of data-driven analyses we’re creating for our Analitics series. Any Hemingway reader can give you a quick sense of his style--short sentences, lots of dialogue, not too much flourish--but our goal was to take a much closer look at how Hemingway achieves that style, and also to see if his style changed over time and across his most famous texts. And because Hemingway is so widely read in school and is a favorite of ours and many of our readers, we thought analyzing Hemingway would be a great way to start the series.


Q: Based on what you already knew about Hemingway, what new aspects of his writing style did you discover?

A: I assumed Hemingway used very few adverbs, so it was surprising to find that he actually used more adverbs than average.  At first, I thought there was a bug in the part-of-speech-tagging code, but on closer investigation, I realized that almost all of the adverbs he used were short and straightforward  (“then,” “now”, “never”), and almost none were the multi-syllabic, “ly” ending words that I think of when I think of adverbs (“approximately,” “immediately,” “ridiculously”).  After a little research into (or maybe a refresher on?) types of adverbs, I realized that he was using a lot of time, place, and frequency adverbs, but very few manner adverbs. Hemingway’s style is more nuanced than I realized.   


Q: Of the many important insights here, you underscore Hemingway's evolving style.  For instance, “while short sentences are characteristic of Hemingway, they define his work less and less as his career progresses.”  Referring to dialogue, you also say that, in contrast to the early WWI novels, characters in later works talk more but say less or barely speak.  It seems, then, that you are actually uncovering various styles, especially a later one that doesn't quite fit with received ideas about "the Hemingway code."  Is that right?  Do you happen to know if the writer's evolution reflects any changes in “typical writing” taking place at the time?  Could you explain a little more how you calculate this "typical writing"?

A: Because writers tend to evolve and write different kinds of work at different points in their careers, it’s always a little tricky to talk about work in aggregate.  That said, there are often common threads that run throughout.  When I read late Hemingway, I can tell it’s different than early Hemingway, but it still feels like Hemingway.  Why is that?

We were interested in trying to answer that question by looking at what’s consistent throughout his work--the tendency to avoid long words, for instance--as well as things that changed, like his use of dialogue.  

To get a sense of what makes Hemingway’s writing distinctive, we needed something to compare it to, and we chose the Brown corpus. It's a million-word compilation of 500 texts from 1961 that was meticulously parsed and tagged by hand, and it was created by linguists as a benchmark for typical writing.  It’s static, so it doesn’t show how typical writing evolves, but since it’s more or less contemporary with Hemingway, it was the best gauge we could find.

At some point, it would be interesting to chart Hemingway’s evolution against a more fluid corpus of typical writing, but we’d need a bigger team for that! 


Q: You give a pretty thorough list of Hemingway's defining verbs, adjectives, nouns, and adverbs.  Out of this list, do you have a favorite, quintessentially Hemingwayesque word?

A: My favorite are the verbs, and I think they’re great strung together: slung, rowed, joked, stroked, punched, dipped.  It’s tough guy poetry.


Q: Throughout, you compare Hemingway to a handful of other writers, including Carver, Fitzgerald, Proust, Stein, and Steinbeck.  Why these writers?  Who else did you consider including?  Would other writers have yielded a different comparative study, a different understanding of Hemingway's style?

A: We chose some of Hemingway’s contemporaries as well as some of his self-proclaimed followers.  If we’d looked at other writers, the results would have been slightly different, but I’m sure we would have discovered a lot of the same trends.  

We did consider other writers for comparison (Faulkner, Shakespeare, Thomas Pynchon), but we felt like the writers we chose gave us enough information to tease out Hemingway’s style, and we didn’t want to introduce writing from eras with different vernacular.  We could keep adding writers to the mix, and may well do that in the future, but the writers we picked seemed to fit the task at hand.   


Q: At various points, you refer to Steinbeck as “more Hemingway than Hemingway,” even ending the statistical analysis with a reference to Steinbeck.  Any chance he’ll be the focus of a data analysis in the near future?

A: There is a chance we’ll focus on Steinbeck.  Like Hemingway, he’s a distinctive stylist, and it would be interesting to see if we could find statistical measures of his uniqueness.  I’d be curious to see how closely the Steinbeck analysis hews to the Hemingway analysis.  They’re definitely different, but what we found suggests more statistical similarity than I would have imagined. 


Q: In keeping with the educational aspects of LitCharts' literary guides, how do you envision this project helping teachers teach and students learn about Hemingway?

A: I think the results of our analysis could lead to some interesting conversations and might give some insight into how Hemingway’s language works.  The nice thing about the zoomed-out statistical view is that it distills essential features of writing. Like a map or an aerial photograph, it gives you a sense of the lay of the land, and in Hemingway’s case, that means you can start to think about what it means to write “truer than true.”   

We also wanted to make this kind of analysis approachable.  Students who have a hard time getting into close reading might be drawn in by a more mathematical approach to literature, and we’re hoping to show them that there’s more than one way to crack a book. Soon after we published “What makes Hemingway Hemingway?” we got a bunch of requests from teachers for a PDF version for use in the classroom, so we now plan to offer free PDFs of each Analitic for exactly this purpose in the future. The Hemingway PDF is here.


Q: Are there any other Hemingway-related projects on the horizon?

A: Right now, we’re working on applying the same kind of analysis to other authors, so while there’s not a Hemingway-related project on the horizon, there’s a good chance he’ll come up again as a point of comparison.  One nice thing about this project: it got me to re-read his work.  And I think I got more out of it this time!


Michael Von Cannon, March 28, 2017