When the cowardly “Resistance” op-ed came out, my first thought was, Gee, I bet we could get some insights on authorship by doing an automated textual analysis. Because of course that was my first thought. Well, somebody was kind enough to do one for us. Specifically, Michael W. Kearney, a journalism and informatics professor at the University of Missouri. Here is the result; I’ll do a layperson’s explanation below, and then some technical links for those so inclined.
Executive summary: This analysis suggests that it was somebody from the office of the Vice President, the State Department, or the Department of Commerce.
What is this?
- The y-axis is various Twitter accounts, labeled on the left.
- The x-axis is the textual correlation.
- Kearney took up to 3,200 tweets from each of the accounts listed, and ran an analysis on those corpuses. He then compared the resulting numbers to the results of the same analysis run on the text of the op-ed.
- The line at the top shows, of course, a 1.0 correlation with the op-ed itself. The next-highest are the Twitter accounts for the Vice President, Trump (who we can discount), Secretary Pompeo, Secretary Ross, and the State Department.
- The analysis includes figures for things like comma usage, sentiment, politeness, word choice, first- and second-person preference, and so on.
- It probably wasn’t somebody at the Department of Transportation.
- Update: I assumed this went without saying, but obviously tweets are not an ideal data source; just most-readily usable with what Kearney had laying around, and within a very short time period.
- We know from reporting on the Wolff book that anonymous sources sometimes intentionally steal other staffers’ phrasing when providing quotes.
- This could explain the use of ‘lodestar,’ a strongly Pence-affiliated word.
- However, it is harder to fake things like comma usage.
- Higher-ranking officials are likely, in their Twitter communications, to try to sound more like Trump, or in general use more homogenous language.
- This could explain the ~0.7 cluster of the most important officials and departments.
- These are not huge volumes of text, and thus the figures are potentially not representative.
- The replies here are worth perusing:
— Mike Kearney📊 (@kearneymw) September 6, 2018
This was quick work, enabled by the library in the final bullet point, which I’m going to have to check out. Cool stuff.