JCL · 14(3) · 2024
Journal of Computational Linguistics
Vol. 14, No. 3 · 2024 · pp. 412–438

Frequency of the Lexeme "asshole" in Published English-Language Texts, 1920–2024: A Corpus Study

L. F. Penrose · Department of Computational Linguistics, University of Manchester
Abstract
We report the frequency of the lexeme asshole (incl. all attested variant spellings) in the Google Books English corpus (n ≈ 8.1 × 1010 tokens) and the Corpus of Contemporary American English (COCA, n ≈ 1.0 × 109) over the period 1920–2024. The lexeme exhibits sustained, monotonic growth in frequency-per-million-tokens, with an overall increase of approximately 3,700% (95% CI: 3,610–3,790%) across the study period. We argue that this rate is anomalously high relative to comparable vulgarities, and that the trajectory is consistent with what we propose to call ontological inflation — an increase in occurrence not solely attributable to changes in editorial tolerance.

1. Data

Frequencies were measured as occurrences per million tokens (PMTok), calculated by decade. Variant spellings included in the count: asshole, ass-hole, ahole, a*hole, a$$hole, A.H., the A-word. Compound forms (e.g., asshole-adjacent) were excluded; quoted attestations within fiction were included.

Table 1 — Frequency by decade (PMTok)
DecadeGoogle BooksCOCACumulative Index (1920=100)
1920s0.4100
1930s0.5125
1940s0.7175
1950s1.1275
1960s2.4600
1970s4.81,200
1980s7.15.81,775
1990s9.48.92,350
2000s11.210.72,800
2010s13.813.23,450
2020s (to 2024)15.114.93,775
Index value at 2024 = 3,775; rounded to 3,700% in the abstract for typographic ease.

2. Method Note

An obvious confounder is the loosening of editorial conventions in published English between 1920 and 2024. We attempted to control for this by comparing the trajectory of asshole to that of three other markedly vulgar lexemes (damn, hell, bastard), which exhibit increases of 480%, 320%, and 290% respectively over the same period. Asshole’s increase is 7.7× the next-fastest. Editorial loosening cannot account for this gap; some additional factor is required.

3. Discussion: Ontological Inflation

We propose that the lexeme’s growth in frequency reflects a real growth in referents — not solely an increase in willingness to name them. This position is unfashionable. The present authors note that it is also the position implied by the data, and that the unfashionable is not, by itself, an argument.

4. Limitations

The Google Books corpus is biased toward edited published prose and undersamples private correspondence, internet text, and oral speech — all of which the present authors expect to exhibit considerably higher frequencies than those reported here. The figure of 3,700% should therefore be regarded as a conservative lower bound. The true growth is, in our judgement, considerably larger; we have not attempted to quantify it.

Submitted 4 January 2024; accepted 19 February 2024. The authors declare no conflict of interest. The authors declare that the present study was conducted independently of the Department of Useless Provocations, which has been cited in several news outlets as having “commissioned” it. The Department has not. The Department has, in fact, declined to comment.