Curiosity as to which pairs of consonants occur most frequently in the English language led me to conduct this interesting research, the conclusions of which will be presented here forthwith. My interest in this topic arose from looking at J.R.R.Tolkien’s runes in Lord of the Rings. Some of his letters have the sounds of two consonants. I wondered what would need to be changed to make runes more useful for writing English.

After an unfruitful search of the internet, I decided that I needed to write my own program in the C programming language to determine the answer to this problem. The source code and executable of the program (named lettray.exe) can be found here. It is a very small and quick program and you should have no trouble using it if you wish.
As for texts to analyze I was able to easily download the full texts of public domain novels from the Project Gutenberg website (www.gutenberg.net).

A quick way to find the number of occurrences of a certain letter sequence is to use a word processor and use the ‘replace’ function. For example, replace all "CH" with "QQ" and the program will tell you "327 replacements made".

My table of results follows below:
TITLE & Author
Date written
Country of Author (translator)
Top 10 Consonant Pairs with Frequency
by Geoffrey Chaucer
TH 9089 ND 3933 ST 1880 GH 1631 SH 1387 HT 1318 WH 1178 CH 1018 RT 936 YN 912 TR 894 NG 873 LL 809 LY 791 LD 763
by William Shakespeare
TH 3563 ND 1321 ST 1131 LL 920 NT 771 NG 633 WH 609 MY 484 SH 433 CH 418 RS 407 LD 397 RT 340 GH 336 RD 320
by William Shakespeare
TH 2855 ND 953 ST 676 LL 559 NG 484 NT 433 WH 429 CH 371 CB * 291 RS 282 RT 277 SH 276 LD 246 GH 242 SS 241
ESSAYES of Francis Bacon
TH 8741 ND 3287 ST 1755 NG 1482 NT 1322 LL 1215 CH 1207 WH 1040 RS 925 NS 854 RT 842 SS 673 PR 671 LY 669 TR 610
by John Bunyan
TH 10521 ND 3092 ST 1992 NG 1636 LL 1539 CH 1396 WH 1288 NT 1118 HR 808 LD 759 SH 726 GH 715 RS 545 NS 528 LY 524
by Daniel Defoe
TH 16505 ND 7468 NG 3883 ST 3263 LL 2703 WH 2598 NT 2594 MY 2442 CH 2093 LD 1851 SH 1708 GH 1625 LY 1323 HT 1174 RY 1084

by Jonathan Swift

TH 14410 ND 6133 ST 3807 NG 3576 NT 3081 WH 2488 CH 2195 LL 2076 MY 1831 RS 1718 LD 1558 TR 1514 RT 1421 NS 1376 PR 1373
TH 8406
ND 3883 NG 2859 NT 2269 ST 1781 PR 1302 CH 1295 LL 1283 WH 1262 RS 1042 NS 1006 LY 984  SS 899 NC 898 MY 822
by Jane Austen
TH 15727 ND 7914 NG 6334 ST 4652 LL 4376 SH 3847 NT 3724 CH 3198 LY 2823 LD 2731 RY 2482 WH 2435 GH 2419 RS 2391 SS 2344
DEMOCRACY IN AMERICA (vol 1) by Alexis de Toqueville
TH 34639  ND 9791  ST 9300  NT 8201  NS 5204  WH 5025  CH 4898  NG 4605 NC 4035  TS 3807  PR 3704  LL 3395  CT 3343  TR 3235  SS 3129
by Mark Twain
TH 12312  ND 8340  NG 3978  LL 3090  ST 2833  WH 1874  LD 1840  NT 1593  SH 1541  CH 1422  CK 1395  GH 1387  HT 1026  YS 960  DN 905
by Jack London
TH 4908  ND 2292  NG 1581  ST 1112  NT 870  LL 738  CK 689  WH 608  CH 556  TR 526  LY 523  GH 472  SH 451  LD 380  SS 356
by F. Scott Fitzgerald
TH 9881  ND 4328  NG 3129  ST 2882  LL 2386  NT 2344  LY 1690  RY 1652  SH 1352  WH 1291  CH 1285  GH 1226  RS 1162  SS 1082 NC 901
NASB version of the Bible
(4 gospels)
TH 12627  ND 6694  NG 3053  WH 2408  LL 2254  ST 1704  NT 1456  SH 1016  CH 986  RS 785  RD 782  NS 741  LD 725  PL 621  GH 599
NIV version of the Bible
(4 gospels)
TH 12142  ND 4142  LL 2463  NG 2267  WH 2223  ST 1689  NT 1489  PL 975  CH 942  RS 803  RD 704  LD 703  NS 693  SH 642  GH 599

Brief Analysis

I tried to see if there was any variation in most common consonant pairs over the centuries. You can see that there is some, but not much. The YN in Chaucer come from words like "myn" for "mine". In the 1700’s words containing MY seem to have been more popular. In the latter half of the 20th century, there seem to be more words containing PL – but I need to find another two non-copyright texts to analyse. Shakespeare’s plays have the names of the speaker at the beginning of every paragraph spoken – hence, CB is one of the top 10 consonant pairs in Macbeth.

It is possible that some editions of these works may have spelling modernized or ‘corrected’ thus giving the wrong results. However, from my brief look at the texts, Project Gutenberg seems to have kept the original wording and spelling. I also wanted to see if there were any differences between US and British works, but found nothing obvious in this small sample of texts. It was difficult thinking of texts from both countries in each century. One known difference is that American spelling often replaces the English S with Z as in "analyse", "paralyse", "recognise".

Commentary on Peculiarities

One can see from this sample of texts that in all cases for the last 400 years, TH is by far the most common consonant pair. This is very unfortunate for non-English speakers as the only languages with the TH sound are English, Greek, Gothic (long since gone!) and Icelandic. (Oh! the Zaragosa region of Spain pronounces S as TH because one of their rulers a long time ago had a lisp and they all had to imitate him. I wonder how they would speak if he had a stutter?!)

Why do we have a TH sound? The TH words in English came from Gothic, Scandinavian (Icelandic), and more recently, Greek. Here’s probably what happened:

Gothic died out around 1000 years ago (between 500 and 900 AD). [Perhaps Goths came up with this name for themselves so that no one else could pronounce it!] Some of our words may have come from the Goths, but they are not a direct ancestor of English. The Goths were busy razing Rome and then raising Gothic Cathedrals, and never made it to Bonnie England. Marauding Vikings (750-1050) may have brought the dreaded TH sound to English shores.
(th-th-thanks for noth-th-thing!) Icelandic actually has two type of TH sounds, the ‘thorn’ þand the ‘edh’ ð. I don’t know how much Icelandic there is in English, but we also have two types of TH sounds – say "think" and "this" to hear the difference. Further down the (historical) road, Britain was importing many things from the continent. It seems that when words were imported from Germany and Holland, many ‘D’s were changed to ‘TH’s – e.g. ‘dank’ to ‘thank’ – definitely an improvement, right? Anglo-Saxon snobbery or just a thtrange custom? Finally, in the past century, cool technical words have been imported from Greek (like ‘thermometer’ and ‘telethon’), or simply made up from Greek sounding words. The Greek letter for TH is Q or q (theta), belovéd by trigonometry students around the world.

In all cases WH is also within the top 10 consonant pairs. Why do we have so many WH combinations when the H is silent? Well, in old English, WH was both written and pronounced as HW – say ‘hwat’ – literally. Almost all of the words that are now written as "WH…" were originally written "HW…". This changed when the Normans who conquered England in 1066 AD and redefinded the spelling of many words including these HW ones (don’t you wish that you had that power!). Finally, a few WH words were initially "W…" words which were aspirated by the addition of an H – such as "whip" from "wippe". Posh? – not!

Nowadays we generally don’t pronounce the ‘H’ anymore – except perhaps in Wales and in the word "white". (Are there hwite hwales in Wales?) Interestingly enough, professional singers pronounce WH words as HW to make them easier to understand and soften the vowels. Interestingly enough, Tolkein included a rune for the HW sound when he invented his elvish alphabet.

Having come full circle back to Tolkein, we will stop here.
Michael Harwood
12 Dec 2002

I just found out from my students that Albanian and Arabic also have the TH sound! If this is true, then perhaps there are more TH languages out there!

12 Jan 2003