Curiosity as to which pairs of consonants occur most frequently in the English language led me to conduct this interesting research, the conclusions of which will be presented here forthwith. My interest in this topic arose from looking at J.R.R.Tolkien’s runes in Lord of the Rings. Some of his letters have the sounds of two consonants. I wondered what would need to be changed to make runes more useful for writing English.
After an unfruitful search of the internet, I decided that I needed
to write my own program in the C programming language to determine the
answer to this problem. The source code and executable of the program (named
lettray.exe) can be found here.
It is a very small and quick program and you should have no trouble using
it if you wish.
As for texts to analyze I was able to easily download the full texts
of public domain novels from the Project Gutenberg website (www.gutenberg.net).
A quick way to find the number of occurrences of a certain letter sequence is to use a word processor and use the ‘replace’ function. For example, replace all "CH" with "QQ" and the program will tell you "327 replacements made".
My table of results follows below:
|
|
|
|
||||||||||||||
TROILUS & CRISEYDE
by Geoffrey Chaucer |
|
|
TH 9089 | ND 3933 | ST 1880 | GH 1631 | SH 1387 | HT 1318 | WH 1178 | CH 1018 | RT 936 | YN 912 | TR 894 | NG 873 | LL 809 | LY 791 | LD 763 |
KING LEAR
by William Shakespeare |
|
|
TH 3563 | ND 1321 | ST 1131 | LL 920 | NT 771 | NG 633 | WH 609 | MY 484 | SH 433 | CH 418 | RS 407 | LD 397 | RT 340 | GH 336 | RD 320 |
MACBETH
by William Shakespeare |
|
TH 2855 | ND 953 | ST 676 | LL 559 | NG 484 | NT 433 | WH 429 | CH 371 | CB * 291 | RS 282 | RT 277 | SH 276 | LD 246 | GH 242 | SS 241 | |
ESSAYES of Francis Bacon |
|
|
TH 8741 | ND 3287 | ST 1755 | NG 1482 | NT 1322 | LL 1215 | CH 1207 | WH 1040 | RS 925 | NS 854 | RT 842 | SS 673 | PR 671 | LY 669 | TR 610 |
PILGRIMS PROGRESS
by John Bunyan |
|
|
TH 10521 | ND 3092 | ST 1992 | NG 1636 | LL 1539 | CH 1396 | WH 1288 | NT 1118 | HR 808 | LD 759 | SH 726 | GH 715 | RS 545 | NS 528 | LY 524 |
ROBINSON CRUSOE
by Daniel Defoe |
|
|
TH 16505 | ND 7468 | NG 3883 | ST 3263 | LL 2703 | WH 2598 | NT 2594 | MY 2442 | CH 2093 | LD 1851 | SH 1708 | GH 1625 | LY 1323 | HT 1174 | RY 1084 |
GULLIVER’S TRAVELS
by Jonathan Swift |
|
|
TH 14410 | ND 6133 | ST 3807 | NG 3576 | NT 3081 | WH 2488 | CH 2195 | LL 2076 | MY 1831 | RS 1718 | LD 1558 | TR 1514 | RT 1421 | NS 1376 | PR 1373 |
AUTOBIOGRAPHY OF BENJAMIN FRANKLIN |
|
|
|
ND 3883 | NG 2859 | NT 2269 | ST 1781 | PR 1302 | CH 1295 | LL 1283 | WH 1262 | RS 1042 | NS 1006 | LY 984 | SS 899 | NC 898 | MY 822 |
EMMA
by Jane Austen |
|
|
TH 15727 | ND 7914 | NG 6334 | ST 4652 | LL 4376 | SH 3847 | NT 3724 | CH 3198 | LY 2823 | LD 2731 | RY 2482 | WH 2435 | GH 2419 | RS 2391 | SS 2344 |
DEMOCRACY IN AMERICA (vol 1) by Alexis de Toqueville |
|
(England?) |
TH 34639 | ND 9791 | ST 9300 | NT 8201 | NS 5204 | WH 5025 | CH 4898 | NG 4605 | NC 4035 | TS 3807 | PR 3704 | LL 3395 | CT 3343 | TR 3235 | SS 3129 |
HUCKELBERRY FINN
by Mark Twain |
|
|
TH 12312 | ND 8340 | NG 3978 | LL 3090 | ST 2833 | WH 1874 | LD 1840 | NT 1593 | SH 1541 | CH 1422 | CK 1395 | GH 1387 | HT 1026 | YS 960 | DN 905 |
CALL OF THE WILD
by Jack London |
|
|
TH 4908 | ND 2292 | NG 1581 | ST 1112 | NT 870 | LL 738 | CK 689 | WH 608 | CH 556 | TR 526 | LY 523 | GH 472 | SH 451 | LD 380 | SS 356 |
THIS SIDE OF PARADISE
by F. Scott Fitzgerald |
|
|
TH 9881 | ND 4328 | NG 3129 | ST 2882 | LL 2386 | NT 2344 | LY 1690 | RY 1652 | SH 1352 | WH 1291 | CH 1285 | GH 1226 | RS 1162 | SS 1082 | NC 901 |
NASB version of the Bible
(4 gospels) |
|
|
TH 12627 | ND 6694 | NG 3053 | WH 2408 | LL 2254 | ST 1704 | NT 1456 | SH 1016 | CH 986 | RS 785 | RD 782 | NS 741 | LD 725 | PL 621 | GH 599 |
NIV version of the Bible
(4 gospels) |
|
|
TH 12142 | ND 4142 | LL 2463 | NG 2267 | WH 2223 | ST 1689 | NT 1489 | PL 975 | CH 942 | RS 803 | RD 704 | LD 703 | NS 693 | SH 642 | GH 599 |
Brief Analysis
I tried to see if there was any variation in most common consonant pairs over the centuries. You can see that there is some, but not much. The YN in Chaucer come from words like "myn" for "mine". In the 1700’s words containing MY seem to have been more popular. In the latter half of the 20th century, there seem to be more words containing PL – but I need to find another two non-copyright texts to analyse. Shakespeare’s plays have the names of the speaker at the beginning of every paragraph spoken – hence, CB is one of the top 10 consonant pairs in Macbeth.
It is possible that some editions of these works may have spelling modernized
or ‘corrected’ thus giving the wrong results. However, from my brief look
at the texts, Project Gutenberg seems to have kept the original wording
and spelling. I also wanted to see if there were any differences between
US and British works, but found nothing obvious in this small sample of
texts. It was difficult thinking of texts from both countries in each century.
One known difference is that American spelling often replaces the English
S with Z as in "analyse", "paralyse", "recognise".
Commentary on Peculiarities
One can see from this sample of texts that in all cases for the last 400 years, TH is by far the most common consonant pair. This is very unfortunate for non-English speakers as the only languages with the TH sound are English, Greek, Gothic (long since gone!) and Icelandic. (Oh! the Zaragosa region of Spain pronounces S as TH because one of their rulers a long time ago had a lisp and they all had to imitate him. I wonder how they would speak if he had a stutter?!)
Why do we have a TH sound? The TH words in English came from Gothic, Scandinavian (Icelandic), and more recently, Greek. Here’s probably what happened:
Gothic died out around 1000 years ago (between 500 and 900 AD). [Perhaps
Goths came up with this name for themselves so that no one else could pronounce
it!] Some of our words may have come from the Goths, but they are not a
direct ancestor of English. The Goths were busy razing Rome and then raising
Gothic Cathedrals, and never made it to Bonnie England. Marauding Vikings
(750-1050) may have brought the dreaded TH sound to English shores.
(th-th-thanks for noth-th-thing!) Icelandic actually has two type of
TH sounds, the ‘thorn’ þand
the ‘edh’ ð. I
don’t know how much Icelandic there is in English, but we also have two
types of TH sounds – say "think" and "this" to hear the difference. Further
down the (historical) road, Britain was importing many things from the
continent. It seems that when words were imported from Germany and Holland,
many ‘D’s were changed to ‘TH’s – e.g. ‘dank’ to ‘thank’ – definitely an
improvement, right? Anglo-Saxon snobbery or just a thtrange custom? Finally,
in the past century, cool technical words have been imported from Greek
(like ‘thermometer’ and ‘telethon’), or simply made up from Greek sounding
words. The Greek letter for TH is Q or q
(theta), belovéd by trigonometry students around the world.
In all cases WH is also within the top 10 consonant pairs. Why do we have so many WH combinations when the H is silent? Well, in old English, WH was both written and pronounced as HW – say ‘hwat’ – literally. Almost all of the words that are now written as "WH…" were originally written "HW…". This changed when the Normans who conquered England in 1066 AD and redefinded the spelling of many words including these HW ones (don’t you wish that you had that power!). Finally, a few WH words were initially "W…" words which were aspirated by the addition of an H – such as "whip" from "wippe". Posh? – not!
Nowadays we generally don’t pronounce the ‘H’ anymore – except perhaps in Wales and in the word "white". (Are there hwite hwales in Wales?) Interestingly enough, professional singers pronounce WH words as HW to make them easier to understand and soften the vowels. Interestingly enough, Tolkein included a rune for the HW sound when he invented his elvish alphabet.
Having come full circle back to Tolkein, we will stop here.
[sic] means that the spelling error was in the original document and
not intruduced [sic] in the publishing process.
Addendum:
I just found out from my students that Albanian and Arabic also have the TH sound!
If this is true, then perhaps there are more TH languages out there!