Core vocabulary is quite old

An exciting new find from the world of evolutionary biology.

‘Oldest English Words’ identified, trumpets the BBC. Let’s pretend that when Hengest and Horsa stepped off the boat at Ebbsfleet in 449, they and their business associates instantly started speaking English. That means the oldest words in English (which is all the native vocabulary which has survived since then) are 1,560 years old. But this is to be rather generous about what constitutes the English language.[1]

The article begins

Some of the oldest words in English have been identified, scientists say.

Reading University researchers claim “I”, “we”, “two” and “three” are among the most ancient, dating back tens of thousands of years.

But English isn’t tens of thousands of years old, which means that this is nonsense by implication. Instead, the study, which has been conducted by Mark Pagel, an evolutionary biologist, appears to demonstrate the comparative durability of core vocabulary.

The researchers used the university’s IBM supercomputer to track the known relations between words, in order to develop estimates of how long ago a given ancestral word diverged in two different languages.

I’m sure that the process is a little more complicated than this and a little more scientific, although glottochronology, a dubious idea at the best of times, bobs around in the back of my mind. Yet the more I read in the article, the more I wonder how plausible such research is. There seems to be too much use of words such as “estimate”, and I cannot reconcile

That is, the model provides a list of words that are unlikely to have changed from their common ancestral root by the time of William the Conqueror.


However, the model cannot offer a guess as to what the ancestral words were.

and the sentence which follows it:

It can only estimate the likelihood that the sound from a modern English word might make some sense if called out during the Battle of Hastings.

There’s a big difference between a sound and a word, and even sounds from the two periods of the language, though they might be represented by the same letter in Old and Modern English, are not necessarily phonetically identical.

What the researchers found was that the frequency with which a word is used relates to how slowly it changes through time, so that the most common words tend to be the oldest ones.

Yeah, this is that core vocabulary thing again. This isn’t exactly news.

For example, “dirty” is a rapidly changing word; currently there are 46 different ways of saying it in the Indo-European languages, all words that are unrelated to each other. As a result, it is likely to die out soon in English, along with “stick” and “guts”.

Perhaps there is a correlation between the range of words used for a thing in the Indo-European languages and the likelihood that it’s going to fall out of use in any of those languages to be replaced by something else. But I also wonder about the semantics. “Dirt”, which comes from Old Norse drit, meant “excrement”,[2] and while it’s still used in that sense as a euphemistic term for “dog excrement”, the general sense has moved away from it. In other words, does the study take semantic shifts into account?

Also, how soon is soon? In the next 50 years? Next hundred? Next two hundred or more?

I’ll finish with these two paragraphs in which “sound” is thrown about with gay abandon.

“We think some of these words are as ancient as 40,000 years old. The sound used to make those words would have been used by all speakers of the Indo-European languages throughout history,” Professor Pagel said.

“Here’s a sound that has been connected to a meaning – and it’s a mostly arbitrary connection – yet that sound has persisted for those tens of thousands of years.”

As before, I don’t know exactly what “sound” is meant to mean here unless it’s the whole pronunciation of a word and not a specific phoneme. The problem is that the word is being used in a singular sense which, to me, means an individual phoneme not a whole word.

Even if the BBC story has rather simplified the research for the uninitiated, I can’t help but feel that the research suffers from too many inherent uncertainties that may not survive close scrutiny. I’m also uncertain about the real value for linguistics. There is this penultimate paragraph:

The work casts an interesting light on the connection between concepts and language in the human brain, and provides an insight into the evolution of a dynamic set of words.

but it’s ever so vague. However, that’s the BBC hack at work picking out a suitable statement while avoiding being too specific.


  1. I think I’ve possibly mentioned before that I think English wasn’t really a distinct language until nearer the end of the Old English period at the earliest. In similar vein, I think Wulfilas’ Gothic is merely a dialect of East Germanic and not, in truth, some distinct language.
  2. This may be a little euphemistic, although I’m not sure. ON drit and the verb dríta may have had the same colloquial sense as “shit”. The dictionary may be glossing the word euphemistically.

[27.02.09. Language Log has now caught up with the story (Scrabble tips for time travellers?). It seems it’s not just the BBC which has mangled the story. Other papers have also managed to publish preposterous nonsense about it as well. The abstract for the original article in Nature doesn’t suggest that there are any great revelations to be gained from this research, but that’s not the same as poor reporting.]


3 thoughts on “Core vocabulary is quite old”

  1. I’m surprised you even managed to read the article. I can’t detect even the slightest shred of logic in those quotations you’ve posted. Utterly bizarre. Perhaps the BBC should consider closing its science department until it can find staff with at least a high school education.

  2. And this has been bugging me: How is it that English has supposedly preserved I, we, two and three unchanged over millenia while all other Indo-European languages have either altered (even if only slightly in the case of the Germanic languages) or replaced them? Or is this a phenomenon related to "If King James’ English was good enough for Jesus, it’s good enough for me!"

  3. Let’s just say that I was very persistent about the article, but it is so badly written that it’s hard to make head or tail of it.The whole matter of certain words being preserved in English is nonsensical. In OE, I was ic (pronounced something like itch); we was spelt identically, but pronounced like way; and two was twegen (m) (>twain); twa (f) or twa, tu (n). Presumably it comes from the fem/neuter forms. I guess that these are words which haven’t been displaced by others even although the pronunciation has changed since Indo-European.I don’t know. I’m inclined to be a little suspicious about this research regardless of the shocking reporting..

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s