Re: Phonetic Name Generator



Mario Donick wrote:


This is a good step in the right direction. How do you determine which
pairs are "weird"?

[There exists some linguistic knowledge about typical and non-typical
phoneme combinations in several languages. I don't know about them for
English in particular (I am German (and German linguist)), but I am
sure that you can find information on this topic in the Internet or in
books.]

English is very strange because it borrows words (and spelling conventions)
from many other languages. Generally, however, there's a three-phoneme
test that works well for identifying words whose pronunciation will seem
consistent.

You can find phonetic combination frequency tables for English - I
recommend Freidman's and Norvig's, although nothing prevents you from
compiling your own. These give relative frequencies for the most
common three-phoneme sequences found in English.

A first approximation gives one phoneme per letter, but there are a lot
of special cases where it isn't true. English uses a lot more phonemes
than there are letters in its alphabet, and for each phoneme it seems
there are at least three or four different ways to write it. Doubled
P's, L's T's, O's and S's for example represent single phonemes, as do
digraphs like ch, ph, gh, th, and so on. English also treats a number
of combined vowels like ie, ea, and so on as single phonemes. Trailing
E is usually not a phoneme itself, but instead denotes a modified
vowel sound in the last syllable. 's' codes for a different phoneme at
the beginning of a word or following a consonant than when found
following a vowel. And so on. There's a big list of rules in an
appendix -- I think it was in Norvig's book, but most of my computer
linguistics books are in boxes right now so I'm not looking it up.
Anyway, if you code those rules, you have a much closer approximation
to the phoneme sequence.

Now you take your phonetic combination frequency table and use it like
a "sliding window" to examine each subsequence of three phonemes. Each
word is assigned a score which is the product of its relative frequencies
divided by the logarithm of its length. Words that score higher are
more likely to be seem to 'belong' in English, or be 'familiar' to
speakers of English. Words that are unpronounceable will generally
score zero (contain at least one three-phoneme sequence whose frequency
is zero).

The same test works fine in other languages, but identifying phonemes
unambiguously using software is usually easier and the base of the
logarithm needs adjustment. Use the same base as for the logarithm
in Zipf's frequency rule for spelling for your language, adjusted
for the average number of letters per phoneme.

Alternatively -- almost as accurate and with far less linguistic
analysis -- you can use alphabetic rather than phonetic frequency
tables but you need four-letter sequences rather than three-phoneme
sequences. The disadvantage here is that the tables have to be utterly
huge or else they'll throw out a lot of words that are quite
pronounceable, and because they'd fill a whole book and they're not
very interesting, I don't know of anybody who's published a good
set. Compiling your own tables in this case would require simple
software, but you'd need access to a quite large corpus of text.

Bear

.



Relevant Pages

  • Re: Literary phonetic alphabet
    ... When a writing system is newly devised, ... be to assign one symbol to one phoneme. ... While the language is changing, however, its ... English, is that spelling was pretty much fixed by the later 17th ...
    (sci.lang)
  • Re: Literary phonetic alphabet
    ... be to assign one symbol to one phoneme. ... While the language is changing, however, its ... English, is that spelling was pretty much fixed by the later 17th ... But I'm not talking about a phonemic alphabet. ...
    (sci.lang)
  • Re: Literary phonetic alphabet
    ... be to assign one symbol to one phoneme. ... While the language is changing, however, its ... English, is that spelling was pretty much fixed by the later 17th ... symbols, each denoting a single lengthless, monotone vowel sound. ...
    (sci.lang)
  • Re: Literary phonetic alphabet
    ... be to assign one symbol to one phoneme. ... While the language is changing, however, its ... English, is that spelling was pretty much fixed by the later 17th ... alphabet, for informal purposes. ...
    (sci.lang)
  • Re: Transcribing rhotics for ESL
    ... >> I'm sure Mxsmaniac knows this perfectly well. ... >> and the English phoneme involved are judged to be markedly different ... What is wrong with a phonemic transcription? ... the "r" in American English is a schwa with a hook on ...
    (sci.lang)

Loading