Phonetic Name Generator



Hi there, folks!

I started writing a Rogue Like about two weeks ago and took a break
from getting the game object system perfected to muck about with
random name generation. Below is my first stab at it, a probabilistic
approach using English phonetics.

Header File:

class NameGenerator {

enum PHENOM_TYPE {
CONSONANT = 1,
VOWEL = 2,
DIPHTHONG = 3
};

typedef std::map<std::string, uint> phenom_list;


private:

// A list of phenoms, mapped to how likely they are
// The chance of each appearing is independent
phenom_list consonant_list;
phenom_list vowel_list;
phenom_list diphthong_list;

// a list of phenoms, mapped to their English character translations
std::multimap<std::string, std::string> phenom_trans;

protected:

// rolls _die d _sides, adds up the result, then returns the number
uint roll(uint _die, uint _sides);

// Chooses a phenom from the passed in list
std::string choosePhenom(phenom_list &_list);

// Choose a consonant
std::string chooseConsonant(void) {

return choosePhenom(consonant_list);
}

// Choose a vowel
std::string chooseVowel(void) {

return choosePhenom(vowel_list);
}

// Choose a diphthong
std::string chooseDip(void) {

return choosePhenom(diphthong_list);
}

public:
// constructor
NameGenerator();

// Generates a random string of phenoms up to _max_length long,
being a minimum of _min_length, and having an early exit chance of
_exit_chance each time through the generation routine
std::string generatePhenom(uint _max_length, uint _min_length, uint
_exit_chance);

// Translate a string of phenoms into English characters
std::string translatePhenom(std::string _phenom);
};

CPP File:

// constructor
NameGenerator::NameGenerator() {

// Set up consonant tables
consonant_list["p"] = 10; // p as in pen, strip, tip
consonant_list["b"] = 10; // b as in but, web
consonant_list["t"] = 10; // t as in two, sting, bet
consonant_list["tS"] = 10; // 'cha' as in CHair, naTure, teaCH
consonant_list["T"] = 10; // t as in THing, breaTH
consonant_list["d"] = 10; // d as in do, odd
consonant_list["dZ"] = 10; // 'g' as in Gin Joy eDGE
consonant_list["k"] = 10; // k as in Cat, Kill, sKin, QUeen thiCK
consonant_list["g"] = 10; // g as in go, get, beg
consonant_list["f"] = 10; // f as in fool, enough, leaf
consonant_list["v"] = 10; // v as in voice, have, of
consonant_list["D"] = 10; // t as in this, breaTHE
consonant_list["s"] = 10; // s as in See, City, paSS
consonant_list["z"] = 10; // z as in zoo, rose
consonant_list["S"] = 10; // s as in SHe, Sure, emoTIon, leaSH
consonant_list["Z"] = 10; // z as in pleaSUre, beiGE
consonant_list["h"] = 10; // h as in ham
consonant_list["m"] = 10; // m as in man, ham
consonant_list["n"] = 10; // n as in No, tiN
consonant_list["N"] = 10; // ng as in siNGer, riNG
consonant_list["l"] = 10; // l as in left, bell
consonant_list["r"] = 10; // r as in Run, veRy
consonant_list["w"] = 5; // w as in We
consonant_list["j"] = 10; // 'ja' as in Yes
consonant_list["W"] = 10; // w as in WHat
consonant_list["x"] = 10; // 'och' as in loCH

// Translation table for consonants

phenom_trans.insert(std::pair<std::string,std::string>("p","p"));
phenom_trans.insert(std::pair<std::string,std::string>("b","b"));
phenom_trans.insert(std::pair<std::string,std::string>("t","t"));
phenom_trans.insert(std::pair<std::string,std::string>("T","th"));
phenom_trans.insert(std::pair<std::string,std::string>("tS","t"));
phenom_trans.insert(std::pair<std::string,std::string>("tS","ch"));
phenom_trans.insert(std::pair<std::string,std::string>("d","d"));
phenom_trans.insert(std::pair<std::string,std::string>("d","dd"));
phenom_trans.insert(std::pair<std::string,std::string>("dZ","g"));
phenom_trans.insert(std::pair<std::string,std::string>("dZ","j"));
phenom_trans.insert(std::pair<std::string,std::string>("dZ","dge"));
phenom_trans.insert(std::pair<std::string,std::string>("k","c"));
phenom_trans.insert(std::pair<std::string,std::string>("k","k"));
phenom_trans.insert(std::pair<std::string,std::string>("k","qu"));
phenom_trans.insert(std::pair<std::string,std::string>("k","ck"));
phenom_trans.insert(std::pair<std::string,std::string>("g","g"));
phenom_trans.insert(std::pair<std::string,std::string>("f","f"));
phenom_trans.insert(std::pair<std::string,std::string>("f","ough"));
phenom_trans.insert(std::pair<std::string,std::string>("v","v"));
phenom_trans.insert(std::pair<std::string,std::string>("v","f"));
phenom_trans.insert(std::pair<std::string,std::string>("th","th"));
phenom_trans.insert(std::pair<std::string,std::string>("D","th"));
phenom_trans.insert(std::pair<std::string,std::string>("D","the"));
phenom_trans.insert(std::pair<std::string,std::string>("s","s"));
phenom_trans.insert(std::pair<std::string,std::string>("s","c"));
phenom_trans.insert(std::pair<std::string,std::string>("s","ss"));
phenom_trans.insert(std::pair<std::string,std::string>("z","z"));
phenom_trans.insert(std::pair<std::string,std::string>("z","x"));
phenom_trans.insert(std::pair<std::string,std::string>("z","se"));
phenom_trans.insert(std::pair<std::string,std::string>("S","s"));
phenom_trans.insert(std::pair<std::string,std::string>("S","sh"));
phenom_trans.insert(std::pair<std::string,std::string>("S","ti"));
phenom_trans.insert(std::pair<std::string,std::string>("Z","su"));
phenom_trans.insert(std::pair<std::string,std::string>("Z","ge"));
phenom_trans.insert(std::pair<std::string,std::string>("h","h"));
phenom_trans.insert(std::pair<std::string,std::string>("m","m"));
phenom_trans.insert(std::pair<std::string,std::string>("n","n"));
phenom_trans.insert(std::pair<std::string,std::string>("N","ng"));
phenom_trans.insert(std::pair<std::string,std::string>("l","l"));
phenom_trans.insert(std::pair<std::string,std::string>("l","le"));
phenom_trans.insert(std::pair<std::string,std::string>("l","ll"));
phenom_trans.insert(std::pair<std::string,std::string>("r","r"));
phenom_trans.insert(std::pair<std::string,std::string>("w","w"));
phenom_trans.insert(std::pair<std::string,std::string>("j","y"));
phenom_trans.insert(std::pair<std::string,std::string>("W","wh"));
phenom_trans.insert(std::pair<std::string,std::string>("x","ch"));

// Vowel list
vowel_list["A"] = 10; // 'a' as in father
vowel_list["i"] = 30; // 'e' as in sEE
vowel_list["I"] = 10; // 'i' as in cIty
vowel_list["E"] = 30; // 'e' as in bEd
vowel_list["3`"] = 10; // 'ir' as in bIRd
vowel_list["{"] = 10; // 'a' as in lAd, cAt, rAn
vowel_list["Ar"] = 10; // 'ar' as in ARm
vowel_list["V"] = 10; // 'u' as in rUn, enOUgh
vowel_list["0"] = 10; // 'a' as in nOt, wAsp
vowel_list["O"] = 10; // 'o' as in lAW, cAUght
vowel_list["U"] = 10; // 'u' as in pUt
vowel_list["u"] = 30; // 'u' as in sOOn, thrOUgh
vowel_list["@"] = 10; // 'a' as in about
vowel_list["@`"] = 10; // 'er' as in winnER

// Translation table for vowels
phenom_trans.insert(std::pair<std::string,std::string>("A","a"));
phenom_trans.insert(std::pair<std::string,std::string>("i","e"));
phenom_trans.insert(std::pair<std::string,std::string>("i","ee"));
phenom_trans.insert(std::pair<std::string,std::string>("I","ei"));
phenom_trans.insert(std::pair<std::string,std::string>("I","i"));
phenom_trans.insert(std::pair<std::string,std::string>("E","e"));
phenom_trans.insert(std::pair<std::string,std::string>("3`","ir"));
phenom_trans.insert(std::pair<std::string,std::string>("{","a"));
phenom_trans.insert(std::pair<std::string,std::string>("Ar","ar"));
phenom_trans.insert(std::pair<std::string,std::string>("V","u"));
phenom_trans.insert(std::pair<std::string,std::string>("V","ou"));
phenom_trans.insert(std::pair<std::string,std::string>("0","o"));
phenom_trans.insert(std::pair<std::string,std::string>("O","au"));
phenom_trans.insert(std::pair<std::string,std::string>("O","aw"));
phenom_trans.insert(std::pair<std::string,std::string>("U","u"));
phenom_trans.insert(std::pair<std::string,std::string>("u","oo"));
phenom_trans.insert(std::pair<std::string,std::string>("u","ou"));
phenom_trans.insert(std::pair<std::string,std::string>("@","a"));
phenom_trans.insert(std::pair<std::string,std::string>("@`","er"));

// Dipthongs
diphthong_list["e"] = 10; // 'ay' as in dAY
diphthong_list["aI"] = 10; // 'iy' as in mY
diphthong_list["OI"] = 10; // 'oy' as in bOY
diphthong_list["o"] = 10; // 'oh' as in nO
diphthong_list["aU"] = 10; // 'ow' as in nOW
diphthong_list["ir"] = 10; // 'ere' as in nEAR, hERE
diphthong_list["er"] = 30; // 'air' as in thERE, hAIR
diphthong_list["Ur"] = 30; // 'our' as in tOUR
diphthong_list["ju"] = 10; // 'oou' as in pUpil

// Diphthong translations
phenom_trans.insert(std::pair<std::string,std::string>("e","ay"));
phenom_trans.insert(std::pair<std::string,std::string>("aI","y"));
phenom_trans.insert(std::pair<std::string,std::string>("aI","ai"));
phenom_trans.insert(std::pair<std::string,std::string>("OI","oy"));
phenom_trans.insert(std::pair<std::string,std::string>("o","o"));
phenom_trans.insert(std::pair<std::string,std::string>("aU","ow"));
phenom_trans.insert(std::pair<std::string,std::string>("ir","ear"));
phenom_trans.insert(std::pair<std::string,std::string>("ir","ere"));
phenom_trans.insert(std::pair<std::string,std::string>("er","ear"));
phenom_trans.insert(std::pair<std::string,std::string>("er","air"));
phenom_trans.insert(std::pair<std::string,std::string>("Ur","our"));
phenom_trans.insert(std::pair<std::string,std::string>("ju","u"));
}

// rolls _die d _sides, adds up the result, then returns the number
uint NameGenerator::roll(uint _die, uint _sides) {

uint result = 0;

// For each die to roll,
for (int i = 0; i < _die; ++i) {

// Roll the die and add it to the result
result = result + (rand() % _sides + 1);
}

// return the result
return result;
}

// Chooses a phenom from the passed in list
std::string NameGenerator::choosePhenom(phenom_list &_list) {

std::string phen = "";

// We try until a winner is found
while (phen == "") {

// Choose a random spot in the phenom array
uint pos = roll(1, _list.size()) - 1;

// Roll to see if we add this
uint chance = roll(1, 100);

phenom_list::iterator p = _list.begin();

// Move the iterator
for (int i = 0; i < pos; ++i) {
++p;
}

if ( (*p).second <= chance) {

// Add it to the string
phen = (*p).first;

// Break out of the loop
break;
}

}

return phen;
}


// Generates a random string of phenoms up to _max_length long, being
a minimum of _min_length, and having an early exit chance of
_exit_chance each time through the generation routine
std::string NameGenerator::generatePhenom(uint _max_length, uint
_min_length, uint _exit_chance) {

std::string phenom;

// Choose a random state to start as
uint state = roll(1, 3);
uint chance = 0;
std::string phenom_type;

// For every possible phenom
for (int len = 0; len < _max_length; ++len) {

// Switch on what type of sound we want
switch (state) {

case CONSONANT:

phenom_type = chooseConsonant();

// Append it to the string with the seperator token
phenom = phenom + "|";
phenom = phenom + phenom_type;

// Change the state
chance = roll(1, 100);

// vowels mostly come after consonantes
if (chance <= 60) {

state = VOWEL;
}
// Another consonant
else if (chance <= 95) {
state = CONSONANT;
}
// A diphthong
else {
state = DIPHTHONG;
}

break;

case VOWEL:

phenom_type = chooseVowel();

// Append it to the string
phenom = phenom + "|";
phenom = phenom + phenom_type;

// Change the state
chance = roll(1, 100);
// vowels pairs are unusual
if (chance <= 10) {

state = VOWEL;
}
// Consonants arent
else if (chance <= 95) {
state = CONSONANT;
}
// A diphthong
else {
state = DIPHTHONG;
}

break;

case DIPHTHONG:

phenom_type = chooseDip();

// Append it to the string
phenom = phenom + "|";
phenom = phenom + phenom_type;

// Change the state
chance = roll(1, 100);
// Only consonates and vowels allowed after
// diphthongs
if (chance <= 10) {

state = VOWEL;
}
else {
state = CONSONANT;
}

break;

}

// Check to see if we break early
if (len >= _min_length - 1) {
if (roll(1, 100) < _exit_chance) {
break;
}
}

}

return phenom;
}

// Translate a string of phenoms into English characters
std::string NameGenerator::translatePhenom(std::string _phenom) {

std::string translated; // The translated string
std::string token; // The current token

std::multimap<std::string, std::string>::iterator p_find;
std::multimap<std::string, std::string>::iterator p_last;

// skip delimiters at beginning.
std::string::size_type lastPos = _phenom.find_first_not_of("|",
0);

// find first "non-delimiter".
std::string::size_type pos = _phenom.find_first_of("|", lastPos);

// For every token in the string,
while (std::string::npos != pos || std::string::npos != lastPos)
{

// Find a token
token = _phenom.substr(lastPos, pos - lastPos);

// Lookup the translation(s)
p_find = phenom_trans.find(token);
//p_last = phenom_trans.upper_bound(token);

// If a token was found,
if (p_find != phenom_trans.end()) {

// Get one of the possible translations randomly
uint trans_num = roll(1, phenom_trans.count(token)) - 1;

for (int i = 0; i < trans_num; i++) {
++p_find;
}

translated = translated + (*p_find).second;

}
else {
translated = translated + "'";
}

// Skip delimiters. Note the "not_of"
lastPos = _phenom.find_first_not_of("|", pos);

// Find next "non-delimiter"
pos = _phenom.find_first_of("|", lastPos);

}

return translated;
}

It produces some pretty interesting output even with all those hard-
coded magic numbers (results were with settings of generatePhenom(10,
3, 60); you can get wildly different generations just by changing
those three settings):

|O|T|A|k
authac

|aU|dZ|A
owga

|V|f|I|t|@`
oufiter

|OI|Z|j
oygey

|3`|dZ|E
irje

|e|r|W
ayrwh

|o|n|w|V
onwou

|O|l|w
aulew

|OI|t|@
oyta

|E|f|3`
efir

|s|w|I|tS
ssweich

|p|m|I
pmei

|Ur|j|d
ourydd

|Ur|r|OI
ourroy

|l|j|x|g
lychg

|s|W|b
swhb

|OI|s|0|u|w|u|p|I
oysoouwoopei

|x|@`|s
cherss

|aI|z|@
yxa

|@|f|A
afa

Not bad, but not perfect. My next step is to move the rules and
translation tables over to a separate object. Something like a
Phonetic Dictionary. To change the style of the words (or the rules by
which they are generated), you could then just set a different
dictionary. Markov chains would probably be a better, easier way to
go, but... what's life without a bit of over engineering, eh?
.



Relevant Pages

  • Re: Interconversion of and
    ... I've not seen any evidence of a distinction being made in Thai, ... I've often wondered about the tone mark on ... omitted the vowel development though ignorance. ... consonant governance, a.k.a. register spreading, despite the ...
    (sci.lang)
  • Re: Is Deutsche Einheitskurzschrift an example of an Abugida?
    ... indicated by raising, lowering, enhancing, distancing ... symbols are used in place of a following consonant. ... Absence of a vowel ... Didn't you use to point that writing is always ...
    (sci.lang)
  • Re: Multi(dual) language support in roguelikes.
    ... naively thought they were driven by vowel vs consonant placement, ... Whether or not you pronounce h ... gram_getarticle(const char *noun) ... // Check if first letter is a vowel. ...
    (rec.games.roguelike.development)
  • Learn to Speak Thai in 1 Easy Lesson ...and then ten years of summer school
    ... consonant class each consonant belongs to. ... a particular consonant belongs to as it has an effect on the tone. ... Syllables in spoken Thai can only end ... A syllable that ends with a long vowel or a sonorant final consonant is ...
    (soc.culture.thai)