iListen - optimising performance - an experienced user's comments.
- From: Cameron Downunder <contactcameron@xxxxxxxxxxx>
- Date: Fri, 6 Feb 2009 17:58:07 -0800 (PST)
iListen is a speech to text program by Macspeech. It has been recently
superceeded by Macspeech's next evolution of its speech to text
programs called "Dictate".
Unfortuantely for some mac users to run Dictate you must have an intel
Mac. Not all mac users wanting speech to text can upgrade their
hardware, so in support of mac users wanting to use iListen, here are
some comments from my experience for seeking optimum performance from
iListen. Tweak a little more out of it perhaps to make it usable.
From my understanding reading the Macspeech website, Macspeech are nolonger providing support for iListen Users.
As a long term iListen user (or tester is probably more accurate), I
thought it may help if I took an hour or so to write up some notes on
what I found helpful in optimising my set up. This was determined
from testing rigorously to see what level of performance I gained with
various setups.
If you can not upgrade to Dictate at this time, this may help you
tweak a marginally more useful performance out of iListen in the
meantime.
Originally posted on comp.sys.mac.os in response to some other posts
there, but have realised this post is better suited to
Comp.sys.mac.apps.
My Testing of iListen:
I have tested iListen quite methodically on two Powerbooks. A 500 Mhz,
and 1 Ghz machine. I first began with iListen on a G3 Powerbook with
iListen 1.2.1 or something like that. I spent years trying to get good
performance out of it.
My most comprehensive testing was on the Powerbook titanium series
1Ghz, 500 RAM, recently upgraded to 1 GB RAM.
Testing and use was mostly with a VXi TalkPro headset with Andrea USB
sound pod. Previous to that I used an NC 7100 headset. Both these
headsets are top of the range, noise cancelling headsets.
Both produced similar performance though I suspect the optimal set up
in iListen differs slightly for each headset.
Before sharing my findings from testing re optimal settings, some
notes to other iListen users.
It is important to remember to benefit from an optimal set up you have
to train the voice profile using that optimal setup, not just dictate
with that set up using an old profile trained with a different set
up.
Unfortunately there will be some variations of what is your optimal
setup to mine as you will have different machines, headsets, and
personal voice characteristics.
Even so, these settings may prompt you to question some of your
current setup, and trial something that proves better. Takes time I
know. I hope it helps some iListen users.
I went from a raw recognition rate of around 70% (70 words per 100) to
low 90's % (90-95 words per 100) on my easiest test texts with careful
testing and finding my optimal setup.
What is an effective recognition rate?:
For effective productive speech to text, (and without a disability)
eg, replace manual typing of letters or a secretary, a recognition
rate of better than 98% seems to be the base threshold in my
experience.
That is less than 2 words incorrect per 100 words.
Below this level of performance, speech to text is still useful for
commands (eg opening applications on your computer using only your
voice commands), or for getting ideas out, knowing the translation
only needs to be good enough for you to recognise the point or idea,
and you can type it up better later - manually.
How much training. How many Learn My Voice Tutorials?
With iListen (*not the new Dictate*) I was surprised to find that the
learning curve of iListen is not linear over sequential Learn My Voice
training tutorial, nor even by number of individual frames (the
paragraphs you read that make up a training tutorial.).
For three of my better voice profiles I tested them consistently after
each Tutorial with easy and hard test texts.
All three showed a similar pattern of progress. For all, performance
improved up to the third tutorial, then went backwards for some, then
improved again to all reach a similar performance around the
completion of the 7th Tutorial.
From my experience, it would seem that if you are trialing a new voiceprofile with what you hope is a more optimal setup, you are best
comparing accuracy after completing the first two training tutorials,
not just the first tutorial.
If you get a clearly better result than previous profiles after the
second training session, that generally means you have a better set
up. I think this is useful information.
Reiterating, after only the first training session, most profiles with
optimal and not quite so optimal set ups tended to perform much the
same.
If you are in a hurry, and if my experience is a good guide, training
to the completion of the third tutorial is the best performance gain
for amount of time committed to the task - but stop there.
If you train further it would appear you need to complete at least up
to the end of tutorial 7. This was where all my good profiles tended
to converge to perform about the same. So poorer of the good profiles
caught up, and best profiles ended up about the same as at the end of
the third tutorial.
From tutorial 7 onwards, there was small steady improvement with eachtutorial. About an additional 5 words per 100 improvement on my most
difficult test text, peaking around 83% to 85%, and up to 93% to 94%
on my easiest test texts.
My optimal settings:
Optimal Volume and Sound Threshold were the most significant factors
in improving performance.
Volume:
I found Volume setting was quite critical for optimal performance,
with a narrow band within about ± 5% being optimal. Auto volume
setting did not get it right in my use of it.
On my Powerbook titanium series 1Ghz, 1 GB RAM optimal volume was
around 75% to 77 % All this was with a VXi TalkPro headset and Andrea
USB sound thingo.
This volume was essentially the highest volume with no hint of sound
distortion - even accounting for variations in my own voice volume
when dictating. So no sound distortion at the loudest my voice may get
when dictating.
Sound Threshold:
Sound Threshold I found even less reliable on automatic setup and I
found lowering it to 15% considerably improved performance, provided I
also had the Volume optimal and the mic distance right (3 finger
widths out from left corner of mouth).
Below a sound threshold of 15% ( eg 10%) background noise would become
significant (mic peaking into green when not talking, generating
random sounds to be translated by iListen.) Above that level, lower
the better.
So a way to find this on your system is to play with the Sound
Threshold, and choose a level a little above background noise.
(Background noise stays in the yellow band, then add a little bit more
for a safety margin.)
A test that you do not have it too low is to quietly sip some water
while with the headset on and mic on. You should be just able to do so
without exceeding the Sound Threshold. (No slurping.)
I found a good test that I needed to play with manual settings was to
repeat the mic setup several times and see how much the auto settings
varied. ST for me would vary from high 20's to 50's, though generally
around mid 30's. Volume from 60% to 80%. Desktop machines may be more
consistent.
What about optimal mic distance from the mouth?
I tried different volume settings with different mic distances. For
example, I wondered which was better: closer mic with lower volume
setting, or further mic distance with higher volume setting. I found
3 finger widths distance was the best distance, out from the left
corner of my lips.
Again the test that you have the mic not too close and in a good
position is to quietly sip some water while with the headset on and
mic on. You should be just able to do so without bumping the mic,
just, but repeatedly.
Something to watch out for in regards to random noises. I had to set
my mic headset and chord up so that if I move my head, there was no
movement of the chord against my cloths or collar, leading to sound
going into the mic through the materials of the headset itself. The
VXi has a chord clip that helps me ensure a little free chord that
loops out a bit around to the headset so there is no rubbing of the
chord as I turn my head.
Watch out for bushy beards also. No rubbing of things triggering mic
noise.
Voice speed in training:
I found training the first training tutorial with slow deliberate
speech, as slow as possible *while still speaking normally*, was
helpful. Then progressing to a moderate though still careful speech
rate on subsequent tutorials, speaking normally. This helped.
I found that training with a fast voice did not generate a Voice
Profile that was better suited to faster speech.
Speed vs accuracy Preference setting:
I set speed vs accuracy in the Preferences at 0.5 (half way between
the two).
Hardware limiting performance:
I found that beyond 1 Ghz, and 500 MB of RAM, I suspect there is not a
lot of performance difference with more CPU speed or more RAM. The
learning rate from a 500 MHz to 1 Ghz machine was only small.
Upgrading the 1 Ghz machine from 500 MB RAM to 1 GB RAM did not make
any significant difference to performance.
The limitation on iListen performance would appear to be the Phillips
Speech Engine rather than computer performance from there on.
Changing OS from OS 10.2.x to OS 10.3.9 did not seem to make a
significant difference. However, my laptop DVD drive that was not
working, has been working beautifully and without a hitch since I
upgraded to 10.4.11. Who would have guessed????
I can only hope this relaying of my experiences is of help to some
iListen users out there.
Cheers
Cameron Downunder.
.
- Prev by Date: Re: App to maintain website - recommendations?
- Next by Date: Re: App to maintain website - recommendations?
- Previous by thread: Adobe Reader 9.0.0 Question
- Next by thread: Re: Safari: Change the 'cycle through tabs' shortcut?
- Index(es):
Relevant Pages
|
Loading