Re: Speech to text



Thank you for your reponse. Actually what I want is quite simple. I
want to take audio and create a time stamp table of values where words
start. For instance the first word start at 250 ms, the second word on
700 ms, the third word on 1.5 s, etc. I do have access to both audio
and corresponding text.


tony.nospam@xxxxxxxxxxxxxxxxxxxxxxx wrote:
agalkin@xxxxxxxxxxx writes:

Is there any software preferrable shareware that allows to find time
offsets of words in a speech audio?

The quick answer is: Yes!

The slower answer has some questions for you:

Do you know what was spoken?

How long is your utterance?

How good are your acoustic conditions?

Do you want off-the-shelf or are you prepared to code?


If you know what was spoken then various companies (inc mine) can supply
you with a solution that is accurate to within perceptual discrimination
(some tens of milliseconds). Given good hacking abilities you can now
do it pretty much do it yourself with toolkits such as HTK
(htk.cam.ac.uk). I'm pleased to have been responsible for the first
automated subtitles broadcast in the UK - which is a similar task
although it involved a lot more than straight speech to text alignment.

If you don't know what is spoken then you can't get every word right,
but it's still doable. Prof. Steve Renals and myself ran a conference
in Cambridge called "Accessing information in spoken audio"
(http://svr-www.eng.cam.ac.uk/~ajr/esca99.html) back in 1999 which
provided much of the basics.

So, if your audio is good it's pretty much a question of how much you
want to spend on software. As you say that you'd prefer freeware then
I'd definitely recommend HTK - it won't cost you anything in software
licenses to to download, train up your system and produce speech to text
alignments - although it will take significant expenditure in time.


Tony

(ob ad: CxO Cantab Research: see http://www.cantabResearch.com)

.



Relevant Pages

  • Re: Speech to text
    ... although it involved a lot more than straight speech to text alignment. ... in Cambridge called "Accessing information in spoken audio" ... I'd definitely recommend HTK - it won't cost you anything in software ...
    (comp.speech.research)
  • Re: Question about Audio tracks
    ... More a quesiton about synchronization. ... If the audio time stamp is natively different from the video time stamp, ... > of how I might add an audio track to an existing file original.asf: ...
    (microsoft.public.windowsmedia.sdk)
  • Re: Speech to text
    ... want to take audio and create a time stamp table of values where words ... For instance the first word start at 250 ms, ... I do have access to both audio ... Using either CMU Sphinx, DTW with synthetic speech, or EHMM in the ...
    (comp.speech.research)
  • Re: Speech to text
    ... want to take audio and create a time stamp table of values where words ... I do have access to both audio ... prompt, go off prompt for a bit, books have chapter titles that aren't ... BEEP for British English) and some time to learn it all. ...
    (comp.speech.research)
  • Re: Bit of a sinking feeling about boot camp (rant - ish)
    ... the AIFF (Audio Interchange File Format) format that Macs ... has a lot of the good added features of Dragon Pro. ... Those added features do not improve accuracy enough to really matter, ... speech recognition application. ...
    (uk.comp.sys.mac)