Re: Speech to text



agalkin@xxxxxxxxxxx writes:

Thank you for your reponse. Actually what I want is quite simple. I
want to take audio and create a time stamp table of values where words
start. For instance the first word start at 250 ms, the second word on
700 ms, the third word on 1.5 s, etc. I do have access to both audio
and corresponding text.

You could use the software we use for labeling prompts in building
speech synthesis voices.

http://www.festvox.org/bsv/bsv-label-ch.html

Using either CMU Sphinx, DTW with synthetic speech, or EHMM in the
latest version of festvox.

The success of this will depend on the quality or ther recordings, and
of course their length, you'd need to have a more eloabrate
segmentation if your sentences are longer than say about 30s.

Alan

Alan W Black email: awb@xxxxxxxxxx
Language Technologies Institute http://www.cs.cmu.edu/~awb/
Carnegie Mellon University tel: +1-412-268-6299
5000 Forbes Ave, Pittsburgh PA, 15213, USA. fax: +1-412-268-6298




tony.nospam@xxxxxxxxxxxxxxxxxxxxxxx wrote:
agalkin@xxxxxxxxxxx writes:

Is there any software preferrable shareware that allows to find time
offsets of words in a speech audio?

The quick answer is: Yes!

The slower answer has some questions for you:

Do you know what was spoken?

How long is your utterance?

How good are your acoustic conditions?

Do you want off-the-shelf or are you prepared to code?


If you know what was spoken then various companies (inc mine) can supply
you with a solution that is accurate to within perceptual discrimination
(some tens of milliseconds). Given good hacking abilities you can now
do it pretty much do it yourself with toolkits such as HTK
(htk.cam.ac.uk). I'm pleased to have been responsible for the first
automated subtitles broadcast in the UK - which is a similar task
although it involved a lot more than straight speech to text alignment.

If you don't know what is spoken then you can't get every word right,
but it's still doable. Prof. Steve Renals and myself ran a conference
in Cambridge called "Accessing information in spoken audio"
(http://svr-www.eng.cam.ac.uk/~ajr/esca99.html) back in 1999 which
provided much of the basics.

So, if your audio is good it's pretty much a question of how much you
want to spend on software. As you say that you'd prefer freeware then
I'd definitely recommend HTK - it won't cost you anything in software
licenses to to download, train up your system and produce speech to text
alignments - although it will take significant expenditure in time.


Tony

(ob ad: CxO Cantab Research: see http://www.cantabResearch.com)
.



Relevant Pages

  • Re: Bit of a sinking feeling about boot camp (rant - ish)
    ... the AIFF (Audio Interchange File Format) format that Macs ... has a lot of the good added features of Dragon Pro. ... Those added features do not improve accuracy enough to really matter, ... speech recognition application. ...
    (uk.comp.sys.mac)
  • Re: speech recognition over a newtowrk with raw PCM
    ... into the Speech engine without having to save it to file first. ... > stream and recognizing against that. ... >> audio interface to stream in the data, but I know little of it. ...
    (microsoft.public.win32.programmer.tapi)
  • Re: PhD studentship: flexible speech and audio coding
    ... Why would anyone want to become a "Ph.D. student in the field of speech ... and audio compression" having all the listed qualifications? ... will aim at extending the operating range of existing coding algorithms. ...
    (comp.dsp)
  • Re: OT: my fortune teller
    ... The speech is just me talking...I used a shareware audio editor to ... animatronic unit that allows you to control your figure with a hobby r/ ... and it records the movement sequence as an audio track. ... to the other...when you play it back, you use a patch cord to play the ...
    (rec.games.pinball)
  • Re: OT: my fortune teller
    ... The speech is just me talking...I used a shareware audio editor to ... animatronic unit that allows you to control your figure with a hobby r/ ... and it records the movement sequence as an audio track. ... to the other...when you play it back, you use a patch cord to play the ...
    (rec.games.pinball)