Re: Splitting a text file into sentences
- From: Damphyr <damphyr@xxxxxxxxxxx>
- Date: Wed, 30 Nov 2005 17:43:37 +0900
Ryan Leavengood wrote:
Which will not help you at all with foreign languages. And don't forget putting i.e., e.g. or etc. in the list.On 11/29/05, basi <basi_lio@xxxxxxxxxxx> wrote:
Yes, I learned this convention when I took a keyboarding (i.e., typing) lesson in high school. Sometime ago, a style manual for word processing appeared, and one of the advice is to use only one space to separate sentences. The reason given is that in a justified format, those two spaces can become four spaces, or even more. Anyway, a lot of text now has one or two spaces between sentences, and this wouldn't be a reliable indicator of sentence boundary.
I too learned the two space after a period convention years ago and also recently learned that with modern fonts and word processors it
is not necessary. It was tricky to retrain myself, but I did, and
have been using just one space ever since.
So like you say, that isn't a reliable way to discern sentences.
I would recommend following the advice of first filtering out false positives (possibly even replacing them with temporary markers, Mr. becomes $MISTER$ or similar), then splitting on punctuation. If you then test on various sample texts you should be able to find more false positives that you might have missed.
This is an ongoing problem (think about the auto-correction 'feature' of capitalizing the first letter of every sentence in Openoffice or Word - something I always turn off because it is so insistent when it's wrong)
Cheers,
V.-
--
http://www.braveworld.net/riva
____________________________________________________________________ http://www.freemail.gr - äùñåÜí õðçñåóßá çëåêôñïíéêïý ôá÷õäñïìåßïõ. http://www.freemail.gr - free email service for the Greek-speaking.
.
- References:
- Splitting a text file into sentences
- From: basi
- Re: Splitting a text file into sentences
- From: Jeffrey Schwab
- Re: Splitting a text file into sentences
- From: basi
- Re: Splitting a text file into sentences
- From: Ryan Leavengood
- Splitting a text file into sentences
- Prev by Date: Re: Programming Newbie: Ruby or Java?
- Next by Date: How to properly debug?
- Previous by thread: Re: Splitting a text file into sentences
- Next by thread: WWW::Mechanize with frames
- Index(es):
Relevant Pages
|