Re: Splitting a text file into sentences
- From: Ryan Leavengood <leavengood@xxxxxxxxx>
- Date: Wed, 30 Nov 2005 13:56:43 +0900
On 11/29/05, basi <basi_lio@xxxxxxxxxxx> wrote:
> Yes, I learned this convention when I took a keyboarding (i.e., typing)
> lesson in high school. Sometime ago, a style manual for word processing
> appeared, and one of the advice is to use only one space to separate
> sentences. The reason given is that in a justified format, those two
> spaces can become four spaces, or even more. Anyway, a lot of text now
> has one or two spaces between sentences, and this wouldn't be a
> reliable indicator of sentence boundary.
I too learned the two space after a period convention years ago and
also recently learned that with modern fonts and word processors it is
not necessary. It was tricky to retrain myself, but I did, and have
been using just one space ever since.
So like you say, that isn't a reliable way to discern sentences.
I would recommend following the advice of first filtering out false
positives (possibly even replacing them with temporary markers, Mr.
becomes $MISTER$ or similar), then splitting on punctuation. If you
then test on various sample texts you should be able to find more
false positives that you might have missed.
Ryan
.
- Follow-Ups:
- Re: Splitting a text file into sentences
- From: Damphyr
- Re: Splitting a text file into sentences
- From: basi
- Re: Splitting a text file into sentences
- References:
- Splitting a text file into sentences
- From: basi
- Re: Splitting a text file into sentences
- From: Jeffrey Schwab
- Re: Splitting a text file into sentences
- From: basi
- Splitting a text file into sentences
- Prev by Date: Re: Wizard quiz
- Next by Date: Re: [QUIZ][SOLUTION] Pinewood Derby Chart (#56)
- Previous by thread: Re: Splitting a text file into sentences
- Next by thread: Re: Splitting a text file into sentences
- Index(es):
Relevant Pages
|