Re: Splitting a text file into sentences



On 11/29/05, basi <basi_lio@xxxxxxxxxxx> wrote:
> Yes, I learned this convention when I took a keyboarding (i.e., typing)
> lesson in high school. Sometime ago, a style manual for word processing
> appeared, and one of the advice is to use only one space to separate
> sentences. The reason given is that in a justified format, those two
> spaces can become four spaces, or even more. Anyway, a lot of text now
> has one or two spaces between sentences, and this wouldn't be a
> reliable indicator of sentence boundary.

I too learned the two space after a period convention years ago and
also recently learned that with modern fonts and word processors it is
not necessary. It was tricky to retrain myself, but I did, and have
been using just one space ever since.

So like you say, that isn't a reliable way to discern sentences.

I would recommend following the advice of first filtering out false
positives (possibly even replacing them with temporary markers, Mr.
becomes $MISTER$ or similar), then splitting on punctuation. If you
then test on various sample texts you should be able to find more
false positives that you might have missed.

Ryan


.



Relevant Pages

  • Re: Splitting a text file into sentences
    ... Sometime ago, a style manual for word processing appeared, and one of the advice is to use only one space to separate sentences. ... I would recommend following the advice of first filtering out false positives, then splitting on punctuation. ...
    (comp.lang.ruby)
  • Switching off word
    ... Hi folks, I need some advice if anyone would be so kind as to offer me ... the mistakes. ... switch off the word processing ability of "Word" and use it as a blank ...
    (microsoft.public.word.newusers)