Re: Average length of words
- From: "George Johnson" <matrix29@xxxxxxxxxxx>
- Date: Sat, 29 Aug 2009 01:24:04 -0400
"glen herrmannsfeldt" <gah@xxxxxxxxxxxxxxxx> wrote in message
news:h75j0l$s5r$2@xxxxxxxxxxxxxxxxxxx
Mok-Kong Shen <mok-kong.shen@xxxxxxxxxxx> wrote:
(snip)
< In such texts each word is mostly followed by a space. So, assuming
< the above value, the effective average space occupied by a word
< on the medium is 6 bytes with ASCII coding. By how much could this
< figure generally be reduced by a good text compression scheme?
The storage format used by the WYLBUR text editing system
used on some IBM systems compresses out blanks. The compressed
line length is known. The line contents consists of bytes where
four bits indicate the blanks until the next non-blank,
and four bits indicate the number of non-blank characters
until the next such descriptor byte or end of the line.
Compared to the IBM standard format, 80 byte fixed length
records padded with blanks, it is very good. This was used
starting in the late 1960s, where CPU time was somewhat more
important than today.
It also has the advantage that in many cases string searches
can be done on the compressed data.
-- glen
http://www.mang.canterbury.ac.nz/writing_guide/writing/flesch.shtml
========
http://able2know.org/topic/114565-1
Average English world length is 5.10 letters. For comparison, Korean
averages 3.05 letters and the German average is 6.26 letters.
Average English sentence length is 14.3 words. Much of this, and also word
length, depends on the subject and audience.
======
http://answers.yahoo.com/question/index?qid=20080526032554AAB28AF
What is the average length of a word in the English language?
Best Answer - Chosen by Voters
Five is a good rule-of-thumb (and the old standard for calculating how many
words one has counting toward a total for an assignment...).
A more precise calculation given at the following link is 5.1
http://blogamundo.net/lab/wordlengths/
One qualification -- this type of calculation is typically based on a chunk
of written text, and includes all those words that are used MULTIPLE times
within a text. Since these most repeated words tend to be shorter, esp.
common things like articles (a, the), pronouns (I, me, he, she, it. . ), and
conjunctions (and, but, or), the average is DECREASED by these repetitions.
If you took exactly the same text and list the different words it uses
without repetition (i.e "a" is only counted once, no matter how many times
the text uses it), the average would be significantly higher.
Also, note that this is "normal" language prose. Something written in
technical language (e.g., a scientific paper) would include many longer
words, and consequently have a higher average word length.
==========
http://blogamundo.net/lab/wordlengths/
Languages by Average word length
This table shows a listing of languages by average word length, as
calculated from the texts at the UDHR in Unicode.
Caveats:
1.. My definition of "word" consists of splitting on space. (Hence screwed
up counts for Amharic, Thai, etc, which don't use spaces.)
2.. I believe there are some incomplete texts in the UDHR collection I
used, not sure.
Rank -- Length -- Language
#122 -- 5.10 ----- English
.
- References:
- Average length of words
- From: Mok-Kong Shen
- Re: Average length of words
- From: glen herrmannsfeldt
- Average length of words
- Prev by Date: Re: logic minimization
- Next by Date: Postdoc position(s) available immediately
- Previous by thread: Re: Average length of words
- Next by thread: Re: Average length of words
- Index(es):
Relevant Pages
|