Re: How to make Forth interesting?



Jonah Thomas <jethomas5@xxxxxxxxx> writes:
William James wrote:

Count the occurrences of distinct sequences of letters ("words") in
a text file, sorting the results primarily by the counts and
secondarily by the "words".

Ruby:

h = Hash.new(0)
IO.read("Bible--kjv10.txt").scan(/[a-z]+/i){|w| h[w] += 1}
puts h.map{|k,v| [v,k]}.sort.map{|a| a.join " "}

Shell utilities:

awk 'BEGIN {RS="[^a-zA-Z]+"; ORS="\n"} {print}' vmgen.texi|sort|uniq -c|sort -n

Standard Forth doesn't give you all the tools to do that. My natural
thought here is to set up some new wordlists whose hash function is the
first four characters. So they'll be mostly sorted.
....

That sounds quite complicated. Forth certainly does give you the tool
to do a table indexed with the words: wordlists. If you have long
words, just use a Forth system that supports long names in the
wordlist (e.g., Gforth); if you need case sensitivity (or
insensitivity), use a system that supports that (Gforth supports
both).

I guess that there are also systems that include a sort, but
strangely, I have not needed a sort outside of shell programming in
the last 20 years, so I have not yet ported the one I wrote so many
years ago to standard Forth.

BTW, as shown above with the shell script, you don't need the lookup
table, sorting with counted uniqueness is good enough. If you do a
heap sort or a tree sort, the counted uniqueness is a trivial variant.

http://www.cs.utah.edu/dept/old/texinfo/gawk/gawk_19.html
They describe how to do this in unix using tr awk and sort.

Ok, my script can be made a little shorter with tr:

tr -c 'a-zA-Z' '\n' <vmgen.texi|sort|uniq -c|sort -n

It's funny that the gawk book used tr where I used awk.

And obviously the Ruby code above is just a huge pile of bloat; any
Forth version would be even more bloated.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2009: http://www.euroforth.org/ef09/
.



Relevant Pages

  • Re: A Fast sorting algorithm for almost sorted data
    ... far my compressor has potential but is nowhere near ready. ... It does however make heavy use of sorting. ... which I am currently calling Run sort. ... entire selected run can be added to the sorted output array. ...
    (comp.compression)
  • Re: Solution for sorting an array alpha-numerically
    ... strings up into groups and sorting the groups seperately, ... > so that numeric and alphabetic data sort as seperate groups. ... To the same project as the web page, add the class AlphaNumCompare() ...
    (microsoft.public.dotnet.general)
  • Re: how fast can I sort on mainframe (using DFSORT)?
    ... Since I joined the team as the performance lead a couple years ago, ... Frank now defers these types of questions to me. ... I have been out of the sorting business for a while, ... Writing to sort work files should not be the problem, ...
    (bit.listserv.ibm-main)
  • Re: except tasks from sorting
    ... position out of any sort key. ... But we will sorting subsequently. ... sort key is a Text ... hint to filtering the tasks before ...
    (microsoft.public.project)
  • Re: When random isnt random
    ... >> (and, if there is not one already, a Sorting Unit). ... TList has a Sort method. ... Try it with a TList and in the compare function ... There seems to be, sometimes, a requirement for a Shuffle that leaves ...
    (borland.public.delphi.language.objectpascal)