Re: Self-Documenting Code Contest
- From: Reinout Heeck <reinout@xxxxxxxx>
- Date: Thu, 07 Aug 2008 12:19:07 +0200
Panu wrote:
So let me "comment" below what I find could perhaps be more self-documenting. [...] Feel free to incorporate any suggestions here if you modify your entry ...
Likewise feel free to enhance my entry and submit it ;-)
| twoWordAnagrams query wordListURL wordList selectedWords |
query := 'documenting' asSortedCollection.
" 'query' is not the most obvious name for this variable
I would think - at least if I haven't read the rules
of the contest. So perhaps something like
'stringToCreateAnagramsFromSorted' would be more
self-documenting.
"
This was one of the hardest variable names for me.
It 'had to' be at the beginning - declaring the inputs in a way - so it has to 'introduce' the intent of the algorithm.
I didn't want to go for the 'stringToCreateAnagramsFromSorted' style. This was a personal choice, I fully expect the majority of the other entries to use this form and wanted mine to stand out from the crowd :-)
Having said that I did use 'twoWordAnagrams' but that was as far as I wanted to go whith long identifiers. One sneaky thing I did was to make that the very first word of the program, not something most people will consciously notice but it might help at the subconscious level...
Mulling a bit over 'stringToCreateAnagramsFromSorted' - I guess I would use something like 'charactersToCreateAnagramsFrom' if I had chosen to allow such long identifiers.
In hindsight something like 'availableCharacters' or 'charctersToUse' might have been a stronger choice. The 'sorted' bit is already clear in the assignment so I don't think that needs to be repeated in the variable name.
"wordListURL := 'http://pragdave.pragprog.com/data/wordlist.txt'.
wordList :=
(wordListURL asURI get contents) tokensBasedOn: Character cr.
selectedWords :=
wordList select: [:string |
string size < query size
and: [string allSatisfy: [:character | query includes: character]]].
I'm not sure about the exact definition of 'anagram'. Shouldn't
the anagram have the exactly same length as the base word,
the above makes me ask.
Why? You haven't yet seen the word 'anagram' at this point (except in the temp declarations of course).
I deliberately used #< instead of #<=, to force people to place a mental marker at that point of the program because it seems fishy, I tried to force the reader to come back here later.
See http://en.wikipedia.org/wiki/Anagram . Ah but I see below
that you combine the 'selectedWords' into 'twoWordAnagrams'.
In any case I would say this is not quite self-documenting
since I'm getting confused here.
Heh :-)
I consciously structured the code so it had these 'little surprises' -- they force you to look back in the listing and verify your mental picture. I thought this was a great way of documenting a program -- it seems you prefer a linear story wheras I don't mind having a coherent whole that needs some non-linearities in reading.
My aim was to have full program comprehension by the time you read it all (that was my private definition of 'self documenting') - not to adhere to a linear explanation.
I think the point is that looking at the above statement,
and without knowing the algorithm beforehand, or reading
the rest of the code, it is hard to understand the PURPOSE
of the above statement. You can usually understand a comment
without having to read any other comments, so self-documenting
code should strive towards the same ideal. Obviously this is
not always possible. (Which I think is a good argument for
having good comments embedded in your code)
"
twoWordAnagrams :=
OrderedCollection new.
selectedWords do: [:word1 |
selectedWords do: [:word2 | | characters |
characters := word1 , word2.
" Here, unless I know Smalltalk, I may wonder how come
two words becomes a set (?) of Characters.
Is characters
the same as a word? Perhaps
combinationWord := word1, word2
would be more self-explanatory.
Funny, I was concerned with expressing that it was *used* as a set, not with *how* that use was achieved in the case of Smalltalk.
I had to introduce the semantics of #, in the program and thought this was sufficient hint that it somehow combines strings. Later on, where results are added to twoWordAnagrams there is further explanation in that it tries to make clear that #, means concatenation.
I had to balance between explaining how Smalltalk works (hence type info used as block argument names here and there, like 'string') and how the algorithm works. Other names I tried for this temp where 'string' and 'anagram', I found neither satisfactory for this particular audience.
""Hm. I had to think about this a bit, how and why it works.characters asSortedCollection = query
ifTrue: [twoWordAnagrams add: word1 , ' ' , word2]]].
Good! This was my intent here -- look at what 'query' holds, realize that is is comparable with a set of characters that has been sorted, and see the explicit example of #, being concatenation.
I was afraid that a lot of programmers from other languages might not automatically think of strings as collections so I wanted my program to force that notion on them. Between the assignment of 'query' and the use of it in an equality test I needed to put the notion of a collection which was my main driver for using 'characters'. Another was that I needed to disambiguate the #asSortedCollection send in the assignment to 'query'; some readers might construe that as creating a sorted collection of strings instead of a sorted collection of characters. Explicitly using the variable name 'characters' and the expression
characters asSortedCollection = query
disambiguates this - it documents the shape of 'query'.
I don't know how it could be better, but
I sure hope there would be some comments here.
But maybe that was the point of the competition
to start with. It's hard to code a non-obvious
algorithm in a self-documenting manner.
"
twoWordAnagrams do: [:anagram |
Transcript cr; show: anagram]
This lattter bit was added to emphasize that the routine has done its work -- if you haven't grokked it at this point it forces you to to look to the previous statements. Again an intentionally added piece of code - it could have simply read 'twoWordAnagrams' or 'twoWordAnagrams inspect' or something like that but I found that explicitly coding the printing loop gave that sensation of 'hey computation is clearly finished at this point - I'd better take a good look at the previous loop, that's where the work was done.
"I might call it anagramsFromTwoWords. I could add
lines saying:
anagramsFromMoreThan2WordsEtc := 'not implemented yet'.
anagramsFrom1word := 'not interesting enough'.
"
Fortunately the contest rules prohibit these lines.
I had also considered using cop-outs like
self fixme: 'this is not a comment :-)'.
but decided firmly against using such form.
One interesting thing I found is that the 'explaining variable' pattern does not work too well in the setting of a single unfactored script - it tends to introduce noise that obscures the overall structure of the program.
I ended up structuring the program as a clear sequence of assignments (which I emphasized by formatting) with as few variables as necessary.
The snippet
> twoWordAnagrams :=
> OrderedCollection new.
> selectedWords do: [:word1 |
> selectedWords do: [:word2 | | characters |
> characters := word1 , word2.
was a hard choice though - I had to balance between a single statement that would emphasize the assignment (using #inject:into: and a #do:) or two statements that would better show the nested iteration (a #do: loop inside a #do: loop).
Considering the audience I decided for the latter even though the former makes for much nicer top-level structure.
Thanks!
Reinout
-------
.
- References:
- Self-Documenting Code Contest
- From: Panu
- Re: Self-Documenting Code Contest
- From: Reinout Heeck
- Re: Self-Documenting Code Contest
- From: Panu
- Self-Documenting Code Contest
- Prev by Date: SmallTalk on Windows
- Next by Date: Re: Smalltalk and flash memory
- Previous by thread: Re: Self-Documenting Code Contest
- Next by thread: Looking for a Seaside/Squeak VPS
- Index(es):
Relevant Pages
|