<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Word Counts Example in Ruby and Scala</title>
	<atom:link href="http://blogs.sourceallies.com/2009/12/word-counts-example-in-ruby-and-scala/feed/" rel="self" type="application/rss+xml" />
	<link>http://blogs.sourceallies.com/2009/12/word-counts-example-in-ruby-and-scala/</link>
	<description>Technical and process thinking from Source Allies employees</description>
	<lastBuildDate>Wed, 28 Jul 2010 14:59:50 -0500</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Brianary</title>
		<link>http://blogs.sourceallies.com/2009/12/word-counts-example-in-ruby-and-scala/comment-page-1/#comment-1129</link>
		<dc:creator>Brianary</dc:creator>
		<pubDate>Sun, 16 May 2010 18:51:06 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.sourceallies.com/?p=509#comment-1129</guid>
		<description>And in Perl (functional style):

perl -lne &#039;map{$c{$_}++}(/(\w+)/g);END{map{print}(sort keys%c)}&#039; **/*.txt</description>
		<content:encoded><![CDATA[<p>And in Perl (functional style):</p>
<p>perl -lne &#8216;map{$c{$_}++}(/(\w+)/g);END{map{print}(sort keys%c)}&#8217; **/*.txt</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Virasak</title>
		<link>http://blogs.sourceallies.com/2009/12/word-counts-example-in-ruby-and-scala/comment-page-1/#comment-330</link>
		<dc:creator>Virasak</dc:creator>
		<pubDate>Tue, 12 Jan 2010 10:22:10 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.sourceallies.com/?p=509#comment-330</guid>
		<description>Hi, I am a Scala beginner and I try to clean your Scala code to be more readable. Here is a result 
http://gist.github.com/265899#file_word_freq.scala</description>
		<content:encoded><![CDATA[<p>Hi, I am a Scala beginner and I try to clean your Scala code to be more readable. Here is a result<br />
<a href="http://gist.github.com/265899#file_word_freq.scala" rel="nofollow">http://gist.github.com/265899#file_word_freq.scala</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: C# vs. Clojure vs. Ruby &#38; Scala &#171; mookid on code</title>
		<link>http://blogs.sourceallies.com/2009/12/word-counts-example-in-ruby-and-scala/comment-page-1/#comment-318</link>
		<dc:creator>C# vs. Clojure vs. Ruby &#38; Scala &#171; mookid on code</dc:creator>
		<pubDate>Wed, 06 Jan 2010 20:02:05 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.sourceallies.com/?p=509#comment-318</guid>
		<description>[...] from a bunch of files into two files, sorted alphabetically and by word count respectively, which he did in Ruby and Scala. This led Lau Bjørn Jensen to do the same thing in Clojure, which apparantly sparked other people [...]</description>
		<content:encoded><![CDATA[<p>[...] from a bunch of files into two files, sorted alphabetically and by word count respectively, which he did in Ruby and Scala. This led Lau Bjørn Jensen to do the same thing in Clojure, which apparantly sparked other people [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Zach  Cox</title>
		<link>http://blogs.sourceallies.com/2009/12/word-counts-example-in-ruby-and-scala/comment-page-1/#comment-280</link>
		<dc:creator>Zach  Cox</dc:creator>
		<pubDate>Mon, 04 Jan 2010 16:16:50 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.sourceallies.com/?p=509#comment-280</guid>
		<description>Thank you to everyone for your comments and follow-up posts!  When writing this post I hoped it would spark creative discussion and am glad that&#039;s what happened.

Some additional background on the original problem: the task was stated basically as &quot;in your job as a developer, you&#039;re given a one-off task one day where you need to produce a report of the word frequencies in these newsgroup files, write some code in whatever language you want.&quot;  So my intentions with this post were never to say &quot;language X is better than language Y&quot; and my intention for including timing data was never to prove &quot;language X is faster than language Y&quot;; I just wanted to post my particular solutions to the problem and hopefully encourage others to provide constructive criticism on my solutions and provide solutions of their own in other languages.  Studying other developers&#039; code is a great way to learn, and that&#039;s why I made this post.

I learned a lot from viewing the code submitted in comments to this post as well as to the other follow-up posts, and revised my original code:
Ruby: http://gist.github.com/268579
Scala: http://gist.github.com/268580

I decided to remove the timing data because I don&#039;t want this to become a performance test.  I also removed the &quot;status update&quot; lines, since we now know these scripts execute fast enough to make that code unnecessary.

The Ruby version now uses Dir[] as it should have before instead of the ridiculous directory-traversal method; I definitely rushed too fast on this part before and should have looked for something like Dir[] in Ruby core.

In the Scala version I condensed the use of the files method into one line, used underscore to represent the file provided to the first block, but couldn&#039;t seem to get underscore to work for word in 2nd block, not sure why.  

Regarding the Scala LoC count: only the first 15 lines are problem-specific, the remainder are 3 methods that probably should be in the core Scala library, and probably are in various 3rd party libs.  I think those 3 methods show a great power of Scala though, to extend the language so the main problem-specific code is more readable:
 - files method just lets you say &quot;do this to each file under this directory&quot;: files(dir) { file =&gt; //use file }
 - file2String lets you use a File as a String: file.split(regex) &lt;== split is a method of class String, not File
 - using is a new control structure that safely closes any object with a close() method: using(dbConnection) {...}, using(reader) {...}, using(writer) {...}, etc.

Again, thanks to everyone for the comments &amp; follow-ups, and let&#039;s continue to learn from one another!</description>
		<content:encoded><![CDATA[<p>Thank you to everyone for your comments and follow-up posts!  When writing this post I hoped it would spark creative discussion and am glad that&#8217;s what happened.</p>
<p>Some additional background on the original problem: the task was stated basically as &#8220;in your job as a developer, you&#8217;re given a one-off task one day where you need to produce a report of the word frequencies in these newsgroup files, write some code in whatever language you want.&#8221;  So my intentions with this post were never to say &#8220;language X is better than language Y&#8221; and my intention for including timing data was never to prove &#8220;language X is faster than language Y&#8221;; I just wanted to post my particular solutions to the problem and hopefully encourage others to provide constructive criticism on my solutions and provide solutions of their own in other languages.  Studying other developers&#8217; code is a great way to learn, and that&#8217;s why I made this post.</p>
<p>I learned a lot from viewing the code submitted in comments to this post as well as to the other follow-up posts, and revised my original code:<br />
Ruby: <a href="http://gist.github.com/268579" rel="nofollow">http://gist.github.com/268579</a><br />
Scala: <a href="http://gist.github.com/268580" rel="nofollow">http://gist.github.com/268580</a></p>
<p>I decided to remove the timing data because I don&#8217;t want this to become a performance test.  I also removed the &#8220;status update&#8221; lines, since we now know these scripts execute fast enough to make that code unnecessary.</p>
<p>The Ruby version now uses Dir[] as it should have before instead of the ridiculous directory-traversal method; I definitely rushed too fast on this part before and should have looked for something like Dir[] in Ruby core.</p>
<p>In the Scala version I condensed the use of the files method into one line, used underscore to represent the file provided to the first block, but couldn&#8217;t seem to get underscore to work for word in 2nd block, not sure why.  </p>
<p>Regarding the Scala LoC count: only the first 15 lines are problem-specific, the remainder are 3 methods that probably should be in the core Scala library, and probably are in various 3rd party libs.  I think those 3 methods show a great power of Scala though, to extend the language so the main problem-specific code is more readable:<br />
 &#8211; files method just lets you say &#8220;do this to each file under this directory&#8221;: files(dir) { file =&gt; //use file }<br />
 &#8211; file2String lets you use a File as a String: file.split(regex) &lt;== split is a method of class String, not File<br />
 &#8211; using is a new control structure that safely closes any object with a close() method: using(dbConnection) {&#8230;}, using(reader) {&#8230;}, using(writer) {&#8230;}, etc.</p>
<p>Again, thanks to everyone for the comments &amp; follow-ups, and let&#039;s continue to learn from one another!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John</title>
		<link>http://blogs.sourceallies.com/2009/12/word-counts-example-in-ruby-and-scala/comment-page-1/#comment-258</link>
		<dc:creator>John</dc:creator>
		<pubDate>Wed, 30 Dec 2009 19:57:17 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.sourceallies.com/?p=509#comment-258</guid>
		<description>I implemented both a Python version and a Factor version, for comparison.

http://re-factor.blogspot.com/2009/12/counting-word-frequencies.html</description>
		<content:encoded><![CDATA[<p>I implemented both a Python version and a Factor version, for comparison.</p>
<p><a href="http://re-factor.blogspot.com/2009/12/counting-word-frequencies.html" rel="nofollow">http://re-factor.blogspot.com/2009/12/counting-word-frequencies.html</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: karakfa</title>
		<link>http://blogs.sourceallies.com/2009/12/word-counts-example-in-ruby-and-scala/comment-page-1/#comment-256</link>
		<dc:creator>karakfa</dc:creator>
		<pubDate>Wed, 30 Dec 2009 15:47:27 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.sourceallies.com/?p=509#comment-256</guid>
		<description>My code got eaten in the previous comment.  I posted on my blog.  Here it is:  http://karakfa.blogspot.com/2009/12/counting-word-frequencies.html</description>
		<content:encoded><![CDATA[<p>My code got eaten in the previous comment.  I posted on my blog.  Here it is:  <a href="http://karakfa.blogspot.com/2009/12/counting-word-frequencies.html" rel="nofollow">http://karakfa.blogspot.com/2009/12/counting-word-frequencies.html</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: karakfa</title>
		<link>http://blogs.sourceallies.com/2009/12/word-counts-example-in-ruby-and-scala/comment-page-1/#comment-255</link>
		<dc:creator>karakfa</dc:creator>
		<pubDate>Wed, 30 Dec 2009 15:34:01 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.sourceallies.com/?p=509#comment-255</guid>
		<description>This is a favorite question of mine as well, but I expect the developer to be pragmatic and provide me a scripting solution that can be written and run in less than 5 minutes.

// cat individual files into all.txt

and do the rest in awk one liner

awk &#039;{for(i=1;i freqsorted.txt</description>
		<content:encoded><![CDATA[<p>This is a favorite question of mine as well, but I expect the developer to be pragmatic and provide me a scripting solution that can be written and run in less than 5 minutes.</p>
<p>// cat individual files into all.txt</p>
<p>and do the rest in awk one liner</p>
<p>awk &#8216;{for(i=1;i freqsorted.txt</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Hossam Karim</title>
		<link>http://blogs.sourceallies.com/2009/12/word-counts-example-in-ruby-and-scala/comment-page-1/#comment-229</link>
		<dc:creator>Hossam Karim</dc:creator>
		<pubDate>Mon, 28 Dec 2009 19:05:47 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.sourceallies.com/?p=509#comment-229</guid>
		<description>I believe this is typical MapReduce problem</description>
		<content:encoded><![CDATA[<p>I believe this is typical MapReduce problem</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mike McNally</title>
		<link>http://blogs.sourceallies.com/2009/12/word-counts-example-in-ruby-and-scala/comment-page-1/#comment-228</link>
		<dc:creator>Mike McNally</dc:creator>
		<pubDate>Mon, 28 Dec 2009 18:27:39 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.sourceallies.com/?p=509#comment-228</guid>
		<description>If I got to use the &quot;language&quot; of my choice, I think I&#039;d type in a quick shell pipeline to find the regular files, use xargs to run sed and strip out the words, tr to normalize alpha case, awk to count the occurrences, and sort to order the results.</description>
		<content:encoded><![CDATA[<p>If I got to use the &#8220;language&#8221; of my choice, I think I&#8217;d type in a quick shell pipeline to find the regular files, use xargs to run sed and strip out the words, tr to normalize alpha case, awk to count the occurrences, and sort to order the results.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jon Harrop</title>
		<link>http://blogs.sourceallies.com/2009/12/word-counts-example-in-ruby-and-scala/comment-page-1/#comment-215</link>
		<dc:creator>Jon Harrop</dc:creator>
		<pubDate>Tue, 22 Dec 2009 03:34:20 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.sourceallies.com/?p=509#comment-215</guid>
		<description>I have written an F# solution that both short and efficient at the same time:

http://fsharpnews.blogspot.com/2009/12/zach-cox-word-count-challenge.html</description>
		<content:encoded><![CDATA[<p>I have written an F# solution that both short and efficient at the same time:</p>
<p><a href="http://fsharpnews.blogspot.com/2009/12/zach-cox-word-count-challenge.html" rel="nofollow">http://fsharpnews.blogspot.com/2009/12/zach-cox-word-count-challenge.html</a></p>
]]></content:encoded>
	</item>
</channel>
</rss>
