<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>thoughts from the test eye &#187; WordFreq</title>
	<atom:link href="http://thetesteye.com/blog/tag/wordfreq/feed/" rel="self" type="application/rss+xml" />
	<link>http://thetesteye.com/blog</link>
	<description>by rikard edgren, henrik emilsson and martin jansson - with torbjörn ryber and henrik andersson</description>
	<lastBuildDate>Sun, 13 May 2012 17:27:07 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>New tool &#8211; WordFreq</title>
		<link>http://thetesteye.com/blog/2009/12/new-tool-wordfreq/</link>
		<comments>http://thetesteye.com/blog/2009/12/new-tool-wordfreq/#comments</comments>
		<pubDate>Sat, 19 Dec 2009 20:37:57 +0000</pubDate>
		<dc:creator>Martin Jansson</dc:creator>
				<category><![CDATA[Documentation]]></category>
		<category><![CDATA[Machines]]></category>
		<category><![CDATA[tools]]></category>
		<category><![CDATA[WordFreq]]></category>

		<guid isPermaLink="false">http://thetesteye.com/blog/?p=702</guid>
		<description><![CDATA[<img src="http://thetesteye.com/blog/wp-content/uploads/documentation.png" width="48" height="48" alt="" title="Documentation" /><img src="http://thetesteye.com/blog/wp-content/uploads/machines.png" width="48" height="48" alt="" title="Machines" /><br/>A disclaimer&#8230; I am no developer, but I have developed a tool. As I develop I have the mindset of a developer, not the tester. I have done lots of mistakes, intentionally not implemented good/needed things and considered what parts I can get away with in the first release. This tool might not seem big and [...]]]></description>
			<content:encoded><![CDATA[<img src="http://thetesteye.com/blog/wp-content/uploads/documentation.png" width="48" height="48" alt="" title="Documentation" /><img src="http://thetesteye.com/blog/wp-content/uploads/machines.png" width="48" height="48" alt="" title="Machines" /><br/><p>A disclaimer&#8230; I am no developer, but I have developed a tool. As I develop I have the mindset of a developer, not the tester. I have done lots of mistakes, intentionally not implemented good/needed things and considered what parts I can get away with in the first release. This tool might not seem big and useful, but I have used it and it has created many interesting results in the past. As I developed this I tried a new method of implementation&#8230; all ideas I had on what functions the tool should have, what was supposed to work, what was not supposed to work etc I wrote down in a testideas-document. I then had one column that identified if it worked or not in a specific release. All good feedback I added to that list.</p>
<p>This is the first tool we create at the test eye that is open for the public. At thetesteye we have choosen to publish our material under the license <a href="http://creativecommons.org/about/licenses/" target="_blank">Attribution No Derivatives</a>. My personal aim with this tool was to increase my knowledge of coding. I have used Python and Tkinter as a graphical interface. In the <a href="http://thetesteye.com/blog/publications/" target="_self">Publications</a> section you will find the link to the tool and the currently released version.</p>
<p><strong>General discussion</strong></p>
<p>The general idea is to use the frequency of words as a way to find errors. The more text you analyze, the higher statistical significance; thus resulting in an easier chance of spotting the erroneous words. This kind of script is very often found as a code example. When I first created a script for this I did not know that. I ran it on a quite large text corpus and found that the company name had been spelled incorrectly 7 times in the copyright text. I also found lots and lots of spelling mistakes as well as some strange API functions that were incorrect.</p>
<p><strong>Use cases</strong></p>
<ul>
<li>Run on documentation to find unfrequent words (that usually contains spelling errors)</li>
<li>Run on code to find variables that are similar but not the same and used incorrectly</li>
<li>Run on code to find unused variables, thus variables only used once</li>
<li>Run on code + API documentation to find things that should not be there or code that are not covered anywhere</li>
<li>Localization specific: When doing translations you might be allowed to have a certain amount of errors, this is one way of finding a few extra faults that you can remove</li>
</ul>
<p><strong>How I use it</strong></p>
<p>I run the tool on a tree structure. I open the result file in Excel or OpenOffice Calc. I then sort on frequency&#8230; start deleting uninteresting records. You can open it in MS Word or something similar to filter out things that are in fact spelled correctly. After a few cleaning ups you might have a list that is worth investigating.</p>
<p><strong>Bugs and Enhancements</strong></p>
<p>The testideas.xls contain the current tests and some of the enhancements that I&#8217;ve gotten so far. If you got any suggestions, feel free to mail me at <a href="mailto:martin.jansson@thetesteye.com">martin.jansson@thetesteye.com</a>.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fthetesteye.com%2Fblog%2F2009%2F12%2Fnew-tool-wordfreq%2F&amp;title=New%20tool%20%26%238211%3B%20WordFreq" id="wpa2a_2"><img src="http://thetesteye.com/blog/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://thetesteye.com/blog/2009/12/new-tool-wordfreq/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

