I've had enough spam

Posted by Doug Wed, 29 Jan 2003 22:13:00 GMT

I finally got tired enough of hitting the delete key that I setup a bayesing-like filter to categorize my mail using statistical analysis. The point is that I don’t want to stop what I’m working on to switch over to my mail to see something about a Nigerian needing assistance. So, I’m using bogofilter. After training it on the more than 3000 spam messages I happen to have lying around on my hard drive (that’s roughly the number of spam messages I’ve received from the evening of Dec 4 to the present) and training it on all the equally large piles of legitimate mail I have laying around I’m happy to announce not one spam has leaked through to my inbox. I’m also sad to say there have been several false positives. So, I’ll probably have to spend the next few days parsing all of my junk mail saying, “Yes, that’s really spam,” and “No, that’s not really spam.”

Posted in  | no comments

Binary Search Debugging

Posted by Doug Wed, 29 Jan 2003 00:47:00 GMT

Joel Spolsky comes up with another good article:

Something we had done since the last release of CityDesk somehow caused our publish times to increase by about 100%; on a particular large site we use for stress testing it had gone from about a minute to about two minutes.

The first thing I tried was a profiler: Compuware DevPartner Studio. Indeed this showed me where a lot of bottlenecks are; that data will be useful to speed up our publish times even more, but I really wanted to find the specific bug that I thought we had introduced which was slowing us down.

The next thing I tried was a method I learned from Gabi at Juno: the old binary search method. Before we started work on this release, publishing took 1’04”. Today it takes 1’57”. So I started checking out old versions of the source from CVS by date, rebuilding, and timing how long publishing took with each day’s build. Here’s what I found:

As of May 1: 1’57”
As of April 1: 1’05”
As of April 15: 1’05”
As of April 22: 1’06”
As of April 26: 1’58”
As of April 24: 1’05”
As of April 25: 1’05”

Aha! Now all I had to do was run WinDiff to compare the source tree from April 25th and April 26th, and I discovered four things that were changed that day, one of which was a function that DevPartner had told me was kind of slow, anyway. Within minutes I found the culprit— that function was originally written to cache its results because it’s often called with the same inputs, and I had inadvertently changed the cache key in one place and not another, so we were getting 100% misses instead of 99% hits. Solved! Total elapsed time to find this bug: about an hour. If your source code is much bigger than CityDesk, builds and checkouts may be slow. This is as good a reason as any to keep all your old daily builds around.

© 2003 Joel Spolsky

Posted in  | no comments

Stupid Patent Tricks

Posted by Doug Thu, 23 Jan 2003 15:25:00 GMT

Yes, I’m boycotting amazon.com because they not only have, but are enforcing several unbelievably obvious patents. The only reason they are considered “novel” is because they took an every day business practice and applied it to the Internet. The USPTO must be thinking, “Ooo, the Internet! It’s so new and powerful! This patent mentions the Internet, it must be new and powerful too!”

Well, it appears to me that The Register is the first to report on another stupid patent. However, this is such a fine piece of reporting; they not only report SBC Communications Inc enforcing this stupid patent, they also did the leg work to find the prior art!

Posted in  | no comments

Older posts: 1 ... 29 30 31 32 33

Copyright 2001 - 2005 by Lathi.net and Doug Alcorn

Creative Commons, Some Rights Reserved Ruby on Rails Developer Powered by Debian GNU/Linux Powered by Typo