Microsoft gives more info on Hotmail's SmartScreen

In December, Microsoft launched a number of new features for its Hotmail web-based email service. One of those new features has to do with SmartScreen, which previously has dealt with the massive amount of email spam that Hotmail users get every single day. Now in a new entry on Microsoft's Windows Live blog site, the company's Dick Craddock writes about how SmartScreen is used for newsletters as well.

Microsoft discovered that when a person checks their Hotmail inbox, 14 percent of those emails are person-to-person, while two percent are true spam and another two percent are named as "other". The remaining 82 percent of email in a typical Hotmail email box is called "graymail" by Microsoft.

The company decided to go over what kinds of emails are under the graymail category. The final result saw that 50 percent of a Hotmail inbox is with newsletters and deal emails. Craddock says, "Every day the average person’s inbox is flooded with messages from thousands of different retailers, clubs, societies, and schools, or with coupons, deals, and notifications from deal aggregators talking about all the exciting things that people need to be buying, doing, or seeing."

So how does SmartScreen filter out all those newsletter and deal emails? Craddock states:

To get Hotmail to identify newsletters for us, we began by making a list of newsletter characteristics and built a piece of software to extract them from incoming emails. This list forms the model of what makes newsletters different from all other mail and includes three aspects: presence of the List-Unsubscribe header, the sending email address, and what gets shown to the user.

With a clear definition of what we considered a newsletter, we created a reference set of about 10,000 messages that we classified as “newsletter” or “not a newsletter.” Think of the reference set as a test for our newsletter filter: the rate at which it correctly identifies newsletters defines its accuracy.

Microsoft used what it calls machine learning to have Hotmail adjust how it detects all those 10,000 newsletter messages. It then used its own employees to test the system in what it calls "dogfooding": Craddock says:

We provided the dogfood users with a way to report missed and incorrectly identified newsletters just as we do for the occasional spam message that gets through our filters. We spent several weeks analyzing the failures and adjusting the model until we’d worked out the known kinks.

For example, a major problem we identified early on was that financial services businesses tend to send all their mail from the same domain, and often have a lot of boilerplate language that closely resembles newsletters—even though they may not be. Rather than take the risk of filing away your bank statements, we decided it was better to leave these messages alone and trained the newsletter filter to ignore them.

Microsoft's new system now allows the user to sweep or filter all those newsletters that are of interest to them. Craddock says that Microsoft is adding new newsletter categories to better help Hotmail users handle all of that traffic.

Previous Story
Walmart offers Lumia 710 for free on contract
Next Story
Rupert Murdoch calls Google "piracy leader"; supports SOPA