Similarity searches accelerate P2P downloads by 30-70%

I know you've all searched for a file, only to see it be displayed over and over in a P2P client's search results. The file name or description may be different, but you can see the file is the same because of the size or some other element. Now imagine being able to downloading your file by connecting to everyone who has the same (more or less) file.

The limitation of P2P file sharing is of course the number of active users that have the file in question. A research team with members at Carnegie Mellon, Purdue, and Intel claims they can increase P2P speeds by 30-70% by using what they call Similarity-Enhanced Transfer (SET), by acknowledging that many of the files being shared contain pieces of identical data. Once the requested file is divided into small segments, the SET software searches for similar files using a method called "handprinting" (similar to techniques used to cluster search results or filter spam). Once similar files are identified, they are scanned for any individual chunks that are identical to pieces of the file being downloaded. As a result, SET should greatly expand the available sources of any given file. The beauty of the system is that it's not all theory; the team has successfully tested the technology with existing P2P networks.

They team will be presenting their technique at the 4th Symposium on Networked Systems Design and Implementation tomorrow, along with actual implementation code. "This is a technique that I would like people to steal. Developers should just take the idea and use it in their own systems," said David Anderson of Carnegie Mellon.

News source: Ars Technica