main
Report a problem

Google admits word database came from third party

Slimy   on 09 April 2007 - 16:51 · 8 comments & 5203 views

Advertisement (Why?)
Google has acknowledged that a dictionary of Chinese words used with one of its recently released software tools came from a third party. Google's Pinyin Input Method Editor (IME) "was built leveraging some non-Google database resources," wrote Google China spokeswoman Cui Jin in an e-mail response. Google's Pinyin IME bears an uncanny resemblance to Sohu's Sogou Pinyin IME, which draws search queries from the company’s search engine to suggest characters that match the Pinyin entered by a user. On Friday, the Chinese Internet company Sohu.com gave Google until Monday to stop downloads of its IME software and issue an apology. Sohu also wants compensation from Google.

The dictionaries used with both software from Google and Sohu shared several common mistakes, where Chinese characters were matched with the wrong Pinyin equivalents. In addition, both dictionaries listed the names of engineers who had developed Sohu's Sogou Pinyin IME. A review of the first version conducted by Sohu's engineers revealed a dictionary of around 330,000 words and their Pinyin equivalents, including more than 300,000 entries that are identical with Sohu's dictionary, said Wang Xiaochuan, Sohu's vice president of technology and head of the company's research and development center. On Friday, Google released an updated version that removed the names of the Sohu engineers, removed 600 words, while adding just one to the dictionary. That update did not remove Pinyin errors but Sunday’s did. "The new dictionary is now based on tens of thousands of entries Google's enormous search database has accumulated over the years," Cui wrote. That claim was confirmed Monday by Sohu, which said the similarity between Google's dictionary and its own dictionary had fallen from 96% to 79% with the latest version of the software.

Link: Forum Discussion (Thanks Express)
News source: InfoWorld

Post a comment · Send to friend Comments · There are 8 additional comments
(1 reply) #1 SacrificialSoldier on 09 Apr 2007 - 16:56
so?
#1.1 Poof on 09 Apr 2007 - 18:11
I believe because it's akin to Google ripping the cover off "Sohu Chinese for Dummies" and replacing it with "Google's Chinese for Dummies" with a couple extra Xeroxed pages stuffed in between a couple pages...?

=/
#2 leo221 on 09 Apr 2007 - 18:30
wow, didn't know google has pinyin IME. seem much better than microsoft's pinyin 3.0. time to switch.
#3 OfF3nSiV3 on 09 Apr 2007 - 19:06
but..did google steal sohu technology or just acquired from the same source?
#4 linsook on 09 Apr 2007 - 21:08
where to get?

nvm found it: http://tools.google.com/pinyin/

very fast. but i'm too use to the way MS IME does it. maybe i'll switch to it another day.

Last edited by linsook on 09 Apr 2007 - 21:16
#5 thejessman on 09 Apr 2007 - 21:09
I wonder how exactly they thought they could get away with this.

I also wonder how much the compensation for Sohu will be.
#6 lOUDsCREAMEr on 09 Apr 2007 - 22:59
how about a Google changjie IME?
#7 Hak Foo on 10 Apr 2007 - 05:14
I'd expect that there's going to be significant commonalities no matter how you go about it. There are only a finite number of valid ways to convert a given piece of input.

Commenting has either been disabled on this article or you are not logged in. Click here to login or register, its free!

Note: Anonymous commenting is disabled in order to keep the quality of responses to a high standard.

Advertisement (Why?)