Google has acknowledged that a dictionary of Chinese words used with one of its recently released software tools came from a third party. Googles Pinyin Input Method Editor (IME) "was built leveraging some non-Google database resources," wrote Google China spokeswoman Cui Jin in an e-mail response. Googles Pinyin IME bears an uncanny resemblance to Sohus Sogou Pinyin IME, which draws search queries from the companys search engine to suggest characters that match the Pinyin entered by a user. On Friday, the Chinese Internet company Sohu.com gave Google until Monday to stop downloads of its IME software and issue an apology. Sohu also wants compensation from Google.
The dictionaries used with both software from Google and Sohu shared several common mistakes, where Chinese characters were matched with the wrong Pinyin equivalents. In addition, both dictionaries listed the names of engineers who had developed Sohus Sogou Pinyin IME. A review of the first version conducted by Sohus engineers revealed a dictionary of around 330,000 words and their Pinyin equivalents, including more than 300,000 entries that are identical with Sohus dictionary, said Wang Xiaochuan, Sohus vice president of technology and head of the companys research and development center. On Friday, Google released an updated version that removed the names of the Sohu engineers, removed 600 words, while adding just one to the dictionary. That update did not remove Pinyin errors but Sundays did. "The new dictionary is now based on tens of thousands of entries Googles enormous search database has accumulated over the years," Cui wrote. That claim was confirmed Monday by Sohu, which said the similarity between Googles dictionary and its own dictionary had fallen from 96% to 79% with the latest version of the software.