Google has acknowledged that a dictionary of Chinese words used with one of its recently released software tools came from a third party. Google's Pinyin Input Method Editor (IME) "was built leveraging some non-Google database resources," wrote Google China spokeswoman Cui Jin in an e-mail response. Google's Pinyin IME bears an uncanny resemblance to Sohu's Sogou Pinyin IME, which draws search queries from the company's search engine to suggest characters that match the Pinyin entered by a user. On Friday, the Chinese Internet company Sohu.com gave Google until Monday to stop downloads of its IME software and issue an apology. Sohu also wants compensation from Google.
The dictionaries used with both software from Google and Sohu shared several common mistakes, where Chinese characters were matched with the wrong Pinyin equivalents. In addition, both dictionaries listed the names of engineers who had developed Sohu's Sogou Pinyin IME. A review of the first version conducted by Sohu's engineers revealed a dictionary of around 330,000 words and their Pinyin equivalents, including more than 300,000 entries that are identical with Sohu's dictionary, said Wang Xiaochuan, Sohu's vice president of technology and head of the company's research and development center. On Friday, Google released an updated version that removed the names of the Sohu engineers, removed 600 words, while adding just one to the dictionary. That update did not remove Pinyin errors but Sunday's did. "The new dictionary is now based on tens of thousands of entries Google's enormous search database has accumulated over the years," Cui wrote. That claim was confirmed Monday by Sohu, which said the similarity between Google's dictionary and its own dictionary had fallen from 96% to 79% with the latest version of the software.