Microsoft has announced that the document translation feature built into Azure Translator can now scan and translate PDF documents. The company said that users no longer need to preprocess documents through an OCR engine before trying to translate them.
The document translation feature was first introduced a year ago and was able to translate several documents at once into more than 110 languages and dialects. Today’s update means PDF files are now fully supported alongside Word and PowerPoint files. Being able to scan PDFs with scanned image content was highly requested, according to the firm.
Explaining some of the features, Microsoft said:
Document translation services now have the intelligence
- to identify whether the PDF document contains scanned image content or not,
- to route PDFs containing scanned image content to an OCR engine internally to extract text,
- to reconstruct the translated content as regular text PDF while retaining the original layout and structure.
While document translation works with 110 languages and dialects, the new scanning feature only works with 68 source languages and 87 target languages. Microsoft has pledged to add support for more in “due course”.
Microsoft said that no code change is required to begin using the new feature and that all PDFs can be submitted to Translator right away. The new feature will not cost customers any more money. Two pricing plans are available for document translation through Azure; they include the pay-as-you-go plan and the D3 volume discount plan for higher volumes. You can read more about pricing here.