Microsoft builds the world's largest transformer-based language generation model

Transformer-based language generation models have enabled better conversational applications. Though they still have their shortcomings, which were recently exposed by a team at MIT, researchers continue improving them to build better, larger, and more robust models.

Now, a team at Microsoft has built the Turing Natural Language Generation Model (T-NLG), which is a 17 billion parameter with 78 transformer layers, that outperforms the state of the art models available today on many downstream natural language processing (NLP) applications.

To train the model, researchers at Microsoft used an NVIDIA DGX-2 system housing multiple NVIDIA V100 GPUs that were interconnected with InfiniBand. The type of training data used was similar to the one that Megatron-LM models were trained on. Tensor slicing was applied to shard the model across four NVIDIA V100 GPUs on the NVIDIA Megatron-LM framework. The team also used DeepSpeed with ZeRO to reduce the model-parallelism degree from 16 to 4, increase the batch size per node by fourfold, and reduce training time by three times.

The result, T-NLG, can improve systems that leverage NLP for chatbots, document understanding, and sentence/paragraph completion tasks. Some of the capabilities of the model are as follows. T-NLG is able to simplify and summarise text to provide direct answers to search queries. For example, instead of returning a paragraph that would contain the answer to a search query (as is done by many search engines traditionally), T-NLG returns the direct answer. Similarly, the new model is also able to answer one-shot questions, that is, questions without context.

Query input to T-NLG: Who was Jason Mraz engaged to?

T-NLG's direct answer: Jason Mraz was engaged to Tristan Prettyman.

Question input to T-NLG: How many people live in the U.S.?

T-NLG's direct answer: There are over 300 million people living in the U.S.

T-NLG is also capable of generating concise summaries. To Microsoft's blog post on being carbon-negative by 2030, the summary produced by T-NLG is as follows:

Microsoft is committed to being carbon negative by 2030. We are launching an aggressive program to cut our carbon emissions by more than half by 2030, both for our direct emissions and for our entire supply and value chain. We are also launching an initiative to use Microsoft technology to help our suppliers and customers reduce their own carbon footprints and a new $1 billion climate innovation fund to accelerate the development of carbon reduction, capture, and removal technologies that will help us and the world become carbon negative. In addition to our aggressive carbon goals, we are launching a new Climate Innovation Fund to accelerate carbon reduction and removal opportunities. We are also launching a program to use our technology to improve the efficiency of our supply chain and reduce our own carbon footprint as well…

In terms of numbers and benchmark tests, Microsoft's new T-NLG boasts better figures compared to Megatron-LM 8.3B on both LAMBADA and WikiText-103. ROUGE scores were also promising and T-NLG outperformed LSTM (CopyNet) for human evaluators of grammatical and factual correctness. You may check out the full test results and benchmark comparisons on Microsoft's blog post.

Moving forward, Microsoft believes that advancements like these will lend to improving natural language processing applications. The firm believes that it will help save time by helping summarize documents and emails and also enhance the overall user experience and functionality on the Microsoft Office suite by offering writing assistance to authors and answering questions that readers may ask about a document.