One of the biggest highlights of Build, Microsoft’s annual software development conference, was the presentation of a tool that uses deep learning to generate source code for office applications. The tool uses GPT-3, a massive language model developed by OpenAI last year and made available to select developers, researchers, and startups in a paid application programming interface.
Many have touted GPT-3 as the next-generation artificial intelligence technology that will usher in a new breed of applications and startups. Since GPT-3’s release, many developers have found interesting and innovative uses for the language model. And several startups have declared that they will be using GPT-3 to build new or augment existing products. But creating a profitable and sustainable business around GPT-3 remains a challenge.
Microsoft’s first GPT-3-powered product provides important hints about the business of large language models and the future of the tech giant’s deepening relation with OpenAI.
A few-shot learning model that must be fine-tuned?
Credit: Microsoft Power Apps blogMicrosoft uses GPT-3 to translate natural language commands to data queriesTNW Conference 2021Attend the tech festival of the year and get your super early bird ticket now!GET TICKETS
According to the Microsoft Blog, “For instance, the new AI-powered features will allow an employee building an e-commerce app to describe a programming goal using conversational language like ‘find products where the name starts with “kids.”’ A fine-tuned GPT-3 model [emphasis mine] then offers choices for transforming the command into a Microsoft Power Fx formula, the open source programming language of the Power Platform.”
I didn’t find technical details on the fine-tuned version of GPT-3 Microsoft used. But there are generally two reasons you would fine-tune a deep learning model. In the first case, the model doesn’t perform the target task with the desired precision, so you need to fine-tune it by training it on examples for that specific task.
In the second case, your model can perform the intended task, but it is computationally inefficient. GPT-3 is a very large deep learning model with 175 billion parameters, and the costs of running it are huge. Therefore, a smaller version of the model can be optimized to perform the code-generation task with the same accuracy at a fraction of the computational cost. A possible tradeoff will be that the model will perform poorly on other tasks (such as question-answering). But in Microsoft’s case, the penalty will be irrelevant.
In either case, a fine-tuned version of the deep learning model seems to be at odds with the original idea discussed in the GPT-3 paper, aptly titled, “Language Models are Few-Shot Learners.”
Here’s a quote from the paper’s abstract: “Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches.” This basically means that, if you build a large enough language model, you will be able to perform many tasks without the need to reconfigure or modify your neural network.
So, what’s the point of the few-shot machine learning model that must be fine-tuned for new tasks? This is where the worlds of scientific research and applied AI collide.