A recent advancement in the application of pretrained language models (PLMs) has pushed the boundaries of their performance in downstream natural language processing (NLP) tasks, thanks to the process of finetuning. The paradigm of task-specific finetuning is highly resource-intensive, requiring large computational capacity due to hundreds of millions of parameters involved. To circumvent this, researchers have been focusing on devising “parameter-efficient” methods for tuning models.
Prompt tuning (PT), a strategy for efficient parameter transfer with PLMs, has shown promise. It incorporates adjustable continuous prompt vectors before initiating training, maintaining the base PLM configurations. The process learns only a limited number of prompt vectors for each task, thereby conserving computational resources. However, despite PT’s remarkable performance, it is still significantly outpaced by complete finetuning. It also requires longer training times as it is highly sensitive to initialization.
In an attempt to address these issues, recent research has proposed repurposing prompt vectors from previous tasks. The process involves training soft prompts on an array of source tasks and then utilizing these pretrained prompts as a foundation for finetuning on a target task. This is achieved using a similarity measure, which may also be learned.
Building on this, a team of researchers from the Ohio State University, MIT-IBM Watson AI Lab, and the Massachusetts Institute of Technology have introduced multitask prompt tuning (MPT). This innovative technique uses multitask data to learn a single prompt that can be efficiently transferred to target tasks.
In theory, learning a shared prompt space sounds straightforward, but its implementation can be quite challenging. The technique needs to discern the commonalities between various source tasks while minimizing their interference. The researchers achieved this by breaking down the soft prompt of each source task into a product of a shared matrix and a low-rank task-specific matrix, instead of simply sharing the prompt matrix across all tasks. This process, referred to as decomposition, involves distilling information from gentle prompts obtained through consistent prompt tuning and then applying low-rank multiplicative modifications to the shared prompt matrix.
Exhaustive testing on 23 NLP datasets across a variety of tasks revealed that MPT significantly outperforms current state-of-the-art prompt transfer techniques. In comparison with the conventional prompt tuning baseline on the SuperGLUE benchmark, the MPT method using T5-Base achieved an improvement of 16.3% despite tuning significantly fewer task-specific prompt parameters. Impressively, in some metrics, MPT even exceeded full finetuning performance while using a mere 0.035% of the adjustable parameters per task. The researchers also demonstrated MPT’s efficacy in few-shot learning scenarios, with successful outcomes ranging between 4-32 examples per target task.
Check out the Paper. All Credit For This Research Goes To the Researchers on This Project.