In this paper, we introduce a novel framework for improving intent classification performance in domains with limited labeled data. We call our framework RIPPED: Recursive Intent Propagation using Pretrained Embedding Distances. Unlike most graph-based semi- supervised approaches, RIPPED uses dense pretrained embeddings to construct the representation graph. In combining this representation scheme with a recursive variant of the label propagation algorithm, RIPPED is able to accurately propagate labels throughout the unlabeled dataset in domains with a large number of unbalanced classes and complex, noisy decision boundaries. In a given data-poor domain, RIPPED acts as an augmentation system, adding to the labeled dataset by classifying unlabeled examples, thus allowing a more effective inductive classifier to be trained. As a result, RIPPED can be easily incorporated into any classification pipeline. RIPPED is simple to apply to new domains, and our results indicate its empirical effectiveness. On four intent classification datasets, given access to only a few labeled examples per class, RIPPED achieved performance comparable to state-of-the-art classifiers given access to the entire training dataset. In some cases (including the one-shot setting), RIPPED outperformed the next-best semi-supervised methods by more than 70%. We propose that RIPPED can be used as an out-of-the-box tool for bootstrapping natural language understanding systems in data-poor domains.
Ball, Michael,
"RIPPED: Recursive Intent Propagation using Pretrained Embedding Distances"
(2019).
Computer Science Theses and Dissertations.
Brown Digital Repository. Brown University Library.
https://doi.org/10.26300/heed-ps97