From Statistical Methods to Deep Learning, Automatic Keyphrase Prediction: A Survey
Keyphrase prediction is a significant task in natural language processing that aims to generate concise phrases summarizing a document. Researchers have conducted extensive studies using statistical and deep learning methods, as well as incorporating external knowledge sources. They have focused on improving keyphrase quality, extracting keyphrases from different document types, and applying them in areas like indexing, summarization, and recommendation systems. These advancements enhance information extraction and support various downstream tasks.

In this paper, we comprehensively summarize representative studies from the perspectives of dominant models, datasets and evaluation metrics. Our work analyzes up to 167 previous works, achieving greater coverage of this task than previous surveys. Particularly, we focus highly on deep learning-based keyphrase prediction, which attracts increasing attention of this task in recent years. Afterwards, we conduct several groups of experiments to carefully compare representative models. To the best of our knowledge, our work is the first attempt to compare these models using the identical commonly-used datasets and evaluation metric, facilitating in-depth analyses of their disadvantages and advantages. Finally, we discuss the possible research directions of this task in the future.