Automatic Abstract Generation
Text summarization for long documents like research papers.
As information overload grows and the amount of data increases, interest in automatic summarization is rising too. Summarization can be done either by extracting elements from the input (extractive) or by understanding the content and using language generation (abstractive). Both methods struggle on long documents like research papers.
In this project we proposed an approach that combines the two: salient sentences are first extracted from the long document and then fed to a sequence-to-sequence RNN. We experimented with several ways to extract salient elements, including LDA, LSA, and TextRank, and fed the best extraction to the RNN to generate an enhanced summary. We evaluated the results using the ROUGE metric on a dataset of research papers from NIPS 2015.