Palash Chauhan
← Projects

Automatic Abstract Generation

· Undergraduate Course Project, IIT Kanpur NLP Deep Learning

Text summarization for long documents like research papers.

Automatic Abstract Generation

As information overload grows and the amount of data increases, interest in automatic summarization is rising too. Summarization can be done either by extracting elements from the input (extractive) or by understanding the content and using language generation (abstractive). Both methods struggle on long documents like research papers.

In this project we proposed an approach that combines the two: salient sentences are first extracted from the long document and then fed to a sequence-to-sequence RNN. We experimented with several ways to extract salient elements, including LDA, LSA, and TextRank, and fed the best extraction to the RNN to generate an enhanced summary. We evaluated the results using the ROUGE metric on a dataset of research papers from NIPS 2015.