Automatic Abstract Generation

As the problem of information overload grows, and as the amount of data increases, the interest in automatic summarization is also increasing. There has been a lot of reserach work in the area of text summarization. Summarization can be achieved either by extracting elements from the input (Extractive) or by understanding the content of the input and using language generation techniques (Abstractive). Both these methods do not perform well on long documents like research papers. Through this project, we proposed an approach for automatic summarization which is a combination of both these methods. Salient sentences are first extracted from the long document which are then fed to a sequence-to-sequence RNN. We experimented with a number of ways to extract salient elements like LDA, LSA and TextRank and fed the best extraction to an RNN to generate an enhanced summary. We evaluated the generated summaries using the ROUGE metric on a dataset containing research papers from NIPS 2015.

Avatar
Palash Chauhan
Masters Student in Computer Science

My research interests include distributed sytems, machine learning and their intersection.