Using Regression Models to pinpoint Relevant Content in Research Papers


Thanks to the world-scale diffusion of web-based applications, digital libraries are playing a foundamental role in giving access to research papers thus allowing researchers to disseminate their main research findings. From the researchers’ perspective, accessing such a huge mass of documents could become critical. they often need to identify the papers that fit their research interests. Our work focuses on extracting the sentences that best summarize the main topics and finding of the research manuscript in an automated manner. To do so we propose a machine learning approach aimed at giving an implicit rank to the sentences according to their informative content. We propose to train regression-based algorithms from a variety of document features in order to relevant text snippets.