This paper presents the Poli2Sum approach to the 5th Computational Linguistics Scientific Document Summarization Shared Task (BIRNDL CL-SciSumm 2019). Given a set of reference papers and the set of papers citing them, the proposed approach has a threefold aim.(1a) Identify the text spans in the reference paper that are referenced by a specific citation in the citing papers.(1b) Assign a facet to each citation describing the semantics behind the citation.(2) Generate a summary of the reference paper consisting of the most relevant cited text spans. The Poli2Sum approach to tasks (1a) and (1b) relies on an ensemble of classification and regression models trained on the annotated pairs of cited and citing sentences. Facet assignment is based on the relative positions of the cited sentences locally to the corresponding section and globally in the entire paper. Task (2) is addressed by predicting the overlap (in terms of units of text) between the selected text spans and the summary generated by the domain experts. The output summary consists of the subset of sentences maximizing the predicted overlap score.
Recommended citation: La Quatra, M., Cagliero, L., & Baralis, E. (2019). Poli2Sum@CL-SciSumm-19: Identify, Classify, and Summarize Cited Text Spans by means of Ensembles of Supervised Models. In 4th Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2019) @ SIGIR 2019 (Vol. 2414, pp. 233–246).