Event-based summarization of news articles

: In recent years, with the increase of available digital information on the Web, the time needed to find relevant information is also increased. Therefore, to reduce the time spent on searching, research on automatic text summarization has gained importance. The proposed summarization process is based on event extraction methods and is called an event-based extractive single-document summarization. In this method, the important features of event extraction and summarization methods are analyzed and combined together to extract the summaries from single-source news documents. Among the tested features, six features are found to be the most effective in constructing good summaries. The constructed summaries are tested on benchmark Document Understanding Conferences 2001 and 2002 datasets, and the results outperformed most of the other well-known summarization methods.


Introduction
Advances in information technology enable us to share enormous amount of data on the web and its applications. Therefore, managing and using vast amounts of data for the needs of users becomes an important research area. Automatic document summarization is used to extract the main idea of the document by eliminating the less significant and redundant parts of the information. Depending on the number of documents it processes, automatic summarization is categorized as single or multidocument summarization. In 2000, the National Institute of Standards and Technology (NIST) introduced the Document Understanding Conferences (DUCs) [1] to encourage researchers to work on text summarization. However, in 2002, the DUC stopped supporting research on single-document summarization, considering that it is harder to extract summaries. Since then, most of the research has focused on multidocument summarization.
Although today there is a need for accessing lots of information in a short time, regardless of its presentation in multiple sources, this research uses single-document summarization. One of the best application areas of the summarization is information retrieval. In retrieving information, search engines (Google, Yahoo, etc.) use automatic summarization to help users find the most relevant information for their searches. Besides search engines, summarization can be used for social media marketing, automated content extraction for writing a good blog, tailoring financial documents for decision making, gathering important information for political moves, and detecting the emotions of users, among others.
Today online news sources provide instant information about the events around us, which feeds the articulations in social media channels such as Facebook and Twitter. Although summarization of news documents is * Correspondence: ftabak@eul.edu.tr This work is licensed under a Creative Commons Attribution 4.0 International License. studied intensively, few works have been conducted on event-based news summarization. As the summarization techniques aim to extract the main ideas of documents, analyzing an event and its context elements from news articles is not far from finding the most important information conveyed in the text. Therefore, in this research, an event-based summarization technique is used to extract the most meaningful information from long articles to ease the work of readers.
The rest of the paper is arranged as follows: Section 2 presents related work and Section 3 introduces the proposed method. The experimental setup and results of event-based extractive single-document summarization (EBDS) are explained in Section 4. Lastly, Section 5 is the conclusion.

Related work
In this paper, two related research tasks, event extraction and summarization, are combined together in the news domain to extract important parts of the text. Therefore, the research in both fields is investigated to be merged.

Summarization
The main goal of automatic text summarization is to construct a brief and coherent summary of one or more documents. The process of automatic summarization can be categorized in different ways: based on structure (extractive/abstractive) [2][3][4][5], the number of source documents, the use of external resources (knowledge-poor techniques, knowledge-rich techniques) [6], features such as query words [7], and chronological news.
Summarization can also be processed by supervised and unsupervised techniques. Supervised techniques [8,9] use a classifier trained on a set of documents. On the other hand, unsupervised techniques eliminate the need for training data for identifying the prominent sentences [10]. Graph-based methods are unsupervised extractive methods. There are many summarization algorithms that use graph methodology in their processes.
A graph is a mathematical structure consisting of a set of vertices connected by edges [7]. The most frequently used graph-based ranking algorithms are Kleinberg's HITS [11] and Google's PageRank [12] algorithms. HITS and PageRank both operate on a set of pages connected by hyperlinks. The TextRank [10] and LexRank [13,14] algorithms are both graph-based algorithms derived from PageRank for ranking sentences and are frequently used in summarization. In both algorithms, vertices represent sentences and edges show the semantic similarity between the sentences. The two methods differ in similarity algorithms, such that the LexRank algorithm uses cosine similarity of TF-IDF vectors while TextRank uses varying similarity measures. Garcia et al. [15] also proposed a graph-based methodology to estimate the coherence of the sentences. Their method uses integer linear programming (ILP) for optimizing the selection of sentences based on maximizing the relevant concepts while minimizing the size of the summary.
Dimension reduction methods are also used to eliminate the least important features to gather the important sentences for summaries. Latent semantic analysis (LSA) [16] was introduced to use semantic relations among the words and sentences in the text. By the usage of singular value decomposition (SVD) in LSA, orthogonal dimensions ensure the sentence by eliminating the redundancy, which is important for multidocument summarization. Gong and Liu [16] applied SVD together with LSA for an automatic summary generation. Steinberger and Karel [17] combined the SVD method of Steinberger and Karel [18] with a sentence compression algorithm to remove unimportant parts of sentences. In order to capture the latent topics in the documents, Hidayta et al. [19] used the latent Dirichlet allocation (LDA) [20] statistical model and Lee et al. [21] used nonnegative matrix factorization (NMF) [21,22] to produce additive parts-based representation of data for interpretation of semantic features.

851
Mendoza et al. [23] introduced a memetic optimization model based on weighting of statistical features of each sentence (position, sentence length, sentence to title relationship, cohesion, and coverage) combined with group features (candidate sentence to summary/original document similarity). Their experiments showed that position, title to sentence relationship, and length features have the best influence to get improved results. Long et al. [24] investigated the relation of the sentences in a document and used entailment among the sentences in extracting the important sentences from the text. Their method was based on the assumption that the words in a sentence can be aligned with the other words in the other sentences. Once the sentences are scored, greedy or ILP approaches are used to select the sentences for summaries. Through the tests they evaluated combined scoring methods and the best results were obtained by the model that combined the entailment scores normalized by logarithmic functions and position scores.
More recent studies on summarization use hybrid methods including machine learning techniques. Alguliyev et al. [25] proposed a method named COSUM, which uses two stages for sentence selection. In the first stage sentences are clustered based on k-means. In the second stage, salient sentences are selected from the clusters based on a proposed optimization algorithm that focuses on the coverage and diversity of the selected sentences. Mao et al. [26] combined graph-based techniques along with supervised learning and unsupervised learning to produce extractive single-document summaries. In this method, the importances of sentences are measured by exploring statistical features and the relationships between sentences obtained by using graph methods. Mao proposed three techniques to score the sentences: a graph-based model and supervised model, a graph-based model and independent feature of the supervised model, and sentence importance with supervised model. Liu et al. [27] proposed a learning-based algorithm called the structured summarization model (SUMO).
SUMO uses iterative refinement to induce the multiroot tree into summaries by repeatedly refining the results predicted in the previous iterations.

Event detection
Starting from the early 1990s with the widespread usage of the Internet, research on event extraction and representation gained acceleration. In 2005, the Automatic Content Extraction (ACE) program annotated events from broadcast transcripts and newswire [28] for specific domains with intense annotation work. Consequently, Li et al. [29] used a cooccurrence graph between named entities and event terms. They assigned initial relevance scores to each of the named entities and event terms and ran the PageRank algorithm to define the context-dependent relevance of named entities and event terms. Kolya [30] used semantic role-labeling (SRL) [31], WordNet [32], and handcrafted rules with machine learning techniques to extract deverbal nouns as events from the TempEval 2010 dataset.
Similarly, event detection has been studied in information retrieval to retrieve event-based documents as a result of query searches [33]. Lin et al. [34] used semantic roles such as predicate-argument in queries with the ones in the main document and proved that semantic roles perform better than syntactic dependencies. Pai et al. [35] used predicate-subject-object triplets to construct content maps and find the similarity among the triplets as a representation of the event score of the documents.
Recent research on detection of events also gains importance in behavior analysis of people [36]. Extracting the reaction of people in regards to events and the effect of the conveyed information on people's emotions increases the need for combining event extraction methodologies with summarization methodologies.

Event-based summarization
While summarization and event detection research was ongoing in different fields, at the beginning of 2000, researchers start to use event information in summarization. Filatova and Hatzivassiloglou [37] defined events as verbs/action nouns and named entities. They obtained the weight of events in sentences to identify the important sentences for construction of the summaries from multidocument text.
Filotova and Hatzivassiloglou [37] and Wu [38] showed that relevance between similar event terms improves the performance of the summarization. Xu et al. [39] derived event relevance from event ontology constructed with formal concept analysis (FCA) [29] from multidocument text. The ontology is built from a set of relevant documents and according to the named entities associated with the events. They showed that the relevant named entity recognition improves event-based summarization.
Liu et al. [40] worked on multidocument event-based summarization. They used semantic relevance obtained from the VerbOcean external linguistic resource together with the PageRank algorithm to evaluate the significance of sentences, in which the sentences with more event terms are assigned higher significance. Aksoy et al. [41] used semantic features like agent, patient, and instrument features obtained from SRL [42] on the DUC 2004 dataset. They observed that the SRL-based sentence scoring approach outperformed a term-to-term-based sentence scoring system. By considering the semantics of events, Glavas and Snajder [43] used event graphs to obtain summaries from multiple texts. In their model, sentences are selected based on event importance, which is constituted by participant importance, event informativeness, and temporal relations among events.
Many of the above event-based summarization techniques are used in multidocument summarization and also require different methodologies to eliminate the redundancy in summarization. Through the summarization methods, it is hard to identify event-based summarization of single documents. Therefore, despite event identification not being mentioned in their processes, because of the relatedness of the used techniques, some summarization methods are found to be related to event-based methods. One of the related method was proposed by Ferreira et al. [14]. In the proposed system 15 sentence-scoring methods are combined within three categories (word-based, sentence-based, graph-based) and tested on news articles, blogs, and scientific documents. In the experiments some of the combinations of these methods are also tested. As a result, it is observed that features such as TS-ISF, sentence position, lexical similarity, and sentence-title resemblance achieved the best performance for the news articles in single-document summarization. Again Oliveira et al. [44] presented the comparative analysis of 18 generic sentence-scoring techniques for single and multiple documents as an extension to Ferreira et al.'s work [14]. In their experiments each individual technique and different combinations of these techniques are tested on CNN, DUC 2001, and DUC 2002 news datasets. Although their combined techniques had a better performance, none of the combinations performed the best for all datasets.
Through the literature, it is easy to observe that news documents are the main evaluation ground for summarization methods. We believe that the main idea of news articles is highly dependent on events and articulated in event-intense sentences. However, the research on detecting events in news documents for summarization is still not well investigated. Therefore, in this project, summarization techniques are combined with various event-extraction methodologies. The event features that are most prominent in determining the main idea are identified to construct the most effective context elements of events to be used in obtaining the relevant summaries from single-document articles.

Proposed method
News sites such as CNN, 1 Rappler, 2 and BBC 3 inform users about current events. An event can contain several events or other dependent events, which indicates a semantic relationship among them [45]. Therefore, in this research the relationships between events and event semantics are investigated to extract the meaningful summaries as presented in the EBDS framework in the Figure  The first step of the framework is to preprocess the text through tokenization and part-of-speech (POS) [46] tagging to be used in further modules: feature detection, feature selection, and sentence selection of the system.

Feature detection
The preprocessed data are used to find the different features in the area of event extraction and summarization to be tested and used in the proposed summarization method.
The named entity recognition (NER) system is a well-known technique used for extracting meaningful pieces from text that are concerned with events. Named entities are also known as "atomic elements" in a sentence. In our study, 18 entity types of NER [47] such as person, organization, location, expressions of time, and money are extracted from sentences to constitute the named entity features by using the NLTK tool that is suitable for text processing libraries to provide tokenization, stemming, tagging, parsing, etc.
As verbs are the main entities of events, another set of features is extracted through SRL [42] to find predicate-argument structures including main verbs and participants in each input sentence. The Practical Natural Language Processing Tools (practnlptools) are used for extracting SRL features. By using SRL, it is possible to identify 17 features [48]. Among the 17 features the most relevant features to the event information, namely agent (A0), patient (A1), and verb (V) and their adjuncts temporal (TMP) and location (LOC), are chosen to be used in this work. The difference between the location attribute obtained by NER and SRL is that NER-location detects the location names whereas SRL-location identifies the location or spatial orientation of a state or action. An example of temporal and locative is presented in Example 1:
Also, additional features such as non-deverbal and deverbal nouns are obtained via WordNet [32] semantic ontology. In the detection of events, some of the rules mentioned by Kolya et al. [30] and their variations are used as follows:

Noun and Verb Senses (NAVS):
The words that are in the form of a noun (NN) POS categories are searched in WordNet and the ones with both noun and verb senses are considered as an event. For example, film, dream, etc.

After Stemming Verb Senses (ASV):
After stemming, nouns are looked for in WordNet. If the sense of the obtained word is a verb, the word is also being considered as an event. For instance, landings, heading, stands, etc.
Overall, through the above-mentioned methods, 11 features (NOUN, NE, NAVS, ASV, DEV, WAPREPOS, A0, A1, V, TMP, and LOC) are extracted to be candidates in the proposed event-based summarization as listed in Table 1.

Feature selection
The feature selection module is used to obtain important event features for summarization. Feature selection is an important process for data mining and pattern recognition. A good feature selection process helps to increase the performance of summarization [51].
Many well-developed statistical and mathematical methods are offered for feature selection [23,52]. As the aim of this work is to find the set of features that perform the best together, for selection of important 855 In this method, for each chromosome, the set of terms corresponding to the active feature of the chromosome is selected as a representative of each sentence, which then constitutes the summary of a text. Later the summary constructed for each chromosome is compared with DUC 2002 reference summaries by using the ROUGE-2 metric; Table 2 shows the ordered list of chromosomes based on their F-measures that are most effective to obtain relevant summaries.
The "SequenceofFeatures" column in Table 2 represents the chromosomes with active features of sequence such as {9, 10}, which corresponds to the chromosome (00000000011). Again the "Position" column represents the order of the sequences in terms of F-measure values. Note that the selected sequences in the top 10 do not have the first 10 position order. The selection of the sequences is based on position and also its identifying features. For example, although the sequence {4, 7, 9, 10} has position 6, as a result of the evaluation, it is not selected in the top 10 sequence list. The reason is that the sequence {4, 7, 9, 10} already contains the sequence {9, 10}, which has a position 1, and inclusion of features 4, 7 does not add any improvement in F-measure results.
Again in Table 2, since only a few feature terms are selected to represent a sentence, F-measure values are very low. However, we would like to note that the presented F-measure results are not used to judge the success of summarization but rather to evaluate the effectiveness of the features, which is used for feature selection. Therefore, from the top sequences, it is observed that summaries with various sequences of 6 features, namely TMP (9), LOC (10), NER (1), A0 (6), A1 (7), and WAPREPOS (5), are effective in extracting event-based summaries.

Sentence selection
In recent summarization tasks, the sentence position is widely used [55,56]. It has been argued that the first sentence of the document contains the most important information about the document that is crucial for solid summaries [57,58]. Therefore, in the constructed extract, the first sentence of the text is added to the summaries regardless of the other features.
Apart from the first sentence, the other sentences in the document are evaluated based on the sequence of features as presented in Table 2. From the top, each sequence of features is checked in a sentence as presented in Algorithm 1, and once the feature sequence is detected, the rest of the sequences are neglected. Therefore, the found features from a sequence are used to calculate the importance score of a sentence as follows: where i ε {0, 1, 2…..n-1}, and n is a number of features in a sequences for a sentence. F(i) is a number of occurrences of the i th feature in a sentence. Once the sentences' importance values are calculated, the sentences with top scores are selected to construct the 100-word summaries of a document. To prevent the construction of longer summaries, in case the added sentence exceeds 100 words it is eliminated from the summary.

Evaluation metrics
Over the years, different evaluation metrics [53,59] have been proposed to evaluate the quality of the generated summaries. Among them, the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) [53] is found to be the most correlated to human evaluation [34]. ROUGE includes several measures such as ROUGE-N, ROUGE-L, and ROUGE-SU. Liu [34] where N is the length of an N-gram, count match (gram n ) refers to the maximum length of n-grams appearing in both a candidate summary and reference summaries, and count(gram n ) in the denominator Equation (2) refers to the n-grams appearing in the reference summary. Based on the similarity to human judgment and the popularity, the performance of the proposed method in this paper is compared with ROUGE-1 and ROUGE-2 evaluation metrics. The system evaluation is done between the generated summaries and the total number of model summaries of each dataset by using the ROUGE 2.0 Java Package.

Performance evaluation
The proposed EBDS system's performance is compared with 17 well-known or recent single-document summarization methods that are explained in Section 2. Comparison of the proposed method is made against the following methods: (a) well known graph-based methods of TextRank [10], LexRank [13], and Garcia's [15] coherence scoring method (Coh-SingleDocSum 6 ); (b) state of the art dimension reduction methods including LSA [16], LDA [20], NMF [22], STEIN [17], and OZ [60]; (c) the best results of Long et al. [ the baseline, which is the first n sentences that do not exceed 100 words. Table 3  are observed for R-2 in the DUC 2001 and DUC 2002 datasets, proving the competitiveness of the EBDS and recent methods in obtaining the summary sentences. In the tests, summaries obtained by dimension reduction methods fall behind many of the machine learning and recent methods while performing better than the graphbased methods. As a related method to EBDS, WA-CombA is able to get into the list of the top best-performing algorithms while the other related method, HP-UFPE FS, did not achieve comparable results.

Conclusion
In this study, a summarization method that takes into account the tight relationship between news and events is used. Although most of the datasets used in summarization include news articles, the relation between the main idea of the news and the event context has not been investigated in much detail. By the introduced EBDS framework the most effective event extraction and summarization features are combined together to extract summaries from single documents. Among the various event features, six features, TMP, LOC, NER, A0, A1, and WAPREPOS, are found to be effective for producing coherent summaries. In particular, sentences containing TMP, LOC, A0, and A1 features together with the first sentence have been proven to be the most effective in representing the relevant summaries.
The proposed EBDS method is evaluated by R-1 and R-2 performance measures on the DUC 2001 and DUC 2002 datasets. In the experiments, the presented EBDS method outperformed all the compared methods in the R-2 measure and obtained comparable results in the R-1 measure by achieving the third highest score in the ranking. The analysis of the results indicates that the features relating to the context of events has a close relation to the main idea in the text, which can further be analyzed in more detail to better improve the quality of summaries.