HTV-News: A New Dataset with High Novelty Rate for Turkish Text Summarization

[ X ]

Tarih

2024

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Institute of Electrical and Electronics Engineers Inc.

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

In this study, we introduce 'HTV-News', a dataset comprising over 173,000 news articles and summary pairs, extracted from an online news website covering various topics. Automatic Text Summarization (ATS), a solution to the challenge of rapidly extracting and understanding necessary information from extensive data piles, is recognized as a demanding task in the field of Natural Language Processing. This solution is typically evaluated using various methods on existing datasets in the literature. One of the difficulties in Text Summarization Research is the creation of a public, large-scale, high-quality dataset. However, the lack of datasets suitable for text summarization in Turkish poses a problem, limiting the scope of research in the Turkish language domain. HTV-News, a novel data set that can be preferred in both extractive and abstractive methods, was compared statistically with other data sets. The summarization performance of the dataset is presented using novelty rate, ROUGE, and manual evaluation metrics through models such as BERTurk, mT5 and mBART. Compared to existing datasets used in the field of summarization, HTV-News has several notable features: i) It surpasses other data sets in terms of innovation rate, ii) It exhibits high-level results when considering the compression ratio, iii) It achieved the most successful results in most of the models. © 2024 IEEE.

Açıklama

9th International Conference on Computer Science and Engineering, UBMK 2024 -- 26 October 2024 through 28 October 2024 -- Antalya -- 204906

Anahtar Kelimeler

Abstractive Text Summarization, Automatic Text Summarization, Natural Language Processing, ROUGE, Transformers

Kaynak

UBMK 2024 - Proceedings: 9th International Conference on Computer Science and Engineering

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye