HTV-News: A New Dataset with High Novelty Rate for Turkish Text Summarization

dc.contributor.authorKayali, Nihal Zuhal
dc.contributor.authorOmurca, Sevinç Ilhan
dc.date.accessioned2025-02-20T08:46:30Z
dc.date.available2025-02-20T08:46:30Z
dc.date.issued2024
dc.departmentTürk-Alman Üniversitesien_US
dc.description9th International Conference on Computer Science and Engineering, UBMK 2024 -- 26 October 2024 through 28 October 2024 -- Antalya -- 204906en_US
dc.description.abstractIn this study, we introduce 'HTV-News', a dataset comprising over 173,000 news articles and summary pairs, extracted from an online news website covering various topics. Automatic Text Summarization (ATS), a solution to the challenge of rapidly extracting and understanding necessary information from extensive data piles, is recognized as a demanding task in the field of Natural Language Processing. This solution is typically evaluated using various methods on existing datasets in the literature. One of the difficulties in Text Summarization Research is the creation of a public, large-scale, high-quality dataset. However, the lack of datasets suitable for text summarization in Turkish poses a problem, limiting the scope of research in the Turkish language domain. HTV-News, a novel data set that can be preferred in both extractive and abstractive methods, was compared statistically with other data sets. The summarization performance of the dataset is presented using novelty rate, ROUGE, and manual evaluation metrics through models such as BERTurk, mT5 and mBART. Compared to existing datasets used in the field of summarization, HTV-News has several notable features: i) It surpasses other data sets in terms of innovation rate, ii) It exhibits high-level results when considering the compression ratio, iii) It achieved the most successful results in most of the models. © 2024 IEEE.
dc.identifier.doi10.1109/UBMK63289.2024.10773431
dc.identifier.endpage6en_US
dc.identifier.isbn979-835036588-7
dc.identifier.scopus2-s2.0-85215511666
dc.identifier.startpage1en_US
dc.identifier.urihttps://doi.org/10.1109/UBMK63289.2024.10773431
dc.identifier.urihttps://hdl.handle.net/20.500.12846/1759
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherInstitute of Electrical and Electronics Engineers Inc.
dc.relation.ispartofUBMK 2024 - Proceedings: 9th International Conference on Computer Science and Engineering
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.snmzKA_Scopus_20250220
dc.subjectAbstractive Text Summarizationen_US
dc.subjectAutomatic Text Summarizationen_US
dc.subjectNatural Language Processingen_US
dc.subjectROUGEen_US
dc.subjectTransformersen_US
dc.titleHTV-News: A New Dataset with High Novelty Rate for Turkish Text Summarization
dc.typeConference Object

Dosyalar