Kayali, Nihal ZuhalOmurca, Sevinç Ilhan2025-02-202025-02-202024979-835036588-7https://doi.org/10.1109/UBMK63289.2024.10773431https://hdl.handle.net/20.500.12846/17599th International Conference on Computer Science and Engineering, UBMK 2024 -- 26 October 2024 through 28 October 2024 -- Antalya -- 204906In this study, we introduce 'HTV-News', a dataset comprising over 173,000 news articles and summary pairs, extracted from an online news website covering various topics. Automatic Text Summarization (ATS), a solution to the challenge of rapidly extracting and understanding necessary information from extensive data piles, is recognized as a demanding task in the field of Natural Language Processing. This solution is typically evaluated using various methods on existing datasets in the literature. One of the difficulties in Text Summarization Research is the creation of a public, large-scale, high-quality dataset. However, the lack of datasets suitable for text summarization in Turkish poses a problem, limiting the scope of research in the Turkish language domain. HTV-News, a novel data set that can be preferred in both extractive and abstractive methods, was compared statistically with other data sets. The summarization performance of the dataset is presented using novelty rate, ROUGE, and manual evaluation metrics through models such as BERTurk, mT5 and mBART. Compared to existing datasets used in the field of summarization, HTV-News has several notable features: i) It surpasses other data sets in terms of innovation rate, ii) It exhibits high-level results when considering the compression ratio, iii) It achieved the most successful results in most of the models. © 2024 IEEE.eninfo:eu-repo/semantics/closedAccessAbstractive Text SummarizationAutomatic Text SummarizationNatural Language ProcessingROUGETransformersHTV-News: A New Dataset with High Novelty Rate for Turkish Text SummarizationConference Object10.1109/UBMK63289.2024.10773431162-s2.0-85215511666