Unraveling the capabilities of language models in news summarization performance evaluation and comparative study

Odabaşı, Abdurrahman

dc.contributor.advisor	Biricik, Göksel
dc.contributor.author	Odabaşı, Abdurrahman
dc.date.accessioned	2024-11-27T06:47:47Z
dc.date.available	2024-11-27T06:47:47Z
dc.date.issued	2024	en_US
dc.date.submitted	2024-07-18
dc.identifier.citation	Odabaşı, A. (2024). Unraveling the capabilities of language models in news summarization performance evaluation and comparative study. Türk-Alman Üniversitesi, Fen Bilimleri Enstitüsü, Bilgisayar Mühendisliği, Yüksek Lisans Programı.	en_US
dc.identifier.uri	https://hdl.handle.net/20.500.12846/1416
dc.description.abstract	Given the recent introduction of multiple public Large Language Models (LLMs) andthe ongoing demand for improved Natural Language Processing tasks, particularlysummarization, this thesis provides a comprehensive benchmarking of 20 recent LLMson the news summarization task. The study systematically evaluates the capabilityand effectiveness of these models in summarizing news articles across different styles,utilizing three distinct datasets. Specifically, this study focuses on zero-shot and few-shot learning settings, employing a robust evaluation methodology that integratesautomatic metrics, human evaluation, and LLM-as-a-judge. Interestingly, includingdemonstration examples in the few-shot learning setting did not enhance models’ per-formance and, in some cases, even led to worse outcomes. This issue arises mainlydue to the poor quality of the gold summaries used as references, which hinders themodels’ learning process and negatively impacts their performance. Furthermore, ourstudy’s results highlight the exceptional performance of GPT-3.5 and GPT-4, whichgenerally dominate due to their advanced capabilities. However, among the publicmodels evaluated, certain models such as Qwen1.5-7B, SOLAR-10.7B-Instruct-v1.0,and Zephyr-7B-Beta demonstrated promising results. These models showed signifi-cant potential, positioning them as competitive alternatives to private models for thetask of news summarization.	en_US
dc.language.iso	eng	en_US
dc.publisher	Türk-Alman Üniversitesi Fen Bilimler Enstitüsü	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Automatic text summarization	en_US
dc.subject	News summarization	en_US
dc.subject	Natural language generation	en_US
dc.subject	Generative arti-ficial intelligence	en_US
dc.subject	In context learningvi	en_US
dc.title	Unraveling the capabilities of language models in news summarization performance evaluation and comparative study	en_US
dc.type	masterThesis	en_US
dc.relation.publicationcategory	Tez	en_US
dc.contributor.department	TAÜ	en_US

Bu öğenin dosyaları:

Ad:: Odabaşı, Abdurrahman.pdf
Boyut:: 16.56Mb
Biçim:: PDF
Açıklama:: Tez

Göster/Aç

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Tez Koleksiyonu [2]

Basit öğe kaydını göster