dc.contributor.advisor | Biricik, Göksel | |
dc.contributor.author | Odabaşı, Abdurrahman | |
dc.date.accessioned | 2024-11-27T06:47:47Z | |
dc.date.available | 2024-11-27T06:47:47Z | |
dc.date.issued | 2024 | en_US |
dc.date.submitted | 2024-07-18 | |
dc.identifier.citation | Odabaşı, A. (2024). Unraveling the capabilities of language models in news summarization performance evaluation and comparative study. Türk-Alman Üniversitesi, Fen Bilimleri Enstitüsü, Bilgisayar Mühendisliği, Yüksek Lisans Programı. | en_US |
dc.identifier.uri | https://hdl.handle.net/20.500.12846/1416 | |
dc.description.abstract | Given the recent introduction of multiple public Large Language Models (LLMs) andthe ongoing demand for improved Natural Language Processing tasks, particularlysummarization, this thesis provides a comprehensive benchmarking of 20 recent LLMson the news summarization task. The study systematically evaluates the capabilityand effectiveness of these models in summarizing news articles across different styles,utilizing three distinct datasets. Specifically, this study focuses on zero-shot and few-shot learning settings, employing a robust evaluation methodology that integratesautomatic metrics, human evaluation, and LLM-as-a-judge. Interestingly, includingdemonstration examples in the few-shot learning setting did not enhance models’ per-formance and, in some cases, even led to worse outcomes. This issue arises mainlydue to the poor quality of the gold summaries used as references, which hinders themodels’ learning process and negatively impacts their performance. Furthermore, ourstudy’s results highlight the exceptional performance of GPT-3.5 and GPT-4, whichgenerally dominate due to their advanced capabilities. However, among the publicmodels evaluated, certain models such as Qwen1.5-7B, SOLAR-10.7B-Instruct-v1.0,and Zephyr-7B-Beta demonstrated promising results. These models showed signifi-cant potential, positioning them as competitive alternatives to private models for thetask of news summarization. | en_US |
dc.language.iso | eng | en_US |
dc.publisher | Türk-Alman Üniversitesi Fen Bilimler Enstitüsü | en_US |
dc.rights | info:eu-repo/semantics/openAccess | en_US |
dc.subject | Automatic text summarization | en_US |
dc.subject | News summarization | en_US |
dc.subject | Natural language generation | en_US |
dc.subject | Generative arti-ficial intelligence | en_US |
dc.subject | In context learningvi | en_US |
dc.title | Unraveling the capabilities of language models in news summarization performance evaluation and comparative study | en_US |
dc.type | masterThesis | en_US |
dc.relation.publicationcategory | Tez | en_US |
dc.contributor.department | TAÜ | en_US |