Unraveling the capabilities of language models in news summarization performance evaluation and comparative study

Yükleniyor...
Küçük Resim

Tarih

2024

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Türk-Alman Üniversitesi Fen Bilimler Enstitüsü

Erişim Hakkı

info:eu-repo/semantics/openAccess

Özet

Given the recent introduction of multiple public Large Language Models (LLMs) andthe ongoing demand for improved Natural Language Processing tasks, particularlysummarization, this thesis provides a comprehensive benchmarking of 20 recent LLMson the news summarization task. The study systematically evaluates the capabilityand effectiveness of these models in summarizing news articles across different styles,utilizing three distinct datasets. Specifically, this study focuses on zero-shot and few-shot learning settings, employing a robust evaluation methodology that integratesautomatic metrics, human evaluation, and LLM-as-a-judge. Interestingly, includingdemonstration examples in the few-shot learning setting did not enhance models’ per-formance and, in some cases, even led to worse outcomes. This issue arises mainlydue to the poor quality of the gold summaries used as references, which hinders themodels’ learning process and negatively impacts their performance. Furthermore, ourstudy’s results highlight the exceptional performance of GPT-3.5 and GPT-4, whichgenerally dominate due to their advanced capabilities. However, among the publicmodels evaluated, certain models such as Qwen1.5-7B, SOLAR-10.7B-Instruct-v1.0,and Zephyr-7B-Beta demonstrated promising results. These models showed signifi-cant potential, positioning them as competitive alternatives to private models for thetask of news summarization.

Açıklama

Anahtar Kelimeler

Automatic text summarization, News summarization, Natural language generation, Generative arti-ficial intelligence, In context learningvi

Kaynak

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye

Odabaşı, A. (2024). Unraveling the capabilities of language models in news summarization performance evaluation and comparative study. Türk-Alman Üniversitesi, Fen Bilimleri Enstitüsü, Bilgisayar Mühendisliği, Yüksek Lisans Programı.

Koleksiyon