undefined

Exploring the Performance of Large Language Models for Data Analysis Tasks Through the CRISP-DM Framework

Year of publication

2024

Authors

Nurlan Musazade; Jozsef Mezei; Xiaolu Wang

Abstract

This paper investigates the impact of Large Language Models (LLMs), specifically GPT, on data analysis tasks within the framework of CRISP-DM (Cross-Industry Standard Process for Data Mining). In order to assess the efficiency of text-to-code language models in data-related tasks, we systematically examine the performance of LLMs in the stages of the data mining process. GPT models are tested against a series of Python programming and SQL tasks derived from a Master’s program’s curriculum. The tasks focus on data exploration, visualization, preprocessing, and advanced analytical tasks like association rule mining and classification. The findings show that GPT models exhibit proficiency in Python programming across various CRISP-DM stages, particularly in Data Understanding, Preparation, and Modeling. They adeptly utilize Python libraries for data manipulation and visualization, demonstrating potential as effective tools in data science. However, the study also uncovers areas where the GPT Text-to-code model shows partial correctness, highlighting the need for human oversight in complex data analysis scenarios. This research contributes to understanding how AI can augment traditional data analysis methods, particularly under the CRISP-DM framework. It reveals the potential of LLMs in automating stages of data analysis, suggesting an acceleration in analytical processes and decision-making. The study provides valuable insights for organizations integrating AI into data analysis, balancing AI strengths with human expertise.
Show more

Organizations and authors

Åbo Akademi University

Musazade Nurlan

Mezei Jozsef

Wang Xiaolu

Publication type

Publication format

Article

Parent publication type

Conference

Article type

Other article

Audience

Scientific

Peer-reviewed

Peer-Reviewed

MINEDU's publication type classification code

A4 Article in conference proceedings

Open access

Open access in the publisher’s service

No

Self-archived

No

Other information

Fields of science

Computer and information sciences; Business and management

Keywords

[object Object],[object Object],[object Object],[object Object]

Internationality of the publisher

International

Language

English

International co-publication

No

Co-publication with a company

No

DOI

10.1007/978-3-031-60227-6_5

The publication is included in the Ministry of Education and Culture’s Publication data collection

Yes