Exploring the Performance of Large Language Models for Data Analysis Tasks Through the CRISP-DM Framework
Year of publication
2024
Authors
Nurlan Musazade; Jozsef Mezei; Xiaolu Wang
Abstract
This paper investigates the impact of Large Language Models (LLMs), specifically GPT, on data analysis tasks within the framework of CRISP-DM (Cross-Industry Standard Process for Data Mining). In order to assess the efficiency of text-to-code language models in data-related tasks, we systematically examine the performance of LLMs in the stages of the data mining process. GPT models are tested against a series of Python programming and SQL tasks derived from a Master’s program’s curriculum. The tasks focus on data exploration, visualization, preprocessing, and advanced analytical tasks like association rule mining and classification. The findings show that GPT models exhibit proficiency in Python programming across various CRISP-DM stages, particularly in Data Understanding, Preparation, and Modeling. They adeptly utilize Python libraries for data manipulation and visualization, demonstrating potential as effective tools in data science. However, the study also uncovers areas where the GPT Text-to-code model shows partial correctness, highlighting the need for human oversight in complex data analysis scenarios. This research contributes to understanding how AI can augment traditional data analysis methods, particularly under the CRISP-DM framework. It reveals the potential of LLMs in automating stages of data analysis, suggesting an acceleration in analytical processes and decision-making. The study provides valuable insights for organizations integrating AI into data analysis, balancing AI strengths with human expertise.
Show moreOrganizations and authors
Publication type
Publication format
Article
Parent publication type
Conference
Article type
Other article
Audience
ScientificPeer-reviewed
Peer-ReviewedMINEDU's publication type classification code
A4 Article in conference proceedingsPublication channel information
Journal/Series
Good Practices and New Perspectives in Information Systems and Technologies - WorldCIST 2024
Parent publication name
Good Practices and New Perspectives in Information Systems and Technologies - WorldCIST 2024
Volume
989
Pages
56-65
ISSN
ISBN
Publication forum
Publication forum level
1
Open access
Open access in the publisher’s service
No
Self-archived
No
Other information
Fields of science
Computer and information sciences; Business and management
Keywords
[object Object],[object Object],[object Object],[object Object]
Internationality of the publisher
International
Language
English
International co-publication
No
Co-publication with a company
No
DOI
10.1007/978-3-031-60227-6_5
The publication is included in the Ministry of Education and Culture’s Publication data collection
Yes