Computing Infrastructure and Data Pipeline for Enterprise-scale Data Preparation: Scalability Optimization Study

Akhund, Sadig

Home
→
CB5. ADA Theses, Dissertations and Final Projects
→
School of Information Technologies and Engineering
→
View Item

dc.contributor.author	Akhund, Sadig
dc.date.accessioned	2024-12-19T23:25:39Z
dc.date.available	2024-12-19T23:25:39Z
dc.date.issued	2023-04
dc.identifier.uri	http://hdl.handle.net/20.500.12181/926
dc.description.abstract	In today's data-driven landscape, enterprises face significant challenges in managing and processing massive amounts of data for meaningful insights and informed decision-making. Data preparation, a critical process that converts raw data into a usable format, plays a pivotal role in the data pipeline and significantly impacts downstream data analysis and modeling. However, traditional data preparation methods may struggle to keep up with the increasing volumes and complexity of data, leading to scalability issues, inefficiencies, delays, and suboptimal performance in the data pipeline. This thesis presents a comprehensive scalability optimization study that analyzes and optimizes the data preparation process in enterprise grade data pipelines. The study begins by analyzing common components of data pipelines and identifying limitations and bottlenecks that hinder scalability. It thoroughly examines existing data preparation methods, tools, and technologies, as well as cutting-edge tools and methodologies such as Apache Nifi, Apache Atlas, and Apache Spark for addressing scalability challenges. The research draws insights from literature, industry practices, and state-of-the-art technologies to propose practical strategies and recommendations for designing a scalable data strategy in an enterprise setting. The study provides actionable insights and recommendations to enhance the performance of data pipelines in enterprise grade data environments. The paper concludes with a summary of key findings, limitations, and future research directions, emphasizing the need for a well-designed data preparation pipeline that incorporates scalable data ingestion, efficient data transformation, and intelligent data storage strategies to ensure reliable and efficient data processing in enterprises dealing with large volumes of data.	en_US
dc.language.iso	en	en_US
dc.relation	School of IT and Engineering	en_US
dc.relation	Graduate program	en_US
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/us/	*
dc.subject	Data processing -- Scalability	en_US
dc.subject	Big data -- Management	en_US
dc.subject	Database management -- Optimization	en_US
dc.subject	IT and Engineering	en_US
dc.title	Computing Infrastructure and Data Pipeline for Enterprise-scale Data Preparation: Scalability Optimization Study	en_US
dc.type	Thesis	en_US