ADA Library Digital Repository

Computing Infrastructure and Data Pipeline for Enterprise-scale Data Preparation: Scalability Optimization Study

Show simple item record

dc.contributor.author Akhund, Sadig
dc.date.accessioned 2024-12-19T23:25:39Z
dc.date.available 2024-12-19T23:25:39Z
dc.date.issued 2023-04
dc.identifier.uri http://hdl.handle.net/20.500.12181/926
dc.description.abstract In today's data-driven landscape, enterprises face significant challenges in managing and processing massive amounts of data for meaningful insights and informed decision-making. Data preparation, a critical process that converts raw data into a usable format, plays a pivotal role in the data pipeline and significantly impacts downstream data analysis and modeling. However, traditional data preparation methods may struggle to keep up with the increasing volumes and complexity of data, leading to scalability issues, inefficiencies, delays, and suboptimal performance in the data pipeline. This thesis presents a comprehensive scalability optimization study that analyzes and optimizes the data preparation process in enterprise grade data pipelines. The study begins by analyzing common components of data pipelines and identifying limitations and bottlenecks that hinder scalability. It thoroughly examines existing data preparation methods, tools, and technologies, as well as cutting-edge tools and methodologies such as Apache Nifi, Apache Atlas, and Apache Spark for addressing scalability challenges. The research draws insights from literature, industry practices, and state-of-the-art technologies to propose practical strategies and recommendations for designing a scalable data strategy in an enterprise setting. The study provides actionable insights and recommendations to enhance the performance of data pipelines in enterprise grade data environments. The paper concludes with a summary of key findings, limitations, and future research directions, emphasizing the need for a well-designed data preparation pipeline that incorporates scalable data ingestion, efficient data transformation, and intelligent data storage strategies to ensure reliable and efficient data processing in enterprises dealing with large volumes of data. en_US
dc.language.iso en en_US
dc.rights Attribution-NonCommercial-NoDerivs 3.0 United States *
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/3.0/us/ *
dc.subject Data processing -- Scalability en_US
dc.subject Big data -- Management en_US
dc.subject Database management -- Optimization en_US
dc.title Computing Infrastructure and Data Pipeline for Enterprise-scale Data Preparation: Scalability Optimization Study en_US
dc.type Thesis en_US


Files in this item

The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 United States Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States

Search ADA LDR


Advanced Search

Browse

My Account