Abstract:
The thesis proposes a design and practical implementation of a database system that is made for holding and managing the descriptive data derived from various repositories. The driver of this project is the goal of giving researchers, data scientists, and enthusiasts a place to access, probe, which is to help analyze, data spread across different repositories. The main purpose is building up a powerful and easy to be used database solution which will be able to keep in memory a wide range of data descriptions and also provide efficient storing, retrieving and querying functions. For the implementing of the project, Mongodb as a NoSQL database was chosen because of following reasons. MongoDB provides a schemafree framework to be able to keep documents with various data types which do not belong to a predefined schema. This is necessary due to the fact that the structures of the dataset schemas are quite different in the data repositories. On the top of that, the flexibility and high performance of MongoDB render it a reliable solution for handling big volumes of data as well as to enable concurrent access for many users. The UCI World Machine Learning Repository, Kaggle and OpenML are the main data repositories in this database. As a initial phase, 100 documents exist in the MongoDB database. Other than that, MongoDB makes provision for lots of querying mechanisms such as Mongo Shell, MongoDB Compass, graphical user interface (GUI), which are meant to enhance the users' choice of interacting with the database depending on the preference, skill and expertise. Additionally, new GUI for querying datasets information was created using flask in the python code, and GUI improves the overall usability and accessibility of the database system by giving users an easy-to-use interface to search, filter, and view dataset descriptions. Broadly, the database system designed and described in this paper is a convenient asset for the data science community