Conceptualization of a data integration and storage framework to enable predictive analytics and application of data analytical algorithms in the manufacturing industry
Research output: Thesis › Master's Thesis
Standard
2021.
Research output: Thesis › Master's Thesis
Harvard
APA
Vancouver
Author
Bibtex - Download
}
RIS (suitable for import to EndNote) - Download
TY - THES
T1 - Conceptualization of a data integration and storage framework to enable predictive analytics and application of data analytical algorithms in the manufacturing industry
AU - Kernbauer, Stefan Philip
N1 - embargoed until 02-02-2023
PY - 2021
Y1 - 2021
N2 - This master’s thesis addresses methods and best practises for data integration and storage as well as measures to assess and assure data quality. Furthermore, algorithms for descriptive analytics are described and applied. While the first part is dedicated to the theoretical elaboration of these techniques, the second part looks at the practical application of the mentioned methods. In collaboration with an enterprise from the automotive industry, a critical review of the implemented IT solution is conducted before presenting an IT architecture, which fits the purposes of predictive analytics in the manufacturing industry. Additionally, the data maturity model, as well as descriptive algorithms, are practically applied, and the results are discussed. Building on the methodical approach of the Data Mining Method for Engineers (DMME), which is based on the CRISP-DM, the first theoretical emphasis of this thesis is laid on data integration. By use of the PERFoRM guideline for system architecture, a comprehensive overview of the requirements for system architecture and middleware in modern manufacturing processes are given. This chapter concludes by giving a comprehensive overview of some of the most used middleware solutions. Subsequently, the mechanics of data storage are described. This includes relational (SQL) as well as non-relational (NoSQL) database structures and popular representatives of the respective groups. Besides data integration and storage, data quality assessment and assurance are another focus of this master’s thesis. Therefore, the different data quality dimensions are highlighted with special respect to the aspect of sensor data quality. In addition, methods for automated assessment and improvement of data quality are presented. With the introduction of the data maturity model, a method that facilitates the determination of the ability to answer business problems through data analytics is characterized. To enable analytics of the available data, the theoretical part concludes with a description of methods applicable for the data at hand. The techniques described in the theoretical part are aligned with the practical part of the master’s thesis, thus focussing on exploratory data analytics (e.g., principal component analysis) and descriptive algorithms such as clustering methods. The practical part of the master’s thesis includes a critical review of the current solution for data integration and storage in the manufacturing company. In addition to that, a scalable and performant IT infrastructure, which allows the implementation of predictive analytics at a large scale, is presented. Based on the data available, the data maturity model is applied, giving insights on the ability of the business to implement data analytics. Although the result of the data maturity model yields in most aspects a high degree of maturity, the dimension of data quantity has repercussions on the number of algorithms applicable to the data. Hence, the focus of the applied algorithms is of explorative and descriptive nature. Nevertheless, the outcome of the algorithms is promising and can be used as a proof-of-concept for future applications of predictive analytics.
AB - This master’s thesis addresses methods and best practises for data integration and storage as well as measures to assess and assure data quality. Furthermore, algorithms for descriptive analytics are described and applied. While the first part is dedicated to the theoretical elaboration of these techniques, the second part looks at the practical application of the mentioned methods. In collaboration with an enterprise from the automotive industry, a critical review of the implemented IT solution is conducted before presenting an IT architecture, which fits the purposes of predictive analytics in the manufacturing industry. Additionally, the data maturity model, as well as descriptive algorithms, are practically applied, and the results are discussed. Building on the methodical approach of the Data Mining Method for Engineers (DMME), which is based on the CRISP-DM, the first theoretical emphasis of this thesis is laid on data integration. By use of the PERFoRM guideline for system architecture, a comprehensive overview of the requirements for system architecture and middleware in modern manufacturing processes are given. This chapter concludes by giving a comprehensive overview of some of the most used middleware solutions. Subsequently, the mechanics of data storage are described. This includes relational (SQL) as well as non-relational (NoSQL) database structures and popular representatives of the respective groups. Besides data integration and storage, data quality assessment and assurance are another focus of this master’s thesis. Therefore, the different data quality dimensions are highlighted with special respect to the aspect of sensor data quality. In addition, methods for automated assessment and improvement of data quality are presented. With the introduction of the data maturity model, a method that facilitates the determination of the ability to answer business problems through data analytics is characterized. To enable analytics of the available data, the theoretical part concludes with a description of methods applicable for the data at hand. The techniques described in the theoretical part are aligned with the practical part of the master’s thesis, thus focussing on exploratory data analytics (e.g., principal component analysis) and descriptive algorithms such as clustering methods. The practical part of the master’s thesis includes a critical review of the current solution for data integration and storage in the manufacturing company. In addition to that, a scalable and performant IT infrastructure, which allows the implementation of predictive analytics at a large scale, is presented. Based on the data available, the data maturity model is applied, giving insights on the ability of the business to implement data analytics. Although the result of the data maturity model yields in most aspects a high degree of maturity, the dimension of data quantity has repercussions on the number of algorithms applicable to the data. Hence, the focus of the applied algorithms is of explorative and descriptive nature. Nevertheless, the outcome of the algorithms is promising and can be used as a proof-of-concept for future applications of predictive analytics.
KW - Data Integration
KW - Data Storage
KW - Data Analytics
KW - Data quality
KW - Data conditioning
KW - Principal Component Analysis
KW - Datenanalyse
KW - Datenintegration
KW - Datenspeicherung
KW - Digitalisierung
M3 - Master's Thesis
ER -