Conceptualization of a data integration and storage framework to enable predictive analytics and application of data analytical algorithms in the manufacturing industry

Research output: ThesisMaster's Thesis

Bibtex - Download

@mastersthesis{c2b6751771974bbaa35b0f529d29d242,
title = "Conceptualization of a data integration and storage framework to enable predictive analytics and application of data analytical algorithms in the manufacturing industry",
abstract = "This master{\textquoteright}s thesis addresses methods and best practises for data integration and storage as well as measures to assess and assure data quality. Furthermore, algorithms for descriptive analytics are described and applied. While the first part is dedicated to the theoretical elaboration of these techniques, the second part looks at the practical application of the mentioned methods. In collaboration with an enterprise from the automotive industry, a critical review of the implemented IT solution is conducted before presenting an IT architecture, which fits the purposes of predictive analytics in the manufacturing industry. Additionally, the data maturity model, as well as descriptive algorithms, are practically applied, and the results are discussed. Building on the methodical approach of the Data Mining Method for Engineers (DMME), which is based on the CRISP-DM, the first theoretical emphasis of this thesis is laid on data integration. By use of the PERFoRM guideline for system architecture, a comprehensive overview of the requirements for system architecture and middleware in modern manufacturing processes are given. This chapter concludes by giving a comprehensive overview of some of the most used middleware solutions. Subsequently, the mechanics of data storage are described. This includes relational (SQL) as well as non-relational (NoSQL) database structures and popular representatives of the respective groups. Besides data integration and storage, data quality assessment and assurance are another focus of this master{\textquoteright}s thesis. Therefore, the different data quality dimensions are highlighted with special respect to the aspect of sensor data quality. In addition, methods for automated assessment and improvement of data quality are presented. With the introduction of the data maturity model, a method that facilitates the determination of the ability to answer business problems through data analytics is characterized. To enable analytics of the available data, the theoretical part concludes with a description of methods applicable for the data at hand. The techniques described in the theoretical part are aligned with the practical part of the master{\textquoteright}s thesis, thus focussing on exploratory data analytics (e.g., principal component analysis) and descriptive algorithms such as clustering methods. The practical part of the master{\textquoteright}s thesis includes a critical review of the current solution for data integration and storage in the manufacturing company. In addition to that, a scalable and performant IT infrastructure, which allows the implementation of predictive analytics at a large scale, is presented. Based on the data available, the data maturity model is applied, giving insights on the ability of the business to implement data analytics. Although the result of the data maturity model yields in most aspects a high degree of maturity, the dimension of data quantity has repercussions on the number of algorithms applicable to the data. Hence, the focus of the applied algorithms is of explorative and descriptive nature. Nevertheless, the outcome of the algorithms is promising and can be used as a proof-of-concept for future applications of predictive analytics.",
keywords = "Data Integration, Data Storage, Data Analytics, Data quality, Data conditioning, Principal Component Analysis, Datenanalyse, Datenintegration, Datenspeicherung, Digitalisierung",
author = "Kernbauer, {Stefan Philip}",
note = "embargoed until 02-02-2023",
year = "2021",
language = "English",
school = "Montanuniversitaet Leoben (000)",

}

RIS (suitable for import to EndNote) - Download

TY - THES

T1 - Conceptualization of a data integration and storage framework to enable predictive analytics and application of data analytical algorithms in the manufacturing industry

AU - Kernbauer, Stefan Philip

N1 - embargoed until 02-02-2023

PY - 2021

Y1 - 2021

N2 - This master’s thesis addresses methods and best practises for data integration and storage as well as measures to assess and assure data quality. Furthermore, algorithms for descriptive analytics are described and applied. While the first part is dedicated to the theoretical elaboration of these techniques, the second part looks at the practical application of the mentioned methods. In collaboration with an enterprise from the automotive industry, a critical review of the implemented IT solution is conducted before presenting an IT architecture, which fits the purposes of predictive analytics in the manufacturing industry. Additionally, the data maturity model, as well as descriptive algorithms, are practically applied, and the results are discussed. Building on the methodical approach of the Data Mining Method for Engineers (DMME), which is based on the CRISP-DM, the first theoretical emphasis of this thesis is laid on data integration. By use of the PERFoRM guideline for system architecture, a comprehensive overview of the requirements for system architecture and middleware in modern manufacturing processes are given. This chapter concludes by giving a comprehensive overview of some of the most used middleware solutions. Subsequently, the mechanics of data storage are described. This includes relational (SQL) as well as non-relational (NoSQL) database structures and popular representatives of the respective groups. Besides data integration and storage, data quality assessment and assurance are another focus of this master’s thesis. Therefore, the different data quality dimensions are highlighted with special respect to the aspect of sensor data quality. In addition, methods for automated assessment and improvement of data quality are presented. With the introduction of the data maturity model, a method that facilitates the determination of the ability to answer business problems through data analytics is characterized. To enable analytics of the available data, the theoretical part concludes with a description of methods applicable for the data at hand. The techniques described in the theoretical part are aligned with the practical part of the master’s thesis, thus focussing on exploratory data analytics (e.g., principal component analysis) and descriptive algorithms such as clustering methods. The practical part of the master’s thesis includes a critical review of the current solution for data integration and storage in the manufacturing company. In addition to that, a scalable and performant IT infrastructure, which allows the implementation of predictive analytics at a large scale, is presented. Based on the data available, the data maturity model is applied, giving insights on the ability of the business to implement data analytics. Although the result of the data maturity model yields in most aspects a high degree of maturity, the dimension of data quantity has repercussions on the number of algorithms applicable to the data. Hence, the focus of the applied algorithms is of explorative and descriptive nature. Nevertheless, the outcome of the algorithms is promising and can be used as a proof-of-concept for future applications of predictive analytics.

AB - This master’s thesis addresses methods and best practises for data integration and storage as well as measures to assess and assure data quality. Furthermore, algorithms for descriptive analytics are described and applied. While the first part is dedicated to the theoretical elaboration of these techniques, the second part looks at the practical application of the mentioned methods. In collaboration with an enterprise from the automotive industry, a critical review of the implemented IT solution is conducted before presenting an IT architecture, which fits the purposes of predictive analytics in the manufacturing industry. Additionally, the data maturity model, as well as descriptive algorithms, are practically applied, and the results are discussed. Building on the methodical approach of the Data Mining Method for Engineers (DMME), which is based on the CRISP-DM, the first theoretical emphasis of this thesis is laid on data integration. By use of the PERFoRM guideline for system architecture, a comprehensive overview of the requirements for system architecture and middleware in modern manufacturing processes are given. This chapter concludes by giving a comprehensive overview of some of the most used middleware solutions. Subsequently, the mechanics of data storage are described. This includes relational (SQL) as well as non-relational (NoSQL) database structures and popular representatives of the respective groups. Besides data integration and storage, data quality assessment and assurance are another focus of this master’s thesis. Therefore, the different data quality dimensions are highlighted with special respect to the aspect of sensor data quality. In addition, methods for automated assessment and improvement of data quality are presented. With the introduction of the data maturity model, a method that facilitates the determination of the ability to answer business problems through data analytics is characterized. To enable analytics of the available data, the theoretical part concludes with a description of methods applicable for the data at hand. The techniques described in the theoretical part are aligned with the practical part of the master’s thesis, thus focussing on exploratory data analytics (e.g., principal component analysis) and descriptive algorithms such as clustering methods. The practical part of the master’s thesis includes a critical review of the current solution for data integration and storage in the manufacturing company. In addition to that, a scalable and performant IT infrastructure, which allows the implementation of predictive analytics at a large scale, is presented. Based on the data available, the data maturity model is applied, giving insights on the ability of the business to implement data analytics. Although the result of the data maturity model yields in most aspects a high degree of maturity, the dimension of data quantity has repercussions on the number of algorithms applicable to the data. Hence, the focus of the applied algorithms is of explorative and descriptive nature. Nevertheless, the outcome of the algorithms is promising and can be used as a proof-of-concept for future applications of predictive analytics.

KW - Data Integration

KW - Data Storage

KW - Data Analytics

KW - Data quality

KW - Data conditioning

KW - Principal Component Analysis

KW - Datenanalyse

KW - Datenintegration

KW - Datenspeicherung

KW - Digitalisierung

M3 - Master's Thesis

ER -