Use and limitations of various metrics to assess the quality of extreme sparse datasets in geotechnics
Publikationen: Thesis / Studienabschlussarbeiten und Habilitationsschriften › Masterarbeit
Standard
2023.
Publikationen: Thesis / Studienabschlussarbeiten und Habilitationsschriften › Masterarbeit
Harvard
APA
Vancouver
Author
Bibtex - Download
}
RIS (suitable for import to EndNote) - Download
TY - THES
T1 - Use and limitations of various metrics to assess the quality of extreme sparse datasets in geotechnics
AU - Hahn, Matthias
N1 - no embargo
PY - 2023
Y1 - 2023
N2 - In data science and statistics, metrics are the measures of a quantitative assessment of dataset(s). In machine learning (ML), metrics are used to monitor the performance of a model during training and testing (therefore sometimes called ¿performance metrics¿) by calculating a distance between predicted and true outputs. All ML models need a metric to access the model¿s accuracy in mapping the inputs X to the outputs y. The ML task can be classification or regression, so the performance metrics. Classification is a supervised learning method that predicts qualitative responses. A classification problem requires that examples are classified into a finite number of classes. Thus, classification is mapping the input variables to discrete output variables. Regression is a supervised learning method used to determine the relationship between independent variables X and dependent variable(s) y. The regression model is mapping input variables to a continuous output variable(s). There are several metrics for both problems. To mention a few: regression metrics: mean absolute error, mean squared error, root mean squared error, R2; classification metrics: accuracy, precision/recall combinations, AUROC (area under receiver operating characteristics curve). To understand how close the results are to the objectives in Research and Development projects, choosing an appropriate evaluation metric for each class of ML is crucial. In geoengineering, the datasets often exhibit extreme sparsity and observations frequency (e.g., rare events). Therefore, the application of both ML tasks on such data requires special preprocessing (e.g., under-sampling, over-sampling, compressing). After ML models are trained on preprocessed data, their output shall be evaluated using a metric that provides the most comprehensive evaluation of the results.
AB - In data science and statistics, metrics are the measures of a quantitative assessment of dataset(s). In machine learning (ML), metrics are used to monitor the performance of a model during training and testing (therefore sometimes called ¿performance metrics¿) by calculating a distance between predicted and true outputs. All ML models need a metric to access the model¿s accuracy in mapping the inputs X to the outputs y. The ML task can be classification or regression, so the performance metrics. Classification is a supervised learning method that predicts qualitative responses. A classification problem requires that examples are classified into a finite number of classes. Thus, classification is mapping the input variables to discrete output variables. Regression is a supervised learning method used to determine the relationship between independent variables X and dependent variable(s) y. The regression model is mapping input variables to a continuous output variable(s). There are several metrics for both problems. To mention a few: regression metrics: mean absolute error, mean squared error, root mean squared error, R2; classification metrics: accuracy, precision/recall combinations, AUROC (area under receiver operating characteristics curve). To understand how close the results are to the objectives in Research and Development projects, choosing an appropriate evaluation metric for each class of ML is crucial. In geoengineering, the datasets often exhibit extreme sparsity and observations frequency (e.g., rare events). Therefore, the application of both ML tasks on such data requires special preprocessing (e.g., under-sampling, over-sampling, compressing). After ML models are trained on preprocessed data, their output shall be evaluated using a metric that provides the most comprehensive evaluation of the results.
KW - metric
KW - machine learning
KW - geotechnik
KW - metrics
KW - machine learning
KW - geotechnic
U2 - 10.34901/2023.04
DO - 10.34901/2023.04
M3 - Master's Thesis
ER -