Use and limitations of various metrics to assess the quality of extreme sparse datasets in geotechnics

Research output: ThesisMaster's Thesis

Bibtex - Download

@mastersthesis{7d66fbbb3dfd44d79d1faacac005c94b,
title = "Use and limitations of various metrics to assess the quality of extreme sparse datasets in geotechnics",
abstract = "In data science and statistics, metrics are the measures of a quantitative assessment of dataset(s). In machine learning (ML), metrics are used to monitor the performance of a model during training and testing (therefore sometimes called ¿performance metrics¿) by calculating a distance between predicted and true outputs. All ML models need a metric to access the model¿s accuracy in mapping the inputs X to the outputs y. The ML task can be classification or regression, so the performance metrics. Classification is a supervised learning method that predicts qualitative responses. A classification problem requires that examples are classified into a finite number of classes. Thus, classification is mapping the input variables to discrete output variables. Regression is a supervised learning method used to determine the relationship between independent variables X and dependent variable(s) y. The regression model is mapping input variables to a continuous output variable(s). There are several metrics for both problems. To mention a few: regression metrics: mean absolute error, mean squared error, root mean squared error, R2; classification metrics: accuracy, precision/recall combinations, AUROC (area under receiver operating characteristics curve). To understand how close the results are to the objectives in Research and Development projects, choosing an appropriate evaluation metric for each class of ML is crucial. In geoengineering, the datasets often exhibit extreme sparsity and observations frequency (e.g., rare events). Therefore, the application of both ML tasks on such data requires special preprocessing (e.g., under-sampling, over-sampling, compressing). After ML models are trained on preprocessed data, their output shall be evaluated using a metric that provides the most comprehensive evaluation of the results.",
keywords = "metric, machine learning, geotechnik, metrics, machine learning, geotechnic",
author = "Matthias Hahn",
note = "no embargo",
year = "2023",
doi = "10.34901/2023.04",
language = "English",
school = "Montanuniversitaet Leoben (000)",

}

RIS (suitable for import to EndNote) - Download

TY - THES

T1 - Use and limitations of various metrics to assess the quality of extreme sparse datasets in geotechnics

AU - Hahn, Matthias

N1 - no embargo

PY - 2023

Y1 - 2023

N2 - In data science and statistics, metrics are the measures of a quantitative assessment of dataset(s). In machine learning (ML), metrics are used to monitor the performance of a model during training and testing (therefore sometimes called ¿performance metrics¿) by calculating a distance between predicted and true outputs. All ML models need a metric to access the model¿s accuracy in mapping the inputs X to the outputs y. The ML task can be classification or regression, so the performance metrics. Classification is a supervised learning method that predicts qualitative responses. A classification problem requires that examples are classified into a finite number of classes. Thus, classification is mapping the input variables to discrete output variables. Regression is a supervised learning method used to determine the relationship between independent variables X and dependent variable(s) y. The regression model is mapping input variables to a continuous output variable(s). There are several metrics for both problems. To mention a few: regression metrics: mean absolute error, mean squared error, root mean squared error, R2; classification metrics: accuracy, precision/recall combinations, AUROC (area under receiver operating characteristics curve). To understand how close the results are to the objectives in Research and Development projects, choosing an appropriate evaluation metric for each class of ML is crucial. In geoengineering, the datasets often exhibit extreme sparsity and observations frequency (e.g., rare events). Therefore, the application of both ML tasks on such data requires special preprocessing (e.g., under-sampling, over-sampling, compressing). After ML models are trained on preprocessed data, their output shall be evaluated using a metric that provides the most comprehensive evaluation of the results.

AB - In data science and statistics, metrics are the measures of a quantitative assessment of dataset(s). In machine learning (ML), metrics are used to monitor the performance of a model during training and testing (therefore sometimes called ¿performance metrics¿) by calculating a distance between predicted and true outputs. All ML models need a metric to access the model¿s accuracy in mapping the inputs X to the outputs y. The ML task can be classification or regression, so the performance metrics. Classification is a supervised learning method that predicts qualitative responses. A classification problem requires that examples are classified into a finite number of classes. Thus, classification is mapping the input variables to discrete output variables. Regression is a supervised learning method used to determine the relationship between independent variables X and dependent variable(s) y. The regression model is mapping input variables to a continuous output variable(s). There are several metrics for both problems. To mention a few: regression metrics: mean absolute error, mean squared error, root mean squared error, R2; classification metrics: accuracy, precision/recall combinations, AUROC (area under receiver operating characteristics curve). To understand how close the results are to the objectives in Research and Development projects, choosing an appropriate evaluation metric for each class of ML is crucial. In geoengineering, the datasets often exhibit extreme sparsity and observations frequency (e.g., rare events). Therefore, the application of both ML tasks on such data requires special preprocessing (e.g., under-sampling, over-sampling, compressing). After ML models are trained on preprocessed data, their output shall be evaluated using a metric that provides the most comprehensive evaluation of the results.

KW - metric

KW - machine learning

KW - geotechnik

KW - metrics

KW - machine learning

KW - geotechnic

U2 - 10.34901/2023.04

DO - 10.34901/2023.04

M3 - Master's Thesis

ER -