Distributional Framework for the Shapley Value in Context of Data Valuation Methods for Machine Learning Applications

Alexander Münker

Abstract

This thesis focusses on data valuation of a medical classification data set for a machine learning appli- cation. Recent research has come up with data valuation methods, which use a notion from the cooper- ative game theory, the Shapley value, to evaluate the worth of data points within a data set. A completely new approach, the Distributional Framework, has been developed in 2020. The method is based on the Shapley value as well and provides new properties. It is the goal of this thesis to evaluate its empirical effectiveness in performing data valuation. Furthermore, as this new approach seems to be able to influence the way data can be traded, it may stimulate future research concerning one of the most important challenges for machine learning and Artificial Intelligence: to find a way to get sufficient data for the training of machine learning models. After implementing the Distributional Framework on the “AIforCOVID” data set for a binary classifi- cation task, several experiments are performed. In conclusion, the Distributional Framework seems to be able to valuate data successfully regarding value assignment. Still, the Distributional Framework faces some challenges concerning runtime and applicability, which can be seen as potential goals for research to solve in the future.

Topics
Machine Learning Data Valuation
Research Methods

Publication Data

Author: Alexander Münker
Thesis Type: Bachelor's Thesis
Pages: 68
Language: English
DOI:
About the Author:
Major / Study Program: Industrial Engineering and Management
Primary Field of Study:
Additional Study Interests:
License: CC BY 4.0
Date of Publication: 02/23/22
Status: Available
Date of Grading: 12/22/21
Institution: Karlsruhe Institute of Technology (Karlsruhe Institute of Technology, Germany)

Endorsements

# Name Details Endorsement
1
Konstantin Pandl
Supervisor
02/22/22
12:00:00 AM

Thesis Documents and Supplemental Materials

11/30/22 09:10:38 AM
# Description Type Upload Date Location
1 Thesis Document PDF (7.89MB) 02/22/22 12:00:00 AMIPFS Download Raw