Distributional Framework for the Shapley Value in Context of Data Valuation Methods for Machine Learning Applications

Alexander Münker

PDF Supplemental Materials

Verified Thesis

Abstract

This thesis focusses on data valuation of a medical classification data set for a machine learning appli- cation. Recent research has come up with data valuation methods, which use a notion from the cooper- ative game theory, the Shapley value, to evaluate the worth of data points within a data set. A completely new approach, the Distributional Framework, has been developed in 2020. The method is based on the Shapley value as well and provides new properties. It is the goal of this thesis to evaluate its empirical effectiveness in performing data valuation. Furthermore, as this new approach seems to be able to influence the way data can be traded, it may stimulate future research concerning one of the most important challenges for machine learning and Artificial Intelligence: to find a way to get sufficient data for the training of machine learning models. After implementing the Distributional Framework on the “AIforCOVID” data set for a binary classifi- cation task, several experiments are performed. In conclusion, the Distributional Framework seems to be able to valuate data successfully regarding value assignment. Still, the Distributional Framework faces some challenges concerning runtime and applicability, which can be seen as potential goals for research to solve in the future.

Topics

Machine Learning Data Valuation

Research Methods

Publication Data

Author: Alexander Münker

Signing Author Pub-Key: 0x5d2aDFfC3DE581801E0B5753c962E1Afa5b7B438

Thesis Type: Bachelor's Thesis

Pages: 68

Language: English

DOI:

About the Author:

Major / Study Program: Industrial Engineering and Management

Primary Field of Study:

Additional Study Interests:

Publication Contract: 0xcAee71c031999e081E4681cD308426290e5b01c0

License: CC BY 4.0

Date of Publication: 02/23/22

Status: Available

Date of Grading: 12/22/21

Institution: Karlsruhe Institute of Technology (Karlsruhe Institute of Technology, Germany)

Endorsements

#	Name	Details	Endorsement
1	Konstantin Pandl Supervisor	Karlsruhe Institute of Technology Email: konstantin.pandl@kit.edu Web: https://www.aifb.kit.edu/web/Konstantin_Pandl Pub-Key: 0xa43190EF71561A9229D8ddC5449d68FFE518b1D8	02/22/22 12:00:00 AM

Thesis Documents and Supplemental Materials

05/09/25 12:46:41 PM

#	Description	Type	Upload Date	Location
1	Thesis Document	PDF (7.89MB)	02/22/22 12:00:00 AM	IPFS	Download Raw