Abstract
This thesis focusses on data valuation of a medical classification data set for a machine learning appli- cation. Recent research has come up with data valuation methods, which use a notion from the cooper- ative game theory, the Shapley value, to evaluate the worth of data points within a data set. A completely new approach, the Distributional Framework, has been developed in 2020. The method is based on the Shapley value as well and provides new properties. It is the goal of this thesis to evaluate its empirical effectiveness in performing data valuation. Furthermore, as this new approach seems to be able to influence the way data can be traded, it may stimulate future research concerning one of the most important challenges for machine learning and Artificial Intelligence: to find a way to get sufficient data for the training of machine learning models. After implementing the Distributional Framework on the “AIforCOVID” data set for a binary classifi- cation task, several experiments are performed. In conclusion, the Distributional Framework seems to be able to valuate data successfully regarding value assignment. Still, the Distributional Framework faces some challenges concerning runtime and applicability, which can be seen as potential goals for research to solve in the future.
Publication Data
Endorsements
# | Name | Details | Endorsement |
---|---|---|---|
1 |
Konstantin Pandl
Supervisor |
02/22/22 12:00:00 AM |