Current developments, including high availability of data and ever rising computing power, constantly enable new approaches in the field of artificial intelligence. By using algorithms from machine learning, an instance can iteratively learn from data and perform cognitive tasks. As a large amount of machine learning algorithms already exists today and is still increasing rapidly, researchers, data scientists, and machine learning engineers must choose which algorithm to apply and optimize to solve their individual problem. In most cases, the selection of a specific algorithm seems to be highly prediction performance motivated as well as dependent on situational tendencies of the user. For use in non-productive environments, these intentions would initially be sufficient. However, for operational applications these tendencies do not provide satisfactory solutions, as they represent a one-sided perspective and an unstructured approach. Thus, when developing productive applications, no methodological comparison is made between machine learning algorithms, that considers factors of the data basis, the operational view and the explainability of a machine learning model, in addition to the commonly used metrics for evaluating predictive per-formance. To close this gap we develop two artifacts, first a structured benchmarking procedure for the comparison of supervised machine learning algorithms, a particular paradigm of machine learning we focus on, second a list of criteria, implemented in the benchmarking model, for iden-tifying the most appropriate supervised machine learning algorithm from a holistic perspective following a design science research approach. To provide robust, practical, and user-friendly artifacts, we validate our results in a four-step approach, by, for example, conducting a discussion with research experts, which is prospectively completed by a real-world application of the model. Our results contribute to a structured, generic procedure that supports the benchmarking of supervised machine learning algorithms and provides users with benchmarking-relevant dimensions to identify the most appropriate supervised machine learning algorithm for their individual use-case.