About


Mantik: A workflow platform for Machine Learning on High-Performance Computers

The use of machine learning (ML) approaches is exponentially increasing, and for many scientific applications, high-performance computing (HPC) infrastructure is used to train large models. However, the tooling for an easy deployment of models for training or inference on HPC infrastructures is not satisfactory, e.g. reproducibility, collaboration and monitoring of ML models are not addressed in existing toolsets. With Mantik, we provide an open-source cloud platform, which simplifies the development of and collaboration on ML models on HPC facilities, and enhances reproducibility by supporting data and code versioning as well as experiment tracking.

The users are able to develop their applications in the environment they are most comfortable with – their local machine. Usage of the best-choice IDE and most recent software versions allow to leverage the full potential of the software stack for their research. Using Mantik’s remote file service allows for simple management of data in remote storages and keeping track of it. As soon as an application is ready for training or inference, users can immediately submit it to an HPC cluster. During application development, users can train and/or evaluate their models on HPC clusters via CLI on their local machine or our browser-based Mantik cloud platform. The latter only requires an internet browser such that e.g., ML training from your phone becomes feasible. Once training or inference has begun, a user is able to monitor the application in real time on the Mantik cloud platform.

References

T. Seidler, F. Emmerich, K. Ehlert, R. Berner, O. Nagel-Kanzler, N. Schultz, M. Quade, M. G. Schultz, M. Abel‚ “Mantik: A Workflow Platform for the Development of Artificial Intelligence on High-Performance Computing Infrastructures”, submitted to The Journal of Open Source Software (JOSS)