👉MMMLM 🧠💥 Multi-Model for Machine Learning Metadata
by
Matthew von-Maszewski, Sr. Architect, ArangoDB
• Funny pre-event tech-trivia: • Trivia: MMMLM - Multi-Model for Machi...
With the rapid and recent rise of data science, the machine learning platforms being built are becoming more complex. You might be aware of the importance of high quality training data for machine learning, but for a production setup the metadata is equally important, including: different versions/datasets, varying versions of jupyter notebooks, different training parameters, test/training accuracy, different features, model serving statistics and many others.
It is critical that we have a common view across all the metadata in a production setup as we need to determine which jupyter notebook has been used to build the model that is currently running in production, if there is new data for a given dataset, and if the models that are currently serving in production need to be updated.
One particular challenge with metadata is choosing the data model. Individual data entities are typically unstructured and would be a perfect fit for a document database, such as the statistics of a training run. As opposed to a scenario where you need to capture the relations between different entities and need to determine which models have been derived from a dataset.
We will explore in this presentation how a multi-model approach can help us to manage ML pipelines at scale, in particular machine learning meta data (MLMD) as part of the TensorFlow ecosystem. We will also propose a first draft of an MLMD-compatible universal Metadata API and demo the first implementation of this API using ArangoDB.
⁃⁃⁃ Matthew von-Maszewski, Sr. Architect, ArangoDB ⁃⁃⁃
Matthew is a Senior Architect @ ArangoDB and a C/C++ developer. His prior experience includes Basho Technologies (Riak), Akamai, and Intuit. Work has ranged from 4-bit microcontrollers to 30+ server DB clusters. When not coding, Matthew is often found running a marathon or participating in a triathlon.
#MachineLearning #NoSQL #Metadata #ArangoDB #jupyter #dataModel #TensorFlow