IBM's ModelMesh goes open-source, enabling developers to deploy AI models at scale

A face with lots of inbound neural network-like connections

Model serving is a critical component of AI use-cases. It involves offering an inference from an AI model in response to a user request. Those who have dabbled in enterprise-grade machine learning applications know that it is usually not one model providing an inference, but actually hundreds or even thousands of models running in tandem. This is a very expensive process computationally as you can't spin up a dedicated container each time you want to serve a request. This is a challenge for developers deploying a large number of models across Kubernetes clusters because there are limitations such as the maximum number of pods and IP addresses allowed as well as compute resource allocation.

IBM solved this challenge with its proprietary ModelMesh model-serving management layer for Watson products such as Watson Assistant, Watson Natural Language Understanding, and Watson Discovery. Since these models have been running in production environments for several years, ModelMesh has been thoroughly tested for various scenarios. Now, IBM is contributing this management layer to open-source complete with controller components as well as model-serving runtimes.

ModelMesh enables developers to deploy AI models on top of Kubernetes at "extreme scale". It features cache management and also acts as a router that balances inferencing requests. Models are intelligently placed in pods and are resilient to temporary outages. ModelMesh deployments can be upgraded with ease without any external orchestration mechanism. It automatically ensures that a model has been fully updated and loaded before routing new requests to it.

Explaining the scalability of ModelMesh with some statistics, IBM went on to say that:

One ModelMesh instance deployed on a single worker node 8vCPU x 64G cluster was able to pack 20K simple-string models. On top of the density test, we also load test the ModelMesh serving by sending thousands of concurrent inference requests to simulate a high traffic holiday season scenario that all loaded models respond with single digit millisecond latency. Our experiment showed that the single worker node supports 20k models for up to 1000 queries per second and responds to inference quests with single digit millisecond latency.

IBM has contributed ModelMesh to the KServe GitHub organization that was developed jointly by itself, Google, Bloomberg, NVIDIA, and Seldon back in 2019. You can check out the ModelMesh implementation contributions in the various GitHub repositories mentioned below:

Model serving controller
ModelMesh containers used for orchestrating model placement and routing Runtime Adapters
modelmesh-runtime-adapter - the containers which run in each model serving pod and act as an intermediary between ModelMesh and third-party model-server containers. It also incorporates the "puller" logic which is responsible for retrieving the models from storage
triton-inference-server - Nvidia's Triton Inference Server
seldon-mlserver - Python MLServer which is part of KFServing

Tags

Subscribe to our Newsletter

Trending Stories

Windows 11 22635.4000 adds a new taskbar feature and more

Windows 11 26257 adds a way to duplicate a tab in File Explorer

Meta: candidates are subject to the same rules as regular users. It's a blatant lie

Edifier STAX Spirit S5. Probably the best closed-back Planar Magnetic headphones

Blazing PCIe 5.0 speeds with T-Force Z540 2 TB NVMe and DARK AirFlow I

Windows 11 26120.1330 adds a new Power setting and more

TerraMaster F4-424 Pro: powerful media class 4-bay NAS, the best on the market

So cheap, so good - EasySMX X05 games controller offers multi-platform fun

Launches from China and New Zealand coming up, Ariane 6 maiden flight

Windows Server 2025 version 26244 does away with a known issue

Oukitel C50: a cheap and cheerful 5G phone with a 5,150mAh battery

GEEKOM GT13 Pro: 13th gen i9 power inside a tiny aluminum frame

Self-hosting: What is it and why you might (or might not!) be interested

How to set up and use Eye Tracking on your iPhone running iOS 18

Login