How does adaptive batching works #3733

yozice · 2023-03-30T14:52:18Z

yozice
Mar 30, 2023

I am getting into BentoML last days and can't fully understand how adaptive batching works in bentoml.

I want to create custom runner with batching enabled, but I don't see any benefits from batchable=True. For example https://github.com/bentoml/gallery/tree/main/custom_runner/torch_hub_yolov5 doesn't work as I expected. Was trying to have RPM/response time boost using locust framework.

My first assumption was that i have to implement interface which would be list-like (in order to pass list of np.ndarray). In that function I am logging length of incoming list and no matter how many requests income length does not change.

After that I wanted to discover how batching and in bentoml._internal.runner.runner_handle.local.py I discovered class LocalRunnerRef with implementation:

class LocalRunnerRef(RunnerHandle):
    def __init__(self, runner: Runner) -> None:  # pylint: disable=super-init-not-called
        self._runnable = runner.runnable_class(**runner.runnable_init_params)  # type: ignore
        self._limiter = None

    async def is_ready(self, timeout: int) -> bool:
        return True

    def run_method(
        self,
        __bentoml_method: RunnerMethod[t.Any, P, R],
        *args: P.args,
        **kwargs: P.kwargs,
    ) -> R:
        if __bentoml_method.config.batchable:
            inp_batch_dim = __bentoml_method.config.batch_dim[0]

            payload_params = Params[Payload](*args, **kwargs).map(
                lambda arg: AutoContainer.to_payload(arg, batch_dim=inp_batch_dim)
            )

            if not payload_params.map(lambda i: i.batch_size).all_equal():
                raise ValueError(
                    "All batchable arguments must have the same batch size."
                )

        return getattr(self._runnable, __bentoml_method.name)(*args, **kwargs)

I don't understand why payload_params are not passing to method invocation if I am right in my assumption that getattr(self._runnable, __bentoml_method.name)(*args, **kwargs) invokes custom Runnable's predict method.

I also tried to run https://github.com/bentoml/gallery/tree/main/pytorch_yolov5_torchhub with different configs in bento_configuration.yaml file with

runners:
  yolo_v5_demo:
    batching:
      enabled: true
      max_batch_size: 100
      max_latency_ms: 500

Just switching enabled: to true or false and didn't get any differences at all.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BentoML

How does adaptive batching works #3733

{{title}}

Replies: 0 comments

Select a reply

BentoML

How does adaptive batching works #3733

yozice Mar 30, 2023

Replies: 0 comments

yozice
Mar 30, 2023