Integrating third party inference libs. Again #166

iboB · 2024-10-28T06:08:10Z

iboB
Oct 28, 2024
Maintainer

Background

This is somewhat related to #5 in that it will affect how we proceed with it. Its more closely related to #151 in that some decisions made here may affect decisions there. In any case decisions made here should supersede any in #5 and #151.

So. The kinda sorta plan for now, and what we actually do, is have (our forks of) third party inference libs as submodules here and add them as subdirectories in the CMake scripts. We also deduplicate any common dependencies we encounter. We use a single instance of ggml for both llama.cpp and whisper.cpp. We rewrote their "common" libs to avoid their third party deps. Deduplicating the deps is not only motivated by frugality, but more importantly to avoid potential ODR violations and CMake target clashes.

ODR violations very important to avoid, but if we wrap the inference libs in shared lib plugins where all their deps are private, they should be a non-issue.

CMake target clashes are a problem though. CMake requires target names to be globally unique. If we hadn't deduplicated ggml, both llama.cpp and whisper.cpp would have added this target and there would have been a clash.

Issue

Supporting two inference libs is workable for now, but as we add more, juggling the target clashes will become harder. Even now there are occasional issues where ggml in llama.cpp has changes incompatible with whisper.cpp. Having 3,4,5...10 inference libs would make supporting the deduplication of ggml alone a nightmare, not to mention if there are more deps to deduplicate. It would also impose a lot of effort on the part of inference libs contributors as they will have to support our build system and deduplicate whats needed on their end.

And that's not all. We do want to explore inference libs like https://github.com/ikawrakow/ik_llama.cpp (optimized for ARM), or https://github.com/Mozilla-Ocho/llamafile (with x86 and ARM optimizations) and maybe others which have their own forks of ggml, all incompatible with upstream. Deduplicating ggml for them will be super-hard to impossible (super hard if we try to do what they didn't bother: integrate their ggml changes in our fork)

Solutions

There are several possible solutions. Well, they are all on the spectrum from "do everything" to "do nothing", so I'll just present some samples from that spectrum to give an idea of what can be done:

Don't use third party inference libs

We could just implement all inference code ourselves. We could start with just entirely merging the submodules in this repo and completely taking over the code. This is mentioned in #5 and even finer grain nuances of this exist (more below). Pushing this further, we could ditch even ggml and work on our own inference library...

Anyway. This is towards the "do everything" end. We certainly don't have the budget and, more importantly, time to pursue this, cool as it is.

Continue with the current strategy

This is "do nothing" only in terms of decision making. In terms of future work it is more along the lines of "do A LOT".

We could continue deduplicating. Merging third party forks. Even do PRs with the merges. This is more open source community friendly. Our efforts will empower projects similar to ours and we could potentially lead the way to standardizing inference library interop within the C++ ecosystem.

This is still a huge effort. Just juggling this seems like a full time job for at least one person if not more.

Don't use third party CMake

We could ignore the CMake files of the third party inference libs and provide our own. Our CMake scripts will name the targets appropriately and wrap the libs into mostly opaque plugins.

This too is mentioned in #5. It's unity build friendly and allows for some leeway when trying to use private library functionality. It's still a lot of CMake work and upstream changes will have to be monitored and manually merged on our end leading to some amount of continuous support for every inference lib.

Don't use third party code

Now, this more or less "do nothing* in the context of this particular issue, but it would still involve a significant amount of work. This work however would be one-shot and after it's done, there would be a procedure for adding third party libs. A procedure which doesn't involve continuous manual support for upstream updates.

This is essentially completely detaching third party inference libs from this codebase. The only one to remain would be dummy which would be a reference implementation of inference lib integration.

To add an inference lib one would create a different repo whose build a mostly opaque self-contained plugin which implements the AC Local Programming Language API and introduces its own Inference API. The code can never be part of ac-local as it should be free to use CMake targets with no regard of other plugins or ac-local itself.

Thus we physically circumvent both ODR violations and CMake target clashes. Well... at least we make it possible to do so.

Why is this a lot of work, then? It's very dangerous and much work needs to be put into making it less so.

We have a C++ interface. Having a separate repo which binds to an existing C++ interface is dangerous, because of potential ODR violations (again). The main effort in supporting this design would be making sure that the plugin is compatible with the core. I think that diligent manual versioning is not enough and some automation needs to exist to make sure we're doing the right thing

Conclusion

This is what I could think of. And, really, only the last two options are actually feasible, unless our budget explodes for some reason.

So, disregarding the first two, I don't like any of the rest. If I have to choose, I would go with the last option as it imposes the least amount of effort for external inference lib contributors, but I still don't like it.

I hope a better idea comes along until we have to make a decision.

iboB · 2024-11-04T10:27:08Z

iboB
Nov 4, 2024
Maintainer Author

We're pursuing the last approach in #177
The RnD, implications, and finer-grain tracking is in that issue.
This discussion will remain open for a while, but as #177 gets near completion, it will get closed

0 replies

iboB · 2024-11-20T08:55:16Z

iboB
Nov 20, 2024
Maintainer Author

#177 is near completion and there are much more clarifications and analysis there. Closing

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrating third party inference libs. Again #166

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Integrating third party inference libs. Again #166

iboB Oct 28, 2024 Maintainer

Background

Issue

Solutions

Don't use third party inference libs

Continue with the current strategy

Don't use third party CMake

Don't use third party code

Conclusion

Replies: 2 comments

iboB Nov 4, 2024 Maintainer Author

iboB Nov 20, 2024 Maintainer Author

iboB
Oct 28, 2024
Maintainer

iboB
Nov 4, 2024
Maintainer Author

iboB
Nov 20, 2024
Maintainer Author