Due to LLM popularity at the moment, there are a lot of different tools for what seem to be very similar tasks — generating responses from the LLM and fine-tuning LLMs for your own needs. To understand this market better myself, I decided to write this post, summarizing knowledge that I already had and recently learned on the topic. All the information presented in the post is accurate as of April 2025 and may change, as the ecosystem around LLMs changes quickly.

LLAMA.CPP

First, I want to mention llama.cpp, which is both used as a backend for some other popular tools and could be used on its own for user-friendly and high-throughput inference.

LLM inference llama.cpp importance.

Disclaimer: this is not true for ALL LLM inference tools, as some of them use custom backends or other backends such as TensorRT-LLM, but it is hard to highlight enough the importance of llama.cpp for the LLM inference ecosystem.

Tool License Open Source Platforms Supported Models
llama.cpp MIT ✅ Yes Windows, Docker, macOS, Linux GGUF models

Links:
https://github.com/ggml-org/llama.cpp

User-Friendly LLM Interfaces

These tools focus on user experience. Most come with GUIs, are relatively simple to install, and allow you to quickly start chatting with an LLM. Some also support fine-tuning and provide API access to local models. Many of these tools use llama.cpp or similar backends and are compatible with GGUF quantized models. Most are open source and support local inference, though multi-GPU support is usually limited or dependent on the backend used.

Tool License Open Source Platforms Supported Models Backend Used
LM Studio Commercial ❌ No Windows, macOS, Linux Various (GGUF and MLX), no image generation Unconfirmed
Ollama MIT ✅ Yes Windows, macOS, Linux https://ollama.com/library llama.cpp
GPT4All MIT ✅ Yes Windows, macOS, Linux 3–13B models llama.cpp
AnythingLLM MIT ✅ Yes Windows, macOS, Linux Various via integrations Unconfirmed
Jan AGPL-3.0 ✅ Yes Windows, macOS, Linux https://cortex.so/models/ cortex.cpp (supports configurable backends; default is llama.cpp)

Links:
https://lmstudio.ai/work
https://ollama.com
https://docs.gpt4all.io/gpt4all_desktop/models.html
https://github.com/Mintplex-Labs/anything-llm
https://github.com/menloresearch/jan
https://cortex.so/models/

Tools for High-Performance Inference

These tools are mostly used in production for high-performance inference and support quantization, batching, and other optimization techniques. Most of them provide Docker images as well.

Tool License Open Source Platforms Supported Models Backend Used
TGI Apache 2.0 ✅ Yes Docker, Linux https://huggingface.co/docs/text-generation-inference/supported_models Supports several backends
vLLM Apache 2.0 ✅ Yes Docker, Linux https://docs.vllm.ai/en/latest/models/supported_models.html Custom with PagedAttention
ExLlamaV2 MIT ✅ Yes Linux Hugging Face models with exl2 postfix Custom backend
TensorRT-LLM Apache 2.0 ✅ Yes Linux https://nvidia.github.io/TensorRT-LLM/reference/support-matrix.html PyTorch and TensorRT backends

Links:
https://github.com/turboderp-org/exllamav2
https://blog.vllm.ai/2023/06/20/vllm.html
https://github.com/huggingface/text-generation-inference
https://huggingface.co/docs/text-generation-inference/multi_backend_support
https://nvidia.github.io/TensorRT-LLM/torch.html

Tools for High-Performance Fine-Tuning

These tools are used for fine-tuning, where you train models for your specific needs. They support quantization, multi-GPU setups, and different model families. They use different approaches for fine-tuning, such as a code-first approach with Unsloth or fine-tuning through configuration and command line with Torchtune.

Tool License Open Source Platforms Supported Models
Unsloth Apache 2.0 ✅ Yes Linux, Windows https://docs.unsloth.ai/get-started/all-our-models
Torchtune BSD 3 license ✅ Yes Linux https://github.com/pytorch/torchtune
Axolotl Apache 2.0 ✅ Yes Linux https://github.com/axolotl-ai-cloud/axolotl

Links:
https://github.com/pytorch/torchtune
https://unsloth.ai/introducing
https://github.com/axolotl-ai-cloud/axolotl