Tools and libraries for LLM Fine-Tuning and Inference

Due to LLM popularity at the moment, there are a lot of different tools for what seem to be very similar tasks — generating responses from the LLM and fine-tuning LLMs for your own needs. To understand this market better myself, I decided to write this post, summarizing knowledge that I already had and recently learned on the topic. All the information presented in the post is accurate as of April 2025 and may change, as the ecosystem around LLMs changes quickly.

LLAMA.CPP

First, I want to mention llama.cpp, which is both used as a backend for some other popular tools and could be used on its own for user-friendly and high-throughput inference.

LLM inference llama.cpp importance.

Disclaimer: this is not true for ALL LLM inference tools, as some of them use custom backends or other backends such as TensorRT-LLM, but it is hard to highlight enough the importance of llama.cpp for the LLM inference ecosystem.

Tool	License	Open Source	Platforms	Supported Models
llama.cpp	MIT	✅ Yes	Windows, Docker, macOS, Linux	GGUF models

Links:
https://github.com/ggml-org/llama.cpp

User-Friendly LLM Interfaces

These tools focus on user experience. Most come with GUIs, are relatively simple to install, and allow you to quickly start chatting with an LLM. Some also support fine-tuning and provide API access to local models. Many of these tools use llama.cpp or similar backends and are compatible with GGUF quantized models. Most are open source and support local inference, though multi-GPU support is usually limited or dependent on the backend used.

Tool	License	Open Source	Platforms	Supported Models	Backend Used
LM Studio	Commercial	❌ No	Windows, macOS, Linux	Various (GGUF and MLX), no image generation	Unconfirmed
Ollama	MIT	✅ Yes	Windows, macOS, Linux	https://ollama.com/library	llama.cpp
GPT4All	MIT	✅ Yes	Windows, macOS, Linux	3–13B models	llama.cpp
AnythingLLM	MIT	✅ Yes	Windows, macOS, Linux	Various via integrations	Unconfirmed
Jan	AGPL-3.0	✅ Yes	Windows, macOS, Linux	https://cortex.so/models/	cortex.cpp (supports configurable backends; default is llama.cpp)

Links:
https://lmstudio.ai/work
https://ollama.com
https://docs.gpt4all.io/gpt4all_desktop/models.html
https://github.com/Mintplex-Labs/anything-llm
https://github.com/menloresearch/jan
https://cortex.so/models/

Tools for High-Performance Inference

These tools are mostly used in production for high-performance inference and support quantization, batching, and other optimization techniques. Most of them provide Docker images as well.

Tool	License	Open Source	Platforms	Supported Models	Backend Used
TGI	Apache 2.0	✅ Yes	Docker, Linux	https://huggingface.co/docs/text-generation-inference/supported_models	Supports several backends
vLLM	Apache 2.0	✅ Yes	Docker, Linux	https://docs.vllm.ai/en/latest/models/supported_models.html	Custom with PagedAttention
ExLlamaV2	MIT	✅ Yes	Linux	Hugging Face models with exl2 postfix	Custom backend
TensorRT-LLM	Apache 2.0	✅ Yes	Linux	https://nvidia.github.io/TensorRT-LLM/reference/support-matrix.html	PyTorch and TensorRT backends

Links:
https://github.com/turboderp-org/exllamav2
https://blog.vllm.ai/2023/06/20/vllm.html
https://github.com/huggingface/text-generation-inference
https://huggingface.co/docs/text-generation-inference/multi_backend_support
https://nvidia.github.io/TensorRT-LLM/torch.html

Tools for High-Performance Fine-Tuning

These tools are used for fine-tuning, where you train models for your specific needs. They support quantization, multi-GPU setups, and different model families. They use different approaches for fine-tuning, such as a code-first approach with Unsloth or fine-tuning through configuration and command line with Torchtune.

Tool	License	Open Source	Platforms	Supported Models
Unsloth	Apache 2.0	✅ Yes	Linux, Windows	https://docs.unsloth.ai/get-started/all-our-models
Torchtune	BSD 3 license	✅ Yes	Linux	https://github.com/pytorch/torchtune
Axolotl	Apache 2.0	✅ Yes	Linux	https://github.com/axolotl-ai-cloud/axolotl

Links:
https://github.com/pytorch/torchtune
https://unsloth.ai/introducing
https://github.com/axolotl-ai-cloud/axolotl