π NEW: Agentic Mode Now Available! Transform your local AI into an autonomous coding assistant that can read, create, edit, and organize files in your workspace. Perfect for automating development tasks, generating boilerplate code, and managing complex workflows - all running privately on your machine. Learn more about Agentic Mode β
A beginner-friendly, privacy-first desktop application for running large language models locally on Windows, Linux, and macOS. Load and chat with GGUF format models like Mistral, LLaMA, DeepSeek, and others with zero setup required.
Direct Download: GGUFLoader_v2.1.1.agentic_mode.exe (~150-300 MB)
Step 2: Run the App
- Click the downloaded
GGUFLoader_v2.1.1.agentic_mode.exefile - Windows may show a security warning - click "More info" then "Run anyway" (this is normal for new apps)
- The app will start automatically - no installation needed!
Step 3: Download a Model
- Visit Local AI Zone for curated model recommendations
- Or browse Hugging Face for thousands of GGUF models
- Save it anywhere on your computer (e.g., Downloads folder)
Step 4: Load the Model
- In GGUF Loader, click "Load Model" button
- Browse to where you saved your GGUF model file
- Select the model and click "Open"
- Wait for the model to load (progress bar will show)
Step 5: Start Chatting!
- Look for the floating chat button on your screen
- Click it to open the chat window
- Type your message and press Ctrl+Enter or click "Send"
- Enjoy your private, local AI assistant!
pip install ggufloader
ggufloaderπ‘ Easy method - No coding knowledge needed!
Step 1: Download the ZIP file
- Click here: Download ZIP
- Save it anywhere on your computer
Step 2: Extract the ZIP file
- Right-click on the downloaded ZIP file
- Select "Extract All..." (Windows) or "Extract Here" (Linux/macOS)
- Choose where to extract it
Step 3: Run the launcher
For Windows:
- Open the extracted folder
- Double-click on
launch.bat - First time only: Wait 1-2 minutes while it downloads dependencies
- The app will start automatically!
- Next time: Just double-click
launch.batagain - it starts instantly!
For Linux/macOS:
- Open the extracted folder
- Double-click on
launch.sh(or right-click β Open) - First time only: Wait 1-2 minutes while it downloads dependencies
- The app will start automatically!
- Next time: Just double-click
launch.shagain - it starts instantly!
That's it! No Python installation needed, no command line, no complicated setup.
- π€ Universal Model Support - Load ANY GGUF model from anywhere, not limited to pre-installed models
- π Zero-Setup Model Loading - Use any downloaded GGUF model instantly without configuration or conversion
- π¨ Modern UI - Clean, intuitive interface built with PySide6
- π Powerful Addon System - Enhance functionality by creating custom addons without modifying core code
- π Floating Chat Button - Always-accessible chat interface that stays on top of all windows
- π€ Agentic Mode - Advanced reasoning and task automation with multi-step problem solving
- π Privacy First - All processing happens locally on your machine, no data leaves your computer
- π» Cross-Platform - Works seamlessly on Windows, Linux, and macOS
- β‘ Lightweight & Fast - Efficient memory usage and quick response times
- π― Beginner Friendly - No technical knowledge required, just download and run
Mistral-7B Instruct (4.23 GB) - Recommended for Agentic Mode
- β¬οΈ Download Q4_0
- Excellent reasoning and task automation capabilities
- Perfect for agentic workflows and multi-step problem solving
- Fast inference with strong instruction following
GPT-OSS 20B (7.34 GB)
LLaMA 3 8B Instruct (4.68 GB)
- Quick Reference - Fast answers to common tasks
- Installation Guide - Detailed setup instructions
- User Guide - How to use GGUF Loader
- Addon Development - Create your own addons
- FAQ - Frequently asked questions
- All Documentation - Complete documentation index
GGUF Loader now supports agentic mode, enabling the AI assistant to autonomously manage your workspace. The assistant can read, create, edit, and organize files within your project folder, automating development tasks and workflows.
- Read Files - Analyze code, documentation, and project structure
- Create Files - Generate new source files, configs, and documentation
- Edit Files - Modify existing code and update configurations
- Organize Files - Create folders, move files, and restructure projects
- Automate Tasks - Execute multi-step workflows without manual intervention
-
Load Mistral-7B (recommended for best results)
- Download from the models section above
- Load it in GGUF Loader
-
Enable Agentic Mode
- Open the chat window
- Select "Agentic Mode" from the settings
- Grant workspace access permissions
-
Example Tasks
- "Create a new feature module with proper structure"
- "Refactor this codebase and organize files"
- "Generate boilerplate code for a new component"
- "Update all configuration files with new settings"
- "Create documentation for this project"
- Mistral-7B Instruct β Best choice - excellent reasoning, fast inference, perfect for code generation
- LLaMA 3 8B - Strong reasoning and code understanding
- GPT-OSS 20B - More powerful for complex refactoring tasks
- OS: Windows 10/11, Linux, or macOS
- RAM: 4GB minimum (8GB recommended)
- Storage: 2GB free space
- GPU: Optional (CUDA/OpenCL support)
GGUF Loader supports GPU acceleration for significantly faster inference speeds. If you have an NVIDIA GPU, follow these steps:
- NVIDIA GPU (GTX 1060 or newer recommended)
- CUDA Toolkit installed (CUDA 12.x recommended)
- Latest NVIDIA drivers
Step 1: Run the GPU installation script
Option A: Pre-built wheel (Recommended - Fastest)
# Windows
install_gpu_llama.bat
# Linux/macOS
chmod +x install_gpu_llama.sh
./install_gpu_llama.shOption B: Build from source (requires Visual Studio Build Tools)
# Windows
install_gpu_llama_source.bat
# Linux/macOS
chmod +x install_gpu_llama_source.sh
./install_gpu_llama_source.shStep 2: Verify GPU support
python verify_gpu_support.pyStep 3: Use GPU acceleration
- Launch GGUF Loader
- In the "Processing Mode" dropdown, select "GPU Accelerated"
- Load your model - you'll see "(GPU)" in the status
- Start chatting with GPU-accelerated inference!
- RTX 4060 (8GB): Can offload 25-40 layers depending on model size
- RTX 3060 (12GB): Can offload 40-50 layers
- RTX 4090 (24GB): Can offload entire models (60+ layers)
Run this in a separate terminal while using GGUF Loader:
# Windows
monitor_gpu.bat
# Linux/macOS
watch -n 1 nvidia-smiWatch the "GPU-Util" column increase when generating responses - this confirms GPU acceleration is working!
- "pip not recognized": The script will automatically activate your virtual environment
- Slow speeds: Try increasing GPU layers in
models/model_loader.py(default: 35) - Out of memory: Reduce GPU layers or use a smaller model
- No speedup: Verify CUDA is installed with
nvidia-smi
We welcome contributions! See CONTRIBUTING.md for guidelines.
This project is licensed under the MIT License - see LICENSE for details.
Report security vulnerabilities to: hossainnazary475@gmail.com
See SECURITY.md for our security policy.
- π Report Issues
- π¬ Discussions
- π§ Email: hossainnazary475@gmail.com
Built with β€οΈ by the GGUF Loader community
