DeepSeek has become one of the most capable and efficient AI models available, and running it locally gives you private, cost-free access to its full capabilities. Here is the complete setup guide for getting DeepSeek running on your own hardware.
Which DeepSeek Model to Run
DeepSeek has released several model families. For local inference, the most useful are:
DeepSeek-R1 Distilled: The reasoning model distilled into smaller sizes. The 14B distilled version delivers excellent chain-of-thought reasoning and runs well on 16GB VRAM. The 7B distilled version works on 8GB VRAM.
DeepSeek-V3: The latest general-purpose model. Full size requires professional hardware (multiple GPUs), but quantized versions of smaller variants run on consumer GPUs.
DeepSeek-Coder V2: Specialized for code generation and analysis. Available in 16B size which runs on 16GB VRAM.
Method 1: Ollama (Recommended for Most Users)
Ollama is the easiest way to run DeepSeek locally.
Step 1: Install Ollama
Download from ollama.ai and install. Ollama runs as a background service.
Step 2: Pull DeepSeek
Open your terminal and run:
ollama pull deepseek-r1:14b
For the 7B version (less VRAM required):
ollama pull deepseek-r1:7b
For coding-specific work:
ollama pull deepseek-coder-v2:16b
Step 3: Start a conversation
ollama run deepseek-r1:14b
You now have a local DeepSeek session. The model runs entirely on your hardware — no data leaves your machine.
Step 4: Use the API
Ollama exposes an OpenAI-compatible API at http://localhost:11434. You can use any OpenAI-compatible client to interact with your local DeepSeek.
Method 2: LM Studio
For users who prefer a graphical interface:
1. Download and install LM Studio
2. Go to the Model Browser (search icon in sidebar)
3. Search for "DeepSeek"
4. Choose your preferred model and click Download
5. Once downloaded, select it as your active model
6. Start chatting in the Chat interface
LM Studio lets you adjust parameters like temperature and context length through its GUI — useful for fine-tuning DeepSeek's behavior for specific tasks.
Quantization: Getting the Best Performance
Quantization reduces model size and memory requirements at the cost of some quality. For local inference, Q4_K_M quantization offers the best quality-to-size ratio for most use cases:
Q8_0: Near-full quality, requires ~14GB for 7B models
Q4_K_M: Good quality, requires ~5GB for 7B models (recommended)
Q3_K_M: Some quality loss, requires ~4GB for 7B models
Ollama handles quantization automatically when you pull models. LM Studio lets you choose the quantization level in the model browser.
Getting the Best Results from DeepSeek Locally
Use the reasoning model for complex tasks: DeepSeek-R1 was trained to reason step-by-step. For technical problems, math, and complex analysis, it will show its work — making its outputs more verifiable.
Give context in system prompts: DeepSeek responds well to detailed system prompts. Set context about your role and task before starting a conversation.
For coding: DeepSeek-Coder V2 is specialized for code and generally outperforms DeepSeek-R1 on purely coding tasks.
Integrating Local DeepSeek with Your Research Workflow
Once DeepSeek is running locally, use Notebook Toolkit to capture your local AI conversations and add them to NotebookLM notebooks. Your private, local DeepSeek analyses become part of a searchable, AI-synthesizable knowledge base — without any of that content ever going through DeepSeek's cloud servers.