Skip to main content

8 posts tagged with "python"

View All Tags

How to Set Up and Run DeepSeek-R1 Locally With Ollama and FastAPI

· 5 min read
Vadim Nicolai
Senior Software Engineer at Vitrifi

Introduction

DeepSeek-R1 is a family of large language models (LLMs) known for advanced natural language capabilities. While hosting an LLM in the cloud can be convenient, local deployment provides greater control over latency, privacy, and resource utilization. Tools like Ollama simplify this process by handling model downloading and quantization. However, to truly scale or integrate these capabilities into other services, you often need a robust REST API layer—FastAPI is perfect for this.

This article covers the entire pipeline:

  1. Installing and configuring Ollama to serve DeepSeek-R1 locally
  2. Interacting with DeepSeek-R1 using the CLI, Python scripts, or a FastAPI endpoint for streaming responses
  3. Demonstrating a minimal FastAPI integration, so you can easily wrap your model in a web service

By the end, you’ll see how to run DeepSeek-R1 locally while benefiting from FastAPI’s scalability, logging, and integration features—all without sending your data to external servers.


1. Why Run DeepSeek-R1 Locally?

Running DeepSeek-R1 on your own machine has multiple advantages:

  • Privacy & Security: No data is sent to third-party services
  • Performance & Low Latency: Local inference avoids remote API calls
  • Customization: Fine-tune or adjust inference parameters as needed
  • No Rate Limits: In-house solution means no usage caps or unexpected cost spikes
  • Offline Availability: Once downloaded, the model runs even without internet access

2. Setting Up DeepSeek-R1 Locally With Ollama

2.1 Installing Ollama

  1. Download Ollama from the official website.
  2. Install it on your machine, just like any application.
note

Check Ollama’s documentation for platform-specific support. It’s available on macOS and some Linux distributions.

2.2 Download and Test DeepSeek-R1

Ollama makes model retrieval simple:

ollama run deepseek-r1

This command automatically downloads DeepSeek-R1 (the default variant). If your hardware cannot handle the full 671B-parameter model, specify a smaller distilled version:

ollama run deepseek-r1:7b
info

DeepSeek-R1 offers different parameter sizes (e.g., 1.5B, 7B, 14B, 70B, 671B) for various hardware setups.

2.3 Running DeepSeek-R1 in the Background

To serve the model continuously (useful for external services like FastAPI):

ollama serve

By default, Ollama listens on http://localhost:11434.


3. Using DeepSeek-R1 Locally

3.1 Command-Line (CLI) Inference

You can chat directly with DeepSeek-R1 in your terminal:

ollama run deepseek-r1

Type a question or prompt; responses stream back in real time.

3.2 Accessing DeepSeek-R1 via API

If you’re building an application, you can call Ollama’s REST API:

curl http://localhost:11434/api/chat -d '{
"model": "deepseek-r1",
"messages": [{ "role": "user", "content": "Solve: 25 * 25" }],
"stream": false
}'
note

Set "stream": true to receive chunked streaming responses—a feature you can integrate easily into web apps or server frameworks like FastAPI.

3.3 Python Integration

Install the ollama Python package:

pip install ollama

Then use:

import ollama

response = ollama.chat(
model="deepseek-r1",
messages=[
{"role": "user", "content": "Explain Newton's second law of motion"},
],
)
print(response["message"]["content"])

4. FastAPI Integration and Streaming Responses

To wrap DeepSeek-R1 in a fully customizable FastAPI service, you can define streaming endpoints for advanced usage. Below is an example that sends chunked responses to the client:

import os
import json
from typing import List
from pydantic import BaseModel
from dotenv import load_dotenv
from fastapi import FastAPI, Query
from fastapi.responses import StreamingResponse
from openai import OpenAI

from .utils.prompt import ClientMessage, convert_to_openai_messages
from .utils.tools import get_current_weather # example tool
from .utils.tools import available_tools # hypothetical dict of tool funcs

load_dotenv(".env.local")

app = FastAPI()
client = OpenAI(api_key="ollama", base_url="http://localhost:11434/v1/")

class Request(BaseModel):
messages: List[ClientMessage]

def stream_text(messages: List[ClientMessage], protocol: str = 'data'):
stream = client.chat.completions.create(
messages=messages,
model="deepseek-r1",
stream=True,
)

if protocol == 'text':
for chunk in stream:
for choice in chunk.choices:
if choice.finish_reason == "stop":
break
else:
yield "{text}".format(text=choice.delta.content)

elif protocol == 'data':
draft_tool_calls = []
draft_tool_calls_index = -1

for chunk in stream:
for choice in chunk.choices:
if choice.finish_reason == "stop":
continue
elif choice.finish_reason == "tool_calls":
for tool_call in draft_tool_calls:
yield f'9:{{"toolCallId":"{tool_call["id"]}","toolName":"{tool_call["name"]}","args":{tool_call["arguments"]}}}\n'

for tool_call in draft_tool_calls:
tool_result = available_tools[tool_call["name"]](**json.loads(tool_call["arguments"]))
yield (
f'a:{{"toolCallId":"{tool_call["id"]}","toolName":"{tool_call["name"]}","args":{tool_call["arguments"]},'
f'"result":{json.dumps(tool_result)}}}\n'
)
elif choice.delta.tool_calls:
for tool_call in choice.delta.tool_calls:
id = tool_call.id
name = tool_call.function.name
arguments = tool_call.function.arguments
if id is not None:
draft_tool_calls_index += 1
draft_tool_calls.append({"id": id, "name": name, "arguments": ""})
else:
draft_tool_calls[draft_tool_calls_index]["arguments"] += arguments
else:
yield f'0:{json.dumps(choice.delta.content)}\n'

# usage
if chunk.choices == []:
usage = chunk.usage
prompt_tokens = usage.prompt_tokens
completion_tokens = usage.completion_tokens
yield (
f'd:{{"finishReason":"{"tool-calls" if len(draft_tool_calls) > 0 else "stop"}",'
f'"usage":{{"promptTokens":{prompt_tokens},"completionTokens":{completion_tokens}}}}}\n'
)

@app.post("/api/chat")
async def handle_chat_data(request: Request, protocol: str = Query('data')):
messages = request.messages
openai_messages = convert_to_openai_messages(messages)
response = StreamingResponse(stream_text(openai_messages, protocol))
response.headers['x-vercel-ai-data-stream'] = 'v1'
return response

Key Points:

  • stream=True allows the server to stream content chunk by chunk.
  • The code handles optional “tool calls” logic—customizable for your own environment.
  • FastAPI’s StreamingResponse ensures the client receives partial output in real time.

With this setup, you can embed DeepSeek-R1 into more complex microservices or orchestrate multi-step workflows that rely on streaming LLM responses.


6. Conclusion

DeepSeek-R1 combined with Ollama and FastAPI gives you a powerful local LLM service. You can handle all aspects of data ingestion, retrieval, and inference in one place—without relying on third-party endpoints or paying subscription costs. Here’s a recap:

  • Ollama manages downloading and serving the DeepSeek-R1 models.
  • FastAPI provides a flexible web layer for streaming responses or building microservices.

Build your local AI solutions confidently and privately—DeepSeek-R1 is now at your fingertips.

Powering Quant Finance with Qlib’s PyTorch MLP on Alpha360

· 5 min read
Vadim Nicolai
Senior Software Engineer at Vitrifi

Introduction

Qlib is an AI-oriented, open-source platform from Microsoft that simplifies the entire quantitative finance process. By leveraging PyTorch, Qlib can seamlessly integrate modern neural networks—like Multi-Layer Perceptrons (MLPs)—to process large datasets, engineer alpha factors, and run flexible backtests. In this post, we focus on a PyTorch MLP pipeline for Alpha360 data in the US market, examining a single YAML configuration that unifies data ingestion, model training, and performance evaluation.

Harnessing AI for Quantitative Finance with Qlib and LightGBM

· 6 min read
Vadim Nicolai
Senior Software Engineer at Vitrifi

Introduction

In the realm of quantitative finance, machine learning and deep learning are revolutionizing how researchers and traders discover alpha, manage portfolios, and adapt to market shifts. Qlib by Microsoft is a powerful open-source framework that merges AI techniques with end-to-end finance workflows.

This article demonstrates how Qlib automates an AI-driven quant workflow—from data ingestion and feature engineering to model training and backtesting—using a single YAML configuration for a LightGBM model. Specifically, we’ll explore the AI-centric aspects of how qrun orchestrates the entire pipeline and highlight best practices for leveraging advanced ML models in your quantitative strategies.

Correct Exchange Mapping in VeighNa to Resolve IB Security Definition Errors

· 14 min read
Vadim Nicolai
Senior Software Engineer at Vitrifi

Introduction

In the intricate world of algorithmic trading, seamless integration between trading platforms and broker APIs is paramount.

One common issue when interfacing with Interactive Brokers (IB) API is encountering the error:

ERROR:root:Error - ReqId: 1, Code: 200, Message: No security definition has been found for the request

This error typically arises due to incorrect exchange mapping, preventing Interactive Brokers (IB) from recognizing the requested security. This article delves into the importance of accurate exchange mapping within the VeighNa trading platform, provides a detailed overview of IB's symbol rules, explains the updatePortfolio method, and offers guidance on implementing correct mappings to avoid such errors.

Understanding the Sniper Algorithm Implementation in Algorithmic Trading

· 8 min read
Vadim Nicolai
Senior Software Engineer at Vitrifi

Introduction

In the realm of algorithmic trading, execution algorithms play a pivotal role in optimizing trade orders to minimize market impact and slippage. One such algorithm is the Sniper Algorithm, which is designed to execute trades discreetly and efficiently by capitalizing on favorable market conditions.

This article aims to review and understand the implementation of the Sniper Algorithm as provided in the VeighNa trading platform's open-source repository. By dissecting the code and explaining its components, we hope to provide clarity on how the algorithm functions and how it can be utilized in practical trading scenarios.

Backtesting NVIDIA Stock Strategies on VeighNa - Moving Average Crossover Strategy

· 15 min read
Vadim Nicolai
Senior Software Engineer at Vitrifi

Introduction

Backtesting is essential for validating trading strategies, especially in the high-frequency and volatile world of stocks like NVIDIA (NVDA). Using VeighNa, an open-source algorithmic trading system, provides traders with the flexibility to thoroughly test strategies and optimize for performance. In this guide, we'll walk through setting up VeighNa, backtesting a simple Moving Average Crossover strategy on NVIDIA, explaining the strategy in detail, troubleshooting common installation issues, and optimizing your strategy.

Automating Financial Data Collection and Uploading to Hugging Face for Algorithmic Trading

· 6 min read
Vadim Nicolai
Senior Software Engineer at Vitrifi

Introduction

In the fast-paced world of algorithmic trading, accessing reliable and timely financial data is essential for backtesting strategies, optimizing models, and making data-driven trading decisions. Automating data collection can streamline your workflow and ensure that you have access to the most recent market information. In this guide, we’ll walk through how to automate the collection of stock data using Python and yfinance, and how to upload this data to Hugging Face for convenient access and future use.

Although this article uses NVIDIA stock data as an example, the process is applicable to any publicly traded company or financial instrument. By integrating data collection and storage into one automated pipeline, traders and analysts can focus on what matters most—developing strategies and maximizing returns.

Algorithmic Trading with VeighNa and Interactive Brokers - Installation Guide and Troubleshooting

· 5 min read
Vadim Nicolai
Senior Software Engineer at Vitrifi

Introduction

Algorithmic trading is transforming the financial landscape, and frameworks like VeighNa combined with Interactive Brokers (IB) offer traders the tools they need to optimize their trading strategies and automate execution across global markets. However, setting up these tools on macOS, particularly on Apple Silicon (M1/M2), can be tricky due to package compatibility issues. This guide will walk you through the installation process of VeighNa with IB on macOS, highlighting all the potential gotchas we encountered, along with their solutions.