9 posts tagged with "python"

How to Integrate OpenAI TTS with FFmpeg in a FastAPI Service

March 6, 2025 · 5 min read

Senior Software Engineer at Vitrifi

Introduction

OpenAI offers powerful text-to-speech capabilities, enabling developers to generate spoken audio from raw text. Meanwhile, FFmpeg is the de facto standard tool for audio/video processing—used heavily for tasks like merging audio files, converting formats, and applying filters. Combining these two in a FastAPI application can produce a scalable, production-ready text-to-speech (TTS) workflow that merges and manipulates audio via FFmpeg under the hood.

This article demonstrates how to:

Accept text input through a FastAPI endpoint
Chunk text and use OpenAI to generate MP3 segments
Merge generated segments with FFmpeg (through the pydub interface)
Return or store a final MP3 file, ideal for streamlined TTS pipelines

By the end, you’ll understand how to build a simple but effective text-to-speech microservice that leverages the power of OpenAI and FFmpeg.

1. Why Combine OpenAI and FFmpeg

Chunked Processing: Long text might exceed certain API limits or timeouts. Splitting into smaller parts ensures each piece is handled reliably.
Post-processing: Merging segments, adding intros or outros, or applying custom filters (such as volume adjustments) becomes trivial with FFmpeg.
Scalability: A background task system (like FastAPI’s BackgroundTasks) can handle requests without blocking the main thread.
Automation: Minimizes manual involvement—one endpoint can receive text and produce a final merged MP3.

2. FastAPI Endpoint and Background Tasks

Below is the FastAPI code that implements a TTS service using the OpenAI API and pydub (which uses FFmpeg internally). It splits the input text into manageable chunks, generates MP3 files per chunk, then merges them:

import os
import time
import logging
from pathlib import Path

from dotenv import load_dotenv
from fastapi import APIRouter, HTTPException, Request, BackgroundTasks
from fastapi.responses import JSONResponse
from pydantic import BaseModel
from openai import OpenAI
from pydub import AudioSegment

load_dotenv(".env.local")

OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
client = OpenAI(api_key=OPENAI_API_KEY)

router = APIRouter()

logging.basicConfig(
    level=logging.DEBUG,  # Set root logger to debug level
    format='%(levelname)s | %(name)s | %(message)s'
)
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)

class AudioRequest(BaseModel):
    input: str

def chunk_text(text: str, chunk_size: int = 4096):
    """
    Generator that yields `text` in chunks of `chunk_size`.
    """
    for i in range(0, len(text), chunk_size):
        yield text[i:i + chunk_size]

@router.post("/speech")
async def generate_speech(request: Request, body: AudioRequest, background_tasks: BackgroundTasks):
    """
    Fires off the TTS request in the background (fire-and-forget).
    Logs are added to track progress. No zip file is created.
    """
    model = "tts-1"
    voice = "onyx"

    if not body.input:
        raise HTTPException(
            status_code=400,
            detail="Missing required field: input"
        )

    # Current time for folder naming or logging
    timestamp = int(time.time() * 1000)

    # Create a folder for storing output
    output_folder = Path(".") / f"speech_{timestamp}"
    output_folder.mkdir(exist_ok=True)

    # Split the input into chunks
    chunks = list(chunk_text(body.input, 4096))

    # Schedule the actual speech generation in the background
    background_tasks.add_task(
        generate_audio_files,
        chunks=chunks,
        output_folder=output_folder,
        model=model,
        voice=voice,
        timestamp=timestamp
    )

    # Log and return immediately
    logger.info(f"Speech generation task started at {timestamp} with {len(chunks)} chunks.")
    return JSONResponse({"detail": f"Speech generation started. Timestamp: {timestamp}"})

def generate_audio_files(chunks, output_folder, model, voice, timestamp):
    """
    Generates audio files for each chunk. Runs in the background.
    After all chunks are created, merges them into a single MP3 file.
    """
    try:
        # Generate individual chunk MP3s
        for index, chunk in enumerate(chunks):
            speech_filename = f"speech-chunk-{index + 1}.mp3"
            speech_file_path = output_folder / speech_filename

            logger.info(f"Generating audio for chunk {index + 1}/{len(chunks)}...")

            response = client.audio.speech.create(
                model=model,
                voice=voice,
                input=chunk,
                response_format="mp3",
            )

            response.stream_to_file(speech_file_path)
            logger.info(f"Chunk {index + 1} audio saved to {speech_file_path}")

        # Merge all generated MP3 files into a single file
        logger.info("Merging all audio chunks into one file...")
        merged_audio = AudioSegment.empty()

        def file_index(file_path: Path):
            # Expects file names like 'speech-chunk-1.mp3'
            return int(file_path.stem.split('-')[-1])

        sorted_audio_files = sorted(output_folder.glob("speech-chunk-*.mp3"), key=file_index)
        for audio_file in sorted_audio_files:
            chunk_audio = AudioSegment.from_file(audio_file, format="mp3")
            merged_audio += chunk_audio

        merged_output_file = output_folder / f"speech-merged-{timestamp}.mp3"
        merged_audio.export(merged_output_file, format="mp3")
        logger.info(f"Merged audio saved to {merged_output_file}")

        logger.info(f"All speech chunks generated and merged for timestamp {timestamp}.")
    except Exception as e:
        logger.error(f"OpenAI error (timestamp {timestamp}): {e}")

Key Takeaways

AudioRequest model enforces the presence of an input field.
chunk_text ensures no chunk exceeds 4096 characters (you can adjust this size).
BackgroundTasks offloads the TTS generation so the API can respond promptly.
pydub merges MP3 files (which in turn calls FFmpeg).

3. Using FFmpeg Under the Hood

Installing pydub requires FFmpeg on your system. Ensure FFmpeg is in your PATH—otherwise you’ll get errors when merging or saving MP3 files. For Linux (Ubuntu/Debian):

sudo apt-get update
sudo apt-get install ffmpeg

For macOS (using Homebrew):

brew install ffmpeg

If you’re on Windows, install FFmpeg from FFmpeg’s official site or use a package manager like chocolatey or scoop.

4. Mermaid JS Diagram

Below is a Mermaid sequence diagram illustrating the workflow:

Explanation:

User sends a POST request with text data.
FastAPI quickly acknowledges the request, then spawns a background task.
Chunks of text are processed via OpenAI TTS, saving individual MP3 files.
pydub merges them (calling FFmpeg behind the scenes).
Final merged file is ready in your output directory.

5. Conclusion

Integrating OpenAI text-to-speech with FFmpeg via pydub in a FastAPI application provides a robust, scalable way to automate TTS pipelines:

Reliability: Chunk-based processing handles large inputs without overloading the API.
Versatility: FFmpeg’s audio manipulation potential is nearly limitless.
Speed: Background tasks ensure the main API remains responsive.

With the sample code above, you can adapt chunk sizes, add authentication, or expand the pipeline to include more sophisticated post-processing (like watermarking, crossfading, or mixing in music). Enjoy building richer audio capabilities into your apps—OpenAI and FFmpeg make a powerful duo.

How to Set Up and Run DeepSeek-R1 Locally With Ollama and FastAPI

January 30, 2025 · 5 min read

Vadim Nicolai

Senior Software Engineer at Vitrifi

Introduction

DeepSeek-R1 is a family of large language models (LLMs) known for advanced natural language capabilities. While hosting an LLM in the cloud can be convenient, local deployment provides greater control over latency, privacy, and resource utilization. Tools like Ollama simplify this process by handling model downloading and quantization. However, to truly scale or integrate these capabilities into other services, you often need a robust REST API layer—FastAPI is perfect for this.

This article covers the entire pipeline:

Installing and configuring Ollama to serve DeepSeek-R1 locally
Interacting with DeepSeek-R1 using the CLI, Python scripts, or a FastAPI endpoint for streaming responses
Demonstrating a minimal FastAPI integration, so you can easily wrap your model in a web service

By the end, you’ll see how to run DeepSeek-R1 locally while benefiting from FastAPI’s scalability, logging, and integration features—all without sending your data to external servers.

1. Why Run DeepSeek-R1 Locally?

Running DeepSeek-R1 on your own machine has multiple advantages:

Privacy & Security: No data is sent to third-party services
Performance & Low Latency: Local inference avoids remote API calls
Customization: Fine-tune or adjust inference parameters as needed
No Rate Limits: In-house solution means no usage caps or unexpected cost spikes
Offline Availability: Once downloaded, the model runs even without internet access

2. Setting Up DeepSeek-R1 Locally With Ollama

2.1 Installing Ollama

Download Ollama from the official website.
Install it on your machine, just like any application.

note

Check Ollama’s documentation for platform-specific support. It’s available on macOS and some Linux distributions.

2.2 Download and Test DeepSeek-R1

Ollama makes model retrieval simple:

ollama run deepseek-r1

This command automatically downloads DeepSeek-R1 (the default variant). If your hardware cannot handle the full 671B-parameter model, specify a smaller distilled version:

ollama run deepseek-r1:7b

info

DeepSeek-R1 offers different parameter sizes (e.g., 1.5B, 7B, 14B, 70B, 671B) for various hardware setups.

2.3 Running DeepSeek-R1 in the Background

To serve the model continuously (useful for external services like FastAPI):

ollama serve

By default, Ollama listens on http://localhost:11434.

3. Using DeepSeek-R1 Locally

3.1 Command-Line (CLI) Inference

You can chat directly with DeepSeek-R1 in your terminal:

ollama run deepseek-r1

Type a question or prompt; responses stream back in real time.

3.2 Accessing DeepSeek-R1 via API

If you’re building an application, you can call Ollama’s REST API:

curl http://localhost:11434/api/chat -d '{
  "model": "deepseek-r1",
  "messages": [{ "role": "user", "content": "Solve: 25 * 25" }],
  "stream": false
}'

note

Set "stream": true to receive chunked streaming responses—a feature you can integrate easily into web apps or server frameworks like FastAPI.

3.3 Python Integration

Install the ollama Python package:

pip install ollama

Then use:

import ollama

response = ollama.chat(
    model="deepseek-r1",
    messages=[
        {"role": "user", "content": "Explain Newton's second law of motion"},
    ],
)
print(response["message"]["content"])

4. FastAPI Integration and Streaming Responses

To wrap DeepSeek-R1 in a fully customizable FastAPI service, you can define streaming endpoints for advanced usage. Below is an example that sends chunked responses to the client:

import os
import json
from typing import List
from pydantic import BaseModel
from dotenv import load_dotenv
from fastapi import FastAPI, Query
from fastapi.responses import StreamingResponse
from openai import OpenAI

from .utils.prompt import ClientMessage, convert_to_openai_messages
from .utils.tools import get_current_weather  # example tool
from .utils.tools import available_tools  # hypothetical dict of tool funcs

load_dotenv(".env.local")

app = FastAPI()
client = OpenAI(api_key="ollama", base_url="http://localhost:11434/v1/")

class Request(BaseModel):
    messages: List[ClientMessage]

def stream_text(messages: List[ClientMessage], protocol: str = 'data'):
    stream = client.chat.completions.create(
        messages=messages,
        model="deepseek-r1",
        stream=True,
    )

    if protocol == 'text':
        for chunk in stream:
            for choice in chunk.choices:
                if choice.finish_reason == "stop":
                    break
                else:
                    yield "{text}".format(text=choice.delta.content)

    elif protocol == 'data':
        draft_tool_calls = []
        draft_tool_calls_index = -1

        for chunk in stream:
            for choice in chunk.choices:
                if choice.finish_reason == "stop":
                    continue
                elif choice.finish_reason == "tool_calls":
                    for tool_call in draft_tool_calls:
                        yield f'9:{{"toolCallId":"{tool_call["id"]}","toolName":"{tool_call["name"]}","args":{tool_call["arguments"]}}}\n'

                    for tool_call in draft_tool_calls:
                        tool_result = available_tools[tool_call["name"]](**json.loads(tool_call["arguments"]))
                        yield (
                            f'a:{{"toolCallId":"{tool_call["id"]}","toolName":"{tool_call["name"]}","args":{tool_call["arguments"]},'
                            f'"result":{json.dumps(tool_result)}}}\n'
                        )
                elif choice.delta.tool_calls:
                    for tool_call in choice.delta.tool_calls:
                        id = tool_call.id
                        name = tool_call.function.name
                        arguments = tool_call.function.arguments
                        if id is not None:
                            draft_tool_calls_index += 1
                            draft_tool_calls.append({"id": id, "name": name, "arguments": ""})
                        else:
                            draft_tool_calls[draft_tool_calls_index]["arguments"] += arguments
                else:
                    yield f'0:{json.dumps(choice.delta.content)}\n'

            # usage
            if chunk.choices == []:
                usage = chunk.usage
                prompt_tokens = usage.prompt_tokens
                completion_tokens = usage.completion_tokens
                yield (
                    f'd:{{"finishReason":"{"tool-calls" if len(draft_tool_calls) > 0 else "stop"}",'
                    f'"usage":{{"promptTokens":{prompt_tokens},"completionTokens":{completion_tokens}}}}}\n'
                )

@app.post("/api/chat")
async def handle_chat_data(request: Request, protocol: str = Query('data')):
    messages = request.messages
    openai_messages = convert_to_openai_messages(messages)
    response = StreamingResponse(stream_text(openai_messages, protocol))
    response.headers['x-vercel-ai-data-stream'] = 'v1'
    return response

Key Points:

stream=True allows the server to stream content chunk by chunk.
The code handles optional “tool calls” logic—customizable for your own environment.
FastAPI’s StreamingResponse ensures the client receives partial output in real time.

With this setup, you can embed DeepSeek-R1 into more complex microservices or orchestrate multi-step workflows that rely on streaming LLM responses.

6. Conclusion

DeepSeek-R1 combined with Ollama and FastAPI gives you a powerful local LLM service. You can handle all aspects of data ingestion, retrieval, and inference in one place—without relying on third-party endpoints or paying subscription costs. Here’s a recap:

Ollama manages downloading and serving the DeepSeek-R1 models.
FastAPI provides a flexible web layer for streaming responses or building microservices.

Build your local AI solutions confidently and privately—DeepSeek-R1 is now at your fingertips.

Powering Quant Finance with Qlib’s PyTorch MLP on Alpha360

December 22, 2024 · 5 min read

Vadim Nicolai

Senior Software Engineer at Vitrifi

Introduction

Qlib is an AI-oriented, open-source platform from Microsoft that simplifies the entire quantitative finance process. By leveraging PyTorch, Qlib can seamlessly integrate modern neural networks—like Multi-Layer Perceptrons (MLPs)—to process large datasets, engineer alpha factors, and run flexible backtests. In this post, we focus on a PyTorch MLP pipeline for Alpha360 data in the US market, examining a single YAML configuration that unifies data ingestion, model training, and performance evaluation.

Harnessing AI for Quantitative Finance with Qlib and LightGBM

December 22, 2024 · 6 min read

Vadim Nicolai

Senior Software Engineer at Vitrifi

Introduction

In the realm of quantitative finance, machine learning and deep learning are revolutionizing how researchers and traders discover alpha, manage portfolios, and adapt to market shifts. Qlib by Microsoft is a powerful open-source framework that merges AI techniques with end-to-end finance workflows.

This article demonstrates how Qlib automates an AI-driven quant workflow—from data ingestion and feature engineering to model training and backtesting—using a single YAML configuration for a LightGBM model. Specifically, we’ll explore the AI-centric aspects of how qrun orchestrates the entire pipeline and highlight best practices for leveraging advanced ML models in your quantitative strategies.

Correct Exchange Mapping in VeighNa to Resolve IB Security Definition Errors

October 12, 2024 · 14 min read

Vadim Nicolai

Senior Software Engineer at Vitrifi

Introduction

In the intricate world of algorithmic trading, seamless integration between trading platforms and broker APIs is paramount.

One common issue when interfacing with Interactive Brokers (IB) API is encountering the error:

ERROR:root:Error - ReqId: 1, Code: 200, Message: No security definition has been found for the request

This error typically arises due to incorrect exchange mapping, preventing Interactive Brokers (IB) from recognizing the requested security. This article delves into the importance of accurate exchange mapping within the VeighNa trading platform, provides a detailed overview of IB's symbol rules, explains the updatePortfolio method, and offers guidance on implementing correct mappings to avoid such errors.

Understanding the Sniper Algorithm Implementation in Algorithmic Trading

October 5, 2024 · 8 min read

Vadim Nicolai

Senior Software Engineer at Vitrifi

Introduction

In the realm of algorithmic trading, execution algorithms play a pivotal role in optimizing trade orders to minimize market impact and slippage. One such algorithm is the Sniper Algorithm, which is designed to execute trades discreetly and efficiently by capitalizing on favorable market conditions.

This article aims to review and understand the implementation of the Sniper Algorithm as provided in the VeighNa trading platform's open-source repository. By dissecting the code and explaining its components, we hope to provide clarity on how the algorithm functions and how it can be utilized in practical trading scenarios.

Backtesting NVIDIA Stock Strategies on VeighNa - Moving Average Crossover Strategy

October 1, 2024 · 15 min read

Vadim Nicolai

Senior Software Engineer at Vitrifi

Introduction

Backtesting is essential for validating trading strategies, especially in the high-frequency and volatile world of stocks like NVIDIA (NVDA). Using VeighNa, an open-source algorithmic trading system, provides traders with the flexibility to thoroughly test strategies and optimize for performance. In this guide, we'll walk through setting up VeighNa, backtesting a simple Moving Average Crossover strategy on NVIDIA, explaining the strategy in detail, troubleshooting common installation issues, and optimizing your strategy.

Automating Financial Data Collection and Uploading to Hugging Face for Algorithmic Trading

September 29, 2024 · 6 min read

Vadim Nicolai

Senior Software Engineer at Vitrifi

Introduction

In the fast-paced world of algorithmic trading, accessing reliable and timely financial data is essential for backtesting strategies, optimizing models, and making data-driven trading decisions. Automating data collection can streamline your workflow and ensure that you have access to the most recent market information. In this guide, we’ll walk through how to automate the collection of stock data using Python and yfinance, and how to upload this data to Hugging Face for convenient access and future use.

Although this article uses NVIDIA stock data as an example, the process is applicable to any publicly traded company or financial instrument. By integrating data collection and storage into one automated pipeline, traders and analysts can focus on what matters most—developing strategies and maximizing returns.

Algorithmic Trading with VeighNa and Interactive Brokers - Installation Guide and Troubleshooting

September 22, 2024 · 5 min read

Vadim Nicolai

Senior Software Engineer at Vitrifi

Introduction

Algorithmic trading is transforming the financial landscape, and frameworks like VeighNa combined with Interactive Brokers (IB) offer traders the tools they need to optimize their trading strategies and automate execution across global markets. However, setting up these tools on macOS, particularly on Apple Silicon (M1/M2), can be tricky due to package compatibility issues. This guide will walk you through the installation process of VeighNa with IB on macOS, highlighting all the potential gotchas we encountered, along with their solutions.

Introduction​

1. Why Combine OpenAI and FFmpeg​

2. FastAPI Endpoint and Background Tasks​

Key Takeaways​

3. Using FFmpeg Under the Hood​

4. Mermaid JS Diagram​

5. Conclusion​

Introduction​

1. Why Run DeepSeek-R1 Locally?​

2. Setting Up DeepSeek-R1 Locally With Ollama​

2.1 Installing Ollama​

2.2 Download and Test DeepSeek-R1​

2.3 Running DeepSeek-R1 in the Background​

3. Using DeepSeek-R1 Locally​

3.1 Command-Line (CLI) Inference​

3.2 Accessing DeepSeek-R1 via API​

3.3 Python Integration​

4. FastAPI Integration and Streaming Responses​

Key Points:​

6. Conclusion​

Introduction​

Introduction​

Introduction​

Introduction​

Introduction​

Introduction​

Introduction​

Introduction

1. Why Combine OpenAI and FFmpeg

2. FastAPI Endpoint and Background Tasks

Key Takeaways

3. Using FFmpeg Under the Hood

4. Mermaid JS Diagram

5. Conclusion

Introduction

1. Why Run DeepSeek-R1 Locally?

2. Setting Up DeepSeek-R1 Locally With Ollama

2.1 Installing Ollama

2.2 Download and Test DeepSeek-R1

2.3 Running DeepSeek-R1 in the Background

3. Using DeepSeek-R1 Locally

3.1 Command-Line (CLI) Inference

3.2 Accessing DeepSeek-R1 via API

3.3 Python Integration

4. FastAPI Integration and Streaming Responses

Key Points:

6. Conclusion

Introduction

Introduction

Introduction

Introduction

Introduction

Introduction

Introduction