2 posts tagged with "api"

How to Integrate OpenAI TTS with FFmpeg in a FastAPI Service

March 6, 2025 · 5 min read

Senior Software Engineer at Vitrifi

Introduction

OpenAI offers powerful text-to-speech capabilities, enabling developers to generate spoken audio from raw text. Meanwhile, FFmpeg is the de facto standard tool for audio/video processing—used heavily for tasks like merging audio files, converting formats, and applying filters. Combining these two in a FastAPI application can produce a scalable, production-ready text-to-speech (TTS) workflow that merges and manipulates audio via FFmpeg under the hood.

This article demonstrates how to:

Accept text input through a FastAPI endpoint
Chunk text and use OpenAI to generate MP3 segments
Merge generated segments with FFmpeg (through the pydub interface)
Return or store a final MP3 file, ideal for streamlined TTS pipelines

By the end, you’ll understand how to build a simple but effective text-to-speech microservice that leverages the power of OpenAI and FFmpeg.

1. Why Combine OpenAI and FFmpeg

Chunked Processing: Long text might exceed certain API limits or timeouts. Splitting into smaller parts ensures each piece is handled reliably.
Post-processing: Merging segments, adding intros or outros, or applying custom filters (such as volume adjustments) becomes trivial with FFmpeg.
Scalability: A background task system (like FastAPI’s BackgroundTasks) can handle requests without blocking the main thread.
Automation: Minimizes manual involvement—one endpoint can receive text and produce a final merged MP3.

2. FastAPI Endpoint and Background Tasks

Below is the FastAPI code that implements a TTS service using the OpenAI API and pydub (which uses FFmpeg internally). It splits the input text into manageable chunks, generates MP3 files per chunk, then merges them:

import os
import time
import logging
from pathlib import Path

from dotenv import load_dotenv
from fastapi import APIRouter, HTTPException, Request, BackgroundTasks
from fastapi.responses import JSONResponse
from pydantic import BaseModel
from openai import OpenAI
from pydub import AudioSegment

load_dotenv(".env.local")

OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
client = OpenAI(api_key=OPENAI_API_KEY)

router = APIRouter()

logging.basicConfig(
    level=logging.DEBUG,  # Set root logger to debug level
    format='%(levelname)s | %(name)s | %(message)s'
)
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)

class AudioRequest(BaseModel):
    input: str

def chunk_text(text: str, chunk_size: int = 4096):
    """
    Generator that yields `text` in chunks of `chunk_size`.
    """
    for i in range(0, len(text), chunk_size):
        yield text[i:i + chunk_size]

@router.post("/speech")
async def generate_speech(request: Request, body: AudioRequest, background_tasks: BackgroundTasks):
    """
    Fires off the TTS request in the background (fire-and-forget).
    Logs are added to track progress. No zip file is created.
    """
    model = "tts-1"
    voice = "onyx"

    if not body.input:
        raise HTTPException(
            status_code=400,
            detail="Missing required field: input"
        )

    # Current time for folder naming or logging
    timestamp = int(time.time() * 1000)

    # Create a folder for storing output
    output_folder = Path(".") / f"speech_{timestamp}"
    output_folder.mkdir(exist_ok=True)

    # Split the input into chunks
    chunks = list(chunk_text(body.input, 4096))

    # Schedule the actual speech generation in the background
    background_tasks.add_task(
        generate_audio_files,
        chunks=chunks,
        output_folder=output_folder,
        model=model,
        voice=voice,
        timestamp=timestamp
    )

    # Log and return immediately
    logger.info(f"Speech generation task started at {timestamp} with {len(chunks)} chunks.")
    return JSONResponse({"detail": f"Speech generation started. Timestamp: {timestamp}"})

def generate_audio_files(chunks, output_folder, model, voice, timestamp):
    """
    Generates audio files for each chunk. Runs in the background.
    After all chunks are created, merges them into a single MP3 file.
    """
    try:
        # Generate individual chunk MP3s
        for index, chunk in enumerate(chunks):
            speech_filename = f"speech-chunk-{index + 1}.mp3"
            speech_file_path = output_folder / speech_filename

            logger.info(f"Generating audio for chunk {index + 1}/{len(chunks)}...")

            response = client.audio.speech.create(
                model=model,
                voice=voice,
                input=chunk,
                response_format="mp3",
            )

            response.stream_to_file(speech_file_path)
            logger.info(f"Chunk {index + 1} audio saved to {speech_file_path}")

        # Merge all generated MP3 files into a single file
        logger.info("Merging all audio chunks into one file...")
        merged_audio = AudioSegment.empty()

        def file_index(file_path: Path):
            # Expects file names like 'speech-chunk-1.mp3'
            return int(file_path.stem.split('-')[-1])

        sorted_audio_files = sorted(output_folder.glob("speech-chunk-*.mp3"), key=file_index)
        for audio_file in sorted_audio_files:
            chunk_audio = AudioSegment.from_file(audio_file, format="mp3")
            merged_audio += chunk_audio

        merged_output_file = output_folder / f"speech-merged-{timestamp}.mp3"
        merged_audio.export(merged_output_file, format="mp3")
        logger.info(f"Merged audio saved to {merged_output_file}")

        logger.info(f"All speech chunks generated and merged for timestamp {timestamp}.")
    except Exception as e:
        logger.error(f"OpenAI error (timestamp {timestamp}): {e}")

Key Takeaways

AudioRequest model enforces the presence of an input field.
chunk_text ensures no chunk exceeds 4096 characters (you can adjust this size).
BackgroundTasks offloads the TTS generation so the API can respond promptly.
pydub merges MP3 files (which in turn calls FFmpeg).

3. Using FFmpeg Under the Hood

Installing pydub requires FFmpeg on your system. Ensure FFmpeg is in your PATH—otherwise you’ll get errors when merging or saving MP3 files. For Linux (Ubuntu/Debian):

sudo apt-get update
sudo apt-get install ffmpeg

For macOS (using Homebrew):

brew install ffmpeg

If you’re on Windows, install FFmpeg from FFmpeg’s official site or use a package manager like chocolatey or scoop.

4. Mermaid JS Diagram

Below is a Mermaid sequence diagram illustrating the workflow:

Explanation:

User sends a POST request with text data.
FastAPI quickly acknowledges the request, then spawns a background task.
Chunks of text are processed via OpenAI TTS, saving individual MP3 files.
pydub merges them (calling FFmpeg behind the scenes).
Final merged file is ready in your output directory.

5. Conclusion

Integrating OpenAI text-to-speech with FFmpeg via pydub in a FastAPI application provides a robust, scalable way to automate TTS pipelines:

Reliability: Chunk-based processing handles large inputs without overloading the API.
Versatility: FFmpeg’s audio manipulation potential is nearly limitless.
Speed: Background tasks ensure the main API remains responsive.

With the sample code above, you can adapt chunk sizes, add authentication, or expand the pipeline to include more sophisticated post-processing (like watermarking, crossfading, or mixing in music). Enjoy building richer audio capabilities into your apps—OpenAI and FFmpeg make a powerful duo.

How to Set Up and Run DeepSeek-R1 Locally With Ollama and FastAPI

January 30, 2025 · 5 min read

Vadim Nicolai

Senior Software Engineer at Vitrifi

Introduction

DeepSeek-R1 is a family of large language models (LLMs) known for advanced natural language capabilities. While hosting an LLM in the cloud can be convenient, local deployment provides greater control over latency, privacy, and resource utilization. Tools like Ollama simplify this process by handling model downloading and quantization. However, to truly scale or integrate these capabilities into other services, you often need a robust REST API layer—FastAPI is perfect for this.

This article covers the entire pipeline:

Installing and configuring Ollama to serve DeepSeek-R1 locally
Interacting with DeepSeek-R1 using the CLI, Python scripts, or a FastAPI endpoint for streaming responses
Demonstrating a minimal FastAPI integration, so you can easily wrap your model in a web service

By the end, you’ll see how to run DeepSeek-R1 locally while benefiting from FastAPI’s scalability, logging, and integration features—all without sending your data to external servers.

1. Why Run DeepSeek-R1 Locally?

Running DeepSeek-R1 on your own machine has multiple advantages:

Privacy & Security: No data is sent to third-party services
Performance & Low Latency: Local inference avoids remote API calls
Customization: Fine-tune or adjust inference parameters as needed
No Rate Limits: In-house solution means no usage caps or unexpected cost spikes
Offline Availability: Once downloaded, the model runs even without internet access

2. Setting Up DeepSeek-R1 Locally With Ollama

2.1 Installing Ollama

Download Ollama from the official website.
Install it on your machine, just like any application.

note

Check Ollama’s documentation for platform-specific support. It’s available on macOS and some Linux distributions.

2.2 Download and Test DeepSeek-R1

Ollama makes model retrieval simple:

ollama run deepseek-r1

This command automatically downloads DeepSeek-R1 (the default variant). If your hardware cannot handle the full 671B-parameter model, specify a smaller distilled version:

ollama run deepseek-r1:7b

info

DeepSeek-R1 offers different parameter sizes (e.g., 1.5B, 7B, 14B, 70B, 671B) for various hardware setups.

2.3 Running DeepSeek-R1 in the Background

To serve the model continuously (useful for external services like FastAPI):

ollama serve

By default, Ollama listens on http://localhost:11434.

3. Using DeepSeek-R1 Locally

3.1 Command-Line (CLI) Inference

You can chat directly with DeepSeek-R1 in your terminal:

ollama run deepseek-r1

Type a question or prompt; responses stream back in real time.

3.2 Accessing DeepSeek-R1 via API

If you’re building an application, you can call Ollama’s REST API:

curl http://localhost:11434/api/chat -d '{
  "model": "deepseek-r1",
  "messages": [{ "role": "user", "content": "Solve: 25 * 25" }],
  "stream": false
}'

note

Set "stream": true to receive chunked streaming responses—a feature you can integrate easily into web apps or server frameworks like FastAPI.

3.3 Python Integration

Install the ollama Python package:

pip install ollama

Then use:

import ollama

response = ollama.chat(
    model="deepseek-r1",
    messages=[
        {"role": "user", "content": "Explain Newton's second law of motion"},
    ],
)
print(response["message"]["content"])

4. FastAPI Integration and Streaming Responses

To wrap DeepSeek-R1 in a fully customizable FastAPI service, you can define streaming endpoints for advanced usage. Below is an example that sends chunked responses to the client:

import os
import json
from typing import List
from pydantic import BaseModel
from dotenv import load_dotenv
from fastapi import FastAPI, Query
from fastapi.responses import StreamingResponse
from openai import OpenAI

from .utils.prompt import ClientMessage, convert_to_openai_messages
from .utils.tools import get_current_weather  # example tool
from .utils.tools import available_tools  # hypothetical dict of tool funcs

load_dotenv(".env.local")

app = FastAPI()
client = OpenAI(api_key="ollama", base_url="http://localhost:11434/v1/")

class Request(BaseModel):
    messages: List[ClientMessage]

def stream_text(messages: List[ClientMessage], protocol: str = 'data'):
    stream = client.chat.completions.create(
        messages=messages,
        model="deepseek-r1",
        stream=True,
    )

    if protocol == 'text':
        for chunk in stream:
            for choice in chunk.choices:
                if choice.finish_reason == "stop":
                    break
                else:
                    yield "{text}".format(text=choice.delta.content)

    elif protocol == 'data':
        draft_tool_calls = []
        draft_tool_calls_index = -1

        for chunk in stream:
            for choice in chunk.choices:
                if choice.finish_reason == "stop":
                    continue
                elif choice.finish_reason == "tool_calls":
                    for tool_call in draft_tool_calls:
                        yield f'9:{{"toolCallId":"{tool_call["id"]}","toolName":"{tool_call["name"]}","args":{tool_call["arguments"]}}}\n'

                    for tool_call in draft_tool_calls:
                        tool_result = available_tools[tool_call["name"]](**json.loads(tool_call["arguments"]))
                        yield (
                            f'a:{{"toolCallId":"{tool_call["id"]}","toolName":"{tool_call["name"]}","args":{tool_call["arguments"]},'
                            f'"result":{json.dumps(tool_result)}}}\n'
                        )
                elif choice.delta.tool_calls:
                    for tool_call in choice.delta.tool_calls:
                        id = tool_call.id
                        name = tool_call.function.name
                        arguments = tool_call.function.arguments
                        if id is not None:
                            draft_tool_calls_index += 1
                            draft_tool_calls.append({"id": id, "name": name, "arguments": ""})
                        else:
                            draft_tool_calls[draft_tool_calls_index]["arguments"] += arguments
                else:
                    yield f'0:{json.dumps(choice.delta.content)}\n'

            # usage
            if chunk.choices == []:
                usage = chunk.usage
                prompt_tokens = usage.prompt_tokens
                completion_tokens = usage.completion_tokens
                yield (
                    f'd:{{"finishReason":"{"tool-calls" if len(draft_tool_calls) > 0 else "stop"}",'
                    f'"usage":{{"promptTokens":{prompt_tokens},"completionTokens":{completion_tokens}}}}}\n'
                )

@app.post("/api/chat")
async def handle_chat_data(request: Request, protocol: str = Query('data')):
    messages = request.messages
    openai_messages = convert_to_openai_messages(messages)
    response = StreamingResponse(stream_text(openai_messages, protocol))
    response.headers['x-vercel-ai-data-stream'] = 'v1'
    return response

Key Points:

stream=True allows the server to stream content chunk by chunk.
The code handles optional “tool calls” logic—customizable for your own environment.
FastAPI’s StreamingResponse ensures the client receives partial output in real time.

With this setup, you can embed DeepSeek-R1 into more complex microservices or orchestrate multi-step workflows that rely on streaming LLM responses.

6. Conclusion

DeepSeek-R1 combined with Ollama and FastAPI gives you a powerful local LLM service. You can handle all aspects of data ingestion, retrieval, and inference in one place—without relying on third-party endpoints or paying subscription costs. Here’s a recap:

Ollama manages downloading and serving the DeepSeek-R1 models.
FastAPI provides a flexible web layer for streaming responses or building microservices.

Build your local AI solutions confidently and privately—DeepSeek-R1 is now at your fingertips.

Introduction​

1. Why Combine OpenAI and FFmpeg​

2. FastAPI Endpoint and Background Tasks​

Key Takeaways​

3. Using FFmpeg Under the Hood​

4. Mermaid JS Diagram​

5. Conclusion​

Introduction​

1. Why Run DeepSeek-R1 Locally?​

2. Setting Up DeepSeek-R1 Locally With Ollama​

2.1 Installing Ollama​

2.2 Download and Test DeepSeek-R1​

2.3 Running DeepSeek-R1 in the Background​

3. Using DeepSeek-R1 Locally​

3.1 Command-Line (CLI) Inference​

3.2 Accessing DeepSeek-R1 via API​

3.3 Python Integration​

4. FastAPI Integration and Streaming Responses​

Key Points:​

6. Conclusion​

Introduction

1. Why Combine OpenAI and FFmpeg

2. FastAPI Endpoint and Background Tasks

Key Takeaways

3. Using FFmpeg Under the Hood

4. Mermaid JS Diagram

5. Conclusion

Introduction

1. Why Run DeepSeek-R1 Locally?

2. Setting Up DeepSeek-R1 Locally With Ollama

2.1 Installing Ollama

2.2 Download and Test DeepSeek-R1

2.3 Running DeepSeek-R1 in the Background

3. Using DeepSeek-R1 Locally

3.1 Command-Line (CLI) Inference

3.2 Accessing DeepSeek-R1 via API

3.3 Python Integration

4. FastAPI Integration and Streaming Responses

Key Points:

6. Conclusion