Skip to main content

Automating Financial Data Collection and Uploading to Hugging Face for Algorithmic Trading

· 6 min read
Vadim Nicolai
Senior Software Engineer at Vitrifi

Introduction

In the fast-paced world of algorithmic trading, accessing reliable and timely financial data is essential for backtesting strategies, optimizing models, and making data-driven trading decisions. Automating data collection can streamline your workflow and ensure that you have access to the most recent market information. In this guide, we’ll walk through how to automate the collection of stock data using Python and yfinance, and how to upload this data to Hugging Face for convenient access and future use.

Although this article uses NVIDIA stock data as an example, the process is applicable to any publicly traded company or financial instrument. By integrating data collection and storage into one automated pipeline, traders and analysts can focus on what matters most—developing strategies and maximizing returns.

Step 1: Collecting Financial Data

The first step in algorithmic trading is collecting historical financial data. In this use case, we will retrieve stock data using yfinance and save it to a CSV file for further analysis. This can be done for any stock, ETF, or index available on Yahoo Finance.

Data Collection Script

Below is the Python script for collecting stock data, which can easily be adapted for any ticker symbol.

import yfinance as yf
import pandas as pd
import os

# Define the stock symbol and date range for data collection
TICKER_SYMBOL = 'NVDA' # Example: NVIDIA
START_DATE = '2020-01-01' # Start date
END_DATE = '2024-09-28' # End date

# Define the file path to save the collected data
OUTPUT_FILE = 'financial_data.csv'

def collect_data(ticker_symbol, start_date, end_date, output_file):
"""Fetch and save stock data using yfinance."""

# Fetch historical market data using yfinance
print(f"Fetching data for {ticker_symbol} from {start_date} to {end_date}...")
stock_data = yf.download(ticker_symbol, start=start_date, end=end_date)

if stock_data.empty:
print("No data fetched. Please check the symbol or date range.")
return

# Save the data to a CSV file
stock_data.to_csv(output_file)
print(f"Data saved to {output_file}")

if __name__ == "__main__":
# Check if the data file already exists
if not os.path.exists(OUTPUT_FILE):
collect_data(TICKER_SYMBOL, START_DATE, END_DATE, OUTPUT_FILE)
else:
print(f"{OUTPUT_FILE} already exists. Delete it if you want to fetch fresh data.")

Explanation

  • yfinance fetches daily OHLCV (open, high, low, close, volume) data for a specific stock symbol between the provided date range.
  • The data is saved in a CSV format to ensure it can be accessed and processed later.
  • This script is designed to avoid overwriting the data unless explicitly requested, preventing redundant API calls.

How to Run the Script

python data_collection_script.py

This script will generate a CSV file containing the financial data, such as stock prices, ready for backtesting and analysis. You can change the TICKER_SYMBOL to fetch data for different stocks, ETFs, or other financial instruments.

Step 2: Uploading Data to Hugging Face

Once you have your dataset, storing it in a shared location like Hugging Face is ideal for easy access and collaboration. Hugging Face's dataset repositories allow you to upload and share your data publicly or privately.

Upload Script

This Python script automates the process of uploading your data to Hugging Face.

import os
from dotenv import load_dotenv
from huggingface_hub import HfApi

# Load Hugging Face token from the environment variable
load_dotenv(".env.local")
huggingface_token = os.getenv("HUGGINGFACE_TOKEN")

# Instantiate the Hugging Face API
api = HfApi()

# Repository information
repo_name = "financial-data-collection" # Example repository name
username = "your_username" # Your Hugging Face username

# Create the repository (if it does not already exist)
try:
api.create_repo(repo_id=f"{username}/{repo_name}", repo_type="dataset", exist_ok=True)
print(f"Repository {repo_name} created successfully or already exists.")
except Exception as e:
print(f"Error creating repository: {e}")

# Define the local path to the data file
data_file = "financial_data.csv" # Local data file

# Upload the file to Hugging Face repository
try:
api.upload_file(
path_or_fileobj=data_file, # Path to your dataset file
path_in_repo="financial_data.csv", # Name of the file in the repository
repo_id=f"{username}/{repo_name}", # Repository ID on Hugging Face
repo_type="dataset", # Type of repository
token=huggingface_token # Hugging Face token for authentication
)
print(f"File {data_file} uploaded successfully.")
except Exception as e:
print(f"Error uploading file: {e}")

How It Works

  • The dotenv library loads the Hugging Face token from a .env.local file to securely authenticate API requests.
  • The script creates a repository if one doesn’t exist, making it easier to manage datasets.
  • Finally, the stock data CSV is uploaded to Hugging Face, where it is publicly accessible or private depending on your repository settings.

Running the Upload Script

Ensure your .env.local file contains the Hugging Face token:

HUGGINGFACE_TOKEN=your_huggingface_token

Then, run the script:

python huggingface_upload.py

This will upload your financial dataset to Hugging Face, where it can be shared with collaborators or accessed for further analysis.

Why Automate Financial Data Collection?

Automating the collection and storage of financial data allows for:

  1. Up-to-date Data: By automating this process, traders and analysts can ensure that they always have access to the latest stock market data, without manually pulling it.
  2. Efficient Backtesting: Having a reliable source of historical data enables traders to backtest and refine their trading strategies based on real-world performance.
  3. Collaboration: Sharing datasets via Hugging Face allows multiple stakeholders to access the same data, making collaboration on algorithmic trading strategies seamless.

In algorithmic trading, the ability to access and utilize real-time and historical financial data can significantly impact the performance of a strategy. Automating this pipeline eliminates manual intervention and ensures your data is always up-to-date and readily available.

Conclusion

Building a pipeline to automate the collection and upload of financial data is essential for any serious algorithmic trading strategy. This guide demonstrated how to fetch stock data using Python's yfinance library, and then upload it to Hugging Face using their API. By leveraging these tools, traders can streamline their data collection processes, allowing them to focus on refining and optimizing their trading strategies.

With this automated pipeline in place, you can collect data on any financial instrument, making it easy to update datasets regularly, share them with your team, and ensure that your models are trained and tested with the most accurate and current information.


This article was inspired by insights shared in a post titled Collecting Data for Backtesting Your Trading Strategies. It emphasizes the importance of high-quality data for systematic trading, a core principle that remains vital in the realm of algorithmic finance.