Polars — A Pandas alternative — load data to mysql

giulio isernia
2 min readJan 9, 2024

--

Polars is a fast and efficient DataFrame library for Python that provides high-performance data manipulation and analysis capabilities. It is designed to handle large datasets with ease, leveraging modern hardware architectures for optimal speed. Polars supports various operations such as filtering, grouping, aggregating, and merging data, making it suitable for a wide range of data processing tasks. Additionally, it integrates well with other data science and machine learning libraries, offering a seamless workflow for data analysis in Python. Its performance is attributed to its native implementation in Rust, which enables parallel and vectorized processing. Overall, Polars is a powerful tool for working with tabular data in Python, particularly when dealing with large datasets that require efficient processing.

To install the Polars library for Python, you can use the pip package manager. Open your terminal or command prompt and execute the following command:

pip install polars

This will download and install the latest stable version of Polars in your Python environment. Once the installation is complete, you can start using the library in your Python projects. If you are working within a virtual environment, make sure to activate the virtual environment before running the installation command.

Note: Ensure that your Python environment is properly configured and has internet access to download the package from PyPI.

Now, let’s try writing some code with our library to test the loading into our database.

import polars as pl
import mysql.connector
import time
import logging

# Logger configuration
logging.basicConfig(filename='load_data_log.log', level=logging.INFO)

# Function to read the CSV file using polars
def read_csv(file_path):
start_time = time.time()
df = pl.read_csv(file_path)
end_time = time.time()
logging.info(f"File reading time: {end_time - start_time} seconds")
return df

# Function to load data into MySQL using polars to_sql
def load_to_mysql(dataframe, table_name, connection_params):
start_time = time.time()
connection = mysql.connector.connect(**connection_params)
cursor = connection.cursor()

# Load data into MySQL using polars to_sql
dataframe.write.mysql(
table_name,
host=connection_params['host'],
user=connection_params['user'],
password=connection_params['password'],
database=connection_params['database'],
create=True, # Create the table if it doesn't exist
if_exists='replace' # Replace the table if it exists
)

connection.commit()
cursor.close()
connection.close()

end_time = time.time()
logging.info(f"Database loading time: {end_time - start_time} seconds")

# Settings
file_path = 'path/to/your/file.csv'
table_name = 'your_table_name'
mysql_params = {
'host': 'your_mysql_host',
'user': 'your_mysql_user',
'password': 'your_mysql_password',
'database': 'your_mysql_database'
}

# Execute the operations
data = read_csv(file_path)
load_to_mysql(data, table_name, mysql_params)

Make sure to replace the placeholders ('your_...') with your actual information.

In a future post, we will explore and compare in detail the file loading processes using both Pandas and Polars, meticulously examining their respective performances to gain a comprehensive understanding of the differences and similarities between the two libraries.

--

--

giulio isernia

CRM and Open Source Specialist. Not so serious as my Bio. Web observer and listener. Internet=knowledge ... knowledge=freedom.