Python — Counting rows and columns in a large csv (or txt)
Let’s try to count number of rows in a large csv file. Python can help us with this easy code:
import csv
import time
filename = "myfile.csv" # your CSV or TXT file
start_time = time.time()
with open(filename, "r") as file:
reader = csv.DictReader(file)
num_rows = sum(1 for row in reader)
end_time = time.time()
execution_time = end_time - start_time
print(f"file {filename} contains {num_rows} rows(with header).")
print(f"Execution time: {execution_time:.4f} seconds.")
We use the csv.DictReader function and not csv.reader. This function assumes that the first line of the CSV file contains the list of column headers. The DictReader function creates a dictionary for each row of the file, where the keys are the column headers and the values are the row values. Since the first line is treated as a header, the num_rows variable will hold the total number of lines in the file, including the first line of the header.
This is the result with an 870 Mb file:
file myfile.csv contains 526542 rows(with header).
Execution time: 3.6944 seconds.
On the same file now we want to count number of columns. Python did the trick
import csv
import time
filename = "myfile.csv" # Enter the name of the CSV or TXT file to read here
delimiter = ";" # Enter the field separator used in the CSV file here
start_time = time.time()
with open(filename, "r") as file:
reader = csv.DictReader(file, delimiter=delimiter)
num_rows = 0
num_cols = len(reader.fieldnames)
for row in reader:
num_rows += 1
end_time = time.time()
execution_time = end_time - start_time
print(f"The file {filename} contains {num_rows} rows (including the header) and {num_cols} columns.")
print(f"Execution time: {execution_time:.4f} seconds.")
result:
The file myfile.csv contains 526542 rows (including the header) and 280 columns.
Execution time: 10.0990 seconds.
Enjoy!!