Python Tutorials

Python CSV & JSON : how to parse, read, and write CSV and JSON files in Python

You have learned how to read and write basic text files, but what happens when you need to process a massive spreadsheet exported from Excel? Or what if you want to pull live stock market data from a web API?

In the real world, data rarely comes as a simple, unstructured paragraph of text. It comes highly structured. The two undisputed kings of structured data formats are CSV (used heavily by businesses, Excel, and databases) and JSON (used universally by web servers, mobile apps, and APIs).

To become a professional Python developer or data scientist, you must know how to seamlessly extract, manipulate, and save data using these two formats. Fortunately, Python provides incredibly powerful, built-in tools to make this seamless.

Working with CSV & JSON Data

  • CSV (Comma-Separated Values): A plain text file format that uses a comma to separate values. It is the universal standard for representing tabular data (rows and columns) like spreadsheets. Handled via Python’s built-in csv module.
  • JSON (JavaScript Object Notation): A lightweight, human-readable format for storing and transporting nested data. It looks nearly identical to a Python Dictionary. Handled via Python’s built-in json module.

Syntax & Basic Usage

Because csv and json are part of the Python Standard Library, you do not need to install anything. You simply import them at the top of your script and combine them with the with open() context manager we learned previously.

import csv
import json

print("CSV and JSON modules successfully imported and ready to use!")

# Expected Output:
# CSV and JSON modules successfully imported and ready to use!

Code language: PHP (php)

Python CSV and JSON Methods and Function Arguments

Let’s explore every major method used to parse, read, and write both CSV and JSON data. We will start with CSVs (spreadsheets) and move on to JSON (web data).

1. Writing to a CSV (csv.writer)

To create a spreadsheet, we open a file in write ('w') mode and use csv.writer(). The writer takes a list of data and formats it perfectly with commas.

import csv

# We use newline='' to prevent Windows from adding blank rows between entries
with open("employees.csv", "w", newline="", encoding="utf-8") as csv_file:
    # 1. Create a writer object
    data_writer = csv.writer(csv_file)
    
    # 2. Write the header row
    data_writer.writerow(["Name", "Department", "Salary"])
    
    # 3. Write multiple rows of data at once using writerows()
    employee_data = [
        ["Alice", "Engineering", 85000],
        ["Bob", "Sales", 60000],
        ["Charlie", "HR", 55000]
    ]
    data_writer.writerows(employee_data)

print("CSV file created successfully!")

# Expected Output:
# CSV file created successfully!
# (A file named 'employees.csv' now exists with structured spreadsheet data)

Code language: PHP (php)

2. Reading a CSV (csv.reader)

To read a standard CSV file, we use csv.reader(). This converts every row of the spreadsheet into a standard Python List.

import csv

with open("employees.csv", "r", encoding="utf-8") as csv_file:
    # 1. Create a reader object
    data_reader = csv.reader(csv_file)
    
    # 2. Iterate through every row
    for row in data_reader:
        print(row)

# Expected Output:
# ['Name', 'Department', 'Salary']
# ['Alice', 'Engineering', '85000']
# ['Bob', 'Sales', '60000']
# ['Charlie', 'HR', '55000']

Code language: PHP (php)

3. The Pythonic Way: csv.DictReader and csv.DictWriter

Working with lists (e.g., row[2] to get the salary) is error-prone. What if the columns change order? Professional developers prefer DictReader and DictWriter. These treat the first row of your CSV as “keys” and turn every subsequent row into a Dictionary!

import csv

# Reading the CSV as Dictionaries
with open("employees.csv", "r", encoding="utf-8") as csv_file:
    # DictReader automatically uses the first row as the dictionary keys
    dictionary_reader = csv.DictReader(csv_file)
    
    for row in dictionary_reader:
        # We can now extract data using clean, readable column names!
        name = row["Name"]
        salary = row["Salary"]
        print(f"{name} earns ${salary}.")

# Expected Output:
# Alice earns $85000.
# Bob earns $60000.
# Charlie earns $55000.

Code language: PHP (php)

4. Parsing JSON Strings (json.loads and json.dumps)

JSON data frequently arrives from the web as a massive, continuous string of text.

  • json.loads() (Load String): Converts a JSON text string into a Python Dictionary.
  • json.dumps() (Dump String): Converts a Python Dictionary back into a JSON text string.
import json

# Simulated JSON string response from a weather API
api_response_string = '{"city": "Seattle", "temperature": 68, "is_raining": false}'

# 1. Parse the string into a usable Python Dictionary (json.loads)
weather_data = json.loads(api_response_string)

print(f"City: {weather_data['city']}")
print(f"Type of weather_data: {type(weather_data)}")

# 2. Convert a dictionary back into a JSON string (json.dumps)
# We use 'indent=4' to make the string beautifully formatted and readable
formatted_json_string = json.dumps(weather_data, indent=4)

print("\n--- Formatted JSON String ---")
print(formatted_json_string)

# Expected Output:
# City: Seattle
# Type of weather_data: <class 'dict'>
#
# --- Formatted JSON String ---
# {
#     "city": "Seattle",
#     "temperature": 68,
#     "is_raining": false
# }

Code language: PHP (php)

5. Writing and Reading JSON Files (json.dump and json.load)

If you want to save or load JSON directly to/from a file (instead of a string), you drop the “s” from the method name. Use json.dump() and json.load().

import json

game_settings = {
    "volume": 75,
    "difficulty": "Hard",
    "subtitles_enabled": True
}

# 1. Save the dictionary directly to a physical JSON file
with open("config.json", "w", encoding="utf-8") as json_file:
    json.dump(game_settings, json_file, indent=4)

# 2. Read the physical JSON file back into a dictionary
with open("config.json", "r", encoding="utf-8") as json_file:
    loaded_settings = json.load(json_file)

print(f"Loaded difficulty setting: {loaded_settings['difficulty']}")

# Expected Output:
# Loaded difficulty setting: Hard
# (A file named 'config.json' was created and read successfully)

Code language: PHP (php)

Real-World Practical Examples

Scenario 1: Processing an E-Commerce Spreadsheet (CSV)

Imagine we receive a daily CSV file of sales, and we need to write a script that calculates the total revenue from all “Completed” orders.

import csv

# Step 1: Create dummy sales data for our example
with open("sales.csv", "w", newline="", encoding="utf-8") as file:
    writer = csv.writer(file)
    writer.writerow(["Order_ID", "Status", "Amount"])
    writer.writerow(["1001", "Completed", "45.50"])
    writer.writerow(["1002", "Pending", "120.00"])
    writer.writerow(["1003", "Completed", "85.25"])

# Step 2: The actual processing script
total_revenue = 0.0

with open("sales.csv", "r", encoding="utf-8") as file:
    reader = csv.DictReader(file)
    
    for order in reader:
        # Check if the status is completed
        if order["Status"] == "Completed":
            # CSV data is ALWAYS extracted as strings. We must convert Amount to a float!
            order_value = float(order["Amount"])
            total_revenue += order_value

print(f"Total revenue from completed orders: ${total_revenue:.2f}")

# Expected Output:
# Total revenue from completed orders: $130.75

Code language: PHP (php)

Scenario 2: Parsing Complex API Data (JSON)

APIs usually return heavily nested JSON data. Here, we simulate fetching a list of users from a web server and extracting specific nested data.

import json

# Simulated complex JSON response from a web server
server_response = """
{
    "status": "success",
    "data": {
        "users": [
            {"id": 1, "name": "Alice", "skills": ["Python", "SQL"]},
            {"id": 2, "name": "Bob", "skills": ["Design", "CSS"]},
            {"id": 3, "name": "Charlie", "skills": ["Python", "AWS"]}
        ]
    }
}
"""

# Parse the JSON string into a Python dictionary
parsed_response = json.loads(server_response)

# Ensure the server returned a success status before processing
if parsed_response.get("status") == "success":
    # Drill down through the nested dictionaries to get the user list
    user_list = parsed_response["data"]["users"]
    
    print("--- Users with Python Skills ---")
    for user in user_list:
        # Check if 'Python' is inside the user's skills list
        if "Python" in user["skills"]:
            print(f"Match found: {user['name']} (ID: {user['id']})")

# Expected Output:
# --- Users with Python Skills ---
# Match found: Alice (ID: 1)
# Match found: Charlie (ID: 3)

Code language: PHP (php)

Best Practices & Common Pitfalls

  • The newline='' Pitfall (CSV): When writing CSV files on Windows, Python will accidentally insert a blank, empty row between every single data row. Always include newline="" inside your open() function when writing CSVs to prevent this formatting bug.
  • Strings vs. Numbers (CSV): A CSV file is literally just a text file. When you extract a number using csv.reader, Python pulls it out as a String (e.g., "85000"). If you want to do math with it, you must cast it to an integer or float first (int(row["Salary"])). JSON, however, preserves data types natively (booleans remain booleans, numbers remain numbers).
  • The load vs loads Confusion: The biggest mistake beginners make with JSON is using the wrong function.
    • If you are working with a string variable, use load**s**() and dump**s**(). (The ‘s’ stands for string).
    • If you are working with a physical file object directly, use load() and dump().
  • Use indent=4 for Readability: When saving JSON data using dumps() or dump(), always pass the indent=4 parameter. Without it, your JSON file will be saved as one massive, unreadable, single-line block of text.

Summary

  • CSV files are used for tabular spreadsheet data. Use the csv module to process them.
  • Prefer csv.DictReader and csv.DictWriter to process CSV rows as dictionaries, making your code immune to column-order changes.
  • JSON files are used for structured, nested web data. Use the json module to process them.
  • Use json.loads() to convert a JSON text string into a Python dictionary, and json.dumps() to convert a dictionary back into a JSON string.
  • Use json.load() and json.dump() when reading from or saving directly to physical files.

Leave a Comment