Predicted Aligned Error (PAE)

Tutorial to Generate Predicted Aligned Error (PAE) Plots from AlphaFold Output Files

What is PAE?

The Predicted Aligned Error (PAE) is a critical output from AlphaFold that estimates the pairwise confidence of residue positions within a predicted protein structure. Unlike pLDDT, which provides confidence scores for individual residues, the PAE matrix gives insight into the relative accuracy between all pairs of residues. This is particularly useful for assessing domain packing, identifying flexible regions, or validating inter-domain orientations in complex proteins.

How AlphaFold Produces .pkl Files?

When AlphaFold predicts protein structures, it outputs several result files, including .pkl files (e.g., result_model_1_pred_0.pkl). These files contain a nested dictionary of various data arrays, including:

• plddt: Per-residue confidence scores (0–100).

• distogram: Pairwise distance probabilities.

• predicted_aligned_error: A matrix of pairwise alignment errors, essential for PAE visualization (only generated when using pTM models like monomer_ptm or multimer).

The PAE matrix is stored in the predicted_aligned_error key of the .pkl file and can be visualized to evaluate confidence in relative residue positioning.

Steps to Generate PAE Plots:

1. Ensure you have Python installed with required libraries: numpy, pickle, os, and matplotlib.

2. Run the script below with the folder containing your AlphaFold .pkl files as input.

3. The script will extract the PAE matrix and save it as high-resolution images.

Script to Generate PAE Plots

Below is a Python script that takes AlphaFold-generated .pkl files as input and generates PAE plots.

###Save below code as PKL2PAE.py ###

import os
import pickle
import numpy as np
import matplotlib.pyplot as plt

# Path to the AlphaFold output directory containing the .pkl files
result_folder = "path_to_your_pkl_files"  # Update with the directory containing .pkl files

# Path to save the PAE plots
output_dir = os.path.join(result_folder, "PAE_Plots")
os.makedirs(output_dir, exist_ok=True)  # Create the output directory if it doesn't exist

# Function to plot the PAE matrix
def plot_pae(pae_matrix, model_name, output_path):
    plt.figure(figsize=(8, 6))
    plt.imshow(pae_matrix, cmap="bwr", interpolation="nearest", vmin=0, vmax=30)
    plt.colorbar(label="Predicted Aligned Error (Å)")
    plt.title(f"Predicted Aligned Error (PAE) - {model_name}")
    plt.xlabel("Residue Index")
    plt.ylabel("Residue Index")
    plt.savefig(output_path, dpi=300)
    plt.close()

# Process each result_model_*_pred_0.pkl file
for file in os.listdir(result_folder):
    if file.startswith("result_model") and file.endswith(".pkl"):
        file_path = os.path.join(result_folder, file)

        # Load the .pkl file
        with open(file_path, "rb") as f:
            data = pickle.load(f)

        # Check if the "predicted_aligned_error" key exists
        if "predicted_aligned_error" in data:
            pae_matrix = np.array(data["predicted_aligned_error"])  # Extract the PAE matrix
            model_name = os.path.splitext(file)[0]  # Use the filename as the model name

            # Define the output path for the PAE plot
            output_path = os.path.join(output_dir, f"{model_name}_pae_plot.png")

            # Plot and save the PAE matrix
            plot_pae(pae_matrix, model_name, output_path)
            print(f"PAE plot saved: {output_path}")
        else:
            print(f"No PAE data found in: {file}")

print("PAE plot generation completed.")

###end###

Running the Script:
	1.	Replace path_to_your_pkl_files with the folder containing your AlphaFold .pkl files.
	2.	Run the script:

python pae_plot_generator.py

Expected Output:
	•	A PAE_Plots folder will be created in the specified directory.
	•	High-resolution PAE plots (e.g., result_model_1_pred_0_pae_plot.png) will be saved for each .pkl file.

Interpretation of PAE Plots:
	•	Blue regions indicate high confidence (low error) between residue pairs.
	•	Red regions indicate low confidence (high error), suggesting flexible or poorly aligned regions.

This tutorial enables users to visualize PAE matrices from AlphaFold outputs, helping to validate structural predictions and identify areas of uncertainty.

Search This Blog

BioDataScience

Predicted Aligned Error (PAE)

Comments

Post a Comment