Parsing HAR File

I am working with a new application that doesn’t seem to like when a person has multiple roles assigned to them … however, I first need to prove that is the problem. Luckily, your browser gets the SAML response and you can actually see the Role entitlements that are being sent. Just need to parse them out of the big 80 meg file that a simple “go here and log on” generates!

To gather data to be parsed, open the Dev Tools for the browser tab. Click the settings gear icon and select “Persist Logs”. Reproduce the scenario – navigate to the site, log in. Then save the dev tools session as a HAR file. The following Python script will analyze the file, extract any SAML response tokens, and print them in a human-readable format.

################################################################################
# This script reads a HAR file, identifies HTTP requests and responses containing
# SAML tokens, and decodes "SAMLResponse" values.
#
# The decoded SAML assertions are printed out for inspection in a readable format.
#
# Usage:
# - Update the str_har_file_path with your HAR file
################################################################################
# Editable Variables
str_har_file_path = 'SumoLogin.har'

# Imports
import json
import base64
import urllib.parse
from xml.dom.minidom import parseString

################################################################################
#  This function decodes SAML responses found within the HAR capture
# Args: 
#   saml_response_encoded(str): URL encoded, base-64 encoded SAML response
# Returns:
#   string: decoded string
################################################################################
def decode_saml_response(saml_response_encoded):
    url_decoded = urllib.parse.unquote(saml_response_encoded)
    base64_decoded = base64.b64decode(url_decoded).decode('utf-8')
    return base64_decoded

################################################################################
#  This function finds and decodes SAML tokens from HAR entries.
#
# Args:
#   entries(list): A list of HTTP request and response entries from a HAR file.
#
# Returns:
#   list: List of decoded SAML assertion response strings.
################################################################################
def find_saml_tokens(entries):
    saml_tokens = []
    for entry in entries:
        request = entry['request']
        response = entry['response']
        
        if request['method'] == 'POST':
            request_body = request.get('postData', {}).get('text', '')
            
            if 'SAMLResponse=' in request_body:
                saml_response_encoded = request_body.split('SAMLResponse=')[1].split('&')[0]
                saml_tokens.append(decode_saml_response(saml_response_encoded))
        
        response_body = response.get('content', {}).get('text', '')
        
        if response.get('content', {}).get('encoding') == 'base64':
            response_body = base64.b64decode(response_body).decode('utf-8', errors='ignore')
        
        if 'SAMLResponse=' in response_body:
            saml_response_encoded = response_body.split('SAMLResponse=')[1].split('&')[0]
            saml_tokens.append(decode_saml_response(saml_response_encoded))
    
    return saml_tokens

################################################################################
#  This function converts XML string to an XML dom object formatted with
# multiple lines with heirarchital indentations
#
# Args:
#   xml_string (str): The XML string to be pretty-printed.
#
# Returns:
#   dom: A pretty-printed version of the XML string.
################################################################################
def pretty_print_xml(xml_string):
    dom = parseString(xml_string)
    return dom.toprettyxml(indent="  ")

# Load HAR file with UTF-8 encoding
with open(str_har_file_path, 'r', encoding='utf-8') as file:
    har_data = json.load(file)

entries = har_data['log']['entries']

saml_tokens = find_saml_tokens(entries)
for token in saml_tokens:
    print("Decoded SAML Token:")
    print(pretty_print_xml(token))
    print('-' * 80)

Leave a Reply

Your email address will not be published. Required fields are marked *