In the municipal bond market, the process of analyzing offering documents is a complex, time-consuming endeavor requiring significant manual effort to evaluate bond issuance terms. With intricate legalese and a myriad of important data points to consider, the underwriter’s job is far from easy. In this post, I’ll share my journey as a tech lead in developing a Windows Presentation Foundation (WPF) client application that leverages .NET and OpenAI to simplify this painstaking process and offer underwriters a streamlined solution.
Architecture and Design Choices
The core challenge we faced was the nature of bond offering documents themselves—dense legal texts filled with essential but hard-to-extract information. Manually sifting through these documents to pinpoint the basic tenets of a security offering is extremely time-intensive and nuanced. The underwriter’s time is valuable, and the existing solutions are either too slow or lack the required precision.
As the tech lead on the project, my role was to ensure a scalable and efficient architecture. We opted for a technology stack consisting of WPF for its rich UI capabilities, and .NET on the backend for its compatibility, extensive libraries, and seamless backend operations. The crux of our application, however, relied on OpenAI’s natural language processing (NLP) capabilities to parse through the complex language of municipal bond documents.
One of the major technical hurdles we encountered revolved around the management of chunks, a block of tokens comprised of text characters, for queries. Given the verbose nature of municipal bond documents, and the context token size limitations of OpenAI API, we often found ourselves hitting the limit. This was a critical issue because in many cases, breaking the document into smaller pieces risks losing the semantic connections between different sections.
To overcome this, we implemented a two-fold strategy:
Text Segmentation: We developed an algorithm to identify logical breakpoints within the documents. This helped us divide the content into smaller, manageable chunks, while still maintaining their contextual relevance. By doing so, we were able to feed these chunks into the OpenAI API without hitting the size limitations.
Context Aggregation: After obtaining the parsed information from each chunk, we faced the challenge of aggregating this data in a way that maintains the document’s original meaning. We used a combination of keyword matching and semantic analysis to relate the output from one chunk to another, essentially stitching the document back together in a coherent manner.
By tackling these challenges through rounds of trial and error, we delivered a robust solution that navigates the intricacies of municipal bond documents while staying within API limitations.
Attaining 100% accuracy in parsing municipal bond offering documents was an ambitious goal. We have achieved a 70-80% accuracy rate in phase 1 and set a target of 90% accuracy in future versions. We are also looking at integrating additional data analytics tools (AWS Bedrock) to provide a more comprehensive solution for underwriters.
Developing a specialized application for parsing municipal bond offering documents was a journey filled with challenges and learning. While automation and AI can greatly reduce the workload, we found that the human element in interpreting unstructured, legal documents cannot be entirely replaced—at least not yet.