Surrogacy simplified is a chatbot that helps users understand complex legal documents by breaking them down into sections and answering questions about them. The chatbot uses two main components: section parsing and document content inquiry.
Section parsing is the process of displaying the document by sections and allowing the user to summarize or explain each paragraph in plain words. This helps the user grasp the main idea and the structure of the document. To implement section parsing, we use a natural language processing (NLP) model that can identify the boundaries and headings of each section in the document. We also use a summarization model that can generate a short summary of each paragraph based on the user’s input.
Document content inquiry is the process of answering the user’s questions or prompts about the document in a natural and conversational way. This helps the user clarify any doubts or ambiguities in the document. To implement document content inquiry, we use a combination of vector embeddings, database queries, and chatGPT models.
Vector embeddings are numerical representations of text that capture its semantic meaning and similarity. We use a vector embedding model that can generate a 1536-dimensional vector for any chunk of text. We split the entire document into chunks of approximately equal length and store them in Supabase, a cloud database service, along with their vector embeddings.
Whenever the user inputs a question or a prompt in the chat, we rephrase it with the entire chat history into a singleton question that has clearer information/context and avoids ambiguity. We also generate the vector embedding of that singleton question. Then we query the database based on the similarity (distance between vectors) between the question embedding and chunk embedding. This returns the most relevant chunks of text from the document that can answer the question.
After that, we input all the chat (with history context) and the document information by text chunks into the chatGPT model, a generative pre-trained transformer model that can generate natural language responses. The chatGPT model provides a response to the user’s input based on its understanding of the document and the conversation.
The benefits of this approach are:
It allows users to upload documents of any size and have a clearer and detailed/granularized visualization. It allows users to inquire the chatbot with consistent/continuous conversation context, from user’s message or bot’s message. It leverages state-of-the-art NLP models to provide accurate and natural responses to complex legal documents.