The PressW team developed an AI system based around a RAG (Retrieval Augmented Generation) chatbot. RAG is a technique where based on the question being asked, pulls back only the most relevant specific data from the massive dataset we have, to answer. This technique helps mitigates hallucinations (more on that later) and creates a more accurate system. This was critical for an application like this where we were dealing with a large quantity of data and looking for pinpoint information throughout it. The system we created was fully managed and tailored to each school district. All each district has to do is upload all their documentation and then our data pipelines automatically turn the information into queryable data for our chatbot.
We tested out a few different techniques in order to create a reliable system that consistently and accurately gives responses for questions such as “Who do I contact for ordering new educational materials?” and “What is the football schedule this year?”. One of the many challenges that we faced was that the data we had and the user's questions don't match all of the time. An example here is if a teacher is asking about the football schedule, the actual document section we need doesn't have the words "football" or "schedule", just dates and the schools that are being played against. We were able to overcome this by utilizing the parent-document retrieval and query expansion techniques (teaser: blog post incoming!) which allowed us to increase the search area for information while keeping the focus narrow.
Trust and Hallucinations
A key constraint that we had for this project was to gain and maintain the trust of the school district staff and to avoid as many hallucinations as possible. In order to mitigate as many hallucinations as possible, we added guardrails to short-circuit the AI to using a different model and prompting structure if there were not any relevant documents retrieved from the system. This allows us to deterministically create responses to guide users to different data sources if we weren’t able to find an answer and to avoid hallucinating responses to questions, even if the response may be generally correct.
Initial Results
The system launched on August 6th with a pilot of 700 users within a local school district in Texas. The feedback was overwhelmingly positive from all of the beta testers and staff that are making use of the beta. It’s also being trialed with state agencies and other school districts around the state with all positive feedback from the initial client! Over 1600 questions were answered within the first two weeks of the system going into beta with the largest category of user queries being various school-related schedules, deadlines, events, and compensation details.

