We develop a query engine for unstructured documents. Our vision is to be able to receive natural language
queries and execute them against arbitrary document repositories. Unlike Retrieval Augmented Generation (RAG)
however, in PERC we adopt a fundamentally different approach aiming to serve a very different purpose.
Given a natural language query, we extract a schema from the data tailored to the query, and then we populate the schema with data that
will answer the query correctly. Thus we can support analytical queries fully. We translate the question to SQL and execute the SQL query against the collection of
extracted tables. We aim to address many challenges to make this vision a reality.
These include:
- Natural language to SQL generation
- Ad-hoc Schema extraction given a query (may include multiple tables with constraints)
- Query optimization in this framework (e.g., data extraction versus code generation)
- Ability to abstain providing an answer and ask for help
- Conversation frameworks to inject the right knowledge to the query during execution and steer it towards correctness