If you like to support me financially, It is totally optional and voluntary. Buy me a coffee here: https://www.buymeacoffee.com/rithesh
Many documents contain a mixture of content types, including text and tables.
Semi-structured data can be challenging for conventional RAG for at least two reasons:
• Text splitting may break up tables, corrupting the data in retrieval
• Embedding tables may pose challenges for semantic similarity search
This video shows how to perform RAG on documents with semi-structured data:
• We will use Unstructured to parse both text and tables from documents (PDFs).
• We will use the multi-vector retriever to store raw tables, text along with table summaries better suited for retrieval.
• We will use LCEL to implement the chains used.
Colab notebook: https://colab.research.google.com/dri...
https://github.com/langchain-ai/langc...
If you like such content please subscribe to the channel here:
https://www.youtube.com/c/RitheshSree...