4 Comments
User's avatar
Dario's avatar

I enjoyed reading your article and found it to be quite insightful. I've been experiencing similar issues with RAG. Currently, I'm trying to optimize the formatting of PDFs to create a vectorized document, which I believe will make it more efficient for the LLMS model to process the responses. However, I haven't yet found a reliable way to leverage the LLM model itself to generate the formatted file. Typically, the output truncates the document, and I'm dealing with PDFs that range from 25 to 35 pages.

Expand full comment
Adam Gospodarczyk's avatar

> Typically, the output truncates the document, and I'm dealing with PDFs that range from 25 to 35 pages.

It's because the output limit tokens are rarely mentioned by the providers, and it's usually very low.

I mainly deal with it by splitting the content based on headers or, at the very least, paragraphs of text (I work with markdown so it's easily detectable).

There's also the option to use an additional prompt when the API stops completing because the 'reason' indicates a limit has been reached. In that case, the prompt might get some of the generated text and decide whether to continue it or not.

Expand full comment
Dario's avatar

thanks for the advice, Adam... Let me know if I can help you in something in the future. I definitely will try that method

Expand full comment
Work-Work Balance's avatar

For structured output from LLMs you should have used https://github.com/boundaryml/baml to make your life easier.

Also, why do you send the entire conversation back and forth to the LLMs, instead of the last things added?

Expand full comment