I'll keep it short for now and we'll scope it out during interview and further as we work through this.
I've '000s of PDFs every month/ quarter from which i want to extract relevant data and output them in Excel. These PDFs are basically quarterly/ annual financial result announcements by listed companies across the globe.
I'm not sure if this needs to be done with LLM (e.g. GPT) or can be done with some good Python NLP libraries. Most of the information would be in tables but some need to be harvested from text. However, there are no rules i can think of at the moment that can tell us where in the pdf those tables/ text would be located. Each pdf would be different in layout, coming from different companies.
So if you're up for a good Python/ NLP/ LLM challenge, please do apply and we'll discuss details of the project.
This job is already closed and no longer accepting applicants, sorry.