Upwork is hiring a Machine Learning Engineer for PDF Data Extraction and Structuring

Machine Learning Engineer for PDF Data Extraction and Structuring

Upwork  ·  US  ·  $31k/yr - $85k/yr
almost 2 years ago

We are seeking an experienced Machine Learning Engineer to develop an algorithm capable of reading unstructured PDFs for IC components. The aim is to extract pinout (pin description) tables and other relevant data from IC datasheets and convert it into a structured schema. Example datasheets are attached.

This will involve OCR for text extraction, and applying text analysis tools to recognize key elements like tables, sections, and specific terms, as well as data categorization and structuring methods.

The development may also involve fine-tuning of a local LLM based on the structured data to accurately interpret and respond to queries by interacting with the database.

Use of GPT4 or similar is not permitted. This will be for a locally hosted application.

Responsibilities:

Data Analysis: Understand the typical format of integrated circuit datasheets to identify key data points.

Algorithm Design: Develop an ML algorithm that can read and understand unstructured PDFs.

Data Extraction: Implement techniques to extract tables, texts, and other required data fields.

Data Structuring: Convert the extracted data into a structured schema for further use.

Testing: Rigorously test the algorithm to ensure high accuracy and reliability.

Skills Required:

Strong experience with Machine Learning algorithms, particularly in Natural Language Processing.

Proficiency in Python and libraries such as TensorFlow, PyTorch, or scikit-learn.

Experience with PDF data extraction tools like PyPDF2 or pdfplumber.

Strong understanding of data structuring and database management.

Excellent problem-solving skills.

Experience with version control systems like Git.

Strong documentation and communication skills.

Experience with AutoVizFlow toolset

Selection Criteria:

Relevant work experience and examples.

Technical interview.

Shortlisted candidates may be asked to perform a test task.

Job is closed

This job is already closed and no longer accepting applicants, sorry.