We are seeking an expert in machine learning and data processing to help us match products from our main products table (containing over 7.5 million entries) to our consolidated products table. The goal is to refine our product data, identify potential new products, and update our master table accordingly.
Tasks include:
Data Pre-processing:
- Normalize and clean text columns (e.g., converting to lowercase, removing special characters).
- Impute or handle missing values.
Feature Engineering:
- Compute similarity scores between products from both tables for relevant columns like "brand," "name," "category," etc.
- Utilize string-matching algorithms and possibly word embeddings for semantic similarity.
Entity Resolution:
- Link products from the main table to entries in the consolidated table using either:
+ A threshold-based approach for similarity scores.
+ A supervised machine learning model leveraging labelled examples.
Identification of Potential New Products:
- Highlight products with low similarity scores to all entries in the consolidated table for potential addition or review.
Updating the Master Table:
- Populate linkages and add new entries to the consolidated table.
Evaluation:
- Verify the accuracy of matches against available ground truth or through manual review.
Iterative Refinement:
- Based on findings, refine steps to enhance accuracy.
Skills/Experience Required:
- Strong experience in Python and related data processing libraries (pandas, scikit-learn, etc.).
- Experience with string-matching algorithms and techniques.
- Familiarity with entity resolution or data deduplication projects.
- Strong communication skills and the ability to work iteratively based on project findings.
To Apply:
Please submit your application with:
- Your experience in similar projects.
- A brief strategy or approach based on the given tasks.
- Any questions or clarifications you might have about the project.
This job is already closed and no longer accepting applicants, sorry.