Upwork is hiring a Expert Data Scientist in Survival analysis - Contract to Hire

Expert Data Scientist in Survival analysis - Contract to Hire

Upwork  ·  US  ·  $67k/yr - $150k/yr
over 1 year ago

Position: Expert in survival analysis projects (predict when event may happen)

(NOT FOR AGENCIES)

Company Overview:

Our company is a cutting-edge insurtech-healthtech company that is revolutionizing the way of finding frauds and diagnosis. We are dedicated to leveraging advanced machine learning, deep learning, and artificial intelligence techniques to solve complex problems in insurtech-healthtech. As we continue to grow, we are seeking a talented and experienced ML/DL/AI Expert Data Scientist to join our team and play a pivotal role in enhancing our survival analysis for the time the patient will develop medical conditions

We are looking for an hands-on Expert with proven experience in survival analysis (time-series) to help us in debugging and if needed change a current project.

In this project, we converted the regression problem into multi-classification, in order to relief on the imbalanced labels by nature. This is likely one of the problems, because in our project the binning of times are ordinal, whereas multi-class will take them as categories, which is incorrect.

We have a very good performance in train set and test set, but in the validation set, the results doesn't make a lot of sense. Also, the problem could be related to the categorization of the labels that we decided on - their sizes are different and the different intervals between visits from on person to another, may affect the results as well.

Inputs:

• The data set contains millions of visits and patients

• Patients

• Visits

o Medical conditions, procedures (we can take also the categories of these see here a link to ICD-10 Medical Conditions Categories, and Procedures PCS categories)

o Medications

o Demographics (Age, gender, we can also take Zip codes, and provider specialty – to know which doctor type they visited, it may help assessing the risk. For example, if they visited cardiologist a couple of times, maybe they are at high risk)

o Trigger list – list of medical conditions that we want to predict the time to diagnosis

o To each of these we have the time in hours attached

• Date_diff_hours = event_date – birth_date = result is in hours

Required outcomes:

• Predict the time to develop each disease, and the probability

• The models need to work in real-time environment. Real-time in this space can be a couple of hours to provide predictions is fine. However, we need to stive providing these results in the shortest time as possible.

• The time to train cannot cross the 2-3 weeks

• We need it to be done ASAP, before end of January

• Performance:

o If we use classifier or regression model, F1, then F1 higher than 0.9, if we use MSE/RMSE lower than 0.15

o survival analysis models - C-index above 0.7 and Brier Score or IBS closer to zero

Job is closed

This job is already closed and no longer accepting applicants, sorry.