AI Engineer - Google Cloud Vertex AI Expert (Model Fine-Tuning)

Upwork · US · $210k/yr - $520k/yr

contractor temporary hourly-rate remote generative-ai

about 2 years ago

Job is closed

Hi developers, we are looking for an experienced AI engineer who is highly proficient in Google Cloud Vertex AI for a model fine-tuning project. Before proceeding to the next stage in discussing the job scopes, we want to evaluate your expertise by presenting a challenging question. Your answer will be reviewed by our technical team to assess your familiarity with Google Cloud Vertex AI.

The Challenges:

Let's say our project involves the deployment of a model from Model Garden in Vertex AI, using our custom Docker image. We attempted to execute requests in two different forms: sequential and parallel.

In the sequential execution, our requests ran successfully, and we were able to run as many requests as we put in the for loop without encountering issues.

However, when executing requests in parallel, all sent simultaneously to the endpoint from our Python script, we encountered varying errors depending on the number of requests executed concurrently. There are two specific conditions:

1. When executing 1 to 10 parallel requests simultaneously, our Python script successfully sent all requests to the endpoint. According to the error logs we examined in the Vertex AI Endpoint Log Explorer page, all 10 requests were executed and completed on time. However, in the responses we received from the Python script, we only received success responses for the first 1-3 requests (the exact number varied). The remaining requests produced the following error:

HTTP Error: 503 Server Error: Service Unavailable for url: https://us-central1-aiplatform.googleapis.com/v1/projects/XXX/locations/us-central1/endpoints/XXX:predict

It's perplexing that while we see successful generation in the Vertex AI Endpoint Log Explorer, the responses are not being sent back for the remaining requests, and this issue persists.

2. When attempting to execute a large number of requests simultaneously, such as more than 300 requests, we encountered a different error. The requests were not successfully sent to the endpoint, and they do not appear in our log explorer. The error message is as follows:

Error connecting: HTTPSConnectionPool(host='us-central1-aiplatform.googleapis.com', port=443): Max retries exceeded with url: /v1/projects/XXX/locations/us-central1/endpoints/XXX:predict (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1007)')))

Please respond to this job post with your detailed explanation of the scenario and the recommended approach to address the 2 challenges outlined above. We are looking for a clear and technically sound response that demonstrates your expertise with Google Cloud Vertex AI. Your answer will be carefully reviewed by our technical team.

If you are interested in this project and have the necessary skills and experience, please submit your cover letter with your best-proposed solution by writing "HEY JAY!" at the start of your cover letter.

Thank you in advance for your valuable insights, and we look forward to finding the right candidate to help us overcome this challenge.

Job is closed

This job is already closed and no longer accepting applicants, sorry.