Stable Diffusion Based Virtual Try-On Inference Endpoint Deployment
Background and Context
Project Objective: Deploy a virtual try-on model based on stable diffusion for online inference while minimising costs.
Current Infrastructure: Our main cloud provider is AWS but we are also set up with GCP and Azure. We are flexible towards other solutions based on cost and ease-of-use.
Model Details: Model is a modified version of Stable Diffusion 1.5, with a different conditioning network and a fine-tuned u-net (weights will be compatible with original stable-diffusion-inpainting model after appropriate conversion). It uses the Sd-vae-ft-mse-original vae, which can be found on huggingface. We also use an img2img upscaling step with a different fine-tuned stable diffusion model available on huggingface. This should also be set up for online inference. We have code which deals with the diffusion pipeline so the solution could just create inference endpoints for the unet, conditioning network and vae so that this code can connect to those rather than models loaded via torch.load. We are open to other solutions e.g. wrapping the whole pipeline in a single inference endpoint if that makes technical sense.
Technical Requirements
Programming Language: Python 3.10 if applicable
Cloud Platform: AWS preferred, open to other suggestions
Model Framework: PyTorch - can be converted by you to other formats if necessary
File Types: .pt files for the conditioning network and u-net will be provided. We can also provide original .ckpt files if necessary
Tasks to be Performed
Infrastructure Setup: Configure the cloud-based online inference environment.
Model Integration: Integrate VAE, conditioning network, and unet with the inference system.
Optimization: Ensure the model can handle online/on-demand inference without compromising accuracy or speed.
Testing: Conduct comprehensive testing to validate the deployment for real-world usage.
Documentation: Create detailed documentation for the setup and operation.
Deliverables
A functional online inference system deployed on the cloud which operates with minimal latency.
Source code with comments if applicable.
Documentation detailing the architecture, installation steps, and operational guidelines.
Skills and Experience Required
Expertise in MLOps, with hands-on experience in deploying machine learning models in real-time settings.
Strong knowledge of AWS services or other cloud platforms.
Familiarity with Stable Diffusion models and generative networks.