Upwork is hiring a Machine Learning Engineer

Machine Learning Engineer

Upwork  ·  US  ·  $100k/yr - $210k/yr
almost 2 years ago

### Project ###

Read the GFlowNet [1] paper and train a GFlowNet model in a Google colab (or AWS) environment using the torchgfn library[2] on the "Flatlander" toy problem (See below). The project budget will include the time for reading and understanding the paper as well as training the model.

There are code samples in the torchgfn library that can be adapted to run on the Flatlander problem. Train a model that samples a flatlander proportional to the reward each flatlander corresponds to (a fully trained GFlowNet has this property).

There's also lots of resources to understand how GFlowNets work conceptually ([4], [5], [6]).

Once you've read the paper and trained the model, I'll set up a 1-2 hour session so you can show me how to run it and explain how it works.

A deterministic reward(flatlander) ⟶ float function will be provided, as well as a dataset of flatlander samples (3x3 nd arrays representing binary images), that have been generated via a hidden generative model.

Use the provided samples to restrict the action space of each GFlowNet node to only seen/valid actions (i.e. P(NextOnPixel | WorkingImage)) to build up a flatlander creature *one pixel* at a time.

You can identify if a node is a terminal state (i.e. fully formed flatlander) based on the reward function. The reward function is a table of all possible fully-formed flatlanders and their corresponding reward value. If an image isn't in the table then it's not fully formed.

I primarily work on AWS instances, so if it's easier for you to write reproducible code there vs a Google colab notebook, feel free to do that instead.

[1] https://yoshuabengio.org/2022/03/05/generative-flow-networks/

[2] https://github.com/GFNOrg/torchgfn

[3] https://colab.research.google.com/drive/1fUMwgu2OhYpQagpzU5mhe9_Esib3Q2VR#scrollTo=hAh5ZpTVBHRn

[4] https://colab.research.google.com/drive/1fUMwgu2OhYpQagpzU5mhe9_Esib3Q2VR#scrollTo=ZyW_qffL-iR_

[5] https://arxiv.org/pdf/2111.09266.pdf

[6] https://milayb.notion.site/The-GFlowNet-Tutorial-95434ef0e2d94c24aab90e69b30be9b3

### Flatlander Toy Problem ###

A flatlander is a 3x3 pixel image. Each pixel's value is either a 0.0 "off" pixel or 1.0 "on" pixel. A flatlander could have anywhere between 3-6 "on" pixels.

Each flatlander is generated from a hidden generative model that starts from a 3x3 image of all "off" pixels and constructs its "on" pixels one-by-one.

Each flatlander is associated with a reward. The reward is deterministic: when a flatlander is sampled twice, it always has the same reward.

The following will be provided:

- 10K samples of flatlanders (N x 3 x 3 ndarray)

- reward(flatlander) ⟶ float

### Desiderata ###

I'm looking for someone who ...

- Is excited by the idea of getting paid to learn about SOTA research topics, building a practical intuition about how they work, and training them on simple toy problems. I have other research papers I'd like to train on toy problems as well.

- Is an expert at gradient-based optimization (i.e. "Deep Learning"). You've implemented/trained/shipped models before and have the practical experience to prove it. I'm looking for someone who can level up my ability to design custom DL models. Ideally, someone comfortable with problems that involve discrete gradient-based optimization (e.g. NNs w/ a straight-through operation). The problems I'm working on have discrete actions and representations.

- Is good at distilling both the key ideas of the theory and also the salient engineering details of an implementation (latency, throughput, how training/runtime changes when switching to a new problem type or due to a change in model architecture, how complex of an environment a GFlowNet can handle).

- Is open to further consulting work on subsequent model implementations and also to advise and provide feedback on research project ideas.

- Prefers readable/adaptable code over "code golf performance" code when there is a conflict between the two. I care about minimizing human dev time (make changes ⟶ run new experiment ⟶ get feedback loop) vs minimizing computational run time. We can always speed things up later.

Job is closed

This job is already closed and no longer accepting applicants, sorry.