ML Researcher

Alignment Research Center · Berkeley

berkeley potential-future-roles-arc-evals-expressions-of-interest employee on-site generative-ai

about 2 years ago

Job is closed

Introduction

The evaluations project at the Alignment Research Center is a new team building capability evaluations (and in the future, alignment evaluations) for advanced ML models. The goals of the project are to improve our understanding of what alignment danger is going to look like, understand how far away we are from dangerous AI, and create metrics that labs can make commitments around (e.g. 'If you hit capability threshold X, don't train a larger model until you've hit alignment threshold Y'). You can learn more about the project at this linked post on the Alignment Forum.

We expect to keep this posting up, but not necessarily evaluate applications actively at all times. It may be some time (months) before we get back to you.

Job Mission

- Effectively finetune LLMs to demonstrate different behaviors or capabilities

- Make informed guesses about how much a finetuning run will improve model performance at various tasks

- Plan what data we need to collect, and prioritize this given tradeoffs in ease of collecting different types of data

- Structure experiments to rapidly get information about how well our approach is working

- Anticipate and prevent costly mistakes in finetuning efforts. Create and track metrics for finetuning success.

- Keep up-to-date on the latest techniques, datasets or tools for eliciting or measuring model performance and capabilities, and ensure we're getting an accurate picture of model capabilities

- Design and run experiments in a replicable, scientific manner, producing highly trustworthy results

- Identify novel hypotheses about the capabilities and behavior of large foundation models, and design experiments and data collection strategies to evaluate them.

- Forecast future progress in ML, including how we'll need to change our evaluation strategies, and how we might incorporate future models to speed up our workflows

Skills

- Conceptual alignment thinking: help generate and evaluate ideas for how to probe alignment, deception, agency and other conceptually slippery properties.

- Dealing with the engineering details of large models

- Selecting or improving model architectures

Job is closed

This job is already closed and no longer accepting applicants, sorry.