This is mathematics and reinforcement learning. Need to change agent policy by updating it without the reward function. Have two papers we are currently using , one paper talks about updating policy with rewards function(rewards rational paper) and other talks about updating without the reward function (constructive preference paper ) our goal is to change the equations of each feedback in the rationale paper to the one like CPL paper, omitting the reward function. I have starting working on it already, the third paper attached (formalism and feedback) I have stated the feedback and equations with reward function, which needs to be changed without the reward functions
This job is already closed and no longer accepting applicants, sorry.