Discussion about this post

User's avatar
Felix Choussat's avatar

"They could also increase the willingness of governments to take costly action on lower-probability or further-future events. Government actors are often reluctant to agree to costly safeguards for risks they view as unlikely or far in the future. If-then commitments could make such agreements more feasible by ensuring that obligations only become burdensome if credible evidence of the relevant danger actually emerges. By structuring commitments so that meaningful action is triggered only when predefined indicators are met, states may be more willing to take costlier action in preparation for risks that are low-probability or longer-term (categories that might substantially overlap in practice)."

I think the main limitation of this approach is that it is reactive, and that there might be levels of AI capability/distribution where empirical evidence of danger only emerges after it's already too late.

For example, misalignment might only appear after the misaligned AI is confident it has a decisive strategic advantage over the rest of humanity (if you endorse the sharp left turn model). The only ways to avoid this are to either slow your own AI project or sabotage those abroad preemptively, both of which carry large political costs. You might also get similar problems with infosec requirements: if they're only imposed after the final model is tested and proven dangerous, there might be months during this testing or near the end of the training run where the model is vulnerable to theft despite being dual-use.

On the other hand, I think it might function quite well for some proliferation risks, since the choice to proliferate a powerful model necessarily happens after that model is developed and its capabilities become empirical. Modulo certain scenarios like self-exfiltration, the existence of these commitments would force states to examine the offense-defense balance of a given model's capabilities instead of leaving that decision in the hands of private labs.

No posts

Ready for more?