First tokens: The Achilles’ heel of LLMs

The Assistant Prefill feature available in many LLMs can leave models vulnerable to safety alignment bypasses (aka jailbreaking). This article builds on prior research to investigate the practical aspects of prefill security.

Read more