When updating large language models, we often encounter a problem: new models may not perform as well as the old ones on some specific tasks. This can be very confusing for users.
In light of this, the authors have conducted in-depth research and exploration, proposing an innovative method called MUSCLE. The uniqueness of this method lies in its ability to maintain consistency in behavior between new and old models during updates. Specifically, the authors found that after model updates, there is a phenomenon where some questions that were correctly answered by the old model are now incorrectly answered by the new model. This situation is referred to as "negative flips."
To effectively reduce the occurrence of "negative flips," the MUSCLE method specifically trains a "compatibility adapter." This adapter learns from both new and old models, striving to keep the new model consistent with the old one while continuously improving overall performance, thus providing users with more stable and reliable services.