Abstract: Adapting large foundation segmentation models such as SAM2 to video object segmentation (VOS), especially for long sequences, is often limited by the high training cost of full fine-tuning.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results