Meta unveiled a batch of innovative new AI technologies that promise to push the boundaries of what’s possible. The tech giant’s research division, FAIR, lifted the lid on multi-modal models, faster language training, music generation 2.0, speech detection upgrades, and efforts to increase diversity in AI.
FAIR has quietly revolutionized AI for over a decade through collaboration. But this release signals Meta is ready to share its most promising work yet. “By publicly sharing this research, we hope to inspire iterations and ultimately help advance AI in a responsible way,” a Meta spokesperson teased.
First up – Chameleon, Meta’s multi-talented text and image juggernaut. Unlike AI assistants confined to words or pictures, Chameleon thrives in both mediums. “Just as humans can process the words and images simultaneously, Chameleon can process and deliver both image and text at the same time,” Meta explained.
Next, multi-token prediction makes language model training way more efficient. Traditional training plods word-by-word, but these revamped models foresee entire phrases at once. “It requires several orders of magnitude more text than what children need to learn the same degree of language fluency,” Meta challenged.
On the beat, JASCO upgrades music creation with customization. Past text-to-tune toys rely on lyrics alone, but JASCO jams with chord progressions and rhythms too. “JASCO is capable of accepting various inputs, such as chords or beat, to improve control over generated music outputs,” Meta declared.
AudioSeal also dropped – the first audio watermarking system that unmasks AI-generated speech from real recordings. It excels where others fail, pinpointing fake clips up to 485 times faster.
Finally, Meta addressed calls for diversity by quantifying geographic bias. After a 65,000+ person study, it open-sourced indicators and annotations so all AI generates globally-aware images.
1 thought on “Meta’s AI Playground: Where Text, Images, and Music Collide”