What happens when we train the largest vision-language model and add in robot experiences?
The result is PaLM-E 🌴🤖, a 562-billion parameter, general-purpose, embodied visual-language generalist – across robotics, vision, and language.
Website: palm-e.github.io pic.twitter.com/5qfK23g52d |