The real time lipsync and avatar expressions must require a lot of compute right? Also, does it go like human speech (ffmpeg) => text(whisper) => llm => response => tts (dia, eleven labs, sesame) and somehow involve the avatar in it?
submitted by /u/agamer60
[link] [comments]
from Software Development – methodologies, techniques, and tools. Covering Agile, RUP, Waterfall + more! https://ift.tt/I2VlCfY