1754577939... mp4
(2.43 MB, 1080x1920 h264)
>>/66363/
Ye, really impressive.
To make the character coherent I need a better detailed prompt with the description and probably a LORA trained with the character.
After that, there are controlnets with bone position estimation, and facial expression extraction, the current theorical limit of the Wan 2.1 model with all of that is this:
https://youtube.com/watch?v=HiDmMB5uiZY
Its not unattainable, but its not as good as Runway Act 2. Pay2win here certainly has an edge.
https://youtube.com/watch?v=JW8PHlFD7HM