Voice-change, Lip-sync, Text-to-speech, Music/Audio tools for projects

mindsong · June 2019

A place to collect thoughts, tips, and tools on voice-changing, lip-sync, text-to-speech, and ausio processing for our projects (esp. animations).

Add your own tips, links, resources, etc. below, or send them to me and we'll maintain a knowledgebase here...

Because of the overlap between tools and workflows, I'd say that anything you know of in this domain is useful, be it Carrara-specific, or using any one of the standard Carrara plugins (Poser, DS, Iclone, etc. :)

Standalone utils and scripts are always relevant as well. Anything related to making a 3D mouth move, or generating the sound that goes with it is welcomed.

--ms

mindsong · June 2019

Voice Changing tools:

These are standalone tools that work in batch or realtime to alter incoming sound streams (usually voice) from one frequency range and/or timbre to another - e.g. male to female, or child to adult, etc.

Most also have settings that can be used to produce cartoony or robotic voices as well. To my experience, most of the outputs end up sounding a bit synthetic, but if you are willing to 'bend' your own voice on the microphone toward the target sound (e.g male trying to sound more female), and use the various application's sound adjustments with restraint, some pretty compelling outputs can be produced, and presets can be saved for these settings. The results have no copyright constraints (assuming the inputs aren't copyrighted...). Once a preset is saved, some of these tools allow for batch conversion, allowing for consistent pre-recording and conversion of full animation voice sequences, using multiple characters.

The inputs and outputs can generally be standard soundfiles (WAVE, MP3, AAC, etc.), that can come from microphones, sound-files, audio-streams, etc., then be processed (in many ways) and used with lip-sync tools for our 3D efforts, and also inserted into the final video-edits.

Any products/tools mentioned below are NOT endorsed, but simply available, and I have no affiliation with any of these products or companies other than possibly being an owner/user. YMMV

Commercial Voice Changing Software

Product: Screaming Bee's Morphvox Series

Source: https://screamingbee.com

Platform(s): Windows

Notes: Free and Payfor versions for both realtime and batch voice conversion. Some good multi-voice and script-writing utilities as well. Presets available for realistic and cartoon/fantasy voices.

Product: Audio4Fun Voice Changer Series

Source: https://www.audio4fun.com/voice-changer.htm

Platform(s): Windows

Notes: Various versions for realtime and batch conversion. Presets available for realistic and cartoon/fantasy voices.

mindsong · June 2019

text-to-speech (TTS) and speech-to-text:

These are tools that attempt to convert text to audio, and audio to text.

In our 3D domain (esp. animation), text to speech is probably the most relevant, as sound would typically be the most useful end-product. That said, any tool that lets us rework our data in ways that let's creative folks work toward their/our target goals will enable creative spirits. At any rate, all tools and techniques are welcomed and encouraged.

Almost all mainstream computer environments have basic text-to-speech capabilities built in - usually as a tool to support users with disabilities, etc. Similarly, speech-to-text is also available in the form of Apple's 'SIRI' and Microsoft's 'Cortana'.

As inexpensive computing capacity becomes available these tools are becoming increasingly sophisticated in that they're quickly becoming more sensitive to linguistic and idiomatic differences, but this also adds to the complexity of using these tools.

As we return (technologically) to our story-telling roots, these tools will become more prolific, capable, and interesting to uus in our creative endeavors.

Text-to-speech tools:

Microsoft WIndows 'Text-to-speech' (built-in, with extensions):

Speech-to-text tools:

Apple's SIRI - native to current MacOS/IOS devices

Microsoft's Cortana - native to WIndows devices

Nuance: Dragon Dictate Series: https://www.nuance.com/dragon.html

IBM's speech-to-text: https://www.ibm.com/watson/services/speech-to-text/

Google's text-to-speech: https://cloud.google.com/text-to-speech/ - from interesting thread here

related (google) from REIVAX

deepspeech google https://github.com/mozilla/DeepSpeech
you can find alpha release here https://github.com/mozilla/DeepSpeech/releases

DNA Software (almost all Japanese) free TTS application: http://dnasoft.web.fc2.com/soft/texttowav/index.html (from the same discussion thread above)

mindsong · June 2019

Lip-Sync Resources:

mcjaudiomation by MCasual: (free, but donate!) https://sites.google.com/site/mcasualsdazscripts2/mcjaudiomation DAZ/Poser Animation controlled by sound file contents. This little gem creates Poser-style PZ2 animation streams (tied to any figure sliders you like), based on the ongoing energy levels in sound files. While the examples in the documentation maps sounds to VU meters, lights, speaker movement, etc. It can also drive a cartoon mouth or emotion sliders with elegance. Windows w/ DS scripts.

DAZ Inc. sound-to-motion mapping tools for lip-sync:

These work with any figures that have an available '*.DMC' viseme/slider mapping files. Most DAZ figures have some form of DMC file available, and many non-DAZ figures have some that are available on sites like sharecg.com.

Mimic Pro for Carrara: microphone input to figure viseme (defined mouth shape) motions.

Mimic Live: (DAZ Studio, but can be exported to Carrara) microphone input to figure viseme (defined mouth shape) motions. Windows?

Mimic Lite: No longer available 'lite' version of the standalone Mimic Pro utility (also no longer available? toolfarm.com?) for Poser/DAZ figures Windows

Mimic Pro: No longer available standalone sound to viseme mapping tool. Exports PZ2 files for conversion/import to other tools. Last known to be available at www.toolfarm.com. Windows

DAZ Studio 4.x 32bit - 'lip-sync' (built-in plugin) - only found in the 32-bit versions of DAZ Studio (a Carrara Plugin :), this plugin leverages the early DAZ lip-sync tool libraries to enable sound-to-viseme mapping in DAZ figures that have so-called DMC mapping files available. Results can be exported as PZ2 Pose presets, or duf files for importing into Carrara.

Papagayo lip-sync tool - http://www.lostmarble.com/papagayo/ and python version: https://morevnaproject.org/papagayo-ng/ - don't know much about this one, but it's been around for a long time and might be useful in your workflow. Outputs to Moho (2D animation tool), and Blender. MacOS and Windows. Update: It looks like a DS script has been written to import papagayo outputs to DS availble at sharecg: Papagayo to DS Importer. Forum thread with instructions: https://www.daz3d.com/forums/discussion/336526/alternative-audio-based-lipsinc-for-daz-studio

Relevant Links (forums/discussions/tutorials - anything that can eventually be used in Carrara):

https://www.daz3d.com/forums/discussion/336526/alternative-audio-based-lipsinc-for-daz-studio

which mentions:

https://www.daz3d.com/forums/discussion/336526/alternative-audio-based-lipsinc-for-daz-studio
https://www.sharecg.com/v/88621/view/8/Script/Script-For-Importer-Files-Lip-Sync-in-DAZ-Studio

mindsong · June 2019

Other Audio Tools (music, MIDI, sound editors, DAWs, video-sound, etc.):

Sound Editors: (there are zillions of these, but a few stand out for popularity/price/etc.)

Audacity - Free/Open Source sound editor: https://www.audacityteam.org/
Mature full-featured sound recorder, editor, and conversion tool.
Windows, MacOS, and Linux

Magix Music Maker (and other DAWs): https://www.magix.com/us/music/ - Free base software with lo-rez loops, payfor add-on instruments and hi-rez sound loop collections. Kind of like DAZ Studio for music SW/Content model. Note that there are both personal-use and commercial-use licenses limitations to these sound loops with prices to match...! Works like AniBlocks or NLA blocks but with sounds (MIDI and sound samples).

IK Multimedia's series, especially 'SampleTank' : https://www.ikmultimedia.com/ - Full-range of sample/MIDI composition tools with beginner->pro versions and sound sample collections for sale. I believe these samples are all assumed to be used as professional commercial outputs. (any know otherwise?)

Music Notation and lyrics to audio/sound files:

Myriad Software - Musical Notation to audio tools: https://www.myriad-online.com/en/products/virtualsinger.htm

REIVAX · June 2019

Hello mindsong

one thing fun in vb script this read reel time computer. save this txt in xxxx.vbs

Dim texte, lecture
Set lecture=CreateObject("sapi.spvoice")
texte="Il est "& time()
lecture.speak texte

clik on

Ps sorry this in french

and now one speech to txt. you can try its free

https://www.ibm.com/watson/services/speech-to-text/

and one virtual singer. creator are french. but the soft is in many langage and win32 64 /mac

https://www.myriad-online.com/en/products/virtualsinger.htm

Persona Non Grata · June 2019

@REIVAX : Virtualsinger - la voix de Stephen Hawking, qui chante "Strangers in the Night" - c'est bien drole !

I've never heard of virtual singer - but it seems like a lot of fun! The example of Strangers in the night sounds like Stephen Hawking!

WendyLuvsCatz · June 2019

Selinita said:

@REIVAX : Virtualsinger - la voix de Stephen Hawking, qui chante "Strangers in the Night" - c'est bien drole !

I've never heard of virtual singer - but it seems like a lot of fun! The example of Strangers in the night sounds like Stephen Hawking!

I have used virtual singer in Myriad Melody Assistant for probably 10 years

but not as frequently now as tend to do music without lyrics

Persona Non Grata · June 2019

Here's a fun one to try...

MUSIC: g(4)¦c(8)b(8)f(8)g(8)a(4)f(4)¦e(4)f(4)c(4)e(4)¦f(8)e(8)d(4)a(4)g(8)a(8)¦e(2+4)
WORDS: I ¦have to do a lit-tle ¦house-work ba-by, ¦when I feel an-gry at ¦you.

¦ - barline
(4) crotchet, quarter note
(8) quaver, eighth note
(2+4) dotted minim, 3 beats

WendyLuvsCatz · June 2019

Selinita said:

Here's a fun one to try...

MUSIC: g(4)¦c(8)b(8)f(8)g(8)a(4)f(4)¦e(4)f(4)c(4)e(4)¦f(8)e(8)d(4)a(4)g(8)a(8)¦e(2+4)
WORDS: I ¦have to do a lit-tle ¦house-work ba-by, ¦when I feel an-gry at ¦you.

¦ - barline
(4) crotchet, quarter note
(8) quaver, eighth note
(2+4) dotted minim, 3 beats

If that was a music score I could sing it sight reading but letters and numbers I would really have to think about it, not on my PC right now, I guess most people used to piano rolls and numbers now, I have to use bar lines being raised with it learning piano from 7yo just cannot cope with other DAW software at all.

Persona Non Grata · June 2019

The music is Troika from Lieutenant Kije

WendyLuvsCatz · June 2019

An early video

I literally have hundreds of them BTW

A more recent one

Persona Non Grata · June 2019

That's fab, Wendy - couldn't pick out a single word of what was being 'sung' but liked the note sliding. Maybe 'Mmmmmmm' would work better?

Persona Non Grata · June 2019

Like the Booty Fall Doll - very arty !

Persona Non Grata · June 2019

I downloaded the trial and here's my first attempt - straight out the box, seems pretty easy to use...

Persona Non Grata · June 2019

Now with piano accompaniment (takes me back to A' Level Music where we had to harmonise [Bach] Chorals) and continuing lyrics...

mindsong · June 2019

Thanks to all for the great inputs already. It looks like this thread is already striking a chord...

I'll try to coordinate the contents in the TOC/header notes as time goes on.

neato!

--ms

REIVAX · June 2019

hello all

Speech breathing

the pdf

http://www.arishapiro.com/SpeechBreathingwithstudy_VHCIE2019.pdf

perhaps you don't know dance from arishapiro ; dance character animation and simulation.
with physics

http://www.arishapiro.com/dance/

WendyLuvsCatz · June 2019

REIVAX said:

hello all

Speech breathing

the pdf

http://www.arishapiro.com/SpeechBreathingwithstudy_VHCIE2019.pdf

Somebody give the man a virtual inhaler

mindsong · June 2019

REIVAX said:

hello all

Speech breathing

the pdf

http://www.arishapiro.com/SpeechBreathingwithstudy_VHCIE2019.pdf

perhaps you don't know dance from arishapiro ; dance character animation and simulation.
with physics

http://www.arishapiro.com/dance/

Interesting when pointed out... I haven't thought about it, but it does have an impact on the continuity/realism of the speech.

Someone pointed out that humans generally blink before they change/redirect their gaze. Once you notice it, it's kind of distracting when you see it everywhere...

cool links!

--ms

REIVAX · June 2019

hi wendy

it use python pygubu

mindsong · June 2019

I added this tool to the reference posts (above), but it bears explicit mention:

MCasual, our local DS freebie script hero, wrote some scripts and a sound analysis utility (windows) that binds soundfile characteristics (energy levels) to arbitrary poser/DAZ sliders of any sort. It's called 'mcjaudiomation' (free, but donate!) from https://sites.google.com/site/mcasualsdazscripts2/mcjaudiomation.

This little gem creates Poser-style PZ2 animation streams (tied to any figure sliders you like), based on the ongoing energy levels in sound files. While the examples in the documentation map ties sounds to things like VU meters, lights, speaker movement, etc. It can also drive a figure's mouth or emotion sliders with a certain elegance - works really well for cartoon vocals.

The results can be imported into Carrara or be used in any other workflow that starts with PZ2 'pose preset' or duf files. I have mimic-pro, which does sophisticated sound analysis to map visemes and the like, and I find that this far more basic approach works pretty darned well in comparison.

I presume it could be used by someone to drive any motion by simply making well-times sounds (vocal or otherwise), to drive arbitrary sliders. E.g. saying "tick tock tick tock" to control a clock pendulum, etc.

--ms

Mistara · October 2019

after all this time funally bought my microphone. went usb

Blue Microphones - Snowball USB Cardioid and Omnidirectional Electret Condenser Vocal Microphone

ahem mee mee mee maa maa maa moh moh moh muuu muu moo
doh ray mee fah soh lah tee dohh

WendyLuvsCatz · October 2019

Windows 10 has piles of Text to speech voices you can get for Narrator including Australian accents but you can only access 4 of them through any other apps ie Balabolka, iClone etc without a registry hack, two American and British English ones (each gender)

the registry hack scares me too much to try

there is another hack for Cortana too

https://www.ghacks.net/2018/08/11/unlock-all-windows-10-tts-voices-system-wide-to-get-more-of-them/

for now I have used Narrator

prepared text

Audacity set to use stereomix as my microphone and my non existent motherboard sound output to playback as I have the Nvidia one on my monitor speaker

Dartanbeck · October 2019

Cool stuff!

It's been a long time since I've used Mimic Pro for Carrara, and even longer since using Mimic Pro (standalone) for Poser.

The later creates PZ2 animated pose (or is it the Face files?) with the sound injected into it, so when you apply the pose (or is it expression FC2?), the sound comes with it. Pretty cool. It's also a workshop for tweaking the visemes, expressions, and extra motion to your liking before writing the final file.

The thing that I really love about Mimic Pro for Carrara (besides the fact that it works directly in Carrara and works really well) is that we can create our own viseme shapes as NLA poses individually for any give character - so I can make Rosie talk like Rosie (visually), Dart talk like... well... me, the bad guy talk like the bad guy, etc.,

Okay, after saying all of that, I am eager to try mCasual's plugin!

Mistara · October 2019

seeing there is an audacity pro version undecided on it

WendyLuvsCatz · October 2019

Mystarra said:

seeing there is an audacity pro version undecided on it

Don't

its a rip off like the people who sell a version of Blender 3D

it opensource software

https://en.wikipedia.org/wiki/Audacity_(audio_editor)

Mistara · October 2019

the feature i really need is the remove background noise.

my place has noisy refridgerator.

the stores don't plug in the refrigerators, can't hear them before buying.

Mistara · October 2019

SadKitty_Carrara said:

Windows 10 has piles of Text to speech voices you can get for Narrator including Australian accents but you can only access 4 of them through any other apps ie Balabolka, iClone etc without a registry hack, two American and British English ones (each gender)

the registry hack scares me too much to try

there is another hack for Cortana too

https://www.ghacks.net/2018/08/11/unlock-all-windows-10-tts-voices-system-wide-to-get-more-of-them/

for now I have used Narrator

prepared text

Audacity set to use stereomix as my microphone and my non existent motherboard sound output to playback as I have the Nvidia one on my monitor speaker

would love to give my actors Australian accents

Mistara · October 2019

what kind of accent would be good for a minotaur?

TangoAlpha · October 2019

Greek?

Notifications

Voice-change, Lip-sync, Text-to-speech, Music/Audio tools for projects

Comments

Speech breathing

Speech breathing

Speech breathing

Adding to Cart…