Create and setup your IBM Watson API services
As defined in the ingredients section, it is assumed that you have an IBM Cloud account. Sign in to your account and select [Catalog] [Watson]. The services that we are interested in are highlighted below:
(As you can see, there are many more Watson API services that you can investigate and integrate for version 2.0)
Select the [Speech to Text] service and select [Create]:
Once created, we need to take a copy of the [Service Credentials] as we’ll need them within the Unity3D app:
Now, repeat the same thing for the [Text to Speech] service:
Finally, we need to create a [Watson Assistant] service:
Now, we have all the [Service Credentials] that we shall need to include from the Unity3D Watson SDK.
For the sake of this article, we shall create some quick and simple conversation Intent/Entity and Dialog flows within the [Watson Assistant] service.
To perform this, we need to click on [Manage] and then click on [Open tool]:
Then select the [Workspaces] tab from the Intoduction screen.
This will show the tiles of Workspaces.Â Create a new Workspace (or re-use an existing one).Â I am using a pre-existing Workspace:
We need to click on [View Details] in order to get the Workspace Id (that we need in order to connect to this Workspace Id):
Once we have that value, we can click on the Workspace to see the Intents/Entities/Dialogs:
As you can see, I have some pre-setup Intents (some copied from the [Content Catalog]), but for this recipe, I just setup the [#General_Jokes] Intent
I setup 17 examples, which is a reasonable amount of examples for an Intent.
I also setup some Entities, I’ll include them here just to show the [@startConversation], as you’ll see that in the [Dialog] tab shortly:
Switch to the [Dialog] tab and by default you should have a [Welcome] node.Â This is the initial node that is executed when you first call the [Watson Assistant] API.Â As you can see below, this is the first node that gets executed on an initial connection [conversation_start] and will respond with the text “HAL 9000 initiated and awaiting your command” and shall wait at this point for the users input:
We shall create a new top-level node and link it to the [#General_Jokes] intent, therefore, if the #General_Jokes intent is identified and triggered it shall follow this node and into it’s child nodes, but first, it shall return back the response to the user “Seriously, you want me to tell you a joke?” and wait for a response from the user:
If the user responds with “Yes” (or something positive) then we shall respond with a “random” response, that happens to be a joke (I didn’t say they were quality jokes….but you can change that).Â (Take note the <waitX> tags within the responses, we’ll come back to that later on)
We create a child node response for a non-Yes response, here we’re just taking any non-Yes response and catering for it rather than an exact “no” response (but you can modify that if you like).Â As you can see, if you respond with a non-Tes response, we just respond with “Okay, not a problem, no funny time for you then”
That’s enough and pretty much all we really need to setup within the [Watson Assistant] service for now – you can extend it much more as you see fit.
After you’ve downloaded and installed Unity, when you start you’ll be given the option to create [New] or [Open] an existing project.Â Select [New].
Select [Add Asset Package], you need to add [IBM Watson Unity SDK], [SALSA With RandomEyes (assuming you have purchased it via the unity asset store)] and [UMA 2 – Unity Multipurpose Avatar]:
You will then be presented with an “empty” Unity3D project, like so:
Follow the instructions as defined on the SALSA website here
You need to go to the SALSA website and download the UMA_DCS asset, select [Assets], [Import Package], [Custom Package] and select the .unitypackage file that you downloaded:
This will then give you access to the Example/Scenes [SalsaUmaSync 1-Click Setup] – this has the GameObjects pre-selected and setup for us to use out-of-the-box:
Double-click this scene to open it in the Unity IDE.
Click on the [Main Camera] and make sure that the [Audio Listener] is selected (this is the Microphone input):
Just make sure that this has a tick so that it is active.
All that we shall add extra is a Canvas/Text GameObject, like so:
This is purely so that we can output to the screen what has been detected by the [Audio Listener] and then converted via the [Speech to Text] service.
Ensure that you have all of the Components added to your GameObject like so:
As we’ll be enhancing this, we will add a new folder called [Scripts] and we shall add a new file called [WatsonTTS.cs]
As you can see, in the Inspector view we can add the [Service Credentials] from the IBM Watson API services that we captured earlier.
We see a [Preview] of the file is you single-click the file, if you double-click then it will then open in the editor you have defined.Â I have defined to use the Mono-Develop IDE, as we shall see in the next step.
The one modification that I have made is to add extra Wardrobe items to the UMA character, to do this you do the following:
I changed the [Active Race] to Female and added the [Wardrobe Recipes]:
One last modification is to change where the UMA avatar is viewed from the Camera perspective, so that we can zoom into just the head of the avatar:
By default, the UMA character will have the “Locomotion” animation assigned to it, which makes it look about randomly, which is a little distracting – if I had more time, I would customise this to be a smaller range, we’ll do that for version 2.0.Â For now, we’ll just remove the animation:
We have not covered the content of the [WatsonTTS.cs] file yet, but once you’ve created the content and you press the [>] run button you will see your 3d Digital Human, like so:
Due to using the SALSA plug-in, the Avatar will automatically move it’s eyes, blink and when it speaks it will perform Lip-syncing to the words that are spoken.
Not bad for spending less than 30 minutes getting this far!Â Imagine what you could do in “hours” or “days”, refining and getting to know more about the UMA modelling etc… as I say, I’ve used the out-the-box most basic example here, so I could focus on the integration to the IBM Watson API services, but I will return to refining and enhancing the UMA and SALSA setup and configuration.
YES! it was commented that the above female avatar looked a bit scary! so I switched her for the male avatar – very simple to do.Â I repeated the same exercise of adding Pants / T-Shirt / Haircut and eyebrows and in minutes we now have this avatar:
Okay, still not 100% perfect, but pretty good for something this quick and easy – we can iron out the finer details once we get it all working together.
Explanation of the WatsonTTS.cs C# file used to control everything
The code that was used as a baseline is already included within the Watson/Examples/ServiceExamples/Scripts folder .cs scripts.
As mentioned in the previous step, we shall create a new C# script file with the following contents.
To start with, we need to include all the libraries that we shall be using.Â Then you’ll notice that we have field declarations for the Watson APIs that we recorded earlier, we’ll set them like this so we don’t have to hard-code them into the .cs file.
You’ll also notice that we have private variables declared that we’ll use within the script.
As we do not hardcode the Watson API values in the .cs script, you have to insert the values within the Unity IDE itself, like so:
Now,bBack to the C# coding. The structure of a .cs file for Unity is to have a Start() method that is executed as an initialiser and an Update() method that is executed every frame (if you’ve ever coded for an Arduino, then it’s a very similar setup).
The Start() method uses the credentials defined in the IDE and the Watson SDK to prepare the objects for later usage.
In the second part, we execute the code to make an initial connection to the Watson Assistant service, just passing the text “first hello” and the results will be returned to the OnCONVMessage callback method.
As you can see the object “response” is passed to this method and this will contain the JSON response from the Watson Assistant service.
In the response, we are passed the “context” variable, we shall copy this to the local _context variable so that we can pass this as an input each time we make a call to the Watson Assistant service to keep track of the “context” values of the conversation.
You can also see above, that we extract the output:text JSON value as this contains the text that is returned by the Watson Assistant Dialog node.
Just as an example, I have left in some custom action tags that are contained within the Dialog node response.Â As you can see above, we can detect these action tags within the conversation text itself and replace these with the values that the Text to Speech API service requires.Â The reason for these break pauses will become clearer later on.Â We store the text to be converted into the global variable TTS_content.
As you can then see, we set the play variable to true.Â This will then get picked up on the next cycle of the Update() method.
As you can see the first check we make in the Update() method is to check the value of the play variable.Â Why do we do this?Â Well….if we are going to call the Text to Speech service and play the speech to the user, we need to stop the Microphone from listening otherwise we’ll get into a self-talking avatar this is speaking and listening to itself.Â Not what we want.Â We want to play the message and when finished, we want to start listening for the users input via the microphone.
There’s probably a better way to do it from within Unity, but I found that the above code worked for me.Â We perform a check (we set the variable value in another method as you’ll see shortly) and we countdown the length of time of the clip that is being played.Â This way, we can then determine when the Avatar has finished speaking / playing the clip and then start listening via the microphone again.
Going back to the check on the play variable – if we look previously, at the end of the onCONVMessage() callback method we set play to true, so this will call the GetTTS() method.
The GetTTS() method calls the Watson Text to Speech API, the only thing we’re setting here is the voice to use and we pass the TTS_Content variable that contains the text to convert.Â The callback will go to the HandleToSpeechCalback() method.
As you can see the clip object is returned and we assign this to the Audio Source and Play() the clip.Â Here, we set the wait variable to the length of the clip and set the check variable to true – again we use these values within the Update() method.
Going back up the file, we have the OOTB content from the sample files for the Speech to Text.Â As you can seeÂ
As you can see above, when the method StartRecording() is executed is will call the RecordHandling() method as shown below:
This starts the microhpone listening and takes the captured speech and streams the content to the Speech to Text service.
As you are speaking, the Speech to Text service will attempt to convert the text “live” and show the output to the Canvas text variable on the screen.
Once the speech has finished (the result is determined to be .final rather than .interim), we take that text and call the Watson Assistant API via the Watson SDK, passing the Input text and the Context variable (as this is the 2nd+ conversation call, we need to keep passing the growing Context variable value)
That does seem like quite a lot, but it is actually pretty simple and does exactly what it is required to do.Â Next we’ll see what it actually does.
Preview and Debug within Unity
This is what your Unity IDE screen should now look like if you are viewing the “Scene” tab and have the “SALSA_UMA2_DCS” GameObject selected:
Â As you can see, I have the Active Race now set to [HumanMaleDCS] and I have added some Wardrobe Recipes from the examples folder.
When you press the [>] Run button, the Avatar will be displayed in the “scene” window within the IDE and you will see the Debug.Log() output values displayed underneath.Â This is where you can keep track of what is going on within the .cs code:
As you can see I have output when the “play” variable is set to true, this will trigger the action in th Update() method.Â This is actually where the Speech for the welcome/greeting message is happening.Â The output with “Server state is listening” is where the Speech has finished and the Microphone is now active and listening.Â The “[DEBUG] tell me a joke” output is showing me what the Text-to-Speech service recognised and will then be passing to the Watson Assistant service.Â As I say, this is a good way to see the output of each step and to analyse the information in more detail.Â If you select a line in the DEBUG output, you will see there is a small window at the bottom of the panel that shows you more indepth information – this is really useful for reviewing the contents of the JSON messages passed back and forth.
If you wish to “see” your avatar outside of the Unity IDE environment, then from the File menu, select Build Settings:
Here you will need to press the [Add Open Scenes] if your scene is not in the list initially.Â You then select [PC, Mac & Linux Standalone] and select the Target Platform you wish to output for.Â You can then press [Build] and it will output, in this case for Mac, to a .app application that you can run by double-clicking on it and it will start up the Unity player and your avatar will initiate and you can start talking and communicating as much or as little as you like!
If you select [Player Settings…] you will see in the IDE Inspector on the right, there are more specific details that you can set about the output of the .app itself, you can change the Splash Image, the amount of RAM allocated to the app, your specific details etc…etc…
Running the app from a Mac
I made a few minor settings changes that I want to raise here – as I’m sure if you are following through this, you would have got this far and thought, “But, when I view my UMA human avatar, I don’t have it zoomed in on the head? how do I do that?”
First of all, select the “Main Camera” GameObject and look in the Inspector to see where I’ve set the main camera to be [X, Y, Z] values:
Now, click on the “SALSA_UMA2_DCS” GameObject – this is the actual human avatar:
You can see that I have modified the “Position” values.Â You might ask, “how did I know to set it to these values?”.Â Well, good question!
If you press the [>]Run button in the Unity IDE and then you see the UMA human on the screen, you can directly modify the values in the Inspector and the changes happen in real-time.Â This way, you can play around with values of the “Main Camera” and th “SALSA_UMA2_DCS” GameObjects and get the view that you want.Â Be aware though! Write down the values you changed to, once you press the [>]Run button to stop, those values you changed will revert back to the previous values.Â You will then have to modify them manually again.
One last change I made was to replace a default animation value that is set – you may not want to do this, but I found it a bit distracting and I will attempt to write my own animation model in the future.Â If you do not change this value, then when you see your UMA human avatar it’ll be moving about, rocking it’s head and body, swinging around a bit like it’s been in the bar for a few hours.Â I didn’t want this so I set the animation to ‘none’, that is why my UMA human avatar is fixed and focused looking forward and just it’s eyes and mouth move:
As you can see, there are some default UMA animations that you can use.
This is all great, but the ultimate goal is to see it actually running and working!
For that I’ve captured a couple of video’s that you can view below:
(if you’re really interested, yes that is my custom car:Â https://www.lilmerc.co.uk/Â )
Â As you see hear/see it did not always behave as I expected.Â I need to work on adding more content to my Watson Assistant Intents / Entities and change my Dialog flow to include a reference to the Intents.confidence % level, so that when I get mis-heard saying “no” and it thinks I said “note”, it handles it more gracefully.Â Now I have the baseline working though, I can spend more time refining these areas.
I’m going to give this tutorial a little look too, as I think I might be neededing to do this:Â https://developer.ibm.com/recipes/tutorials/improve-your-speech-to-text-accuracy-by-using-ibm-watson-speech-to-text-acoustic-model-customization-service/
As you can see above, I’ve spent more time writing this up than it actually took me to make.Â My goal now will be to enhance things further (when I get some time), such as looking more into what the SALSA components can do for me; making the LipSync more realistic; perhaps adding more visual feature movements to the UMA human avatar; having key trigger words that perform certain reactions, such as having the head tilt to one side when listening or having the UMA digital avatar squint and wrinkle it’s forehead slightly when responding to questions…
….and then there is the other-side, I can look into tapping into the IBM Watson Tone Analyzer service to detect the tone of the user and change the UMA digital avatar responses…. oh, and then there is the ability to Build&Deploy to WebGL….and to iOS and Android phones…..oooooo and then there is the Virtual Reality output from Unity too……
Anyway, there is always scope for doing more, this is genuinely just the start… I hope you find it a useful starting point for you own projects.Â Good Luck!