dev landscape for MS speechSun, 26 Nov 2006 15:45:28 GMT 

the following is a list of what's out there, or will be shortly :

Speech Server 2004 - can be used to develop SALT based telephony and multimodal applications. one of the latter changes was the addition of language packs for supporting more languages. it's being replaced by Speech Server 2007.

Speech Server 2007 - it offers a workflow development model which is much easier than working with SALT or VoiceXml. this workflow model is excellent! it continues to support SALT for telephony apps and adds basic VoiceXml support. also adds support for VOIP, but stops supporting multimodal apps. it also will have better language support. another thing is that it will be a part of Office Communication Server 2007. part of this effort is that Speech is getting baked into Exchange. not sure if i like that yet ... it sounds a little too IT'ish, and less developer-y. last i heard is that we won't see this until mid 07.

SAPI 5.1 - is a COM library on XP for doing speech reco and synthesis. it can be wrapped with an RCW to be called by .NET apps. it only supported about 3 different languages.

SAPI 5.3 - is a COM library on VIsta for doing speech reco and synthesis. its main changes are adding SRGS and SSML support, improved recognition accuracy, and more human sounding speech synthesis. with Vista, it supports something like 8 different languages already. .NET developers can still wrap this library directly with an RCW, or they can just use System.Speech.

System.Speech - is a managed interface for speech recognition and synthesis. this allows developers to add rich speech integration into their applications. it is part of .NET 3.0, so it runs on XP and Vista. on Vista, its underlying speech tech is SAPI 5.3, and on XP it uses SAPI 5.1. this has hooks into Vista's Speech UI to provide visual feedback to speech users. it's worth noting that System.Speech does not 'allow partially trusted callers', so it cannot be used from an internet sandbox.

Vista Speech UI - Vista speech recognition is baked directly into the shell. the integration is so well done, that it allows you to control apps using just your speech. even apps (and web pages) that were never meant to be speech controlled. of course, with some design changes and accessibility hooks, you can make your app even easier to control with speech.

Voice Command 1.6 - is an end user app for adding speech reco and synth to Pocket PCs, and now Smartphones. this latest release has also added bluetooth support. there is no developer API ... but i keep asking, so maybe someday.

MS Agents - you can still do these, but consider them dead. the SDK hasn't been updated for years. by years, i mean forever.

... and now i'm holding my breath that some version of SAPI will make its way over to XNA on the XBox 360. the idea is that you can control your squad or co-op NPC using your voice. e.g. i recently played the dog fighting game Blazing Angels. instead of having to press a directional key to tell the other pilots in my squad to attack or defend, it would have been more natural and immersive to just say 'attack' or 'defend'. the NPCs are already providing me with spoken prompts, so why can't i talk back?