DotNet TTS Wrapper

A .NET NuGet package that provides a unified API for working with multiple cloud-based and local Text-to-Speech (TTS) services. Ported from js-tts-wrapper.

Repository: https://github.com/AACTools/dotnet-tts-wrapper
NuGet: dotnet add package DotNetTtsWrapper

Supported Engines

Engine	Word Events	Streaming	Offline	Notes
Azure	Real	Yes	No	Azure Speech SDK (WebSocket). Also has REST client.
Google	Real (timepoints)	Yes	No	Google Cloud TTS
ElevenLabs	Real (alignment)	Yes	No	Character-level alignment data
Polly	Estimated	Yes	No	AWS Polly with full Signature V4 auth
OpenAI	Estimated	Yes	No	Configurable model (tts-1 / tts-1-hd)
Cartesia	Estimated	Yes	No	Low-latency TTS
Deepgram	Estimated	Yes	No	Aura models
Watson	Estimated	Yes	No	IBM Watson TTS
SherpaOnnx	Estimated	Yes	Yes	Local VITS/Matcha/Kokoro/Piper/MMS models
SAPI	Estimated	No	N/A	Windows built-in system voices
PlayHT, WitAI, Gemini, Hume, xAI, FishAudio, Mistral, Murf, UnrealSpeech, Resemble, UpliftAI, ModelsLab	Estimated	Yes	No	Additional cloud engines

Word Timing Support

Type	Description
Real	Engine provides actual word boundary timestamps from the API
Estimated	Length-weighted heuristic based on speaking rate (150 WPM default, configurable). Automatically applied as fallback when an engine doesn't provide real timing data.

All engines return WordTimings on TtsSynthesisResult. Engines without native support get estimated timings automatically — no configuration needed.

Features

Unified API: Single interface for 20+ TTS engines via TtsFactory.CreateClient()
Streaming: IAsyncEnumerable<AudioChunkEventArgs> for real-time audio chunk streaming
Word Timings: Real word boundary events from Azure/Google/ElevenLabs; automatic estimated fallback for all other engines via WordTimingEstimator
SpeechMarkdown: Automatic conversion from SpeechMarkdown to SSML/plaintext per engine
Credential Validation: CheckCredentialsAsync() on every engine, with synthesis fallback for engines with hardcoded voice lists
Cross-platform: Windows, Linux, macOS (engine-dependent)
Modern .NET: Built for .NET 8.0+ with RollForward=LatestMajor

Installation

dotnet add package DotNetTtsWrapper

Quick Start

using DotNetTtsWrapper.Models;
using DotNetTtsWrapper.Engines;

// Create a client (factory handles all engine types)
var creds = new OpenAICredentials { ApiKey = "sk-...", Model = "tts-1-hd" };
var client = TtsFactory.CreateClient("openai", creds);

// List voices
var voices = await client.GetVoicesAsync();
client.SetVoice("alloy");

// Synthesize to bytes (with word timings)
var result = await client.SynthToBytesAsync("Hello world!");
File.WriteAllBytes("output.mp3", result.AudioData);

// Word timings are always available (real or estimated)
foreach (var t in result.WordTimings)
    Console.WriteLine($"{t.Text}: {t.StartTime:F2}s - {t.EndTime:F2}s");

Engine Configuration

Azure

var creds = new AzureCredentials { SubscriptionKey = "key", Region = "eastus" };
var client = TtsFactory.CreateClient("azure", creds);

OpenAI (configurable model)

var creds = new OpenAICredentials { ApiKey = "sk-...", Model = "tts-1-hd" };
// Model defaults to "tts-1", set to "tts-1-hd" for higher quality
// OrganizationId optional: creds.OrganizationId = "org-...";

ElevenLabs (configurable model + voice settings)

var creds = new ElevenLabsCredentials {
    ApiKey = "...",
    ModelId = "eleven_multilingual_v2",  // or "eleven_monolingual_v1"
    Stability = 0.5f,
    SimilarityBoost = 0.75f
};

Google

var creds = new GoogleCredentials { ApiKey = "AIza..." };
// languageCode is derived from voice name automatically

AWS Polly (full Signature V4 authentication)

var creds = new PollyCredentials {
    AccessKeyId = "AKIA...",
    SecretAccessKey = "...",
    Region = "us-east-1"
};

SherpaOnnx (local offline TTS)

var creds = new SherpaOnnxCredentials {
    ModelFilePath = "/path/to/model.onnx",       // explicit paths
    TokensFilePath = "/path/to/tokens.txt",
    DataDirPath = "/path/to/espeak-ng-data",
    // OR use ModelPath directory convention:
    // ModelPath = "/path/to/model/directory",
    // ModelId = "vits-piper-en_US-amy-low"
};

Streaming

var streamResult = await client.SynthToStreamAsync("Long text to stream...");
await foreach (var chunk in streamResult.AudioStream)
{
    speaker.Write(chunk.AudioData, 0, chunk.AudioData.Length);
}
// streamResult.WordTimings available after completion

Word Boundary Events

// Real-time events during SpeakAsync
client.WordBoundary += (sender, e) => {
    Console.WriteLine($"Word: {e.Text}, Time: {e.StartTime:F2}s");
};
await client.SpeakAsync("Hello world!");

// Or access from synthesis result
var result = await client.SynthToBytesAsync("Hello world!");
var timings = result.WordTimings; // always populated (real or estimated)

Customizing Estimates

using DotNetTtsWrapper.Utils;

// Length-weighted estimate (default: 150 WPM)
var estimates = WordTimingEstimator.EstimateWordBoundaries(text, wordsPerMinute: 200);

// With known audio duration (scales proportionally)
var estimates = WordTimingEstimator.EstimateWordBoundaries(text, totalDurationSeconds: 5.2);

// Simple flat estimate (300ms per word)
var flat = WordTimingEstimator.EstimateWordBoundariesFlat(text);

SpeechMarkdown

The wrapper automatically converts SpeechMarkdown to engine-appropriate format:

// SpeechMarkdown is auto-detected and converted
await client.SpeakAsync("Hello (speed:x-fast)world(/speed)");

Each engine gets the correct platform mapping (Azure → Microsoft Azure, Google → Google Assistant, Polly → Amazon Alexa, etc.).

Requirements

.NET 8.0+ runtime
Windows required for SAPI engine; SherpaOnnx works on all platforms
API keys/credentials for cloud engines

License

Ported from js-tts-wrapper with .NET-specific enhancements.

Related Projects

VoiceGarden-SAPI — SAPI5 adapter using this library
js-tts-wrapper — JavaScript/TypeScript version

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github/workflows		.github/workflows
DotNetTtsWrapper.Tests		DotNetTtsWrapper.Tests
src/DotNetTtsWrapper.Core		src/DotNetTtsWrapper.Core
tests/DotNetTtsWrapper.Tests		tests/DotNetTtsWrapper.Tests
.gitattributes		.gitattributes
.gitignore		.gitignore
DotNetTtsWrapper.slnx		DotNetTtsWrapper.slnx
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DotNet TTS Wrapper

Supported Engines

Word Timing Support

Features

Installation

Quick Start

Engine Configuration

Azure

OpenAI (configurable model)

ElevenLabs (configurable model + voice settings)

Google

AWS Polly (full Signature V4 authentication)

SherpaOnnx (local offline TTS)

Streaming

Word Boundary Events

Customizing Estimates

SpeechMarkdown

Requirements

License

Related Projects

About

Uh oh!

Releases 11

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

DotNet TTS Wrapper

Supported Engines

Word Timing Support

Features

Installation

Quick Start

Engine Configuration

Azure

OpenAI (configurable model)

ElevenLabs (configurable model + voice settings)

Google

AWS Polly (full Signature V4 authentication)

SherpaOnnx (local offline TTS)

Streaming

Word Boundary Events

Customizing Estimates

SpeechMarkdown

Requirements

License

Related Projects

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 11

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages