TTS test

Demo
TTS
Whisper
/asr (Automatic Speech Recognition)
/detect-language

Curl example:


curl -X POST -H "content-type: multipart/form-data" -F "audio_file=@v21.mp3" http://static.132.20.13.49.clients.your-server.de:9000/asr?output=json
Answer:
{"text": "Test text creation, analysis through three models", "segments": [{"id": 0, "seek": 0, "start": 0.0, "end": 4.0, "text": "Test text creation, analysis through three models", "tokens": [50364, 39352, 8016, 11, 5215, 807, 1045, 5245, 13, 50564], "temperature": 0.0, "avg_logprob": -0.5402440591291948, "compression_ratio": 0.9019607843137255, "no_speech_prob": 0.03386480361223221}], "language": "en"}

transcribe: (default) task, transcribes the uploaded file.

  • translate: will provide an English transcript no matter which language was spoken.
  • Files are automatically converted with FFmpeg.
  • Full list of supported audio and video formats.
  • You can enable word level timestamps output by word_timestamps parameter
  • You can Enable the voice activity detection (VAD) to filter out parts of the audio without speech by vad_filter parameter (only with Faster Whisper for now).
  • Request URL Query Params

    Name Values Description
    audio_file File Audio or video file to transcribe
    output text (default), json, vtt, srt, tsv Output format
    task transcribe, translate Task type - transcribe in source language or translate to English
    language en (default is auto recognition) Source language code (see supported languages)
    word_timestamps false (default) Enable word-level timestamps (Faster Whisper only)
    vad_filter false (default) Enable voice activity detection filtering (Faster Whisper only)
    encode true (default) Encode audio through FFmpeg before processing
    diarize false (default) Enable speaker diarization (WhisperX only)
    min_speakers null (default) Minimum number of speakers for diarization (WhisperX only)
    max_speakers null (default) Maximum number of speakers for diarization (WhisperX only)