<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	
	>
<channel>
	<title>
	Comments on: Running Whisper AI for Real-Time Speech-to-Text on Linux	</title>
	<atom:link href="https://www.tecmint.com/whisper-ai-audio-transcription-on-linux/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.tecmint.com/whisper-ai-audio-transcription-on-linux/</link>
	<description>Tecmint - Linux Howtos, Tutorials, Guides, News, Tips and Tricks.</description>
	<lastBuildDate>Tue, 27 Jan 2026 06:13:00 +0000</lastBuildDate>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	
	<item>
		<title>
		By: Ravi Saive		</title>
		<link>https://www.tecmint.com/whisper-ai-audio-transcription-on-linux/comment-page-1/#comment-2384000</link>

		<dc:creator><![CDATA[Ravi Saive]]></dc:creator>
		<pubDate>Tue, 27 Jan 2026 06:13:00 +0000</pubDate>
		<guid isPermaLink="false">https://www.tecmint.com/?p=59850#comment-2384000</guid>

					<description><![CDATA[In reply to &lt;a href=&quot;https://www.tecmint.com/whisper-ai-audio-transcription-on-linux/comment-page-1/#comment-2383785&quot;&gt;irmhild&lt;/a&gt;.

@irmhild,

Thanks for the update, you’re actually very close.

By “&lt;strong&gt;quasi live&lt;/strong&gt;,” I meant that &lt;strong&gt;Whisper&lt;/strong&gt; can’t transcribe speech word by word in real time. Instead, you record audio continuously, collect a few seconds of speech, send that chunk to &lt;strong&gt;Whisper&lt;/strong&gt;, show the text, and repeat. 

Saving to a file was just to confirm your microphone audio is good and since that worked, &lt;strong&gt;Whisper&lt;/strong&gt; itself is fine.

The limitations mainly come from &lt;strong&gt;Whisper&lt;/strong&gt;, not &lt;strong&gt;Linux&lt;/strong&gt;. It’s designed for full audio segments, so if the audio chunks are too short or mostly silence, it often returns no text. 

The “&lt;strong&gt;streaming logic&lt;/strong&gt;” is just the code that handles recording and chunking audio before sending it to the model.

If you want something more naturally real-time, you could also try tools like &lt;code&gt;whisper.cpp&lt;/code&gt; (streaming versions) or &lt;strong&gt;Vosk&lt;/strong&gt;, which are built more for continuous speech recognition.

So your setup isn’t broken, it’s just the audio buffering part that needs adjustment.]]></description>
			<content:encoded><![CDATA[<p>In reply to <a target="_blank" href="https://www.tecmint.com/whisper-ai-audio-transcription-on-linux/comment-page-1/#comment-2383785">irmhild</a>.</p>
<p>@irmhild,</p>
<p>Thanks for the update, you’re actually very close.</p>
<p>By “<strong>quasi live</strong>,” I meant that <strong>Whisper</strong> can’t transcribe speech word by word in real time. Instead, you record audio continuously, collect a few seconds of speech, send that chunk to <strong>Whisper</strong>, show the text, and repeat. </p>
<p>Saving to a file was just to confirm your microphone audio is good and since that worked, <strong>Whisper</strong> itself is fine.</p>
<p>The limitations mainly come from <strong>Whisper</strong>, not <strong>Linux</strong>. It’s designed for full audio segments, so if the audio chunks are too short or mostly silence, it often returns no text. </p>
<p>The “<strong>streaming logic</strong>” is just the code that handles recording and chunking audio before sending it to the model.</p>
<p>If you want something more naturally real-time, you could also try tools like <code>whisper.cpp</code> (streaming versions) or <strong>Vosk</strong>, which are built more for continuous speech recognition.</p>
<p>So your setup isn’t broken, it’s just the audio buffering part that needs adjustment.</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: irmhild		</title>
		<link>https://www.tecmint.com/whisper-ai-audio-transcription-on-linux/comment-page-1/#comment-2383785</link>

		<dc:creator><![CDATA[irmhild]]></dc:creator>
		<pubDate>Mon, 26 Jan 2026 10:31:14 +0000</pubDate>
		<guid isPermaLink="false">https://www.tecmint.com/?p=59850#comment-2383785</guid>

					<description><![CDATA[In reply to &lt;a href=&quot;https://www.tecmint.com/whisper-ai-audio-transcription-on-linux/comment-page-1/#comment-2377306&quot;&gt;Ravi Saive&lt;/a&gt;.

Dear Ravi,

Thank you for your reply. After returning from holiday, I tried again today. As you suggested, I saved a file, but I’m not sure what to do next to transcribe in a quasi “&lt;strong&gt;live&lt;/strong&gt;” mode. 

Could you clarify what you meant? It works in transcription mode, but I’m not sure that’s what you were referring to.

I also have another question. You mention several possible limitations of real-time transcription — are these limitations related to &lt;strong&gt;Whisper&lt;/strong&gt; itself, your script, Python, or &lt;strong&gt;Linux&lt;/strong&gt;? 

Where does the “&lt;strong&gt;streaming logic&lt;/strong&gt;” come from? Do you know of any alternative solutions for real-time transcription that I could try?]]></description>
			<content:encoded><![CDATA[<p>In reply to <a target="_blank" href="https://www.tecmint.com/whisper-ai-audio-transcription-on-linux/comment-page-1/#comment-2377306">Ravi Saive</a>.</p>
<p>Dear Ravi,</p>
<p>Thank you for your reply. After returning from holiday, I tried again today. As you suggested, I saved a file, but I’m not sure what to do next to transcribe in a quasi “<strong>live</strong>” mode. </p>
<p>Could you clarify what you meant? It works in transcription mode, but I’m not sure that’s what you were referring to.</p>
<p>I also have another question. You mention several possible limitations of real-time transcription — are these limitations related to <strong>Whisper</strong> itself, your script, Python, or <strong>Linux</strong>? </p>
<p>Where does the “<strong>streaming logic</strong>” come from? Do you know of any alternative solutions for real-time transcription that I could try?</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Ravi Saive		</title>
		<link>https://www.tecmint.com/whisper-ai-audio-transcription-on-linux/comment-page-1/#comment-2377306</link>

		<dc:creator><![CDATA[Ravi Saive]]></dc:creator>
		<pubDate>Fri, 09 Jan 2026 06:15:05 +0000</pubDate>
		<guid isPermaLink="false">https://www.tecmint.com/?p=59850#comment-2377306</guid>

					<description><![CDATA[In reply to &lt;a href=&quot;https://www.tecmint.com/whisper-ai-audio-transcription-on-linux/comment-page-1/#comment-2375410&quot;&gt;irmhild&lt;/a&gt;.

@irmhild,

Thank you for your kind words, and I’m glad to hear you were able to resolve the initial error and successfully transcribe audio files in German.

Regarding the real-time transcription issue: what you are seeing is a common limitation rather than a configuration mistake. &lt;code&gt;model.transcribe()&lt;/code&gt; is designed for complete audio segments, not for continuous real-time streams. If the incoming audio buffer is too short, contains mostly silence, or is not finalized, Whisper may simply return no text without raising an error.

A few points to check:

Make sure &lt;code&gt;audio_data&lt;/code&gt; actually contains speech and not just silence. Whisper will output nothing if the audio energy is too low.

Real-time transcription typically requires buffering audio into longer chunks (e.g., several seconds) before calling &lt;code&gt;transcribe()&lt;/code&gt;. Calling it too frequently on small frames often results in empty output.

Ensure the audio is sampled at 16 kHz (or properly resampled), mono, and normalized to the expected float range.

For real-time use, many implementations use a loop that accumulates audio, applies a voice-activity check, and only then calls &lt;code&gt;transcribe()&lt;/code&gt;.

Since file-based transcription works for you in German, language support is not the issue. The problem is almost certainly related to how the live audio is captured, buffered, or passed to the model.

I would recommend testing by saving a few seconds of your “real-time” audio to a file and transcribing that file. If that works, the issue is confirmed to be in the streaming logic rather than Whisper itself.

I hope this helps, and please feel free to share more details about your audio capture setup if you need further assistance.]]></description>
			<content:encoded><![CDATA[<p>In reply to <a target="_blank" href="https://www.tecmint.com/whisper-ai-audio-transcription-on-linux/comment-page-1/#comment-2375410">irmhild</a>.</p>
<p>@irmhild,</p>
<p>Thank you for your kind words, and I’m glad to hear you were able to resolve the initial error and successfully transcribe audio files in German.</p>
<p>Regarding the real-time transcription issue: what you are seeing is a common limitation rather than a configuration mistake. <code>model.transcribe()</code> is designed for complete audio segments, not for continuous real-time streams. If the incoming audio buffer is too short, contains mostly silence, or is not finalized, Whisper may simply return no text without raising an error.</p>
<p>A few points to check:</p>
<p>Make sure <code>audio_data</code> actually contains speech and not just silence. Whisper will output nothing if the audio energy is too low.</p>
<p>Real-time transcription typically requires buffering audio into longer chunks (e.g., several seconds) before calling <code>transcribe()</code>. Calling it too frequently on small frames often results in empty output.</p>
<p>Ensure the audio is sampled at 16 kHz (or properly resampled), mono, and normalized to the expected float range.</p>
<p>For real-time use, many implementations use a loop that accumulates audio, applies a voice-activity check, and only then calls <code>transcribe()</code>.</p>
<p>Since file-based transcription works for you in German, language support is not the issue. The problem is almost certainly related to how the live audio is captured, buffered, or passed to the model.</p>
<p>I would recommend testing by saving a few seconds of your “real-time” audio to a file and transcribing that file. If that works, the issue is confirmed to be in the streaming logic rather than Whisper itself.</p>
<p>I hope this helps, and please feel free to share more details about your audio capture setup if you need further assistance.</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: irmhild		</title>
		<link>https://www.tecmint.com/whisper-ai-audio-transcription-on-linux/comment-page-1/#comment-2375410</link>

		<dc:creator><![CDATA[irmhild]]></dc:creator>
		<pubDate>Thu, 01 Jan 2026 12:22:57 +0000</pubDate>
		<guid isPermaLink="false">https://www.tecmint.com/?p=59850#comment-2375410</guid>

					<description><![CDATA[Dear Ravi,

First of all, thank you for this great work, and all the best to you in 2026.

My question: I am deaf and would like to use real-time transcription in German. After some initial trouble (specifically the &lt;strong&gt;ValueError: need at least one array to concatenate&lt;/strong&gt;, which I fixed using your suggested check), everything works as expected: &lt;code&gt;whisper --help&lt;/code&gt; works, and I can also get a transcript from an audio file in German.

However, when I try real-time transcription (using, of course, &lt;code&gt;result = model.transcribe(audio_data.flatten(), language=&quot;de&quot;))&lt;/code&gt;, I get no output at all—no text, nothing. 

I have tried waiting for some time, but still nothing happens.

Do you have any idea what might be going wrong?

Thank you very much in advance!]]></description>
			<content:encoded><![CDATA[<p>Dear Ravi,</p>
<p>First of all, thank you for this great work, and all the best to you in 2026.</p>
<p>My question: I am deaf and would like to use real-time transcription in German. After some initial trouble (specifically the <strong>ValueError: need at least one array to concatenate</strong>, which I fixed using your suggested check), everything works as expected: <code>whisper --help</code> works, and I can also get a transcript from an audio file in German.</p>
<p>However, when I try real-time transcription (using, of course, <code>result = model.transcribe(audio_data.flatten(), language="de"))</code>, I get no output at all—no text, nothing. </p>
<p>I have tried waiting for some time, but still nothing happens.</p>
<p>Do you have any idea what might be going wrong?</p>
<p>Thank you very much in advance!</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Ravi Saive		</title>
		<link>https://www.tecmint.com/whisper-ai-audio-transcription-on-linux/comment-page-1/#comment-2363186</link>

		<dc:creator><![CDATA[Ravi Saive]]></dc:creator>
		<pubDate>Thu, 20 Nov 2025 04:28:27 +0000</pubDate>
		<guid isPermaLink="false">https://www.tecmint.com/?p=59850#comment-2363186</guid>

					<description><![CDATA[In reply to &lt;a href=&quot;https://www.tecmint.com/whisper-ai-audio-transcription-on-linux/comment-page-1/#comment-2362976&quot;&gt;Paolo&lt;/a&gt;.

@Paolo,

Thanks for the update! 

Really appreciate you adding all these options. I’ll give them a spin and let you know my thoughts.

Great work!]]></description>
			<content:encoded><![CDATA[<p>In reply to <a target="_blank" href="https://www.tecmint.com/whisper-ai-audio-transcription-on-linux/comment-page-1/#comment-2362976">Paolo</a>.</p>
<p>@Paolo,</p>
<p>Thanks for the update! </p>
<p>Really appreciate you adding all these options. I’ll give them a spin and let you know my thoughts.</p>
<p>Great work!</p>
]]></content:encoded>
		
			</item>
	</channel>
</rss>
