continuous record/recognize audio with pocketsphinx/ffmpeg


Staff member
as the title already says, I want to continuous record raw audio through my microphone.
So the idea was running a simple C program in the background as service that would create chunks of audio and send those files through the sphinx speech recognition.

After that I can do some processing with the recognized words.

The problem is the (continuous) recognition. I can't just record audio chunks containing 10 seconds what i've said, because maybe chunk[33] -> chunk[34] belong together and then sphinx would output something like:

recognized chunk[33] -> ["enable light"]
recognized chunk[34] -> ["5 with 50 percent"]

Another approach would be to continuous record audio but then I can't process big audio files with sphinx.

I'm using the basic <a href="" rel="nofollow noreferrer">example</a> from pocketsphinx:

#include &lt;pocketsphinx.h&gt;

int main(int argc, char *argv[])
ps_decoder_t *ps;
cmd_ln_t *config;
FILE *fh;
char const *hyp, *uttid;
int16 buf[512];
int rv;
int32 score;

config = cmd_ln_init(NULL, ps_args(), TRUE,
             "-hmm", MODELDIR "/en-us/en-us",
             "-lm", MODELDIR "/en-us/en-us.lm.bin",
             "-dict", MODELDIR "/en-us/cmudict-en-us.dict",
if (config == NULL) {
fprintf(stderr, "Failed to create config object, see log for details\n");
return -1;

ps = ps_init(config);
if (ps == NULL) {
fprintf(stderr, "Failed to create recognizer, see log for details\n");
return -1;

fh = fopen("audiochunk_33.raw", "rb");
if (fh == NULL) {
fprintf(stderr, "Unable to open input file goforward.raw\n");
return -1;

rv = ps_start_utt(ps);

while (!feof(fh)) {
size_t nsamp;
nsamp = fread(buf, 2, 512, fh);
rv = ps_process_raw(ps, buf, nsamp, FALSE, FALSE);

rv = ps_end_utt(ps);
hyp = ps_get_hyp(ps, &amp;score);
printf("Recognized: %s\n", hyp);


return 0;


And <a href="" rel="nofollow noreferrer">here</a> is a basic example using ffmpeg to create a simple audio file/chunk:

#include &lt;stdio.h&gt;
#include &lt;stdint.h&gt;
#include &lt;math.h&gt;

#define N 44100

void main()
// Create audio buffer
int16_t buf[N] = {0}; // buffer
int n;                // buffer index
double Fs = 44100.0;  // sampling frequency

// Generate 1 second of audio data - it's just a 1 kHz sine wave
for (n=0 ; n&lt;N ; ++n) buf[n] = 16383.0 * sin(n*1000.0*2.0*M_PI/Fs);

// Pipe the audio data to ffmpeg, which writes it to a wav file
FILE *pipeout;
pipeout = popen("ffmpeg -y -f s16le -ar 44100 -ac 1 -i - beep.wav", "w");
fwrite(buf, 2, N, pipeout);