Smart Speakers

ejlane

@KevinT Yes, that was my question, thanks. I didn't realize that Android had that built in - I thought the voices you hear were generated in the cloud and then the wav or mp3 was sent to the phone. (Like when using GPS to get around or whatever.) Or is it the same voices that you hear while doing that? Is it more basic?

Anyway, thanks for the answer - that's interesting.

Yeah, Alice must be able to run on pretty much any server, but I don't know how much manual effort would be required to alter settings. Maybe they even have an easy-to-use config change that will set up everything?

What I like the most is that the 'satellites' don't need much power, because all they really need to do is wakeword detection and then piping audio back and forth. Just need one with enough grunt to do all the speech to text and back. With our use patterns, that one wouldn't even get worked all that hard for the most part.

Actually, I'm really looking forward to this, because I have a Death Star 3d printed to hold all the electronics, and an addressable 60-LED lightring to go around the outside. So it will also function as a clock and timer/stopwatch with the ring.

And I'm also adding a piezo mic to act as an impact sensor, so you can use them as targets for nerf darts. I got that idea from someone who made an alarm clock that you turn off by shooting. As soon as I saw that I knew I had to add it to my speaker.

Oh, and the depression for the laser beam from the Death Star is where the speaker mounts. I have all that 3d modeled and printed, just need to do all the electronics and programming... :) I just need to figure out a way to make some simulated lasers and to be able to light them up. That would also work as a bit of a speaker grille to protect it. I haven't spent much time on that part, but I think it would be a nice addition, and I need to see about it...

ejlane

@CrankyCoder pointed me to another candidate for the speech-to-text part; Rhasspy in a post in another thread.

Funnily enough, in reading through the documentation, it states that it interoperates with Snips, and provides a link. Even though going straight to the Snips website provides nothing but a landing page, using their docs link ends up with all kinds of info. So it's still there, but hidden from a casual search.

rejoe2

How about Rhasspy?
That's kind of successor to snips and uses the same (hermes) protocol (MQTT).

For just some "regular" commands to any home automation system just something starting with Pi 3B+ should have enough power to do that.

In my setup (HP T620 as headless x86 server) this runs beside FHEM and deconz on my central machine and receives audio input by mobile phone app (ESP32 satellites are possible as well) allowing full control to all my blinds, lights, ...

CrankyCoder

I have been playing with Rhasspy on and off since the snips debacle. I recently got a little more in to it and have a neat little setup so far.

I have an instance of rhasspy running on a pi3 with a docker container. That container will soon be moved to something more powerful, but for now it's working quite well. That pi just has a set of simple speakers and an old playstation eye webcam (for the microphone).

Then I have a small pi zero w with a respeaker 2 mic hat and a tiny like 2 inch "mini loud speaker". That pi obviously has little power, but I have all the heavy lifting being done by the other pi. So the zero is doing the wake word detection, audio record, speech to text (for the moment) and audio playback.

The other pi is doing the same things but also intent analysis and intent handling. Both pi's are sharing an mqtt broker on my network. So the pi3 is actually processing intents and intent handling for itself AND my little satellite module.

My text to speech is currently using a docker container of marytts running on yet another machine and handles the tts for both pis.

For anyone that knows me and my setup, I am a big kubernetes guy, so stuff like the marytts, intent handling, intent analysis all those things are being moved into my cluster. Then all my pi's can basically share the same "brain" as it were.

My whole system for this just like my home automation is designed to run local. Which is why I am not using google's speech to text or amazon polly or something like that. I know those are more natural sounding and realistically if I need to switch it later it's easy.

Rhasspy has DEFINITELY made it nice to swap pieces in and out and made it very modular.

ejlane

@CrankyCoder Thank you so much for the detailed explanation of your setup! That sounds very much like how I want my final result to be. I still have no idea how I missed Rhasspy as a decent solution when snips got bought out. It was just never on my radar, which in hindsight is ridiculous. I have no explanation for the huge oversight.

However, now I've been looking into it more, thanks to you, and it's looking great! I also like that there's been some collaboration between Rhasspy and the guy behind Project Alice, so it looks like there's no bad blood there. I was looking at the LED control code, and surprisingly to me, I found out that the Raspberry Pi can even control the ws2812 LEDs directly! I thought the timing constraints were too much for it, but I see that people have made clever use of some of its hardware to be able to handle it, even with a non-realtime OS.

So with that, I'll be able to do without a supporting mcu, it looks like. I guess I'm running out of excuses and need to get started soon.

I was comparing the TTS modules, and although the online ones do sound better, as you say, I also want to keep everything local for privacy reasons. I did like the sound of the MaryTTS, but when I was listening to samples online, it sounded like picoTTS was a bit better to my ear.

I haven't figured out my final configuration for sure, but it could end up being a docker instance on an old repurposed desktop that has extra bandwidth or maybe on a dedicated Raspberry Pi 4. I think either one would be enough, because it would be relatively rare for more than one satellite to be interacted with at the same time.

Do you like Kubernetes much better than Docker for any certain reasons? I guess this is mostly curiosity for me - I currently have 5-10 docker instances running on a couple computers and they're mostly hands-off as far as any maintenance, so I would be hesitant to put in the time to start switching at this point.

CrankyCoder

if you want to play with some ws2812 stuff. i HIGHLY recommend looking at the wled project from aircookie. uses a wemos d1 mini. flash the firmware and off you go. integrates into EVERYTHING. tons of built in lighting effects and some crazy extras like supporting e1.31 protocol so you can add your led strips to christmas light shows using stuff like xlights, vixen2 and falcon pi player.

I haven't really tested pico but I would guess there is a way to use it as a drop in just like i have with my current marytts.

As far as the kubernetes, the reason I use that is 2 main reasons.

i currently have a 7 node cluster. So if a container dies, or a node needs to be patched or something, kubernetes just gets it back up and running somewhere else quick.
i am CKA certified and do alot of kubernetes for work. So it eventually bled over into my hobby. it's probably WAY overkill, but I even run my homeautomation software in it.

KevinT

@ejlane Your Death star speaker sounds pretty impressive! LEDs, timer/stopwatch, impact sensor, and of course speaker & microphone, she'll be loaded. You'll have to share a few pictures. How big will it be? Which Pi fits inside it?

ejlane

@CrankyCoder as far as the LED project, I'll look into that, thanks! I don't have any plans to go that crazy to need the e1.31 stuff, but I guess support for features you don't use doesn't hurt... :)

Yeah, it looked like there was just a selection box, and they were both choices. I haven't gone any deeper than that, or installed it on my own hardware yet. Day job and family stuff are keeping me too busy to spend much time on it other than just dreaming/planning.

Some of what you said might have gone over my head with Kubernetes. Makes sense that you would use what you're good at. But by saying you have a 7 node cluster, does that mean 7 physical machines that will actively share the load? If so that's pretty cool, but far more than what I'm needing any time soon. (I think. Unless I get really deep into some big project, but there's nothing on the horizon right now.)

ejlane

@KevinT It's sized for whatever size a 60 LED ring is. I bought them off Aliexpress and sized to that. A regular Pi could probably fit in there, though I'm not 100% sure on that. I planned for them to be Zeros. Only thing is, it's been a couple years since I started thinking about it, and I just don't get around to actually doing it much yet. So it might still be a while before I get anything finished.

Though just today I got a marketing email about an ESP32-S3 product that is aimed at machine learning and might very well be able to handle all the needs of a satellite. It would still need software support, so it's not ready to go, but that would be even lower cost and power budget for the satellites.

https://www.hackster.io/news/espressif-launches-esp32-s3-box-an-all-in-one-esp32-s3-dev-system-for-tinyml-edge-ai-work-89421f602b2d

So I did a bit of searching, and others are also considering it: https://community.rhasspy.org/t/best-esp32-based-hardware-for-satellite/3012

Looks like the chips should have plenty of power for wake word detection, but it would have to be coded. So far if I understood it correctly, they are just streaming everything to the Rhasspy server full-time. Not that it would be a ton of bandwidth, but I'd rather not be broadcasting all audio in every room of my house 24/7. I think it's just the very inelegant design hurts my engineer's brain... :)

CrankyCoder

@ejlane Correct. 7 nodes = 7 machines. BUT, the 7 current machines are 7 raspberry pi 4 (8gig Ram) modules. it is 1/2 of what my goal is. Eventually it will be a full 14 node cluster.

I had been doing something similar with the pi satellites. I found that if you tell it to use a UDP broadcast to localhost for the wakeword/recording then it doesn't send the audio frames to the mqtt broker.

Smart Speakers

16

11.7k

11.2k

113.2k