20240820 Wifi Audio Streaming Client - Jim Rush's Home on the Web

Preface

This update will be a bit longer and even more meandering than other posts. There is a lot of content, but not a tutorial. It should be enough for anybody interested in repeating my results.

Introduction

I like audio in my spaces. Music, podcasts, noise/nature sounds, ect. As part of the Toon Tiki Patio build out, I want audio to be part of the experience. I want to be able to play music, background sounds and prop audio. The latter means audio tied to an object, with or without animation. For example, a voice to go with a mask.

I’ve been looking through commercial and DIY solutions and for some time was stumped, but I found an approach that appears will work. I’ve built it. The audio is so-so, but more on that later. I need to finish up the software and install it in the patio to figure out what aspects of it will work and what won’t.

Requirements

Reasonably cheap. I suspect I may want to replicate this approach for multiple use cases. So, I might end up deploying 5-10 audio streaming devices over time. Given audio has a wide quality range for results, ideally I can scale quality from low end and cheaper to music quality with higher end speakers.
I want to be able to control it from the same software framework I’m using for lights, and eventually, animations. I’ve currently settled on my own rolled Node-Red framework.
I want the audio to be streamed, which allows for more-or-less dynamic content and doesn’t require deploying SDCards.

Current Solution Summary

I ended up building a solution around an ESP32S3 and MAX98357A amplifiers. The ESP32S3 can generate two I2S audio streams. The ESPHome media player supports this out of the box with minimal effort. The media player can play MP3 audio based on a URL. You also have volume controls and ability to change status (stop, start, pause). The MAX98357A is a low end I2S DAC and amplifier. It can support up to ~3W speakers. This will be, I suspect, to be under performant for all but prop audio. But, the entire solution can be powered by 5V, so I can use my existing WLED patio set up in the Mai Tai Machine electrical box.

Circuit

The wiring is fairly simple:

The only complicated part is the SD_Mode resistor.

The MAX98357A is a single channel I2S amplifier. The SD_Mode pin’s voltage lets you select which channel to play or mixture of both. Setting it to 5V gives you the left channel. But, you need between .77V and 1.4V to set the right channel. If left open, the voltage is set by some internal resistors (1M to VDD and 100K to GND), to land in the 0.16V to 0.77V range to have it use an average of both channels.

I wasted too much time figuring out the resistor I needed for my breadboard setup. Then, when I designed the final circuit I merged this voltage selector with the one for the other stream, but forgot that puts both in parallel meaning I needed a different value resistor to get to the right voltage. To make it even more annoying, when you are close to the edges of different voltage regions, the related speaker gets distorted and has static.

ESPHome and ESP32S3

The base Arduino libraries that come with ESPHome didn’t work for me and I never worked out why. I ended up using an alternate audio pipeline. I ran into a wide variety of problems trying to get it to work, some of the more notable:

The ESP32S3 devboard I was using requires the wifi output_power to be reduced to 8.5dB for reliable wifi connections. However, I found that without it, I seemed to be getting other inconsistent results with the solution.
When making any significant changes to the build YAML file, do a clean build from the ESPHome Dashboard. A a C programmer of old, it makes sense. The config contents are probably not fully linked to the library dependencies, so if you make a change and don’t do a clean, you may be getting a library/build from an earlier setting. Or worse, a mixed build.

Lots of inconsistent results getting this to work and I never figured them all out, but those two probably were the result of many of the problems. Of course, just working on a breadboard can cause problems by bumped connections.

ESPHome YAML file

esphome:
  name: s3audio1
  friendly_name: S3Audio1

esp32:
  board: esp32-s3-devkitc-1
  flash_size: 16MB
  framework:
# The rest of this section required to use the gnumpi audio pipeline
    type: esp-idf
    version: recommended
    sdkconfig_options:
      COMPILER_OPTIMIZATION_SIZE: y
      CONFIG_ESP32_S3_BOX_BOARD: "y"
    advanced:
      ignore_efuse_mac_crc: false

psram:
  mode: octal
  speed: 80MHz

# Enable logging
logger:
  level: VERBOSE
#  level: VERY_VERBOSE

# Enable Home Assistant API
api:
  encryption:
    key: "****"

ota:
  - platform: esphome
    password: "****"

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  output_power: 8.5dB # Needed for these S3 boards or weird stuff happens beyond wifi problems

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "S3Audio1 Fallback Hotspot"
    password: "****"

captive_portal:

external_components:
  - source:
      type: git
      url: https://github.com/gnumpi/esphome_audio
      ref: main
    components: [ adf_pipeline, i2s_audio ]

i2s_audio:
  - id: i2s_out_1
    i2s_lrclk_pin: GPIO38
    i2s_bclk_pin: GPIO39

  - id: i2s_out_2
    i2s_lrclk_pin: GPIO48
    i2s_bclk_pin: GPIO45

adf_pipeline:
  - platform: i2s_audio
    type: audio_out
    id: adf_i2s_out_1
    i2s_audio_id: i2s_out_1
    i2s_dout_pin: GPIO18

  - platform: i2s_audio
    type: audio_out
    id: adf_i2s_out_2
    i2s_audio_id: i2s_out_2
    i2s_dout_pin: GPIO17

media_player:
  - platform: adf_pipeline
    id: adf_media_player_1
    name: media_player_patio1
    keep_pipeline_alive: true
    internal: false
    pipeline:
      - self
      - adf_i2s_out_1

  - platform: adf_pipeline
    id: adf_media_player_2
    name: media_player_patio2
    keep_pipeline_alive: true
    internal: false
    pipeline:
      - self
      - adf_i2s_out_2

Home Assistant

Home Assistant can adopt the device. It shows to media devices. From the built in interface you can select an audio source, but the choices are limited. My media collection is also published as a DLNA service on my network. I don’t use it, but it is a built in feature of my NAS. For testing, I could select the player and browse to some audio. And, since this is Home Assistant, I can also do the same with a REST API call:

POST http://myHAServer:8123/api/services/media_player/play_media

POST body
{
    "entity_id": "media_player.s3audio1_media_player_patio2",
    "media_content_type": "music",
    "media_content_id": "http://static_content_server/static/speaker-test.mp3"
}

Audio Output

Given the limits of the amplifier, I went with some basic $10/pair speakers. They sound like you would expect for that size and price. Very tinny. At this moment, I wasn’t convinced the 5V/MAX amplifier approach was going to be viable. But, I was surprised how much the audio changed when I held the speakers, which pushed me faster into looking for an enclosure. Ended up finding an OpenSCAD project for small speaker enclosures that fit my need. I had to tweak the settings a bit for my speakers, but I’m actually happy with how it came out. I also added some spring terminal clips to the back.

I will probably redo these enclosures at a later date. Longer term, I want to hide any speakers on the patio and will design things to contain them. But, if I wait until those side quests are done, it will be awhile before I find out if these will even meet my needs.

Once the speakers were in the enclosures, the audio improved significantly. It still is a bit tinny and at full volume, audio gets significantly distorted, but they might just be good enough for some background ambient sounds. They are good enough for any prop type solutions where the listener will be close by. So, even if these don’t work for the patio, they will get reused in other projects.

Preparations for Patio Deployment

Once I had everything working on a breadboard, I transferred the circuit to a PCB. At some point in the near future, I’m going learn enough KiCad to generate my own circuit boards for either my own etching or sending them out to get made (trying to figure out the cost factors, but these breadboard PCBs are super cheap at $4 each. A custom made board will be more expensive, but be smaller, more reliable, and require less soldering).

I had already designed/found a model for connecting those boards to a DIN rail. I had to modify that one to provide a mount for all of the speaker terminals.

This will allow me to add them to the existing DIN rail in the Mai Tai Machine Box and use the same 5V power supply used by the LED solution for Blinky (and other future projects).

Next Steps

First, I want to get the Node-Red software written. To do that, I need a working solution where I’m coding, not outside. Given I’m not sure if this is the final solution for the patio, I don’t want to build another instance just for development. I suspect this will take a couple of weeks. It isn’t hard and will be a lot of cut and paste from the existing codebase, but I tend to get lazy.

Second, I want to mount the speakers to the patio. I’ll put one at each corner of the covered patio. I will also have to figure out how to hide the wires. I’ll use some of the existing PVC conduit, but I’m not convinced I want to continue to use that approach. I might break down and buy premade conduit.

Finally, I want to see how it sounds with different types of audio. In particular, how it works with ambient sounds. For example, crickets and rain sounds. I suspect any thunder in the latter won’t be deep enough, but as secondary audio, it might be good enough. I also want to see how it might work for prop audio. My next patio project might be a Tiki + Penguins of Madagascar + Disney Enchanted Tiki Room mashup. While there is no animation, can audio aimed at a specific speaker (e.g. putting the audio on a single Left or right channel) sound well enough to understand a character voice (e.g. breaking up some audio sequence of conversation between the penguins into different audio streams so that it plays near the relevant penguin…)

If this audio approach doesn’t work, my next iteration will be to replace the MAX amplifiers with something a bit stronger. I’ve found a couple that might work, but all require 12V or more as a source. This isn’t a major concern, but does shift the electrical options. If I go down this path, I will probably also switch over to outdoor speakers where somebody has already created a balanced output with some decent sound quality.