Quest for Zoom Audio Perfection

Warning: This post is not for the faint of heart.

After Zoom released a recent update that finally enabled high-fidelity audio, I went on a quest to up my audio game in Zoom. But I was having problems achieving the promised “high fidelity” despite having checked all the right boxes and selected the proper settings. So what follows is a description of my current setup, which is finally producing great quality audio, in stereo (if their client supports it).

The Gear

  • 2017 Macbook Pro
  • Focusrite Scarlett 2i2 (2nd gen)…thanks Neil!
  • A Røde NT1 Condenser Mic (any mic will do)
  • Yamaha Piaggero NP-12 digital keyboard (any MIDI keyboard will do)

The Software

  • Zoom (running the latest version)
  • Soundflower
  • Garageband (but any DAW that allows monitoring of multiple tracks simultaneously would work)

The Setup

With all of the gear plugged in and connected to the computer, here is the process of getting the software installed and configured.

Soundflower is a system audio extension that allows audio to be routed internally from one to another. The installation process is pretty straightforward. But this won’t install an app on your computer. Instead, after installation, you’ll see two new sound options in the sound menu.

We don’t need to do anything else with this for now.

Next, you’ll need to head to your Audio MIDI Setup. This is a system application located in the Utilities folder in Applications.

There, we’re going to create two new virtual devices. Click the + icon in the lower left and choose “Create Aggregate Device.” This virtual device will allow MacOS (and Core Audio) to recognize the inputs from each selected device as one virtual interface. Each devices input will be accessible simeltanuously. We’ll use this ability in Garageband later.

Using the checkboxes in the “Use” column, select “Soundflower (2ch)” first, then select whatever input you use for your microphone. In my case, it’s my Scarlett 2i2. The order in which you select the inputs is important, as it will affect the order of the inputs in Garageband.

The last thing we need to create is a “Multi-Output Device.” Create this the same way by clicking the + icon in the lower left and choose “Create Multi-Output Device.” This virtual device allows audio outputs to be delivered to multiple different outputs. We need this so that we can monitor the audio from Garageband. Without it, we wouldn’t be able to hear anything being played through Zoom. Here, the order doesn’t matter, but we want to select “Soundflower (64ch)” and the output you use to listen (which in my case is the Scarlett 2i2).

NOTE: There are two Soundflower devices, one with (2ch) and the other with (64ch). For the sake of your own sanity, be sure to use (2ch) in the Aggregate Device and (64ch) for the Multi-Output Device. If you do the opposite, Garageband will present you with over 60 different inputs to choose, making things much more complicated.

That’s it for the Audio/MIDI Setup.

Next, we turn our attention to Garageband. Start a new project, choose an empty project, then click Choose. Garageband will immediately ask you to choose a track type. Ignore this for now and click on the GarageBand menu and choose “Preferences.”

Click on the Audio/MIDI tab at the top of the Preferences window. For the Output Device, select “Multi-Output Device” and for the Input Device, select “Aggregate Device.” Then close the Preference window by clicking the red X in the upper left corner.

Now, back to “Choose a track type.” We will ultimately be creating three different tracks. For this track, choose an audio track, the one with the microphone logo. For the input, choose “Input 1 + 2.” This track will carry all of your computer audio from Soundflower. So verify that your setup screen looks like this and click create.

Before you do anything else, go ahead and rename this track “Computer Audio” so you’ll remember which one is which later. You can do that by double clicking on the default name “Audio 1” and then typing in a new name.

To create the second track, go to “Track” in the menu and select “New Track…”

We’ll again choose an audio track with the microphone, but this time, we want Input 3. This is the microphone input on my Scarlett 2i2. Remember, Garageband is seeing the Aggregate Device we set up earlier and recognizes all of the inputs from both Soundflower and the Scarlett 2i2. Make sure your screen looks like this and click create.

Rename this track “Microphone.”

This last track is optional, but if you want to play a MIDI keyboard and have the audio be heard in Zoom, create one more track. This time, we’ll choose a Software Instrument and click create.

Now that you have all of your tracks created, it’s time to set up how to monitor them in your headphones. First, we need to get your computer audio routed to your “Computer Audio” track. To do this, simply select “Soundflower (2ch)” as your system audio output.

Now, any audio created by your computer will be routed to the “Computer Audio” track in Garageband. You can test this by playing some music and see if the input levels begin lighting up on that track. To hear the sounds, we need “Record Enable” all of the tracks. Right-click on any of the tracks and choose “Configure Track Header…”

In the menu that pops up, click the checkbox next to “Record Enable.” This will put a new button on each track that looks like the button you see in this menu.

When this button is pressed, Garageband will route the sound from that track to both your headphones (so you can monitor the sounds) and to the “Soundflower (64ch)” output. You’ll need to record enable all of the tracks you wish to use in your Zoom session.

You should now be able to hear both your computer audio and your microphone through your headphones or whatever device you’re using to monitor your computer sounds. For example, I have a pair of headphones plugged into my Scarlett 2i2 that I use for monitoring. Save this Garageband project so that you won’t have to do any of this setup next time.

The last step is to set up Zoom. Luckily, this is the easiest part. In Zoom’s settings, head to the Audio menu. Select “Soundflower (64ch)” as the microphone and your monitoring device as the speaker. For me, that is my Scarlett 2i2.

And that’s it! When you join a Zoom call, your microphone and system audio will come through as one signal. An advantage of this setup is that you no longer need to share your computer sound, since it automatically comes through all the time.

To take things to the next level audio-wise, there are a couple of hidden options in Zoom. First, click on “Advanced” in the audio settings. There, click the checkbox for “Show in-meeting option to “Enable Original Sound” from microphone.” Then, click the checkbox next to “High fidelity music mode.” You’ll see here that I also have a “Use stereo audio” option that you might not have available. Click here to learn how to enable the option.

In a meeting, when you want your audio to sound as good as possible, click the option in the upper left that says “Turn on Original Sound.” NOTE: This text changes to “Turn off Original Sound” when clicked. That means it is turned on and working.

So there you have it! Great quality audio from your computer during a Zoom call. My students have reported that the music sounds great, and I can confirm that this is as good as it can get for audio over Zoom.


This week has been an exciting one for students, faculty, and staff at the TCU School of Music. We had the pleasure of hosting John Corigliano in residency for a wonderful week of performances, lessons, masterclasses, and discussions. It was an exciting experience getting to know one of the greatest composers of our time.

Especially enjoyable was a performance of his third symphony, Circus Maximus, by the TCU Wind Ensemble. The work is a colossal undertaking, requiring a large stage band, a small marching band in the back of the audience, and a sax quartet and cadre of trumpets throughout the balconies. The piece is certainly a spectacle.

But the spectacle is all but impossible to describe. Sure, I can recount the experience I had as an audience member. Sure, you can listen to this recording on Spotify, or even purchase this 5-channel recording of the piece. Sure, you can look at the score. But being in the space is the only way I was able to fully understand the work. The physical locations of the players throughout the hall is so important and clearly not an afterthought for Corigliano. Their locations are critical to the piece’s success, and while a 5-channel recording does some justice to this, it is hard to truly capture the immersive experience of being in the space of the hall during a performance.

So how does someone, especially this music theorist who takes pride in being able to describe the relationship between music and other domains, explain and/or analyze something like Circus Maximus? I had a similar problem upon attending a performance of Einstein on the Beach. Leaving the hall, I wondered to myself, “how am I ever going to adequately explain something that defies analysis and explanation?” A good reference video recording of Einstein helped, and I’m glad that the TCU performance of Circus Maximus was extensively filmed. But even then, I’m relying on a director’s interpretation of what should be on screen, possibly at the expense of other action on stage.

For Einstein, the solution lies with the fact that every audience member’s interaction with the piece is going to be radically different. In a 5+ hour performance, even the most disciplined audience member is going to drift in and out of the performance. Instead, large thematic ideas and repeated musical gestures (insert your minimalism joke here…but trust me…I’ve heard them all) can lead the way in terms of finding interpretation in meaning. I’ve done just that with several scenes in Einstein, and you can find slides to that presentation here. Crucial, though, to those analyses was an artifact, the full video recording of the work, that allowed me to reference more than just the music and my recollection of what happened visually.

I look forward to seeing the video recording of TCU’s Circus Maximus performance. Perhaps then I can start in on the question of how to analytically tackle the notion of space and music in a way similar to that of my approach to Einstein.

But first, Billy Joel anyone? I’ll be presenting a paper titled “Deceptive Love: The Impact of Deceptive Motion in Billy Joel’s ‘She’s Got A Way’ and ‘She’s Always A Woman,’” on Saturday, October 8 at 1pm (Mountain Time).


My graduate seminar on Music and Meaning has recently been reading Raymond Monelle’s chapter “The Temporal Image” from his The Sense of Music , which has me thinking about time. According to Monelle (and most other philosophers on the subject) time can broadly be considered in two modalities: moment-to-moment time and structural time. Essentially, no matter what you call it (linear vs. vertical time, cyclical vs. structural, earthly vs. eternal) we as humans tend to perceive time in smaller, moment-to-moment ephemera and in larger, more structured units at the same time. You got up this morning, got ready, went to work, but you did this because you know it’s a weekday and certain things happen on weekdays that might not happen on other days of the week.

In music, this dichotomy between local and large-scale time seems best epitomized by opera. Specifically a so-called “numbers” opera typical of the Baroque, Classical, and early Romantic eras. These works alternate between recitative (moments which convey plot information and move the story along) and aria (moments of reflection on the actions that have just happened or are about to happen). Arias, in a sense, are out of time; a pause in the time of the opera. Of course, musical time presses forward no matter what, and our journey through literal time with the opera is unaffected, but our perception of time is altered by the way the music is presented. Indeed, the musical content of aria and recitative is so engrained in our Western musical culture that they are now coded specifically in reference to time. In opera, once you hear that harpsichord and bass with speech-like vocal lines, you know where you are in the time of the opera, even if you can’t understand the words being sung.

The music in Pokémon Go, as well as in other Role Playing Games (RPGs), whether they be augmented reality or not, have a similar relationship with time. The overworld music serves as linear time, moving the story along. Battle music, on the other hand, slows things down. It’s an interruption in the cyclical nature of the game and requires more attention from the player. Naturally, the music reflects this shift in temporality, and often does so in a somewhat paradoxical manner. In Pokémon Go, the battle music is heard when a player attempts to capture a wild pokémon, but it isn’t slow at all. In fact it is much quicker than the overworld music. Overall, the D Major tonal center, a whole step higher than the overworld music, sets a tone of heightened excitement and a sense of forward momentum. Both of these attributes are bolstered by the opening ascending flourish and the consistent presence of a syncopated rhythm. This music is also much shorter (40 seconds), given that encounters with wild pokémon in the game are usually quick, and until reaching higher levels are rarely prolonged events.

So what are we to make of this apparent contradiction in the way time is being represented? On the one hand, the main point of the game is to walk around and collect as many pokémon as possible. Thus, the occasions upon which a player has to stop and catch a pokémon interrupt the larger goal of traveling. Walking is linear time; battling in vertical time. But the musical characteristics of the themes present contrasting topics. The overworld music is slower and in a lower key than the faster, energetic, and much shorter battle music.

Perhaps this is less a problem of time and more an ontological issue. Let’s face it; the overworld/questing parts of RPGs are boring. A player directs a character across a large map, hoping to either make it to a destination quickly (i.e. not get interrupted by encounters with creatures) or to have as many encounters as possible to increase experience. In either case, the act itself is tedious. The appearance of a battle can therefore be met with a great sense of excitement. “Something changed and I can stop just walking around!” The battle music, then, seems appropriate given the context of its appearance.

Overworld music in the time of augmented reality gaming

It’s the age old (well, “old” in terms of video games) question: Can the character I’m controlling in the game hear the music that I’m hearing? When you think of classic games like the original Super Mario Bros., the answer seems fairly obvious. The music is meant for the player, not the character, as the way the music changes (speeding up when time is running short, “star” music when the character is invisible, etc.) gives auditory clues to the person playing the game. Mario is just along for the ride.

But many games blur the line. Take for example The Legend of Zelda: Skyward Sword. When walking around the town of Skyloft, all seems to be normal with regard to the use of overworld music. Your character happily roams about while the cheery, lilting tune gives you, the player, a sense of calm and a push to explore. But at night, the music doesn’t change into an eerie, dark, and ominous soundtrack. Rather, it completely disappears. Was the music silenced as not to wake the other characters in the game? Can Link and the other characters actually hear the music during the day?

The Legend of Zelda: Skyward Sword, Nighttime in Skyloft

The recent global phenomenon of Pokémon Go, a game which makes extensive use of augmented reality gaming, further confuses the question on the diegesis of a game’s overworld music. For those that haven’t yet had the impulse to “catch ’em all,” Pokémon Go asks that players move their character not by pressing directional arrows on the screen, but by walking through the actual world we live in. The overworld, then, is our world, and as a player moves through it, they encounter small creatures (Pokémon) in an attempt to catch, evolve, and eventually battle with other player’s Pokémon. This is where the technological magic of AR comes in. When a player finds a Pokémon, the phone’s camera is turned on and the player sees the creature as if it were really there, sitting on a park bench or on the ground in front of them.

For a game in which the character is more closely connected to the player than ever before, and in which the game’s map is the real world, how does the traditional use of overworld music work? Pokémon Go has six main soundtracks; the overworld/walking music, the main title music, the catching Pokémon music, the Registered to Pokédex music, the gym music, and the gym battle music. Over the next few posts, I’ll examine each of these themes in turn, but today, let’s turn our focus on the overworld music and how it impacts gameplay.

Pokémon Go, Overworld Music

The opening repeated bass notes recall the ostinato patterns often found in the intro to training montage sequences (like this one from the 2015 Rocky spinoff Creed). And the connection to training montages seems particular appropriate since the players of Pokémon Go are known as “trainers.” The low ostinato also elicits a march topic, creating military undertones that foreshadow the battles that will be fought throughout in the game (and prepare the gym and gym battle music).


Step-up Sequence (starting at 0:13)

A recurring harmonic pattern throughout the overworld music consists of major triads ascending in parallel by major second. The first time this happens (from 13 seconds to 27 seconds in the above video), the goal is C Major (the overall key of the overworld music), ascending from A-flat major through B-flat major. But the next time this motion is heard (at 40 seconds), it begins on B-flat, quickly stepping through C and arriving on D major. The main theme is heard again in this key, a sequential step-up modulation that effectively increases the tension and intensity of the music.


Return of step-up sequence (at 0:40)

A little over halfway through the overworld music, the driving intensity of the opening ostinato is replaced by a contrasting lyrical passage in the subdominant (F major)(at 1:34 in the video). This is similar in affect to the Registered to Pokédex music. But just as quickly as it arrived, the driving ostinato returns with yet another iteration of the ascending step-up sequence from before, leading to a return to the opening and thus completing the unbroken loop necessary of this kind of overworld music (at 1:54).


Final appearance of step-up sequence (at 1:54)

The musical devices here serve to foreshadow and increase tension and excitement, but in the context of the method of gameplay, how successful are these musical codes at achieving their goal? In my own experience playing the game, I typically have the music turned off. And from what I’ve seen in casual observation around TCU’s campus, other players don’t seem to have their sound on either. The music, then, only seems to serve the few who are playing with the sound turned up (and turned up quite fully as to compete with the sounds in the real world). In addition, like many other mobile games, a player can choose to listen to the game’s music or provide their own soundtrack via iTunes, Spotify, or any other music app (Simon Morrison’s recent post at Musicology Now discusses, among other things, a crowd-sourced playlist to accompany Pokémon Go). But an augmented reality game presents an interesting question: what is the more authentic soundtrack for the game? Is it the one provided by the game, which hits on crucial musical codes to set the game’s mood? Or is it one provided by the player? After all, the game is augmented reality, so is not the player’s own musical choice more authentic? Or is the real soundtrack to the game life itself, the everyday noises and sounds all around us? Consider that if it was Ash (not us) walking around a fictional Pokémon land, he wouldn’t hear the driving ostinato or the step-up sequence. He’d hear the sounds of Pokémon rustling and the grass beneath his feet.

So can the characters hear the overworld music in a video game? It’s complicated, but the rise in popularity of augmented reality gaming will continue to obfuscate and complicate this question even further.

Stay tuned for more posts on the other music in Pokémon Go.

Directional Tonality and Wolf

My graduate music analysis class is discussing Wolf’s “Mir ward gesagt” in class today. The song starts in a most ambiguous way, seemingly signaling G major before adding a minor 3rd underneath to suggest E minor. Things continue to get weird as dominant 7th chords seem to “sort of” resolve throughout the song, but never quite land definitively where they should. Ultimately, melodic material is repeated for a second verse, this time a whole-step higher, adding even more confusion. Luckily, the song ends quite conclusively in D major, but what are we to make of the key until that happens?

The concept of directional tonality describes music that ends in a different key than which it began. This is similar to an auxiliary cadence (off-tonic opening), but the starting key is typically well-defined rather than a fleeting hint. Wolf’s song presents a problem in that nothing about the song suggests D major until the end, leaving over half of the song in tonal limbo. I’ve armed the class with a couple of tools: we discussed the Schenkerian notion that the final key is the key for the whole piece, but we also briefly discussed transformational theory as a way to explain local chord progressions (or perhaps even global shifts in key areas). I’m curious as to what they come up with.

In-class Song Analysis with Nested Groups

Sometimes you get so excited about a teaching idea that you just have to share.

My Sophomore theory class is embarking on the topic of mode mixture for the next few days, so I thought we’d jump right in with some song analysis, specifically Schubert’s “Der Neugierige.” SPOILER ALERT: This fairly early song in the cycle Die schöne Müllerin is one of the first to subtly hint at the tragic fate that awaits the protagonist, and Schubert accomplishes this through the use of mode mixture.

The plan is to have my class of 18 students divide initially into 6 groups of 3. Each group will be tasked with addressing some aspect of the song (5 groups for the music and 1 for the text). After 10 minutes or so, each student is now an “expert” on that particular part of the song. They’ll then regroup into 3 groups of 6, each group member coming from a different smaller group. Each bigger group can now discuss the song as a whole, interjecting their findings and coming up with a complete(ish) analysis of the song. Finally, we’ll reassemble as a class and debrief, making sure we all noticed the mode mixture and the ramifications when considered with the text.

We’ll see how it goes this morning.

UPDATE: I think it went really well! For those that may be interested, here is how I divided the 6 groups: Continue reading “In-class Song Analysis with Nested Groups”