“Neural Audio,” or, “What I Did This Summer”

I’ve had a few people1 ask me what, exactly, I was doing all summer, off in Louisiana. As a programmer, being efficient is sort of the goal of everything I do; as such, doing a single write-up here and then sending that link to people makes more sense than answer the question over and over.2
I spent the summer working at a National Science Foundation-funded Research Experience for Undergraduates at the Center for Computation and Technology at Louisiana State University.3 It’s a pretty cool setup they’ve got at the CCT4 – it’s not an academic unit, it’s a research group only. The building has all sorts of handy resources – all of us in the program had access to both a shared workspace for the REU students and our own individual workrooms, which varied depending on our project.5 The exciting new thing for me was the server room, which I had access to.6 There were a few machines of interest in there – HIVE, a cluster-in-progress that was devoted entirely towards art that required high-powered computation, and Titan, a machine designed for use with neural networks.
This is where I lead in to my specific research program, which wound up being titled “Neural Audio,” as above.7 The goal was basically an exploration of the use of deep neural networks for music information retrieval.
Whoops, went a bit jargon-heavy. Let’s break it down.

Deep Neural Networks

You may have heard about this one before – neural networks are the current big thing in artificial intelligence. Google uses them to power a lot of things, but the big one people have heard about is Google Photos, where deep neural networks provide the incredible search features.8 As you might guess from the name, they’re based off the structure of the human brain:9 a bunch of nodes, connected by weighted edges, which are the neurons and synapses of the artificial brain.10 Now, what’s cool about machine learning is the training: instead of sitting down and writing an algorithm to perform a task, you just build up a big data set of questions and their paired answers. Then you feed it into the system, and it learns11 how to answer the questions.
Of course, it’s not that open-ended- you can’t drop the works of Shakespeare in there and expect it to write a paper analyzing his writing style.12 They work best with categorization – you give them a set number of categories, and the network can tell you either which category something belongs to, or the percentage chance that thing falls into each category.13
Beyond that, there’s nothing fancy about neural networks – they’re just a software construct used to do a heck of a lot of math, the end result of which is an algorithm that no human could’ve designed. Cool stuff.

Music Information Retrieval

The field of MIR isn’t new, they’ve been around for a while doing cool things. It really does what it says on the tin: the idea is to be able to feed a piece of music into the software and receive useful information about the music out. Software that can recognize the key of a song being played or identify the speed at which the piece is being performed are good examples of this.14

Combining Them

My work was basically looking into combining these two fields. Machine learning can do some cool stuff, the idea went, so why not try applying it to music?
This took two forms: trying to identify the genre of a piece, and trying to identify the instruments playing in a piece.
It’s here that I’m going to hand off the explanation to another thing I was working on this summer, though as a test subject rather than a researcher: the digital poster. One of the other research groups at the CCT was working on a system to modernize the poster presentation, a staple of scientific conferences. I had the opportunity to be one of the trial-run students for the digital poster, and wound up putting together an online version as my way of wireframing what the final product would look like. Being me, I made my wireframe look just as good as the ‘official’ one, and wound up posting the whole thing online and providing a QR code on the paper poster15 that linked to the online site.

Wrap-Up

While the summer, and thus the time I had at LSU, came to an end, the work didn’t. I’m still16 trading emails with my mentor, and I’m hopefully going to be attending another conference at some point to talk about my work. In the interim, I hope to be able to get some additional work done, maybe get some more interesting data out of the machines. It’s a goal, and time will tell how well I’m able to accomplish it.
That’s about all I’m going to write here – if you want to know more, you can check out the digital poster, and if that doesn’t get you enough information, you can fire me an email, it’s grey (at) this site.17


  1. Reasonably 
  2. If I were teaching a computer science course, the first thing I’d say would be along the lines of “‘efficiency’ is just a codeword for ‘laziness that won’t get you fired.’” 
  3. Or “LSU CCT NSF REU” for out-of-order short. 
  4. I hope you read the last footnote, because I’m going to be using these short-forms of the names throughout. Efficiency! 
  5. One person had a few offices shared with graduate students working on the same program; another had a Mac lab to themselves; I was given the key to a media lab on another floor. 
  6. I found this oddly entertaining after I had to let one of the IT staff in there to reboot a server following a power failure. 
  7. I kept trying to make it “neural audio,” because I’m a millenial and thus hate capital letters, but I was overruled by my mentor. Probably for the best. 
  8. Seriously, the fact that I can search for someone’s name and have it accurately spit out a list of every photo I’ve taken with them in it is seriously impressive. The fact that I can ask for stuff like “mountain” or “car” and also get accurate results? Mind-blowing. 
  9. Though, it’s important to note that they’re not based off an accurate/current idea of how the human brain works; we’re computer scientists, not biologists. 
  10. The weighting of the edges is important, as that’s where all the magic happens. Each node, simplified down, is performing an averaging operation over all of its inputs. The output is then passed along the edges, and transformed by the weight of that edge, creating the new input for the next node. 
  11. Using a system called Stochastic Gradient Descent, which I find to be a very elegant solution the problem. (I recommend reading the previous footnote before this one.) Learning, via training goes like this: you feed the network an input, and the randomized initial weights do the processing and spit out an answer. That’s probably not the right answer, so the network will change the weights in a random ‘direction,’ and then try again. If it’s closer to the right answer, the network will change the weights in that direction again; if it was further away, it’ll try a different direction. The process of training is just repeating that operation over and over and over again. 
  12. Although, entertainingly, you can drop the entire works of Shakespeare into a neural network and have it make a spirited attempt at creating a new work in the style of Shakespeare. 
  13. That’s called softmax, and it’s pretty handy. I looked at using changing softmax results over time as a way of extracting metadata from music. 
  14. Entertainingly, some of the best examples of MIR arguably aren’t MIR at all: Gracenote, for example, the system that allows the ‘smart’ stereo systems in cars to figure out what CD you’ve just put in, is based on a ‘CD fingerprint’ that looks at the length of the tracks and when each one starts. It is possible, with a lot of effort, to design a CD that will show up as being something entirely different than it actually is. 
  15. We were all required to make traditional paper posters, regardless of our use of digital posters. 
  16. Infrequently, because time zones. 
  17. I’m not dumb enough to put my email address up on the open web, c’mon. I already get way too much spam email.