Web Audio Editor (update)


Web Audio Editor: A simple audio editor that works in the browser, designed for kids age 7+ (URL // source code)


I teach music workshops at an afterschool program where kids build their own sample libraries from scratch. They find the default audio editor (Audacity) really frustrating, and I’m with them! So I decided to build an audio editor that does all the stuff that kids like to do (like reverse sounds, change the speed, and add reverb). I wanted an easy way to make good sounds, and learn about sound in the process.Even though it was designed for an audience who has never worked with digital sound before, the Web Audio Editor can be of use to anyone who wants effects or just a quick edit. I wanted to do this in the browser, rather than as a standalone application, after researching music education software’s transition to the web through my Technology for Music Education course (at Steinhardt). I also drew inspiration from Pixlr, the online photo editor that I choose over Photoshop for quick little edits. Web audio is still in its early stages, so the pieces necessary to build this weren’t available until very recently. But HTML5, the WebAudio API, and Canvas open up a lot of potential for audio playback and visualizations.

Much of the Web Audio Editor’s functionality comes from SoX, a command line utility that has been around for over 2 decades. Shawn Van Every introduced me to SoX in CommLab Web, and helped me set it up on my server so that I can call it with PHP. The frontend is built off of Wavesurfer, a JavaScript library that visualizes the waveform in a JavaScript Canvas. I also added a Canvas to visualize the frequency spectrum, inspired by ICM and Marius Watz’ Sound as Data workshop. I am continuing to develop this project and would be thrilled to present it at the ITP show.

Personal Statement

My primary inspiration was seeing kids struggle to use Audacity. I wanted to see if I could build something simpler that functions in the browser. I developed the idea in a course at Steinhardt called ‘Technological Trends in Music Education: Designing Technologies and Experiences for Making, Learning and Engagement.’ I made it a project in CommLab Web (just to get the PHP / SoX running) and ICM (to visualize the sound data and build a user interface).


My course at Steinhardt included readings from MIT’s Lifelong Kindergarten group, and I became very interested in Scratch, a visual programming environment for kids. Scratch very effectively teaches programming concepts by giving a space to create and readymade pieces, like lego blocks. Scratch has an audio editor that has been inspirational for this project.On a more technical level, I’ve found several attempts to build a Web Audio Editor, including plucked.js, a thesis project called Audiee (http://www.stud.fit.vutbr.cz/~xmyler00/audiee/#), and a thesis project by Ryan Viglizzo here at ITP. I’ve been in touch with Ryan. I have dug deep into PHP, SoX, CSS, HTML5 / JavaScript / Web Audio / Canvas to figure out how to do a lot of things I had never done before.

The Web Audio Editor was designed for kids in my after-school workshop, so 2nd-8th graders who have short attention spans and haven’t necessarily worked with digital audio before. But I think it’s of use to anybody who needs to make a quick audio edit, or add some fun effects.
I’ve learned a ton! I had never heard of SoX and was new to PHP. Initially I thought I’d be able to use Processing.js and Minim to visualize and scrub through the audio. But I quickly learned that Minim uses JavaSound, and Java is going out of fashion for web browsers. I also learned more about how Processing.JS works and why it won’t really be able to help me in this case. So I dug into HTML5/JavaScript, where I discovered that much of what works in Processing can be recreated in a Canvas, and the WebAudio API works a bit like Minim. I’ve also learned a lot about user testing with different audiences, including the kids in my after-school program as well as my classmates at ITP.


Built with SoX, and Waveform.JS a library by Katspaugh. Other libraries include Bootstrap and jquery. The frequency spectrum visualization draws from Boris Smus’ Web Audio API ebook (O’Reilly 2013). Many thanks to Shawn Van Every, Daniel Shiffman, Alex Ruthmann, Sam Brenner, Sam Lavigne, Luisa Pereira, Ryan Viglizzo, Danne Woo, Beam Center and everyone else who has helped me figure out how to do this!

Audio Transformer update

Audio Transformer, the Web Audio Editor, is online in a functional demo mode. It’s not ready for public testing until I prepare my server, and there are many features (and bug fixes) yet to come. But if you’d like to check it out, here it is (source code here)…



I had the chance to user-test with my ICM class, and observed that graduate students tend to start by pressing the button on the left and moving to the right. Only those who really knew what they were doing with audio editing would jump straight to the fun stuff (effects). A few people also thought to press Spacebar for play/pause, but most didn’t think to do that, so I added a “spacebar” text hint to this button. At that point, there was a button to place markers at a spot on the waveform, and everyone wanted click’n’drag to select an area, so I’ve begun to implement this. I also added Loop mode, with a symbol that shows up when loop is on, though if I have the time I’d like to develop my own buttons that look like physical buttons.

“Speed Up/Down” has little effect on the waveform, so there needs to be a way to show that the length of the file is changing, otherwise it doesn’t look like those buttons do anything. I added a timer in the top-right but would like to visualize this more clearly by showing the full waveform in a skinny timeline at the top, and the selected area in the bottom. As the file shortens, the zoom-level stays the same, so the selected area will grow in proportion to the full file. This’ll make a lot more sense if I can just show you what I mean. Other comments were that the frequency spectrum colors didn’t seem to correlate to the waveform colors, and if the colors that represent sound should be link.

Before presenting this to kids in my workshop, I need to indicate when WAE is processing audio, and grey out the buttons so that they can’t freeze the program by overloading the “speed up” button.

I am subbing for my friend’s Scratch video-game workshop, and I had the chance to work on sound effects using the Scratch interface:


Scratch has been a big influence on my approach to “designing for tinkerability”, as has a lot of the projects and research coming out of MIT Media Lab’s Lifelong Kindergarten Group. Its audio editor has a concise, efficient design. They don’t overload the user with too many parameters, for example, they leave things as “louder” and “softer” rather than a volume slider. This is the way I’ve implemented my effects, though I think that in the name of tinkerability, I should not only provide a preset starting point, but also provide an advanced mode for users who wish to dig deeper.

Scratch gives kids three options for sounds: 1. record your own, 2. choose from preset sounds, 3. upload a sound. The kids wanted to try each of these. One group wanted to record their own voices saying “crap!” Another went to YouTube trying to figure out a way to bring the Dr Who theme into their project. And others explored the library of existing sounds. I think that offering all of these starting points would strengthen the web audio editor.

Designing for kids as young as 2nd grade is difficult because they aren’t all able to read at the same level. This applies to words, but it also applies to symbols. For example, when I asked the kids to play a sound in Scratch, some didn’t know what button to press. They hadn’t all been exposed to a sideways triangle symbol as a play button. Even if it said “play” they might not know what it means to “play” a sound. I don’t know if there’s a better way to convey the meaning of these abstract audio concepts, but I think that using the most simple, conventional names and symbols will help establish meaning that will stick with them later in life.

As my Physical Computing teacher Tom Igoe says, there’s no such thing as ‘intuitive’, just learned behavior. So in an educational setting for kids who’ve never worked with audio before, it will be necessary to point out some things.

Just this morning, I had the opportunity to present this project to a 5-year old. At first, thanks to her guide pointing out the foam chair, she was more interested in picking up the foam chair than in working with the Audio Transformer. When she sat down, I gave a short explanation that this is a way to listen to sounds and change the way they sound. I showed her how to click and drag a file from a desktop folder into the browser, then pressed buttons to change the sound. She was much more interested in dragging the sounds than in modifying them. Click’n’drag is a difficult skill for novice computer users, but she told me she’s been working on it with her dad, and she seemed intent on mastering it now. The dragging distance proved too far for her to manage, so I helped load the sound and then encouraged her to try pressing the buttons. She didn’t understand which button to press to play the sound until I pointed it out, and from there she successfully slowed down and reversed the sound and played it back. She was on a tour of ITP so my project had a lot of competition for her time, but afterwards she said that the project was “fun.” I asked if there was anything that wasn’t fun and she said no. I think this is a good sign, but I’d like to try to make it easier to load readymade sounds—perhaps within the browser itself the way Scratch does—without the need to click and drag.

As things stand, I have several features I hope to implement:

  • Don’t afford the ability to press buttons while audio is processing (because it causes errors)  (but could be done more elegantly)
  • Allow Edits w/ better highlighting of selected area
  • Zoom mode w/ additional waveform view update, highlight selection
  • Spiff up interface with symbols that can help bridge a child’s current level of understanding with audio-related symbols that’ll carry meaning later on in life.
  • Allow Record (WebRTC?) https://github.com/muaz-khan/WebRTC-Experiment/tree/master/RecordRTC/RecordRTC-to-PHP   (but stops recording properly [gets glitchy] after ~three recording sessions or if a file is played until the end…why??)
  • More options for starting sounds (preload a range of cool sounds and waveforms)
  • Oscilloscope ( http://stuartmemo.com/wavy-jones/ ) because the wavesurfer plugin isn’t as preceise to illustrate the concept of a sine wave, triangle wave etc they just look like big blocks of sound…
  • Better Undo/Redo (download page with all files at end of session then delete them?) ///// on close, delete all of the files. Filesize limit. These are important before making the website public so as not to overload my server.
  • “Advanced Mode” allowing user to tweak effect parameters. Audacity has too many parameters, Scratch has too few, WAE should provide a simple starting point but allow tinkering for those who wish to dig deeper and explore

[ Dec 7th update: crossed two items off the list)

Birdveillance update

Yesterday Yiyang and I presented Birdveillance for our final Physical Computing class. We’re proud of our work, but feel like we still have more work to do on this project, and hope we’ll get a chance to develop it for the winter show.


** install flash in head, tweak head & eyes
** felt neck & tube
** will email work or do we need to switch to HTTP POST JSON API?
** beak
– figure out Speech to Text
– Beak
– eyes
– sew wings
– wings — motors?
– upload photos somewhere else to get around twitter photo upload limits

– feet?
– responds to sound threshold (goes to sleep if below threshold)
– slower motion to catch the faces
– clearer eye / photo signal (add a flash from a camera)
bird stand: box, label with twitter handle

Surveillance Bird update

Surveillance Bird is coming along after quite a lot of work, it’s not quite there yet but we’re on the right track.

All of the tests we’ve done so far have indicated that the bird needs to look & act like a bird, inviting people to interact in a fun/fuzzy way. Otherwise people will think it’s a surveillance robot and they will run away. At first we thought we could find a bird toy, but the Angry Birds’ head is too much of a symbol, and looking online we were unable to find a big enough bird that would arrive in time. We decided it’d be fun to make our own, anyway. Yiyang is a painter and I haven’t done paper mache since kindergarten but we thought we’d try to paper mache. Then I was going around the city and found that everything I looked at looked like it could be a part of the Bird. So I found some Christmas ornaments, a plastic apple, and a cookie jar at the dollar store along with some fuzzy socks and we’re going to make this our bird.


Today on my bikeride to ITP I was thinking about Surveillance Bird’s need for a base to hold the motor in place. I saw a wooden bedframe, snapped off a leg, took it to the shop and—with the help of Dan and Scott who showed me how to use some new tools—managed to turn this wooden leg into a base for the motor. Meanwhile the xmas ornament proved too fragile for us to slice out a camera hole, so we used the plastic apple. Plastic is great!

We had a lot of trouble with Serial communication between Processing (face detection) and Arduino. After hours of troubleshooting, it turned out that using Delay() on Arduino was the source of our problem. I had been using random delays to simulate a noisey birdlike movement—the steady scanning we presented in class was way too ominous. Moon showed me a way to simulate delay by creating a counter, and that’s doing the trick.

One nagging issue is that the Pan/Tilt bracket for our two motors came with cheap little screws, and so far we broke four of them. We really needed those. I’ve been to Radio Shack, Home Depot, and Ace Hardware in search of replacements but none of them carry screws this small. And then on top of this, we’ve burned out two motors and had to borrow one. Each servo motor has a different shape to its connectors, and the connectors need to be spaced a certain way in order to screw properly into the Pan/Tilt bracket. This is the one piece we thought we wouldn’t have to worry about, but instead it feels like it could fall apart at any moment.

Another issue is Audio. We want to have the bird tweet when it moves, flash when it finds them, and then tell say “follow @birdveillance on twitter”. There is only 32kb of flash memory on the arduino, so our options are limited. The PCMAudio library from High-Low Tech seemed perfect, but it conflicts with the Servo library. The Mozzi library seemed like it’d work, but regular samples are too large, and Huffman-compressed samples don’t seem to wanna play in this sketch. They do play in an empty sketch, but if I add the Serial library to that sketch and start serial communication, it slows down the sample playback, so maybe the sound is actually playing but at such a slow speed that it is inaudible…

We’re still not sure what the bird is going to do when it sees people. So far we have the ability to take a photo of them and post it to twitter. We’ll hopefully figure out some sound. And we have lights installed in its head. We also have the ability to listen and post Speech To Text to twitter, but we need a good microphone if this is going to work in a bustling space like the ITP floor. Maybe we can solve our audio problem with the computer (and our own speaker?) since we’re already relying on it so much.

Audio Transformer

My final project, “Audio Transformer,” will be a simple audio editor that functions in the browser.

I’m designing for kids age 7 and up who have never worked with audio. The idea came from teaching afterschool music workshops, and seeing how frustrated kids get by the defacto freeware program Audacity. With so many features, Audacity can be overwhelming even for adults—here’s a screenshot.


I want to make this work in the browser because I’ve also been researching this topic for my class Technological Trends in Music Education: Designing Technologies and Experiences for Music Making, Learning and Engagement. So much of the technology we’ve looked at in that course is happening in the browser. Some technology, like Scratch, incorporates audio editing features. But I have yet to find any dedicated audio editor that works in the browser. I think there is a need for this sort of specialized tool, something like the pixlr photo editor but for audio. I could even see myself using Audio Transformer in favor of pro audio programs when I just need to make a simple edit.

My goal is to keep things simple by focusing on the features that interest the kids in my workshop. They want to hear sounds backwards, slowed down, and sped up. They need to be able to delete, crop, cut, copy, paste in order to make their edits. Fades, normalizing file volume, and determining the file’s starting point (ie. cropping out initial silence) are not so exciting for kids, and often neglected, so these should all happen automatically by default. Undo and Redo are essential. I’m going to model these functions after Scratch’s audio editor with two cursors for selecting a segment, and dropdowns to choose effects/edits.


The sound data needs to be presented in a visually meaningful way. I think it’s important to introduce the concept of a waveform, and to visualize the frequency spectrum. I’ve begun to experiment with visualizations that can represent frequency and attack/decay in meaningful ways. For example, I’d like to have a visual method of changing the bass/treble frequencies, something like this:

I went to the Sound as Data workshop and was very inspired by the possibilities for visualizing sound, but all of the examples use Minim, which uses java.sound and therefore won’t work in the browser. Maxim is a sound library that has a javascript version, so I made the above using the Maxim library because I thought it would work in the browser…

…but it doesn’t work in the browser. There are a lot of issues with Processing, sound, and the web. So I may just have to use JavaScript. I’m excited about the idea of delving into the WebAudio API which has a lot of the stuff I’d need.

Here’s a javascript demo that visualizes the waveform and has the beginnings of the UI that I’m going for.

Here’s a functional mockup that actually edits an audio file (using PHP and SoX, running on a server)

Next, gotta figure out how to pull this all together! My big question is:

  • How can I get JavaScript talking with PHP? I need to send variables from one to the other. From my research and experiments so far, I think the answer involves AJAX, POST and html forms, but nothing is working so far.

Creativity: Flow and the Psychology of Discovery and Invention

What is the key to happiness? How do we find meaning in the chaos of life? And why do our best ideas comes when we least expect them? These questions are the domain of psychology theorist Mihaly Csikszentmihalyi. His answer derives from Flow, a concept he outlined in his 1990 book Flow: The Psychology of Optimal Experience. Flow basically means “gettin in the zone.” I’m responding to one of his follow-ups from 1996, Creativity: Flow and the Psychology of Discovery and Invention, which examines how Flow impacts “Creativity.” Csikszentmihalyi distinguishes Creativity (with a capital “C”) from other forms of creativity in ways I find problematic, but the distinctions and theories are useful as I look for inspiration and happiness in life. Continue reading

PComp Idea

Bird Surveillance System – a network of sensors hidden in tree-like areas, like birds, that observe you. When you rush by, they tweet one note and light up red. But when you stop to observe their presence, they cycle through a sequence/arpeggio of light and sound. There are three of them hidden throughout the space. Find them, get your friends to join in, and make music out of the surveillance network.

Are they birds, or surveillance cameras?

What is the interaction? How can they be quiet when nobody is around, beep fast when there’s a lot of movement (someone rushing by), but play music when somebody stops to interact?

Pixel Experiments

So much to play with this week! Here are my two favorite experiments.

Bearpocolypse – loops through each pixel in the bear image, compares it to every other pixel, and changes each pixel based on “distance” of the RGB.

source: http://itp.jasonsigal.cc/icm/pixels/bearpocalypse/bear.pde

In Motion Dot-tector – built off of the Learning Processing Motion Sensor, and inspired by something I saw at the Exploratorium as a kid. Ellipse size decreases if there is a lot of motion, and ellipses only show up at pixels where there is motion (a difference between current capture and the most recent capture). This one took a lot of memory to run.

source: http://itp.jasonsigal.cc/icm/pixels/inmotion/inmotion2js.pde

I got caught up in all of the examples and libraries, but wasn’t sure where to take them. I’d like to come back to Pixels to make music out of shapes and/or movement.

Reading Notes: Music Education with Digital Technology

The four chapters we read this week were very inspiring as I reflect on the experience of my digital music workshop.

:: In “The DJ Factor,” Mike Challis describes a curriculum for at-risk 14-16 year olds in the UK who have not had any previous music experience. The key to his experience is to start from music that the students are interested in, which is novel for students who often have a sharp divide between music in and out of school. In the UK, kids listen to electronic music, UK garage and hip-hop. The first stage is to put music into their hands as DJs, and through a 4-step process of removing elements to fill with original elements, they produce a piece of original music.

In Stage 1, beatmatching is used as a way to learn about beats and bars. I’ve struggled a bit to teach this concept to the kids in my workshop, and despite the fact that they are much younger (some are as young as 7), I can’t think of a more hands-on approach than using two turntables and a mixer.

In Stage 2, Reason is introduced as a beat-making tool, and students create original beats to layer over their DJ mixes. The main lesson here for me is that the students aren’t given a comprehensive overview of how the program works, but instead they just dive in to figure it out on their own. Reason gives immediate feedback, so it may in fact be easier to learn from diving right in than by a general tutorial. The instructor’s role is to answer questions, monitor, intervene when necessary and provide feedback.

In Stage 3, students add a bassline either with a live performance using MIDI, or by sequencing each step of the melody. And in stage 4, structure is introduced by listening to the music that the students brought to the program. At this point, the original mix can even be removed to leave each student with a completely original composition (not to stay that remixed elements couldn’t also be used in original ways). The article mentioned that Dizzee Rascal had this type of opportunity when he was a teenager, I found this to be really inspiring.

:: The next chapter, “Composing & Graphical Technologies” by Kevin Jennings, gave me a lot to think about as I design my web-based audio editor. I’m working on the design, and the visual representation of sound will have a significant impact on how people use this tool.

Graphical interfaces can provide novel ways for young people to dive into composing without needing to learn notation or a sequencer. However, music is not visual, so any attempt to represent it visually biases and affords certain types of uses, whether it’s traditional notation or sequencers or Finale or Logic or the original programs Hyperscore and Drum Steps mentioned in this paper.

I’m really inspired by idea idea of Hyperscore, in which ‘motives’ can be brought into a ‘sketch.’ The author believes that this particular interface is best-suited to teaching rhythmic concepts, texture and form, but despite his efforts, questions concerning pitch did not come about as naturally.

If you set a young person free to explore music composition in an environment with certain affordances, they will come to understand many different types of musical concepts on their own. After an initial stage of “bricolage” or “just messing around” to become familiar with the interface, Jennings observes the way students settle on a clearer idea to explore. This bricolage is an important part of the process that reminds me of Mike Challis’ approach in teaching Reason. The questions come up as part of the exploration process, and in the case of Hyperscore, concepts like inversion, repetition, and variation appear in students’ composition without any mention of these concepts by the teacher. In short, musical concepts don’t need to be explained as long as the system affords the potential to discover them through exploration.

In contrast to HyperScore, Drum-Step as a graphical interface that students seem more likely to interpret in non-musical ways. Their motivations are more about making things that look cool than by the musical inspiration. So graphical interfaces can have drawbacks. Interfaces need to incorporate non-musical elements, and there is a point where those might eclipse the musical elements. If the goal is music education, then it’s best to design an accessible path into musical understanding.

In the case of my web-based audio editor, there is a question of whether to follow the established ways to represent sound, or invent my own. There is also a question of how to describe the various effects that can be used. If an effect is offered, it will be used, is the message of Jennings’ article. However, in the case of a “feature-bloated” program like Microsoft Word, only a small percentage of the program’s features are actually used on a regular basis by most users. That is the problem I’m trying to solve by creating a program that streamlines some of the audio editing features available in other programs like Audacity. But I need to think very carefully about how they are represented visually.

:: Reading Steve Dillon and Andrew Brown’s 2005 chapter on networked collaborative music-making  here in the year 2013, I’m struck by the way they approach “computer as instrument,” “cyberspace as venue” and “network as ensemble” with such novelty. The trick for this type of program to work is to send data (i.e. MIDI notes, OSC messages) instead of the actual audio (which is also data, but a different kind that takes more memory).

Some of the benefits of networked jam2jam are that this type of experience invites listening and creating at the same time. Participants need to use musical terms to communicate their ideas, and jam2jam provides chat boxes that facilitate.

Dillon & Brown identify three ways that students collaborate without being in the same physical space (country, even), : Disputational involves individual decisions without agreement. Cumulative collaborators largely agree, but avoid confrontation. Exploratory is a give-and-take, modifying each others ideas to build something that is greater than the sum of its parts, and that seems like the goal here.