Category: Discussion & Process

Discussions about the state of music composition in the 21st century and for the more technically inclined, research observations involving music informatics and related software.

  • 4 Channels Wide

    4 Channels Wide

    I spent the last three weeks in Florida working with a 10-speaker dome configuration. But for reasons both musical and practical, I won’t be adopting the dome in my current workflow. At least not today…

    Some of my requirements that prohibit use of the dome include:

    • ease in travel and setup
    • flexibility in performance
    • sharing and distribution (i.e. web playback)

    But most of all, the dome seems best suited to aesthetically-focused acousmatic music. By contrast, the music I create relies heavily on creating the illusion of real acoustic instruments and is otherwise deeply rooted in our vast musical heritage.

    That said, the dome has inspired a few important changes to my studio. Specifically, I’ve expanded the available stereo image with two additional speakers placed roughly at three and nine o’clock.

    I suspect that a more detailed multi-channel image originating from the concert stage can overcome the aforementioned obstacles while providing the desired benefits. So far, my studio experiments have been very promising, but I’ll need to develop some repertoire and try it out in real performance spaces to know for sure…

  • Atlantic Center for the Arts Residency

    Atlantic Center for the Arts Residency

    I’ve just returned from a three-week residency at the Atlantic Center for the Arts in New Smyrna Beach, Florida — and what an INCREDIBLE experience it was!

    From the wonderful studio facilities to the glorious chef-prepared meals, the ACA exceeded all expectations. But the outstanding aspect of the residency was the people! Getting to know the staff and the other 23 artists was what made the whole experience worthwhile.

    (more…)

  • Composing Music with Isomer

    Composing Music with Isomer

    After graduating Eastman in 2002, I got involved with music informatics research and spent several years developing music search and discovery technologies. During that time, I was exposed to emerging AI that led me to imagine potential solutions to existing challenges in computational creativity. Today, I’m working to develop methods for human and software collaboration with the goal of generating novel musical expression grounded in familiar perceptual structures.

    Unlike other approaches to computational creativity, I’m not interested in directly modeling the creative process in humans. Instead, I’m searching for a musical result that speaks emotionally through a context-aware, emergent grammar — a language that feels natural and engaging, even when its surface is unfamiliar. It’s a tall order, and something that I’ll be working toward for a long time to come.

    What is Isomer?

    Since early 2012, I’ve been developing my own software (called Isomer) which allows me to move flexibly between interpretations of fixed audio sources, symbolic performances, and detailed orchestrational renderings. (You can read more about how Isomer thinks about music here.)

    What makes working with Isomer unique is the advanced role the software plays in musically interpreting the raw input data. Over time, Isomer is learning how human-composed music creates and satisfies expectation (emotional tension) and applying this knowledge to the output it generates.

    Making Machines Musical

    Applying deep learning technologies to artistic expression is a popular trend. But even in the best cases, this approach generally results in simple, distorted copies of the original input. While it can provide uncanny effects, the results remain tightly bound to limitations within the input models. In my view, creativity requires the ability to forge new connections between existing contexts, and the missing ingredient with current machine learning approaches is the conscious application of context within the creative medium. Or put more accurately: the contextual classification of musical elements critical to defining musical intent and expectation.

    These critical elements do not appear to reveal themselves with current deep learning methods which is why Isomer must become capable of deciding whether or not to apply learned rules to guide musical expression within an ever-changing artistic context. A simple example of this might be the application of a crescendo to a rising melodic line. There isn’t a single, ideal crescendo that will work in every case. Determining whether or not it’s an appropriate addition, and working out exactly how it should be executed, depends heavily on the melodic, harmonic, and timbral environment. Even if Isomer can learn how to apply an appropriate solution, the contextually-dependent decision whether or not to do so still remains.

    More complex examples include: how to finalize a musical gesture (i.e. registral return, agogic accent, dynamic contrast, none/all of the above?), how to affect a harmonic transition (i.e. adjust harmonic tension within a specific context), and how to orchestrate evolving variations of a specific melodic idea. These examples show how important the interpretation of musical events within their near-term context can be, and it’s for this reason that context awareness is the primary focus of ongoing development. And so with that in mind, let’s take a look at how I collaborate with Isomer throughout the composition process.

    Creative Collaboration: Isomer’s Role

    Isomer’s first job in the process is to reverse engineer audio input to determine how its spectral components change over time. Additionally, Isomer searches for perceptually important transient attack points or “event onsets”. Each event onset defines an adaptive analysis window that provides rhythmic context and helps Isomer develop large-scale formal pacing.

    Within each window, partial data is extracted, normalized, and stored. A picture of the composite harmonic layers is assembled (from bottom to top), allowing Isomer to define a series of layered monophonic lines. These lines are then shaped into useful musical material using learned principles of musical expectation and universal construction.

    Once the model is established (click here for details), Isomer can search for musically relevant patterns and modify the output based on composer input. For some applications, the composer may wish to keep the original analysis generally intact, while in other cases, it may be desirable to ask Isomer to add, remove, or modify events to create a result that is more predictable and universally appealing in its construction.

    Creative Collaboration: The Composer’s Role

    What’s left for the human composer to contribute, you ask? First, the composer must curate the input source(s). Whether it’s a series of short “found” sounds or a meticulously crafted concrète work, the input source has a significant impact on the harmonic content and dramatic pacing of the resulting work.

    Speaking of output, Isomer’s analysis often results in hundreds of tracks, but only a few (maybe 24 or so?) are likely to be used in the competed work. The composer must evaluate the potential for each of these output tracks and determine which will make it into the piece.

    In early versions of the software, the composer completely determined the orchestration. Today, Isomer sets out a basic orchestrational plan which must be modified by the composer. In future versions, Isomer will evaluate the timbral signatures of orchestral options to determine the most musically useful combinations of instruments to employ within the varying emotional states of the work.

    And finally, the composer must digitally realize and produce the resulting piece. Digital production isn’t traditionally considered part of the compositional process, but for me (as with many composers of electronic music) it is. Machine listening is still a long way from dealing with detailed production choices, and these decisions can have an enormous effect on the resulting piece. So for now, production must be up to the composer.

    A Final Thought

    Keep in mind that while exploring creative collaboration with computers is an interesting topic, these process details really don’t define anything too important. What’s truly important is the experience gained by listening. And that will always remain the biggest challenge for anyone involved in creative pursuits — computationally collaborative, or otherwise.

  • Representing Musical Models (the Isomer way)

    Representing Musical Models (the Isomer way)

    Isomer is a suite of software that produces an abstracted representation of the characteristic trends present in musical language through the analysis of symbolic or audio input. Isomer uses this abstracted representation to execute query and transformation algorithms, with the ultimate goal of generating of new/hybrid materials.

    Isomer assembles a unified model from raw analysis data (symbolic and/or audio) and populates a single (or multiple) database(s) with observed and interpreted feature data representations. Multiple model abstractions can exist simultaneously in discrete databases, a key feature of Isomer that encourages the emergence of a desired representation schema for the musical model. Model data is separated into four musical components: melody, harmony, rhythm and timbre depending on source availability.

    Data Representation Overview

    Isomer provides normalized storage of feature data in a Cartesian coordinate (grid) format. This allows input from various sources to combine in a single model representation suitable for further analysis and abstraction.
    Isomer-Model Data Representation

    Cartesian Grid Format

    Horizontal Axis

    The horizontal (X) axis represents time with sampled data appearing at a regular, user-defined time interval (default = 2 ms). Each vertical column of data is called a slice.

    Why is 2ms the default Slice window?

    2ms is the smallest time interval at which humans can detect multiple transients and allows for complete capture of MIDI files with resolution of 30bpm @ 960ppq (or 60bpm @ 480ppq).
    Isomer-Model Cartesian Grid

    Vertical Axis

    The vertical (Y) axis contains individual streams (type is dependent on input). Each stream forms a horizontal row that is further divided into four sub-stream categories: melody, rhythm, timbre and harmony.
    Isomer-Model Sub-Streams
    Each grid point contains multidimensional data for each of the stream sub-categories in time at a resolution defined by the user.

    Observed and Extrapolated Data Fields

    The source of input data (audio or symbolic) is determined at the time of model creation. Once the model is generated, the original format (audio or symbolic) of the raw data is no longer an issue, although input format can limit the parameters available (i.e. symbolic input contains no texture data).

    Regardless of the input format, Isomer attempts to collect data without interpretation using the most accurate methods of analysis available. These data fields are treated as empirical observations. However, during model creation Isomer also calculates additional parameters to better prepare the data for use in a musical context. These are extrapolated data fields.

    The grid format described above requires two coordinates: stream/sub-stream (Y) and a series of analysis windows (X). Each point in the grid contains data for the following parameters:

    * indicates an extrapolated data field

    • Melody
      • Pitch (hertz, midi)
      • Pitch IR *
      • IR Equity *
      • Proximity Equity *
      • Registral Return *
    • Rhythm
      • Band (describes Bark Scale freq range where onset is detected)
      • Intensity (percentage, velocity)
      • Onset Equity *
    • Timbre
      • RMS
      • Spectral Stability
      • Spectral Flatness
      • MFCC (13)
    • Harmony
      • Pitches (hertz, midi, intensity)
      • Average Chroma
      • Total Pitches *
      • Tension *

    Multi-Event Query Parameters

    Querying musical data in Isomer allows users to mine the models for musically relevant trends. The user supplies coordinates as time and stream/sub-stream ranges. This input is translated into the matrix coordinates (as defined by the model) before the query is executed.

    The interface returns the results as time points (X-axis) and streams/sub-streams (Y-axis) for a given model and allows on-demand calculation of any/all of the following data for any grid coordinates or range of coordinates:

    • Melody
      • Registral Direction
      • Registral Return
      • Proximity Ratio
      • Interval Range
      • Melodic Accent
    • Rhythm
      • Temporal Direction
      • Temporal Instability
      • IOI Range
      • NPVI IOI
      • NPVI Duration
      • Agogic Accent
      • Syncopation
    • Timbre
      • Texture Fingerprint
      • Dynamic (Timbre) Accent
    • Harmony
      • Tension Direction
      • Tension Instability
      • Average Tension

    Increasing Understanding Through Multiple Perspectives

    Isomer extrapolates parameters in addition to the observed data only after window-based quantization takes place. The interpretation that necessarily occurs during quantization can create desirable variations in the resulting model representation. By generating several models from the same source input with varying window sizes, Isomer can create multiple representations of a single input source.

    The power of implementing a series of interpreted abstractions is that the raw data can be represented multiple times at varying depths from the original (raw) observations regardless of the model input format, giving the user a variety of ways to view and examine the model. For symbolic input, this provides a useful way to control the level of detail captured in the model. In the case of audio input, the user can find the most appropriate representation for a specific input file; a problem inherent in the musical analysis of raw audio.

    For example, determining the location of event onsets, “beats”, or perceptual pulse points may require a high-resolution view of feature data, while the detection of more expansive musical elements like harmonic rhythm may benefit significantly from a more general reduction of the raw DSP data.

    In conjunction with the Isomer’s query function, multiple versions of the abstraction layer can exist simultaneously in discrete databases. This design is a key feature of Isomer that encourages the emergence of a desired representation schema, as dictated by the specific task the user has in mind.

  • The Pleasures of Music

    The Pleasures of Music

    Much of what currently passes as interdisciplinary work in computational creativity is profoundly lacking in artistic experience and understanding. This is a pervasive and critical flaw — after all, the creative side of the equation is our primary goal, is it not?

    To help us consider a more appropriate balance, let’s briefly examine how human composers and analysts approach their work.

    Music as Data

    Our first challenge is that musicians don’t generally think of raw musical materials as data. Immediately we find a conflict between the computational and creative — how can we compute without data? The answer lies with how we collect, represent, and ultimately select musical elements for processing. The key issue we need to consider is context.

    Interpreting the musical context around individual events is what allows us to develop a meaningful picture of how groups of musical events function. Understanding these musical effects requires computational systems to make aesthetic judgments based on experience (or training in this case). I’m developing my own approaches to this set of challenges, but suffice it to say…

    Any composer that deals with musical materials as anything but a means to achieve musical effect is not likely to produce results worth exploring or repeating.

    A Composer’s View

    Composers seek combinations of musical patterns that take on an irrepressible life of their own. As someone with decades of experience attempting this, I can say that it’s an intimate and perilous journey with few external guiding references. Every step forward simultaneously removes options and presents new opportunities, and when a combination of musical events expressing strong potential finally presents itself, there are no guarantees.

    Unfortunately, this first-hand description of the compositional process isn’t very helpful when designing creative algorithms. To do that, we need ways of defining and comparing the effect of musical ideas to learn what sinks and what floats. In recent years my work as composer, pianist, and researcher has focused on developing a computation-friendly approach to this problem and I believe reasonable solutions aren’t that far off.

    What About the Analyst?

    On the other hand, analysis (whether computational or theoretical) encourages the deconstruction and justification of existing musical ideas by developing a highly-relational network of connections between events with clearly definable attributes. Strange as it may seem, some of these relationships may be carefully planned and executed by the composer, but many are not.

    Ultimately, analysis aims to reveal a tightly woven fabric of methodically cataloged connections; regardless of the composer’s design or intent, and without the natural aesthetic judgments mandated by the compositional process.

    Analysis is a necessary component of model-based computational composition. But for analysis to become directly relevant to the compositional process, it must be capable of addressing a wide range of emotional potential to provide a meaningful musical experience. In other words, it must evolve to encompass the pleasures that music provides to us.

    Pleasures of Music

    What exactly do I mean by the “pleasures” of music?

    Well, I don’t mean the types of emotional associations that attach themselves to musical textures, forms, and ideas through cultural conditioning and repetition. No. I’m speaking of an aurally perceptible, inner geometry that attaches itself to our collective nervous system and forces us to repeat and examine it for generations upon generations.

    While it may be difficult to codify them, these designs are recognizable and valued because we are human. We find a pleasure in their architecture that universally speaks to us, regardless of time and cultural distance. And they did not come to exist in a vacuum — they have been under constant development for thousands of years through the myriad generations that precede us. Lucky for us, these designs can be found throughout the musical canon.

    To create music worthy of sustained exploration and repetition, computational creativity must seek out the patterns and trends that elucidate these universal pleasures. This process begins with our attempting to understand fundamental musical aspects like memory, identity, expectation, surprise, and developing a data-driven understanding of the expressive qualities of musical character.

  • Automating Musical Descriptions: A Case Study

    Automating Musical Descriptions: A Case Study

    It’s widely accepted that music elicits similar emotional responses from culturally connected groups of human listeners. Less clear is how various aspects of musical language contribute to these effects.

    Funded by a Mellon Foundation Research Grant through Dickinson College Digital Humanities, our research into this questions leverages cognition-based machine listening algorithms and network analysis of musical descriptors to identify the connections between musical affect and language.

    (more…)