Chroma Conversion
Master’s Project Report
By Irvn Rynning
Advised by - Dr. S Semwal
GMI Program
UCCS
15 December 2015
IRynning 1
Chroma Conversion
Signature Page
Dr. Sudhanshu Semwal – Committee Chair: ____________________________
Dr. Glen Whitehead – Committee Member: ___________________________
Dr. Jonathon Ventura– Committee Member: _____________________________
IRynning 2
For human–computer interaction, most devices use the fingers, sometimes the whole hand. And short of
organ pedals, the fingers remain the primary source of triggering musical instrument digital interface (MIDI)
synthesizers. This project offers an alternative in using the color intensity and hue controlled through objects
manipulated or worn as clothing or ornament by the human body, as detected by a camera across the several
zones of the video image. The method is explained in this report, along with pitfalls, perturbations, as well as
performance and paradigm.
IRynning 3
This project was motivated from a previous work that took red-green-blue (RGB) data from the mouse cursor
and derived MIDI triggers for note-on messages with the appropriate channel and velocity values. The
previous algorithm included triggers for total luminosity (grey-scale) as well as cyan-magenta-yellow (CMY).
All work contained here is original to the best of the author’s knowledge. Note that the programming language
used is Cycling74’s Max version 7.0, a visual environment similar to MathWorks Simulink and Miller
Puckette’s Pure Data.
1.1 Objective
The goal for this project is to show how color hue and intensity can be used create a soundscape using MIDI
synthesizers and motion in real-time, i.e., in a live performance.
1.2 Background
Color information from a USB digital video camera can be easily decoded through arithmetic techniques. As
the values are conveniently in RGB values, converted to the inclusive range between zero and one, with an
eight-bit resolution, devising a recognition algorithm is relatively straightforward.
1.3 Prior Work
My own prior work was the development of pixel reader project that directly read the RGB values of a
computer image based upon cursor location, and developed up to nine channels of MIDI triggers. It should
be noted that one can easily trigger samples stored on a hard disk drive, although losing pitch information.
The algorithm essentially compared one channel of color information (i.e., R, G, or B) to the sum of the
other two multiplied by a constant j or k. If the comparison were greater — e.g., R > j *(G+B) — a note was
triggered whose pitch was based upon the intensity of the datum. Another algorithm was developed which
used the intensity to vary the velocity (volume). If the correlation were the opposite (e.g., R < k * (G + B),
the presence of red’s complement cyan was inferred, and dealt with similarly. Note the constants were
independent in either implementation. Initially the range of values for the notes were mapped linearly — that
is, if R = 0, note 24, a C, about 31 Hz, is triggered; for R = 1, note 84, also C, five octaves higher, or about
999 Hz.
This work is divided into several sections:
1.
2.
3.
4.
Camera selection, including frame rate, saturation (color bias).
Constant selection, with capability of individually altering.
Zone division: the video image is split into sections both vertically and horizontally.
The processing of the RGB and other parameters for each zone.
2.1 Cameras
The market is rather devoid of high performance cameras that use USB interface for streaming digital video.
A standard USB web camera was chosen, both the Logitech C920 and the IPEVO Point 2 View were used
in development, as well as the built in cameras in an Apple MacBook Pro 17”, and a Dell Latitude E6420.
Modern digital single-lens reflex (SLR) cameras offer a non-continuous option for the USB interface — they
can take stills, or record video, but do not offer a real-time image. As the development proceeded usually on
a laptop with a built in camera, it was necessary to have a select function. Additionally both the frame rate and
color saturation were implemented with default vales, discussed below.
IRynning 4
2.2 Constants
With the capability of up to nine zones it was necessary to have both a default constant (here it is 0•75), and
a method to alter it in real-time to add or subtract the color triggering.
2.3 Zone definiton
The video image is essentially a two dimensional matrix, with each element having three element RGB vector,
not unlike a still image. The frame of the video was first split vertically, then horizontally. There was no
provision for further division in the performance, only the option of turning off any of the zones.
2.4 Triggering
With the RGB values and constant, the triggering can be created, and the triggers converted to MIDI notes
and sent out a MIDI channel, also adjustable in real-time. The algorithm is discussed in detail below.
3.1 Video
To create the video object the jit.grab( ) function in Max (jit stands for jitter, a symbol for its video processing
library) is used, which offered several parameters controlled while running (in real-time). The first was
getdevicelist, which upon being called gave a list of available cameras to be used. Since development
proceeded usually on a Windows7 laptop or an iMac with a built in video camera, it was necessary to select
the proper camera. Additionally the parameter of Saturate may be adjusted. A saturation value of zero is
greyscale; 1 is normal, and -1 inverts all colors. A default value of four was used to exaggerate (loadmess 4 ),
and then lowered to two for performance. Note in this screen shot the built-in camera HD Pro Webcam
C920 is shown. The timing shown in Let’s Do It! is 16 milliseconds, about a 60 Hz/30 frames per second
rate.
3.2 Constants and other start up parameters.
An option for reversing the image is available with the Mirror Image button, which was used when the camera
was directed at the operator. The ZOOM button was used to maximize the interface to the computer’s display
screen; a preset value of 40 resized the window to 1920 x 1200. The Tolerance is the global constant used in
each detector, and preloads to 0•75. The qmetro is a vertical slider that gives an integer for the frame timing
in milliseconds, 16 ms (62•5 Hz/31 frames per second) being the default.
3.3 Zone Division
The output of the function jit.qt.grab is the camera video. By invoking the scissors (not shown) procedure the
video matrix is first divided into columns (three, without edges shown in the center of this shot) and then into
three rows — both are shown in the lower black box figure 1, as are the labels for each zone. These labels are
used in the selection process for the RGB processing.
IRynning 5
Figure 1 Video importing— “open” and “close” must be done manually. A test image color wheel is shown.
IRynning 6
3.4 Zone Processing
The output of the second scissors procedure is labeled set p xy, where x, y are the coordinates of the zones.
Each of the nine zones has a preset, graphically illustrated, which corresponds to its location. Note for several
of them the preset video source can be altered, so that some zones can have two different and parallel
processing.
Figure 2 The nine zones
In figure three is shown a close-up of the zone. The set buttons select the source, and an annunciator p21 is
shown in this image, over the slice of the video selected. To the right is shown a switch, which either enables
the processed video from the camera, or a stack of colors shown to its right. The RGB values are given,
including 0 0 0 0, 255 255 0 0, et cetera, along with the actual color of the values. ((What does this mean
and what does it do – elaborate), These are used for testing, though by replacing them with sliders, one
could perform from here. An earlier version had the sliders.
IRynning 7
This processed RGB values are the averages from the input matrix of the elected zone. From the selection
of the source of the RGB values the function call (named FuncFlush 3a in the figure) is called. This has as
input parameters as follows:
1.
2.
3.
4.
the three RGB values;
an emergency off button (called flush in the Max terminology) which sends notes off to all outputs;
the three channel sliders, here in red “9”, green “10, blue “11;
the tolerance, with two presets 0.99 and 0.44, and a floating point slider; note the three are parallel
with the last one having precedence;
5. the last parameter, shown as “port j”, “port k”, “port m” controls the output of the MIDI notes.
Three different destinations were implemented, “k” being a generic USB to MIDI converter cable, “m” being
a Tascam US-200 USB-MIDI/audio converter, with both MIDI connectors and analog audio out, and “j”
being the internal synthesizers such as Native Instruments’ Kontact Player, which is a group of analog
instrument samples which respond to pitch and velocity information. Their audio is sent through USB to the
Tascam US-200 and converted to analog, which is connected to the mixer.
Figure 3 A zone, close up. The bottom three buttons indicate a note on for that Red-Green-Blue channel
3.3 The Flush Functions
The flush functions (there are nine) implement the original algorithm developed from the project discussed
in section 1.3, Prior Work. They are called Flush Functions because of the added parameter Emergency, that
flushes the generated note buffer in the event of a “stuck on” MIDI note. With the aforementioned parameter
list the RGB vales are compared. Given the constant k, if R > k*(G+B), then the note on is triggered for the
red output, and a small red light is lit on the zone’s process. See figure 3. Similarly, the green light or the blue
light is lit for the Green and Blue outputs. The note number is determined by the following arithmetic
IRynning 8
expression:
for the red note value r in the interval { 0 .. 255},
its note number = (r/255.0) * 60.0 + 24; range is 24 through 84.
So if r = 0, the note is 24; if r = 255, the note is 84. Note that the conversion to integer must be done at the
last operation.
Another expression used for the flush functions was as follows:
red note number = (r/255.0) * 39.0 + 31; range is 31 through 70.
green note number = (r/255.0) * 59.0 + 25; range is 25 through 84.
blue note number = (r/255.0) * 18.0 + 52; range is 52 through 70.
A third one used a single instrument, but assigned the red, green blue notes to a fixed range:
red note number = (r/255.0) *15.0 + 30; range is 30 through 45.
green note number = (g/255.0) *15.0 + 46; range is 46 through 61.
blue note number = (b/255.0) *15.0 + 62; range is 62 through 77.
Still a fourth one, which inverted the intensity/pitch relationship was:
red note number = (1.0 - r/255.0) * 15.0 + 30; range is 45 down to 30.
green note number = (1.0 - g/255.0) * 15.0 + 46; range is 61 down to 46.
blue note number = (1.0 - b/255.0) * 15.0 + 62; range is 77 down to 62.
.
An internal variable is the length of the note, which is adjustable in real-time. The unit of measurement is
the millisecond, and is seen in figure 5. Note that the value is the same for all the three note-channels.
Figure 4 Flush Function (typical) one of nine. The parameters are routed in the upper left.
IRynning 9
Figure 5 Flush Function detail, showing "duration" to adjust note on length to 333 ms.
3.3 Analog Mixing
The MIDI notes were routed through either of the USB converters into a Yamaha MJC-8 switcher with
standard DIN-5 connectors. The switcher then passed the serial data into the several synthesizer modules.
Typically “port m” went to the Yamaha TX-716 rack and the Korg EX-8000, MIDI channels 1 through 8.
The “port k” connection went to the MJC-8 to the Emu Proteus 3 and the Roland D-110, with up to 15
channels. Both of the latter are multi-timbral, which means that they can receive and play notes on all the
channels, each channel going to an individual, separate patch/sound. The Yamaha and the Korg units are one
channel, one sound. All units are stereo, except for the Yamaha TX.
Figure 7 Mackie Mixer
Figure 6 Synthesizer Rack, including power amp (on bottom),
compressor (on top)
IRynning 10
4.1 Testing
To test the design a set of colored images was necessary, and printed on paper. These images, with absolute
red, green, blue, their CMY complements, and several mid-tones, were then placed in the zone to be tested.
Also used were tee shirts of suitable hue. Note the magenta and the blue panels (in the center of Fig 8, top
and bottom) appear darker on the screen.
Figure 8 A collage of the several test images. Shown (clockwise) are 255-255-0, 255-0-255, 0-255-0, 255-0-0, 0-0-255, 0-128-128. Each
was on a separate sheet.
Each of the test images was exposed to any particular zone, and the output note numbers created, and passed
to the synthesizers. Both a neutral background, as well as a black, were used. Several audio recordings were
made illustrating the pitch variation and adjustment of duration and tolerance parameters. It was decided that
both the upper the lower zones would use longer note durations, 1000—2000 ms, and patches that had slower
attack times and longer sustain and releases. The middle zones would use comparatively short times, 200–
400 ms, with instruments with a quick attack.
With this implementation and all nine zones on, the frame rate was noticeably slower, about 5 frames per
second, and a temperature monitor on the computer started showing a rise up to 90° C. when the camera was
enabled. Indeed even with only three zones, the temperature stayed high. An additional cooler was used
which sat underneath the laptop, and this kept it in the 80s.
4.1 Performance
The final performance took place in UCCS Visual and Performing Arts (VAPA) space, Osborne
Theatre in the University Hall, on 7 December 2015. Two dancers volunteered from the Modern Dance
class taught by Tiffany Tinsley. And after a brief warmup, they proceeded to test the design, which was
recorded by Dr S Semwal, and assembled in Adobe Premiere and posted to YouTube at GMI: The Q Biall.
First some glaring observations—the light intensity was insufficient vis-a-vis the testing environment,
and the color spectrum was deficient in green. The field of the camera was too wide, which also did not match
testing. Regards to intensity, the zones should have been more numerous so that the width was compensated.
Also the camera image should be visible to the dancers, so they know their position with respect to the zone
definitions. The note duration setting was perhaps too rhythmic insofar as they should have been relatively
prime, and not the same. For example, RGB durations of 333ms/400ms/500ms respectively might have
IRynning 11
imparted a 6 over 5 or 5 over 4 feel, which is not bad, but 192/311/503 would have seemed pseudo-random.
(Those three values are relatively prime.) The note durations were adjustable in real time, but neither the
color balance nor the intensity in the camera controls were. To effect the latter, consider the basic algorithm
of
note number = ( color / 255.0 ) * x + n (x is floating point, n is integer), where color ∊ {0 .. 255 }
This needs to be altered so that calibration can be done beforehand. That is, set the black stage such that no
sound is triggered, and at minimum color from the sample image, adjust the addend n such that the lowest
(or highest, if one has an inverted algorithm) note is produced — the color value will necessarily be above
zero, and the other colors will be zero. Secondly, with maximum brightness from the sample image, set the
multiplier x so that the highest (or lowest) note is made. And if the method is such that one instrument is
triggered by all three colors, albeit with different mutually exclusive pitch values, be prepared with the six
color swatches of dimmest and brightest. Note the constant used for color comparison may need alteration;
this is already provided for in real-time, and its selection must consider the complexity (the number of notes
to be sounded given the admixture of two colors; a lower constant gives up to three notes per zone).
Development would include the above mentioned calibration technique, as well as making aesthetic value
judgements regarding synthesizer patches and timing variables, spatial arrangement (i.e., four channel audio),
and a more cohesive choreography, as opposed to improvisation. Additionally CPU usage needs to be
investigated to reduce heating issues, and to maximize the use of internal samples and sounds versus outboard
modules. A possible configuration might be to use a 2nd and 3rd computer to receive the MIDI note data and
implement internal synthesizer/samplers.
The camera selection could be altered: other webcams might cause less processor overhead (which has caused
the heating in our experiments as mentioned earlier). There are a class of cameras used in commercial security
installations which are both color, provide for manual or automatic focus, and have zooming capability, with
lenses with what is called C/CS mount. Many have USB connections, though the environment used,
Cycling74’s Max/Msp, may not recognize them. The parameters of camera control —
brightness/contrast/color bias, zoom, exposure/f-stop — were not implemented in this performance.
Were this a juxtaposition of 100 years ago with current technology, this would form the basis of a roadshow:
add some opening acts, sideshow acts, and take a world tour. Another implementation could use an
installation in a public venue, which could provide feedback to movements of an unsuspecting visitor. This
method could be used as a quasi-composition tool or instrument, with a small camera angle and close in (such
as was used in testing) and we can experiment with well-defined color swatches, e.g., tee shirts and other
fabrics offer less reflectivity than paper prints, thus provided better and more defined performance.
i
https://youtu.be/cvHrMwpLOWI, you tube video on “irvn rynn” channel.
There are no other references.
IRynning 12
The following endnotes are reproduced from the project proposal, but were not referenced in this report.
J A Pardiso, and F Sparacino, “Optical Tracking for Music and Dance Performance”, Media Laboratory, M.I.T.,
Fourth Conference on Optical 3D Measurement Techniques, ETH, Zurich, September 1997
M Rohs, G Essl, M Roth, “CaMus: Live Music Performance using Camera Phones and Visual Grid Tracking”,
Proceedings of the 2006 International Conference on New Interfaces for Musical Expression (NIME06), Paris,
France, 2006.
From the MIDI Manufacturers' Association, http://www.midi.org/aboutmidi/index.php. See reference 7, below.
Soundflower is available for download at https://github.com/RogueAmoeba/Soundflower
Y Zhang, "Research on the Synesthesia-Based Conversion for Electronic Digital Image and Digital Audio", Lecture
Notes in Electrical Engineering Volume 178, 2013, pp 481-486
RGB to HSV conversion, http://www.rapidtables.com/convert/color/rgb-to-hsv.htm
N. Osmanovic, N. Hrustemovic, H. R. Myler, “A Testbed for Auralization of Graphic Art”, IEEE Region 5, 2003
Annual Technical Conference, pp.45-49, 2003.
D Payling, S Mills, T Howle, Hue Music — Creating Timbral Soundscapes from Coloured Pictures, International
Conference on Auditory Display, 2007.
Max/MSP can be found here, and offers an academic discount. The original author, Miller Puckette, offers a free
version called Pure Data at https://puredata.info. https://cycling74.com/products/max/
This is the Metasynth homepage. http://www.uisoftware.com/MetaSynth/index.php
R Takahashi, JH Miller, “Conversion of Amino-Acid Sequence in Proteins to Classical Music: Search for Auditory
Patterns”, Genome Biology 2007, 8:405, 2007.
i
Gene2music, http://www.mimg.ucla.edu/faculty/miller_jh/gene2music/home.html
G Bologna, B Deville, T Pun On the use of the auditory pathway to represent image scenes in real-time,
Neurocomputing 72.4 (2009): 839-849.
General MIDI Specifications”, http://www.midi.org/techspecs/gm.php, 2015.
IRynning 13