Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Chroma Conversion

For human–computer interaction, most devices use the fingers, sometimes the whole hand. And short of organ pedals, the fingers remain the primary source of triggering musical instrument digital interface (MIDI) synthesizers. This project offers an alternative in using the color intensity and hue controlled through objects manipulated or worn as clothing or ornament by the human body, as detected by a camera across the several zones of the video image. The method is explained in this report, along with pitfalls, perturbations, as well as performance and paradigm.

Chroma Conversion Master’s Project Report By Irvn Rynning Advised by - Dr. S Semwal GMI Program UCCS 15 December 2015 IRynning 1 Chroma Conversion Signature Page Dr. Sudhanshu Semwal – Committee Chair: ____________________________ Dr. Glen Whitehead – Committee Member: ___________________________ Dr. Jonathon Ventura– Committee Member: _____________________________ IRynning 2 For human–computer interaction, most devices use the fingers, sometimes the whole hand. And short of organ pedals, the fingers remain the primary source of triggering musical instrument digital interface (MIDI) synthesizers. This project offers an alternative in using the color intensity and hue controlled through objects manipulated or worn as clothing or ornament by the human body, as detected by a camera across the several zones of the video image. The method is explained in this report, along with pitfalls, perturbations, as well as performance and paradigm. IRynning 3 This project was motivated from a previous work that took red-green-blue (RGB) data from the mouse cursor and derived MIDI triggers for note-on messages with the appropriate channel and velocity values. The previous algorithm included triggers for total luminosity (grey-scale) as well as cyan-magenta-yellow (CMY). All work contained here is original to the best of the author’s knowledge. Note that the programming language used is Cycling74’s Max version 7.0, a visual environment similar to MathWorks Simulink and Miller Puckette’s Pure Data. 1.1 Objective The goal for this project is to show how color hue and intensity can be used create a soundscape using MIDI synthesizers and motion in real-time, i.e., in a live performance. 1.2 Background Color information from a USB digital video camera can be easily decoded through arithmetic techniques. As the values are conveniently in RGB values, converted to the inclusive range between zero and one, with an eight-bit resolution, devising a recognition algorithm is relatively straightforward. 1.3 Prior Work My own prior work was the development of pixel reader project that directly read the RGB values of a computer image based upon cursor location, and developed up to nine channels of MIDI triggers. It should be noted that one can easily trigger samples stored on a hard disk drive, although losing pitch information. The algorithm essentially compared one channel of color information (i.e., R, G, or B) to the sum of the other two multiplied by a constant j or k. If the comparison were greater — e.g., R > j *(G+B) — a note was triggered whose pitch was based upon the intensity of the datum. Another algorithm was developed which used the intensity to vary the velocity (volume). If the correlation were the opposite (e.g., R < k * (G + B), the presence of red’s complement cyan was inferred, and dealt with similarly. Note the constants were independent in either implementation. Initially the range of values for the notes were mapped linearly — that is, if R = 0, note 24, a C, about 31 Hz, is triggered; for R = 1, note 84, also C, five octaves higher, or about 999 Hz. This work is divided into several sections: 1. 2. 3. 4. Camera selection, including frame rate, saturation (color bias). Constant selection, with capability of individually altering. Zone division: the video image is split into sections both vertically and horizontally. The processing of the RGB and other parameters for each zone. 2.1 Cameras The market is rather devoid of high performance cameras that use USB interface for streaming digital video. A standard USB web camera was chosen, both the Logitech C920 and the IPEVO Point 2 View were used in development, as well as the built in cameras in an Apple MacBook Pro 17”, and a Dell Latitude E6420. Modern digital single-lens reflex (SLR) cameras offer a non-continuous option for the USB interface — they can take stills, or record video, but do not offer a real-time image. As the development proceeded usually on a laptop with a built in camera, it was necessary to have a select function. Additionally both the frame rate and color saturation were implemented with default vales, discussed below. IRynning 4 2.2 Constants With the capability of up to nine zones it was necessary to have both a default constant (here it is 0•75), and a method to alter it in real-time to add or subtract the color triggering. 2.3 Zone definiton The video image is essentially a two dimensional matrix, with each element having three element RGB vector, not unlike a still image. The frame of the video was first split vertically, then horizontally. There was no provision for further division in the performance, only the option of turning off any of the zones. 2.4 Triggering With the RGB values and constant, the triggering can be created, and the triggers converted to MIDI notes and sent out a MIDI channel, also adjustable in real-time. The algorithm is discussed in detail below. 3.1 Video To create the video object the jit.grab( ) function in Max (jit stands for jitter, a symbol for its video processing library) is used, which offered several parameters controlled while running (in real-time). The first was getdevicelist, which upon being called gave a list of available cameras to be used. Since development proceeded usually on a Windows7 laptop or an iMac with a built in video camera, it was necessary to select the proper camera. Additionally the parameter of Saturate may be adjusted. A saturation value of zero is greyscale; 1 is normal, and -1 inverts all colors. A default value of four was used to exaggerate (loadmess 4 ), and then lowered to two for performance. Note in this screen shot the built-in camera HD Pro Webcam C920 is shown. The timing shown in Let’s Do It! is 16 milliseconds, about a 60 Hz/30 frames per second rate. 3.2 Constants and other start up parameters. An option for reversing the image is available with the Mirror Image button, which was used when the camera was directed at the operator. The ZOOM button was used to maximize the interface to the computer’s display screen; a preset value of 40 resized the window to 1920 x 1200. The Tolerance is the global constant used in each detector, and preloads to 0•75. The qmetro is a vertical slider that gives an integer for the frame timing in milliseconds, 16 ms (62•5 Hz/31 frames per second) being the default. 3.3 Zone Division The output of the function jit.qt.grab is the camera video. By invoking the scissors (not shown) procedure the video matrix is first divided into columns (three, without edges shown in the center of this shot) and then into three rows — both are shown in the lower black box figure 1, as are the labels for each zone. These labels are used in the selection process for the RGB processing. IRynning 5 Figure 1 Video importing— “open” and “close” must be done manually. A test image color wheel is shown. IRynning 6 3.4 Zone Processing The output of the second scissors procedure is labeled set p xy, where x, y are the coordinates of the zones. Each of the nine zones has a preset, graphically illustrated, which corresponds to its location. Note for several of them the preset video source can be altered, so that some zones can have two different and parallel processing. Figure 2 The nine zones In figure three is shown a close-up of the zone. The set buttons select the source, and an annunciator p21 is shown in this image, over the slice of the video selected. To the right is shown a switch, which either enables the processed video from the camera, or a stack of colors shown to its right. The RGB values are given, including 0 0 0 0, 255 255 0 0, et cetera, along with the actual color of the values. ((What does this mean and what does it do – elaborate), These are used for testing, though by replacing them with sliders, one could perform from here. An earlier version had the sliders. IRynning 7 This processed RGB values are the averages from the input matrix of the elected zone. From the selection of the source of the RGB values the function call (named FuncFlush 3a in the figure) is called. This has as input parameters as follows: 1. 2. 3. 4. the three RGB values; an emergency off button (called flush in the Max terminology) which sends notes off to all outputs; the three channel sliders, here in red “9”, green “10, blue “11; the tolerance, with two presets 0.99 and 0.44, and a floating point slider; note the three are parallel with the last one having precedence; 5. the last parameter, shown as “port j”, “port k”, “port m” controls the output of the MIDI notes. Three different destinations were implemented, “k” being a generic USB to MIDI converter cable, “m” being a Tascam US-200 USB-MIDI/audio converter, with both MIDI connectors and analog audio out, and “j” being the internal synthesizers such as Native Instruments’ Kontact Player, which is a group of analog instrument samples which respond to pitch and velocity information. Their audio is sent through USB to the Tascam US-200 and converted to analog, which is connected to the mixer. Figure 3 A zone, close up. The bottom three buttons indicate a note on for that Red-Green-Blue channel 3.3 The Flush Functions The flush functions (there are nine) implement the original algorithm developed from the project discussed in section 1.3, Prior Work. They are called Flush Functions because of the added parameter Emergency, that flushes the generated note buffer in the event of a “stuck on” MIDI note. With the aforementioned parameter list the RGB vales are compared. Given the constant k, if R > k*(G+B), then the note on is triggered for the red output, and a small red light is lit on the zone’s process. See figure 3. Similarly, the green light or the blue light is lit for the Green and Blue outputs. The note number is determined by the following arithmetic IRynning 8 expression: for the red note value r in the interval { 0 .. 255}, its note number = (r/255.0) * 60.0 + 24; range is 24 through 84. So if r = 0, the note is 24; if r = 255, the note is 84. Note that the conversion to integer must be done at the last operation. Another expression used for the flush functions was as follows: red note number = (r/255.0) * 39.0 + 31; range is 31 through 70. green note number = (r/255.0) * 59.0 + 25; range is 25 through 84. blue note number = (r/255.0) * 18.0 + 52; range is 52 through 70. A third one used a single instrument, but assigned the red, green blue notes to a fixed range: red note number = (r/255.0) *15.0 + 30; range is 30 through 45. green note number = (g/255.0) *15.0 + 46; range is 46 through 61. blue note number = (b/255.0) *15.0 + 62; range is 62 through 77. Still a fourth one, which inverted the intensity/pitch relationship was: red note number = (1.0 - r/255.0) * 15.0 + 30; range is 45 down to 30. green note number = (1.0 - g/255.0) * 15.0 + 46; range is 61 down to 46. blue note number = (1.0 - b/255.0) * 15.0 + 62; range is 77 down to 62. . An internal variable is the length of the note, which is adjustable in real-time. The unit of measurement is the millisecond, and is seen in figure 5. Note that the value is the same for all the three note-channels. Figure 4 Flush Function (typical) one of nine. The parameters are routed in the upper left. IRynning 9 Figure 5 Flush Function detail, showing "duration" to adjust note on length to 333 ms. 3.3 Analog Mixing The MIDI notes were routed through either of the USB converters into a Yamaha MJC-8 switcher with standard DIN-5 connectors. The switcher then passed the serial data into the several synthesizer modules. Typically “port m” went to the Yamaha TX-716 rack and the Korg EX-8000, MIDI channels 1 through 8. The “port k” connection went to the MJC-8 to the Emu Proteus 3 and the Roland D-110, with up to 15 channels. Both of the latter are multi-timbral, which means that they can receive and play notes on all the channels, each channel going to an individual, separate patch/sound. The Yamaha and the Korg units are one channel, one sound. All units are stereo, except for the Yamaha TX. Figure 7 Mackie Mixer Figure 6 Synthesizer Rack, including power amp (on bottom), compressor (on top) IRynning 10 4.1 Testing To test the design a set of colored images was necessary, and printed on paper. These images, with absolute red, green, blue, their CMY complements, and several mid-tones, were then placed in the zone to be tested. Also used were tee shirts of suitable hue. Note the magenta and the blue panels (in the center of Fig 8, top and bottom) appear darker on the screen. Figure 8 A collage of the several test images. Shown (clockwise) are 255-255-0, 255-0-255, 0-255-0, 255-0-0, 0-0-255, 0-128-128. Each was on a separate sheet. Each of the test images was exposed to any particular zone, and the output note numbers created, and passed to the synthesizers. Both a neutral background, as well as a black, were used. Several audio recordings were made illustrating the pitch variation and adjustment of duration and tolerance parameters. It was decided that both the upper the lower zones would use longer note durations, 1000—2000 ms, and patches that had slower attack times and longer sustain and releases. The middle zones would use comparatively short times, 200– 400 ms, with instruments with a quick attack. With this implementation and all nine zones on, the frame rate was noticeably slower, about 5 frames per second, and a temperature monitor on the computer started showing a rise up to 90° C. when the camera was enabled. Indeed even with only three zones, the temperature stayed high. An additional cooler was used which sat underneath the laptop, and this kept it in the 80s. 4.1 Performance The final performance took place in UCCS Visual and Performing Arts (VAPA) space, Osborne Theatre in the University Hall, on 7 December 2015. Two dancers volunteered from the Modern Dance class taught by Tiffany Tinsley. And after a brief warmup, they proceeded to test the design, which was recorded by Dr S Semwal, and assembled in Adobe Premiere and posted to YouTube at GMI: The Q Biall. First some glaring observations—the light intensity was insufficient vis-a-vis the testing environment, and the color spectrum was deficient in green. The field of the camera was too wide, which also did not match testing. Regards to intensity, the zones should have been more numerous so that the width was compensated. Also the camera image should be visible to the dancers, so they know their position with respect to the zone definitions. The note duration setting was perhaps too rhythmic insofar as they should have been relatively prime, and not the same. For example, RGB durations of 333ms/400ms/500ms respectively might have IRynning 11 imparted a 6 over 5 or 5 over 4 feel, which is not bad, but 192/311/503 would have seemed pseudo-random. (Those three values are relatively prime.) The note durations were adjustable in real time, but neither the color balance nor the intensity in the camera controls were. To effect the latter, consider the basic algorithm of note number = ( color / 255.0 ) * x + n (x is floating point, n is integer), where color ∊ {0 .. 255 } This needs to be altered so that calibration can be done beforehand. That is, set the black stage such that no sound is triggered, and at minimum color from the sample image, adjust the addend n such that the lowest (or highest, if one has an inverted algorithm) note is produced — the color value will necessarily be above zero, and the other colors will be zero. Secondly, with maximum brightness from the sample image, set the multiplier x so that the highest (or lowest) note is made. And if the method is such that one instrument is triggered by all three colors, albeit with different mutually exclusive pitch values, be prepared with the six color swatches of dimmest and brightest. Note the constant used for color comparison may need alteration; this is already provided for in real-time, and its selection must consider the complexity (the number of notes to be sounded given the admixture of two colors; a lower constant gives up to three notes per zone). Development would include the above mentioned calibration technique, as well as making aesthetic value judgements regarding synthesizer patches and timing variables, spatial arrangement (i.e., four channel audio), and a more cohesive choreography, as opposed to improvisation. Additionally CPU usage needs to be investigated to reduce heating issues, and to maximize the use of internal samples and sounds versus outboard modules. A possible configuration might be to use a 2nd and 3rd computer to receive the MIDI note data and implement internal synthesizer/samplers. The camera selection could be altered: other webcams might cause less processor overhead (which has caused the heating in our experiments as mentioned earlier). There are a class of cameras used in commercial security installations which are both color, provide for manual or automatic focus, and have zooming capability, with lenses with what is called C/CS mount. Many have USB connections, though the environment used, Cycling74’s Max/Msp, may not recognize them. The parameters of camera control — brightness/contrast/color bias, zoom, exposure/f-stop — were not implemented in this performance. Were this a juxtaposition of 100 years ago with current technology, this would form the basis of a roadshow: add some opening acts, sideshow acts, and take a world tour. Another implementation could use an installation in a public venue, which could provide feedback to movements of an unsuspecting visitor. This method could be used as a quasi-composition tool or instrument, with a small camera angle and close in (such as was used in testing) and we can experiment with well-defined color swatches, e.g., tee shirts and other fabrics offer less reflectivity than paper prints, thus provided better and more defined performance. i https://youtu.be/cvHrMwpLOWI, you tube video on “irvn rynn” channel. There are no other references. IRynning 12 The following endnotes are reproduced from the project proposal, but were not referenced in this report. J A Pardiso, and F Sparacino, “Optical Tracking for Music and Dance Performance”, Media Laboratory, M.I.T., Fourth Conference on Optical 3D Measurement Techniques, ETH, Zurich, September 1997 M Rohs, G Essl, M Roth, “CaMus: Live Music Performance using Camera Phones and Visual Grid Tracking”, Proceedings of the 2006 International Conference on New Interfaces for Musical Expression (NIME06), Paris, France, 2006. From the MIDI Manufacturers' Association, http://www.midi.org/aboutmidi/index.php. See reference 7, below. Soundflower is available for download at https://github.com/RogueAmoeba/Soundflower Y Zhang, "Research on the Synesthesia-Based Conversion for Electronic Digital Image and Digital Audio", Lecture Notes in Electrical Engineering Volume 178, 2013, pp 481-486 RGB to HSV conversion, http://www.rapidtables.com/convert/color/rgb-to-hsv.htm N. Osmanovic, N. Hrustemovic, H. R. Myler, “A Testbed for Auralization of Graphic Art”, IEEE Region 5, 2003 Annual Technical Conference, pp.45-49, 2003. D Payling, S Mills, T Howle, Hue Music — Creating Timbral Soundscapes from Coloured Pictures, International Conference on Auditory Display, 2007. Max/MSP can be found here, and offers an academic discount. The original author, Miller Puckette, offers a free version called Pure Data at https://puredata.info. https://cycling74.com/products/max/ This is the Metasynth homepage. http://www.uisoftware.com/MetaSynth/index.php R Takahashi, JH Miller, “Conversion of Amino-Acid Sequence in Proteins to Classical Music: Search for Auditory Patterns”, Genome Biology 2007, 8:405, 2007. i Gene2music, http://www.mimg.ucla.edu/faculty/miller_jh/gene2music/home.html G Bologna, B Deville, T Pun On the use of the auditory pathway to represent image scenes in real-time, Neurocomputing 72.4 (2009): 839-849. General MIDI Specifications”, http://www.midi.org/techspecs/gm.php, 2015. IRynning 13