Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

DDSP Guitar Amp: Interpretable Guitar Amplifier Modeling

Abstract

Neural network models for guitar amplifier emulation, while being effective, often demand high computational cost and lack interpretability. Drawing ideas from physical amplifier design, this paper aims to address these issues with a new differentiable digital signal processing (DDSP)-based model, called “DDSP guitar amp,” that models the four components of a guitar amp (i.e., preamp, tone stack, power amp, and output transformer) using specific DSP-inspired designs. With a set of time- and frequency-domain metrics, we demonstrate that DDSP guitar amp achieves performance comparable with that of black-box baselines while requiring less than 10% of the computational operations per audio sample, thereby holding greater potential for usages in real-time applications.

Index Terms—  Guitar amplifier modeling, differentiable digital signal processing, virtual analog modeling

1 Introduction

A physical guitar amplifier often comprises four main components [1]: a) the preamp, which boosts the input signal and shapes its sound characteristics, often introducing deliberate distortion; b) the tone stack, allowing frequency-specific adjustments; c) the power amp, which nonlinearly amplifies the signal to drive speakers and can add further coloration; and d) the output transformer, matching impedance and contributing additional nonlinearity. Each component introduces unique sound traits, with user-adjustable control knobs (such as “gain,” “bass,” “mids,” “treble,” and “master” volume) influencing the overall sound. There are intricate interactions between these components and their control knobs.

Guitar amplifier modeling aims to digitally replicate the behavior of physical amps and create similarly wide range of tone variations. The task is a type of virtual analog (VA) modeling, which aims to digitally replicate real-world devices. In general, approaches to VA modeling span a spectrum from white-box to black-box methods, based on their reliance on circuit knowledge. White-box approaches [1, 2, 3, 4] require comprehensive circuit understanding; grey-box methods [5, 6, 7, 8, 9] utilize partial device knowledge, balancing interpretability and flexibility; and black-box approaches [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22], in particular those based on deep neural networks (NN), have recently demonstrated impressive abilities in capturing complex nonlinear behaviors of amps and beyond using only input-output data, without prior circuit knowledge.

Despite its prevalence, existing black-box NN-based approaches have notable downsides. For example, while large NN models tend to perform well [11, 18], the incurred computational cost limits their portability to low-power devices and they may suffer from system instability in real-time usage [23]. The lack of interpretability is also an important issue, making it hard for musicians to intuitively fine-tune parameters for specific sonic characteristics [23].

Drawing inspirations from existing grey-box VA modeling methods and the general idea of differentiable digital signal processing (DDSP) [24, 25], this paper aims to propose a novel grey-box DDSP-based approach that offers a solution for efficient and interpretable guitar amp modeling. Specifically, we follow closely the physical design of guitar amps and propose differentiable modules corresponding to the four main physical components of an amp. As depicted in Figure 1, the four components are cascaded in the same way as their physical counterparts, and are conditioned by parameters derived from human-readable control knobs through a few dense multilayer perceptron (MLP) layers. To our best knowledge, this represents the first attempt to neural guitar amp modeling that uses such a modularized and DSP-inspired design. We refer to our model as “DDSP guitar amp.”

Refer to caption

Fig. 1: Illustration of the proposed DDSP-based model for guitar amplifier modeling, named DDSP guitar amp.

Two works are highly related to ours. Eichas et al. [5] presented a grey-box method for modeling guitar amps with the Wiener-Hammerstein (WH) model rooted in nonlinear system identification [26, 27], achieving better interpretablity. However, this work is preliminary in that they only modeled the preamp, ignoring the other components of an amp. Moreover, they did not consider control knobs. Miklánek et al. [6] employed circuit methods to model instead the tone stack component of guitar amps, with a mechanism to account for the control knobs related to the tone stack. However, for the other components (e.g., preamp and power amp), they simply used a black-box NN to model them. A comprehensive approach that remains interpretable for all the four components of a guitar amp and that captures the full range of knob-induced variations as presented here has yet to be developed.

Our approach has several key innovations: 1) it captures intricate knob-output relations across all amp components via end-to-end training, overcoming the limited parameter configs of previous interpretable models; 2) it maintains interpretability through the DDSP principles, enabling intuitive parameter adjustments that correspond to physical amp characteristics; and 3) it achieves performance comparable to or surpassing current black-box NN models, with less than 10% of the operations per audio sample, making it suitable for real-time applications in music production and performance.

We provide audio examples at https://ytsrt66589.github.io/ddspGuitarAmp_Demo/.

2 Proposed methods

Our objective is to develop a model g𝑔gitalic_g that accurately emulates a target guitar amplifier. For an input signal x[n]𝑥delimited-[]𝑛x[n]italic_x [ italic_n ] and control parameters ϕitalic-ϕ\phiitalic_ϕ representing knob settings, the model produces an output y[n]=g(x[n],ϕ)𝑦delimited-[]𝑛𝑔𝑥delimited-[]𝑛italic-ϕy[n]=g(x[n],\phi)italic_y [ italic_n ] = italic_g ( italic_x [ italic_n ] , italic_ϕ ) that closely matches the acoustic characteristics of the output of the physical device. We provide model details below.

2.1 Differentiable Preamp

The preamp, the first stage in a guitar amplifier, boosts the input signal and determines its initial tone [1]. It exhibits dynamic nonlinear behavior significantly influenced by its “bias point” [1, 5], which is determined by the DC operating conditions of the circuit. The bias point would vary with the input signal level, leading to dynamic variations in harmonic content and compression characteristics. The “gain” knob affects the preamp behavior by altering both the signal amplification and frequency response, resulting in diverse sound variation.

Inspired by [5], we adapt the Wiener-Hammerstein (WH) model [27, 26] to capture this complex behavior. Here, a “WH block” is composed of two linear filters H1subscript𝐻1H_{1}italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and H2subscript𝐻2H_{2}italic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, each implemented as cascades of low-shelf, peak, and high-shelf filters, and a nonlinear function f()𝑓f(\cdot)italic_f ( ⋅ ):

z[n]=H2(f(H1(z[n]))),superscript𝑧delimited-[]𝑛subscript𝐻2𝑓subscript𝐻1𝑧delimited-[]𝑛z^{\prime}[n]=H_{2}(f(H_{1}(z[n])))\,,italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT [ italic_n ] = italic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_f ( italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_z [ italic_n ] ) ) ) , (1)

where z[n]𝑧delimited-[]𝑛z[n]italic_z [ italic_n ] and z[n]superscript𝑧delimited-[]𝑛z^{\prime}[n]italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT [ italic_n ] denote the input and output of a block.

Different from [5], which uses a static nonlinear function such as tanh\tanhroman_tanh for f()𝑓f(\cdot)italic_f ( ⋅ ), we propose to use a gated recurrent unit (GRU) with hidden size 1 as f()𝑓f(\cdot)italic_f ( ⋅ ) to specifically account for the behavior of the bias point in the preamp. This innovative use of a GRU effectively models the signal level-dependent nonlinearity and captures the dynamic response of the preamp. By leveraging the ability of GRU to learn complex nonlinearities from data, we maintain interpretability by emulating the “memory” effect in tube circuits [1]. Crucially, we find additionally that our GRU-based approach accurately represents asymmetric distortion, a key characteristic of tube preamps that cannot be modeled by symmetric functions such as tanh\tanhroman_tanh. While there are methods to introduce asymmetry to tanh\tanhroman_tanh functions (e.g., adding DC offsets or combining multiple tanh\tanhroman_tanh), these often require careful tuning and may not capture the full complexity of the preamp’s behavior. Moreover, unlike GRU, tanh\tanhroman_tanh is not able to model the time-dependent behaviors of preamps, since the output of tanh\tanhroman_tanh is not affected by the previous output. Finally, we note that, despite harnessing neural network capabilities, the GRU maintains computational efficiency suitable for real-time applications.

To enhance the model’s versatility, we employ N=4𝑁4N=4italic_N = 4 cascaded WH blocks. Moreover, as shown in Figure 2a, we incorporate information from the “gain” knob both before and after the nonlinear stage f()𝑓f(\cdot)italic_f ( ⋅ ) in each WH blck. The “pregain,” applied before the nonlinear stage, controls the amount of signal fed into the nonlinearity, effectively determining the level of distortion. The “postgain,” applied after the nonlinear stage, adjusts the overall output level without altering the distortion characteristics. This allows for fine-tuned control over the distortion amount and overall volume independently, enabling accurate reproduction of various distortion levels and acoustic characteristics across different gain settings.

2.2 Differentiable Tone Stack

The tone stack, positioned after the preamp, is crucial for shaping the amplifier’s frequency response. It typically features three interactive controls: “bass,” “mids,” and “treble,” allowing guitarists to tune and sculpt the preamp’s distorted output [28]. The complex interdependence of these controls creates a nuanced tone shaping system.

The pursuit of a differentiable tone stack has been done before by Miklánek et al. [6]. Similarly, we model it as a series of three filters: low-shelf, peak, and high-shelf (LPH) [29], as shown in Figure 2b. This structure effectively replicates various tone stack configs by adjusting filter parameters based on knob settings. The low-shelf and high-shelf filters primarily modulate bass and treble responses, respectively, while the peak filter targets mid-range frequencies. Our digital representation not only mimics the frequency response of analog tone stacks but also preserves the critical interactive relationships between the control knobs, capturing the subtleties that characterize different guitar amplifiers.

2.3 Differentiable Power Amp

The power amp is the last amplification stage before the output transformer. Its behavior is affected by output tube type, class of operation, and negative feedback [30]. Little work, however, has been done to build differential power amp and output transformer. We present such an endeavor below.

We pioneer the design of a differentiable power amp following the push/pull topology (POW) in the physical design of a power amp, which includes a phase splitter, a negative feedback loop, and a presence control [1]. Our DSP implementation of the phase splitter (PS) inverts one audio segment with phase inversion (PI) and applies a simplified soft-clipper nonlinear function, approximating the behavior of tube phase splitters while maintaining computational efficiency [31]. Implementing negative feedback in digital systems traditionally introduces echo artifacts due to the inherent delay. We address this challenge by using a linear filter to emulate negative feedback behavior, effectively reducing distortion and controlling frequency response before the phase splitter stage with light computational cost. As shown in Figure 2c, the signal path before PS includes a “master” volume control (M), a filter (F) emulating negative feedback and presence control, along with a gain control (G). After PS, both separate paths for each phase employ WH blocks with gain control and similarly a GRU with hidden size 1 for the nonlinear function.

2.4 Differentiable Output Transformer

The output transformer, which should not be confused with self-attention based transformers used in NNs [32], performs impedance matching between the power amp and speaker, nonlinearly amplifying the signal and adding tone coloration. This behavior is primarily due to magnetic domain hysteresis and frequency-dependent losses [33, 34]. Moreover, the output transformer exhibits a bandpass characteristic [34].

We propose to model the output transformer (TRANS) simply with a GRU with again hidden size 1, complemented by gain control and filtering, as shown in Figure 2d. The GRU’s state-dependent output captures the history-dependent nature of hysteresis, while the additional modules model the frequency-shaping effects. This approach, chosen over traditional models like Jiles-Atherton [35], offers data-driven parameter learning ability with light computational cost, capturing the dynamic, nonlinear, and frequency-dependent characteristics of real transformers. Interestingly, as demonstrated on our project page, we find that the resulting distortion curve closely resembles a standard magnetic hysteresis curve.

Refer to caption

Fig. 2: Details of DDSP Guitar Amp. Letters in the squares denote their respective types—F: filter, G: gain, NL: nonlinear function, L: low-shelf filter, P: peak filter, H: high-shelf filter, M: master, PS: phase splitter, PI: phase inversion. Orange blocks: knob controller-estimated parameters for multiplication. Pink blocks: designed operations with Knob Controller-predicted parameters. Green blocks: fixed operations without learnable parameters. Best viewed in color.

2.5 Knob Controller

To achieve full dynamic modeling, we implement MLPs as knob controllers. Each amp component has an independent set of MLP layers that map knob values to DSP parameters, enabling the model to learn complex, nonlinear relations between knob adjustments and sonic outcomes. This approach captures subtle interactions beyond simple scaling. For instance, the “gain” knob effect combines filtering and amplification, while the “bass” knob influences multiple frequency ranges. Learning these intricate relationships from data, our model has the opportunity to replicate the full spectrum of tone variations possible in physical guitar amps.

Model Seen knob conditions Unseen knob conditions Ops/sample Params
MAE \downarrow MR-STFT \downarrow MAE \downarrow MR-STFT \downarrow
A. Small Concat-GRU-8 0.057 4.302 0.075 5.762 1,344 369
B. Big Concat-GRU-48 0.013 1.214 0.023 1.851 19,872 7,969
C. WH Only 0.317 2.552 0.189 4.675 736 4,462
D. WH+++LPH+++WH 0.063 5.098 0.066 5.803 995 10,213
E. WH+++LPH+++POW 0.034 2.979 0.057 4.825 1,243 8,200
F. WH+++LPH+++POW+++TRANS 0.024 2.161 0.043 3.972 1,352 10,126
Table 1: Evaluation results of (AB) black-box baselines and (F) the proposed DDSP model and (CE) its ablations.

3 Experimental Setup

Our experiment targets the emulation of Marshall JVM 410H, a tube-based guitar amplifier. We use the data from Miklanek et al. [6], comprising 6-minute audio files of unprocessed dry signals sampled at 44.1kHz, featuring diverse guitar and bass playing styles. Target signals were recorded from the “OD1” channel, a high-distortion setting. We split the data into train, validation and test sets with a 6:1:3 ratio. Following [6], we use both seen and unseen knob conditions for testing.

For performance comparison, we establish two black-box NN baselines: Small Concat-GRU-8 and Big Concat-GRU-48, with hidden sizes of 8 and 48 respectively, using the concatenation method [22] for knob value conditioning. Moreover, we evaluate ablated versions of the proposed DDSP-based model, listed below with increasing complexity:

  • WH only: A single multi-block Wiener-Hammerstein model as described in Section 2.1 that emulates the entire guitar amp, which resembles the approach of [5].

  • WH+LPH+WH: A WH model for the preamp, followed by a series of low-shelf, peak, high-shelf filters (LPH) emulating the tone stack, and another WH model that emulates both the power amp and transformer altogether. This resembles the approach of [6] as the same NN structure are used before and after the tone stack.

  • WH+LPH+POW: Replace the WH after LPH by the proposed POW-based design described in Section 2.3, omitting the transformer modeling in Section 2.4.

  • WH+LPH+POW+TRANS: Our full proposed model.

Here are some implementation details. We implemented all the models using the PyNeuralFx library [36] and trained them on a single NVIDIA RTX 4090 GPU. Our training process starts with an initial learning rate of 2e–3 over 100 epochs, adaptively halving the learning rate after two consecutive non-improving validations and early stopping after four. The DDSP models were trained on 8,192-sample audio segments, while the GRU baselines used 2,048-sample segments due to gradient updating constraints in recurrent networks. We used mean absolute error (MAE) and multi-resolution STFT (MR-STFT) losses [13]. The STFT loss employs three FFT window sizes: 128, 512, and 2,048. For DDSP models, MLP controller outputs were sigmoid-activated and mapped to predefined parameter ranges, ensuring physically meaningful bounds. All MLPs in our model comprise three layers with LeakyReLU activations (slope 0.1) and 32 hidden units. Condition values were normalized to [1,1]11[-1,1][ - 1 , 1 ] for consistent input scaling. Linear filters in DDSP models were implemented as IIR biquad filters using dasp-pytorch [37]. We employed the frequency sampling method for efficient biquad IIR filter updating as in [38, 39]. This balances model performance with computational efficiency, enabling fair comparison between our DDSP-based method and the baselines.

4 Objective Evaluation Results

We evaluate the accuracy of the implemented models for emulating the target guitar amp (for unseen content in test data) in MAE and MR-STFT [22]. Moreover, we assess model efficiency by operations (ops) per sample. For reference, sigmoid and tanh operations cost 30 floating point ops [40].

Table 1 presents the result, starting with the two baseline GRU models: (A) Small Concat-GRU-8 and (B) Big Concat-GRU-48. Config B, which is a strong baseline, attains the lowest MAE and MR-STFT for both seen and unseen conditions, but with significantly higher computational cost. Config A demonstrates that merely reducing GRU size is ineffective for balancing modeling accuracy and efficiency.

We then progressively evaluate four configs (CF) to validate our design choices. Config C (WH model only) excels in frequency content reconstruction but struggles with time-domain aspects and lacks interpretability due to entangled components. Config D introduces a structured approach separating preamp, tonestack, and power amp, but its simple WH structure for the power amp yields suboptimal results. Config E improves upon config D with a more sophisticated power amp model, showing noticeable enhancements. Finally, the proposed method (config F), which incorporates an output transformer model, crucially enables performance comparable to config A with improved accuracy and interpretability. Notably, config F requires less than 10% of the operations per sample compared to config B.

The clear reduction in MAE and MR-STFT going from C to F for either seen or unseen conditions validates our design process. While config B has the lowest losses, config F exhibits a better balance of accuracy, computational efficiency, and interpretability. Config F outperforms the similarly-sized config A, indicating the superiority of our physically-inspired approach over the simple approach of model scaling down.

5 Conclusion

In this paper, we have presented a DDSP-based method for guitar amplifier modeling, leveraging physical design principles to balance interpretability, computational efficiency, and accurate reproduction of nonlinear behavior. Our approach demonstrates advantages over black-box NN baselines in capturing component interactions and user control. Future work could explore addressing aliasing in nonlinear digital systems.

References

  • [1] Jyri Pakarinen and David T. Yeh, “A review of digital techniques for modeling vacuum-tube guitar amplifiers,” Computer Music Journal, vol. 33, no. 2, pp. 85–100, 2009.
  • [2] Fabián Esqueda et al., “Differentiable white-box virtual analog modeling,” in DAFx, 2021.
  • [3] Julian D. Parker, Sebastian J. Schlecht, Rudolf Rabenstein, and Maximilian Schäfer, “Physical modeling using recurrent neural networks with fast convolutional layers,” in DAFx, 2022.
  • [4] David T. Yeh, Jonathan S. Abel, and Julius O. Smith, “Automated physical modeling of nonlinear audio circuits for real-time audio effects,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 4, pp. 728–737, 2010.
  • [5] Felix Eichas and Udo Zölzer, “Gray-box modeling of guitar amplifiers,” Journal of the Audio Engineering Society, 2018.
  • [6] Stepan Miklanek et al., “Neural grey-box guitar amplifier modelling with limited data,” in DAFx, 2023.
  • [7] Shahan Nercessian et al., “Lightweight and interpretable neural modeling of an audio distortion effect using hyperconditioned differentiable biquads,” in ICASSP, 2021.
  • [8] Felix Eichas, Stephan Möller, and Udo Zölzer, “Block-oriented gray box modeling of guitar amplifiers,” in DAFx, 2017.
  • [9] Felix Eichas and Udo Zölzer, “Virtual analog modeling of guitar amplifiers with Wiener-Hammerstein models,” in Proc. Annual Convention on Acoustics, 2018.
  • [10] Marco A. Martínez Ramírez, Emmanouil Benetos, and Joshua D. Reiss, “Deep learning for black-box modeling of audio effects,” Applied Sciences, vol. 10, no. 2, pp. 638, 2020.
  • [11] Eero-Pekka Damskägg, Lauri Juvela, and Vesa Välimäki, “Real-time modeling of audio distortion circuits with deep learning,” in Proc. Sound and Music Computing Conf., 2019.
  • [12] Marco A. Martínez Ramírez and Joshua D. Reiss, “Modeling nonlinear audio effects with end-to-end deep neural networks,” in ICASSP, 2019.
  • [13] Christian J. Steinmetz and Joshua D. Reiss, “Efficient neural networks for real-time modeling of analog dynamic range compression,” Journal of the Audio Engineering Society, 2022.
  • [14] Eero-Pekka Damskägg et al., “Deep learning for tube amplifier emulation,” in ICASSP, 2019.
  • [15] John Covert and David L. Livingston, “A vacuum-tube guitar amplifier model using a recurrent neural network,” in Proc. IEEE Southeastcon, 2013.
  • [16] Thomas Schmitz and Jean-Jacques Embrechts, “Real time emulation of parametric guitar tube amplifier with long short term memory neural network,” arXiv preprint arXiv:1804.07145, 2018.
  • [17] Zhichen Zhang et al., “A vacuum-tube guitar amplifier model using long/short-term memory networks,” SoutheastCon, 2018.
  • [18] Alec Wright et al., “Real-time black-box modelling with recurrent neural networks,” in DAFx, 2019.
  • [19] Lauri Juvela et al., “End-to-end amp modeling: from data to controllable guitar amplifier models,” in ICASSP, 2023.
  • [20] Yu-Hua Chen et al., “Improving unsupervised clean-to-rendered guitar tone transformation using GANs and integrated unaligned clean data,” in DAFx, 2024.
  • [21] Yu-Hua Chen et al., “Towards zero-shot amplifier modeling: One-to-many amplifier modeling via tone embedding control,” in ISMIR, 2024.
  • [22] Yen-Tung Yeh, Wen-Yi Hsiao, and Yi-Hsuan Yang, “Hyper recurrent neural network: Condition mechanisms for black-box audio effect modeling,” in DAFx, 2024.
  • [23] Tara Vanhatalo et al., “A review of neural network-based emulation of guitar amplifiers,” Applied Sciences, vol. 12, no. 12, pp. 5894, 2022.
  • [24] Jesse Engel et al., “DDSP: Differentiable digital signal processing,” in ICLR, 2020.
  • [25] Lenny Renault et al., “DDSP-Piano: A neural sound synthesizer informed by instrument knowledge,” Journal of the Audio Engineering Society, vol. 71, no. 9, pp. 552–565, 2023.
  • [26] Afef Ghanmi, Samira Kamoun, and Sofien Hajji, “Structural estimation of Wiener-Hammerstein mathematical models,” in ICSC, 2015.
  • [27] The MathWorks Inc., “What are Hammerstein-Wiener models?,” 2022.
  • [28] David T. Yeh and Julius O Smith, “Simulating guitar distortion circuits using wave digital and nonlinear state-space formulations,” in DAFx, 2008.
  • [29] Joshua D Reiss and Andrew McPherson, Audio effects: theory, implementation and application, CRC Press, 2014.
  • [30] E. Barbour, “The cool sound of tubes [vacuum tube musical applications],” IEEE Spectrum, vol. 35, no. 8, pp. 24–35, 1998.
  • [31] Kristjan Dempwolf et al., “The influence of small variations in a simplified guitar amplifier model,” in DAFx, 2009.
  • [32] Ashish Vaswani et al., “Attention is all you need,” in NeurIPS, 2017, pp. 5998–6008.
  • [33] H. Neal Bertram, Theory of magnetic recording, Cambridge University Press, 1994.
  • [34] Rafael Cauduro Dias de Paiva et al., “Real-time audio transformer emulation for virtual tube amplifiers,” EURASIP Journal on Advances in Signal Processing, 2011.
  • [35] Martin Holters and Udo Zölzer, “Circuit simulation with inductors and transformers based on the jiles-atherton model of magnetization,” in DAFx, 2016.
  • [36] Yen-Tung Yeh, Wen-Yi Hsiao, and Yi-Hsuan Yang, “PyNeuralFx: A Python package for neural audio effect modeling,” arXiv preprint arXiv:2408.06053, 2024, [Online] https://github.com/ytsrt66589/pyneuralfx/.
  • [37] Christian J. Steinmetz, “dasp-pytorch,” [Online] https://github.com/csteinmetz1/dasp-pytorch/.
  • [38] Shahan Nercessian, “Neural parametric equalizer matching using differentiable biquads,” in DAFx, 2020.
  • [39] Joseph T. Colonel et al., “Direct design of biquad filter cascades with deep learning by sampling random polynomials,” in ICASSP, 2022.
  • [40] Julian D Parker, Fabián Esqueda, and André Bergner, “Modelling of nonlinear state-space systems using a deep neural network,” in DAFx, 2019.