-
CAMELoT: Towards Large Language Models with Training-Free Consolidated Associative Memory
Authors:
Zexue He,
Leonid Karlinsky,
Donghyun Kim,
Julian McAuley,
Dmitry Krotov,
Rogerio Feris
Abstract:
Large Language Models (LLMs) struggle to handle long input sequences due to high memory and runtime costs. Memory-augmented models have emerged as a promising solution to this problem, but current methods are hindered by limited memory capacity and require costly re-training to integrate with a new LLM. In this work, we introduce an associative memory module which can be coupled to any pre-trained…
▽ More
Large Language Models (LLMs) struggle to handle long input sequences due to high memory and runtime costs. Memory-augmented models have emerged as a promising solution to this problem, but current methods are hindered by limited memory capacity and require costly re-training to integrate with a new LLM. In this work, we introduce an associative memory module which can be coupled to any pre-trained (frozen) attention-based LLM without re-training, enabling it to handle arbitrarily long input sequences. Unlike previous methods, our associative memory module consolidates representations of individual tokens into a non-parametric distribution model, dynamically managed by properly balancing the novelty and recency of the incoming data. By retrieving information from this consolidated associative memory, the base LLM can achieve significant (up to 29.7% on Arxiv) perplexity reduction in long-context modeling compared to other baselines evaluated on standard benchmarks. This architecture, which we call CAMELoT (Consolidated Associative Memory Enhanced Long Transformer), demonstrates superior performance even with a tiny context window of 128 tokens, and also enables improved in-context learning with a much larger set of demonstrations.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
On the existence of some completely regular codes in Hamming graphs
Authors:
Denis S. Krotov
Abstract:
We solve several first questions in the table of small parameters of completely regular (CR) codes in Hamming graphs $H(n,q)$. The most uplifting result is the existence of a $\{13,6,1;1,6,9\}$-CR code in $H(n,2)$, $n\ge 13$. We also establish the non-existence of a $\{11,4;3,6\}$-code and a $\{10,3;4,7\}$-code in $H(12,2)$ and $H(13,2)$. A partition of the complement of the quaternary Hamming cod…
▽ More
We solve several first questions in the table of small parameters of completely regular (CR) codes in Hamming graphs $H(n,q)$. The most uplifting result is the existence of a $\{13,6,1;1,6,9\}$-CR code in $H(n,2)$, $n\ge 13$. We also establish the non-existence of a $\{11,4;3,6\}$-code and a $\{10,3;4,7\}$-code in $H(12,2)$ and $H(13,2)$. A partition of the complement of the quaternary Hamming code of length~$5$ into $4$-cliques is found, which can be used to construct completely regular codes with covering radius $1$ by known constructions. Additionally we discuss the parameters $\{24,21,10;1,4,12\}$ of a putative completely regular code in $H(24,2)$ and show the nonexistence of such a code in $H(8,4)$.
Keywords: Hamming graph, equitable partition, completely regular code
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
Multispreads
Authors:
Denis S. Krotov,
Ivan Yu. Mogilnykh
Abstract:
Additive one-weight codes over a finite field of non-prime order are equivalent to special subspace coverings of the points of projective space, which we call multispreads. The current paper is devoted to the characterization of the parameters of multispreads, which is equivalent to the characterization of the parameters of additive one-weight codes. We characterize these parameters for the case o…
▽ More
Additive one-weight codes over a finite field of non-prime order are equivalent to special subspace coverings of the points of projective space, which we call multispreads. The current paper is devoted to the characterization of the parameters of multispreads, which is equivalent to the characterization of the parameters of additive one-weight codes. We characterize these parameters for the case of the prime-square order of the field and make a partial characterization for the prime-cube case and the case of the fourth degree of a prime (including a complete characterization for orders 8, 27, and 16).
△ Less
Submitted 19 March, 2024; v1 submitted 12 December, 2023;
originally announced December 2023.
-
On degree-$3$ and $(n-4)$-correlation-immune perfect colorings of $n$-cubes
Authors:
Denis S. Krotov,
Alexandr A. Valyuzhenich
Abstract:
A perfect $k$-coloring of the Boolean hypercube $Q_n$ is a function from the set of binary words of length $n$ onto a $k$-set of colors such that for any colors $i$ and $j$ every word of color $i$ has exactly $S(i,j)$ neighbors (at Hamming distance $1$) of color $j$, where the coefficient $S(i,j)$ depends only on $i$ and $j$ but not on the particular choice of the word. The $k$-by-$k$ table of all…
▽ More
A perfect $k$-coloring of the Boolean hypercube $Q_n$ is a function from the set of binary words of length $n$ onto a $k$-set of colors such that for any colors $i$ and $j$ every word of color $i$ has exactly $S(i,j)$ neighbors (at Hamming distance $1$) of color $j$, where the coefficient $S(i,j)$ depends only on $i$ and $j$ but not on the particular choice of the word. The $k$-by-$k$ table of all coefficients $S(i,j)$ is called the quotient matrix. We characterize perfect colorings of $Q_n$ of degree at most $3$, that is, with quotient matrix whose all eigenvalues are not less than $n-6$, or, equivalently, such that every color corresponds to a Boolean function represented by a polynomial of degree at most $3$ over $R$. Additionally, we characterize $(n-4)$-correlation-immune perfect colorings of $Q_n$, whose all colors correspond to $(n-4)$-correlation-immune Boolean functions, or, equivalently, all non-main (different from $n$) eigenvalues of the quotient matrix are not greater than $6-n$.
Keywords: perfect coloring, equitable partition, resilient function, correlation-immune function.
△ Less
Submitted 23 June, 2024; v1 submitted 9 November, 2023;
originally announced November 2023.
-
The classification of orthogonal arrays OA(2048,14,2,7) and some completely regular codes
Authors:
Denis S. Krotov
Abstract:
We describe the classification of orthogonal arrays OA$(2048,14,2,7)$, or, equivalently, completely regular $\{14;2\}$-codes in the $14$-cube ($30848$ equivalence classes). In particular, we find that there is exactly one almost-OA$(2048,14,2,7{+}1)$, up to equivalence. As derived objects, OA$(1024,13,2,6)$ ($202917$ classes) and completely regular $\{12,2;2,12\}$- and $\{14, 12, 2; 2, 12, 14\}$-c…
▽ More
We describe the classification of orthogonal arrays OA$(2048,14,2,7)$, or, equivalently, completely regular $\{14;2\}$-codes in the $14$-cube ($30848$ equivalence classes). In particular, we find that there is exactly one almost-OA$(2048,14,2,7{+}1)$, up to equivalence. As derived objects, OA$(1024,13,2,6)$ ($202917$ classes) and completely regular $\{12,2;2,12\}$- and $\{14, 12, 2; 2, 12, 14\}$-codes in the $13$- and $14$-cubes, respectively, are also classified.
Keywords: binary orthogonal array, completely regular code, binary 1-perfect code.
△ Less
Submitted 12 June, 2024; v1 submitted 9 November, 2023;
originally announced November 2023.
-
Memory in Plain Sight: Surveying the Uncanny Resemblances of Associative Memories and Diffusion Models
Authors:
Benjamin Hoover,
Hendrik Strobelt,
Dmitry Krotov,
Judy Hoffman,
Zsolt Kira,
Duen Horng Chau
Abstract:
The generative process of Diffusion Models (DMs) has recently set state-of-the-art on many AI generation benchmarks. Though the generative process is traditionally understood as an "iterative denoiser", there is no universally accepted language to describe it. We introduce a novel perspective to describe DMs using the mathematical language of memory retrieval from the field of energy-based Associa…
▽ More
The generative process of Diffusion Models (DMs) has recently set state-of-the-art on many AI generation benchmarks. Though the generative process is traditionally understood as an "iterative denoiser", there is no universally accepted language to describe it. We introduce a novel perspective to describe DMs using the mathematical language of memory retrieval from the field of energy-based Associative Memories (AMs), making efforts to keep our presentation approachable to newcomers to both of these fields. Unifying these two fields provides insight that DMs can be seen as a particular kind of AM where Lyapunov stability guarantees are bypassed by intelligently engineering the dynamics (i.e., the noise and step size schedules) of the denoising process. Finally, we present a growing body of evidence that records DMs exhibiting empirical behavior we would expect from AMs, and conclude by discussing research opportunities that are revealed by understanding DMs as a form of energy-based memory.
△ Less
Submitted 28 May, 2024; v1 submitted 28 September, 2023;
originally announced September 2023.
-
Long Sequence Hopfield Memory
Authors:
Hamza Tahir Chaudhry,
Jacob A. Zavatone-Veth,
Dmitry Krotov,
Cengiz Pehlevan
Abstract:
Sequence memory is an essential attribute of natural and artificial intelligence that enables agents to encode, store, and retrieve complex sequences of stimuli and actions. Computational models of sequence memory have been proposed where recurrent Hopfield-like neural networks are trained with temporally asymmetric Hebbian rules. However, these networks suffer from limited sequence capacity (maxi…
▽ More
Sequence memory is an essential attribute of natural and artificial intelligence that enables agents to encode, store, and retrieve complex sequences of stimuli and actions. Computational models of sequence memory have been proposed where recurrent Hopfield-like neural networks are trained with temporally asymmetric Hebbian rules. However, these networks suffer from limited sequence capacity (maximal length of the stored sequence) due to interference between the memories. Inspired by recent work on Dense Associative Memories, we expand the sequence capacity of these models by introducing a nonlinear interaction term, enhancing separation between the patterns. We derive novel scaling laws for sequence capacity with respect to network size, significantly outperforming existing scaling laws for models based on traditional Hopfield networks, and verify these theoretical results with numerical simulation. Moreover, we introduce a generalized pseudoinverse rule to recall sequences of highly correlated patterns. Finally, we extend this model to store sequences with variable timing between states' transitions and describe a biologically-plausible implementation, with connections to motor neuroscience.
△ Less
Submitted 2 November, 2023; v1 submitted 7 June, 2023;
originally announced June 2023.
-
End-to-end Differentiable Clustering with Associative Memories
Authors:
Bishwajit Saha,
Dmitry Krotov,
Mohammed J. Zaki,
Parikshit Ram
Abstract:
Clustering is a widely used unsupervised learning technique involving an intensive discrete optimization problem. Associative Memory models or AMs are differentiable neural networks defining a recursive dynamical system, which have been integrated with various deep learning architectures. We uncover a novel connection between the AM dynamics and the inherent discrete assignment necessary in cluste…
▽ More
Clustering is a widely used unsupervised learning technique involving an intensive discrete optimization problem. Associative Memory models or AMs are differentiable neural networks defining a recursive dynamical system, which have been integrated with various deep learning architectures. We uncover a novel connection between the AM dynamics and the inherent discrete assignment necessary in clustering to propose a novel unconstrained continuous relaxation of the discrete clustering problem, enabling end-to-end differentiable clustering with AM, dubbed ClAM. Leveraging the pattern completion ability of AMs, we further develop a novel self-supervised clustering loss. Our evaluations on varied datasets demonstrate that ClAM benefits from the self-supervision, and significantly improves upon both the traditional Lloyd's k-means algorithm, and more recent continuous clustering relaxations (by upto 60% in terms of the Silhouette Coefficient).
△ Less
Submitted 5 June, 2023;
originally announced June 2023.
-
Quasi-cyclic perfect codes in Doob graphs and special partitions of Galois rings
Authors:
Minjia Shi,
Xiaoxiao Li,
Denis S. Krotov,
Ferruh Özbudak
Abstract:
The Galois ring GR$(4^Δ)$ is the residue ring $Z_4[x]/(h(x))$, where $h(x)$ is a basic primitive polynomial of degree $Δ$ over $Z_4$. For any odd $Δ$ larger than $1$, we construct a partition of GR$(4^Δ) \backslash \{0\}$ into $6$-subsets of type $\{a,b,-a-b,-a,-b,a+b\}$ and $3$-subsets of type $\{c,-c,2c\}$ such that the partition is invariant under the multiplication by a nonzero element of the…
▽ More
The Galois ring GR$(4^Δ)$ is the residue ring $Z_4[x]/(h(x))$, where $h(x)$ is a basic primitive polynomial of degree $Δ$ over $Z_4$. For any odd $Δ$ larger than $1$, we construct a partition of GR$(4^Δ) \backslash \{0\}$ into $6$-subsets of type $\{a,b,-a-b,-a,-b,a+b\}$ and $3$-subsets of type $\{c,-c,2c\}$ such that the partition is invariant under the multiplication by a nonzero element of the Teichmuller set in GR$(4^Δ)$ and, if $Δ$ is not a multiple of $3$, under the action of the automorphism group of GR$(4^Δ)$.
As a corollary, this implies the existence of quasi-cyclic additive $1$-perfect codes of index $(2^Δ-1)$ in $D((2^Δ-1)(2^Δ-2)/{6}, 2^Δ-1 )$ where $D(m,n)$ is the Doob metric scheme on $Z^{2m+n}$.
△ Less
Submitted 4 May, 2023;
originally announced May 2023.
-
Sparse Distributed Memory is a Continual Learner
Authors:
Trenton Bricken,
Xander Davies,
Deepak Singh,
Dmitry Krotov,
Gabriel Kreiman
Abstract:
Continual learning is a problem for artificial neural networks that their biological counterparts are adept at solving. Building on work using Sparse Distributed Memory (SDM) to connect a core neural circuit with the powerful Transformer model, we create a modified Multi-Layered Perceptron (MLP) that is a strong continual learner. We find that every component of our MLP variant translated from bio…
▽ More
Continual learning is a problem for artificial neural networks that their biological counterparts are adept at solving. Building on work using Sparse Distributed Memory (SDM) to connect a core neural circuit with the powerful Transformer model, we create a modified Multi-Layered Perceptron (MLP) that is a strong continual learner. We find that every component of our MLP variant translated from biology is necessary for continual learning. Our solution is also free from any memory replay or task information, and introduces novel methods to train sparse networks that may be broadly applicable.
△ Less
Submitted 20 March, 2023;
originally announced March 2023.
-
Energy Transformer
Authors:
Benjamin Hoover,
Yuchen Liang,
Bao Pham,
Rameswar Panda,
Hendrik Strobelt,
Duen Horng Chau,
Mohammed J. Zaki,
Dmitry Krotov
Abstract:
Our work combines aspects of three promising paradigms in machine learning, namely, attention mechanism, energy-based models, and associative memory. Attention is the power-house driving modern deep learning successes, but it lacks clear theoretical foundations. Energy-based models allow a principled approach to discriminative and generative tasks, but the design of the energy functional is not st…
▽ More
Our work combines aspects of three promising paradigms in machine learning, namely, attention mechanism, energy-based models, and associative memory. Attention is the power-house driving modern deep learning successes, but it lacks clear theoretical foundations. Energy-based models allow a principled approach to discriminative and generative tasks, but the design of the energy functional is not straightforward. At the same time, Dense Associative Memory models or Modern Hopfield Networks have a well-established theoretical foundation, and allow an intuitive design of the energy function. We propose a novel architecture, called the Energy Transformer (or ET for short), that uses a sequence of attention layers that are purposely designed to minimize a specifically engineered energy function, which is responsible for representing the relationships between the tokens. In this work, we introduce the theoretical foundations of ET, explore its empirical capabilities using the image completion task, and obtain strong quantitative results on the graph anomaly detection and graph classification tasks.
△ Less
Submitted 31 October, 2023; v1 submitted 14 February, 2023;
originally announced February 2023.
-
An upper bound on the number of frequency hypercubes
Authors:
Denis S. Krotov,
Vladimir N. Potapov
Abstract:
A frequency $n$-cube $F^n(q;l_0,...,l_{m-1})$ is an $n$-dimensional $q$-by-...-by-$q$ array, where $q = l_0+...+l_{m-1}$, filled by numbers $0,...,m-1$ with the property that each line contains exactly $l_i$ cells with symbol $i$, $i = 0,...,m-1$ (a line consists of $q$ cells of the array differing in one coordinate). The trivial upper bound on the number of frequency $n$-cubes is $m^{(q-1)^{n}}$.…
▽ More
A frequency $n$-cube $F^n(q;l_0,...,l_{m-1})$ is an $n$-dimensional $q$-by-...-by-$q$ array, where $q = l_0+...+l_{m-1}$, filled by numbers $0,...,m-1$ with the property that each line contains exactly $l_i$ cells with symbol $i$, $i = 0,...,m-1$ (a line consists of $q$ cells of the array differing in one coordinate). The trivial upper bound on the number of frequency $n$-cubes is $m^{(q-1)^{n}}$. We improve that lower bound for $n>2$, replacing $q-1$ by a smaller value, by constructing a testing set of size $s^{n}$, $s<q-1$, for frequency $n$-cubes (a testing sets is a collection of cells of an array the values in which uniquely determine the array with given parameters). We also construct new testing sets for generalized frequency $n$-cubes, which are essentially correlation-immune functions in $n$ $q$-valued arguments; the cardinalities of new testing sets are smaller than for testing sets known before.
Keywords: frequency hypercube, correlation-immune function, latin hypercube, testing set.
△ Less
Submitted 12 June, 2024; v1 submitted 7 December, 2022;
originally announced December 2022.
-
Multifold 1-perfect codes
Authors:
Denis S. Krotov
Abstract:
A multifold $1$-perfect code ($1$-perfect code for list decoding) in any graph is a set $C$ of vertices such that every vertex of the graph is at distance not more than $1$ from exactly $μ$ elements of $C$. In $q$-ary Hamming graphs, where $q$ is a prime power, we characterize all parameters of multifold $1$-perfect codes and all parameters of additive multifold $1$-perfect codes. In particular, w…
▽ More
A multifold $1$-perfect code ($1$-perfect code for list decoding) in any graph is a set $C$ of vertices such that every vertex of the graph is at distance not more than $1$ from exactly $μ$ elements of $C$. In $q$-ary Hamming graphs, where $q$ is a prime power, we characterize all parameters of multifold $1$-perfect codes and all parameters of additive multifold $1$-perfect codes. In particular, we show that additive multifold $1$-perfect codes are related to special multiset generalizations of spreads, multispreads, and that multispreads of parameters corresponding to multifold $1$-perfect codes always exist.
Keywords: perfect codes, multifold packing, multiple covering, list-decoding codes, additive codes, spreads, multispreads, completely regular codes, intriguing sets.
△ Less
Submitted 15 December, 2023; v1 submitted 7 December, 2022;
originally announced December 2022.
-
A family of diameter perfect constant-weight codes from Steiner systems
Authors:
Minjia Shi,
Yuhong Xia,
Denis S. Krotov
Abstract:
If $S$ is a transitive metric space, then $|C|\cdot|A| \le |S|$ for any distance-$d$ code $C$ and a set $A$, ``anticode'', of diameter less than $d$. For every Steiner S$(t,k,n)$ system $S$, we show the existence of a $q$-ary constant-weight code $C$ of length~$n$, weight~$k$ (or $n-k$), and distance $d=2k-t+1$ (respectively, $d=n-t+1$) and an anticode $A$ of diameter $d-1$ such that the pair…
▽ More
If $S$ is a transitive metric space, then $|C|\cdot|A| \le |S|$ for any distance-$d$ code $C$ and a set $A$, ``anticode'', of diameter less than $d$. For every Steiner S$(t,k,n)$ system $S$, we show the existence of a $q$-ary constant-weight code $C$ of length~$n$, weight~$k$ (or $n-k$), and distance $d=2k-t+1$ (respectively, $d=n-t+1$) and an anticode $A$ of diameter $d-1$ such that the pair $(C,A)$ attains the code--anticode bound and the supports of the codewords of $C$ are the blocks of $S$ (respectively, the complements of the blocks of $S$). We study the problem of estimating the minimum value of $q$ for which such a code exists, and find that minimum for small values of $t$.
Keywords: diameter perfect codes, anticodes, constant-weight codes, code--anticode bound, Steiner systems.
△ Less
Submitted 31 July, 2023; v1 submitted 30 November, 2022;
originally announced December 2022.
-
Constructing MRD codes by switching
Authors:
Minjia Shi,
Denis S. Krotov,
Ferruh Özbudak
Abstract:
MRD codes are maximum codes in the rank-distance metric space on $m$-by-$n$ matrices over the finite field of order $q$. They are diameter perfect and have the cardinality $q^{m(n-d+1)}$ if $m\ge n$. We define switching in MRD codes as replacing special MRD subcodes by other subcodes with the same parameters. We consider constructions of MRD codes admitting such switching, including punctured twis…
▽ More
MRD codes are maximum codes in the rank-distance metric space on $m$-by-$n$ matrices over the finite field of order $q$. They are diameter perfect and have the cardinality $q^{m(n-d+1)}$ if $m\ge n$. We define switching in MRD codes as replacing special MRD subcodes by other subcodes with the same parameters. We consider constructions of MRD codes admitting such switching, including punctured twisted Gabidulin codes and direct-product codes. Using switching, we construct a huge class of MRD codes whose cardinality grows doubly exponentially in $m$ if the other parameters ($n$, $q$, the code distance) are fixed. Moreover, we construct MRD codes with different affine ranks and aperiodic MRD codes.
Keywords: MRD codes, rank distance, bilinear forms graph, switching, diameter perfect codes
△ Less
Submitted 1 November, 2022;
originally announced November 2022.
-
Associative Learning for Network Embedding
Authors:
Yuchen Liang,
Dmitry Krotov,
Mohammed J. Zaki
Abstract:
The network embedding task is to represent the node in the network as a low-dimensional vector while incorporating the topological and structural information. Most existing approaches solve this problem by factorizing a proximity matrix, either directly or implicitly. In this work, we introduce a network embedding method from a new perspective, which leverages Modern Hopfield Networks (MHN) for as…
▽ More
The network embedding task is to represent the node in the network as a low-dimensional vector while incorporating the topological and structural information. Most existing approaches solve this problem by factorizing a proximity matrix, either directly or implicitly. In this work, we introduce a network embedding method from a new perspective, which leverages Modern Hopfield Networks (MHN) for associative learning. Our network learns associations between the content of each node and that node's neighbors. These associations serve as memories in the MHN. The recurrent dynamics of the network make it possible to recover the masked node, given that node's neighbors. Our proposed method is evaluated on different downstream tasks such as node classification and linkage prediction. The results show competitive performance compared to the common matrix factorization techniques and deep learning based methods.
△ Less
Submitted 30 August, 2022;
originally announced August 2022.
-
Projective tilings and full-rank perfect codes
Authors:
Denis S. Krotov
Abstract:
A tiling of a vector space $S$ is the pair $(U,V)$ of its subsets such that every vector in $S$ is uniquely represented as the sum of a vector from $U$ and a vector from $V$. A tiling is connected to a perfect codes if one of the sets, say $U$, is projective, i.e., the union of one-dimensional subspaces of $S$. A tiling $(U,V)$ is full-rank if the affine span of each of $U$, $V$ is $S$. For finite…
▽ More
A tiling of a vector space $S$ is the pair $(U,V)$ of its subsets such that every vector in $S$ is uniquely represented as the sum of a vector from $U$ and a vector from $V$. A tiling is connected to a perfect codes if one of the sets, say $U$, is projective, i.e., the union of one-dimensional subspaces of $S$. A tiling $(U,V)$ is full-rank if the affine span of each of $U$, $V$ is $S$. For finite non-binary vector spaces of dimension at least $6$ (at least $10$), we construct full-rank tilings $(U,V)$ with projective $U$ (both $U$ and $V$, respectively). In particular, that construction gives a full-rank ternary $1$-perfect code of length $13$, solving a known problem. We also discuss the treatment of tilings with projective components as factorizations of projective spaces.
Keywords: perfect codes, tilings, group factorization, full-rank tilings, projective geometry
△ Less
Submitted 12 June, 2024; v1 submitted 30 June, 2022;
originally announced July 2022.
-
On the coset graph construction of distance-regular graphs
Authors:
Minjia Shi,
Denis S. Krotov,
Patrick Solé
Abstract:
We show that no more new distance-regular graphs in the tables of the book of (Brouwer, Cohen, Neumaier, 1989) can be produced by using the coset graph of additive completely regular codes over finite fields.
We show that no more new distance-regular graphs in the tables of the book of (Brouwer, Cohen, Neumaier, 1989) can be produced by using the coset graph of additive completely regular codes over finite fields.
△ Less
Submitted 31 May, 2022;
originally announced June 2022.
-
Self-dual Hadamard bent sequences
Authors:
Minjia Shi,
Yaya Li,
Wei Cheng,
Dean Crnković,
Denis Krotov,
Patrick Solé
Abstract:
A new notion of bent sequence related to Hadamard matrices was introduced recently, motivated by a security application ( Solé et al, 2021). We study the self dual class in length at most $196.$ We use three competing methods of generation: Exhaustion, Linear Algebra and Groebner bases. Regular Hadamard matrices and Bush-type Hadamard matrices provide many examples. We conjecture that if $v$ is an…
▽ More
A new notion of bent sequence related to Hadamard matrices was introduced recently, motivated by a security application ( Solé et al, 2021). We study the self dual class in length at most $196.$ We use three competing methods of generation: Exhaustion, Linear Algebra and Groebner bases. Regular Hadamard matrices and Bush-type Hadamard matrices provide many examples. We conjecture that if $v$ is an even perfect square, a self-dual bent sequence of length $v$ always exist. We introduce the strong automorphism group of Hadamard matrices, which acts on their associated self-dual bent sequences. We give an efficient algorithm to compute that group.
△ Less
Submitted 22 June, 2022; v1 submitted 30 March, 2022;
originally announced March 2022.
-
An enumeration of 1-perfect ternary codes
Authors:
Minjia Shi,
Denis S. Krotov
Abstract:
We study codes with parameters of the ternary Hamming $(n=(3^m-1)/2,3^{n-m},3)$ code, i.e., ternary $1$-perfect codes. The rank of the code is defined to be the dimension of its affine span. We characterize ternary $1$-perfect codes of rank $n-m+1$, count their number, and prove that all such codes can be obtained from each other by a sequence of two-coordinate switchings. We enumerate ternary…
▽ More
We study codes with parameters of the ternary Hamming $(n=(3^m-1)/2,3^{n-m},3)$ code, i.e., ternary $1$-perfect codes. The rank of the code is defined to be the dimension of its affine span. We characterize ternary $1$-perfect codes of rank $n-m+1$, count their number, and prove that all such codes can be obtained from each other by a sequence of two-coordinate switchings. We enumerate ternary $1$-perfect codes of length $13$ obtained by concatenation from codes of lengths $9$ and $4$; we find that there are $93241327$ equivalence classes of such codes.
Keywords: perfect codes, ternary codes, concatenation, switching.
△ Less
Submitted 8 April, 2023; v1 submitted 12 October, 2021;
originally announced October 2021.
-
On $q$-ary shortened-$1$-perfect-like codes
Authors:
Minjia Shi,
Rongsheng Wu,
Denis S. Krotov
Abstract:
We study codes with parameters of $q$-ary shortened Hamming codes, i.e., $(n=(q^m-q)/(q-1), q^{n-m}, 3)_q$. Firstly, we prove the fact mentioned in 1998 by Brouwer et al. that such codes are optimal, generalizing it to a bound for multifold packings of radius-$1$ balls, with a corollary for multiple coverings. In particular, we show that the punctured Hamming code is an optimal $q$-fold packing wi…
▽ More
We study codes with parameters of $q$-ary shortened Hamming codes, i.e., $(n=(q^m-q)/(q-1), q^{n-m}, 3)_q$. Firstly, we prove the fact mentioned in 1998 by Brouwer et al. that such codes are optimal, generalizing it to a bound for multifold packings of radius-$1$ balls, with a corollary for multiple coverings. In particular, we show that the punctured Hamming code is an optimal $q$-fold packing with minimum distance $2$. Secondly, for every admissible length starting from $n=20$, we show the existence of $4$-ary codes with parameters of shortened $1$-perfect codes that cannot be obtained by shortening a $1$-perfect code.
Keywords: Hamming graph, multifold packings, multiple coverings, perfect codes.
△ Less
Submitted 28 June, 2023; v1 submitted 11 October, 2021;
originally announced October 2021.
-
Hierarchical Associative Memory
Authors:
Dmitry Krotov
Abstract:
Dense Associative Memories or Modern Hopfield Networks have many appealing properties of associative memory. They can do pattern completion, store a large number of memories, and can be described using a recurrent neural network with a degree of biological plausibility and rich feedback between the neurons. At the same time, up until now all the models of this class have had only one hidden layer,…
▽ More
Dense Associative Memories or Modern Hopfield Networks have many appealing properties of associative memory. They can do pattern completion, store a large number of memories, and can be described using a recurrent neural network with a degree of biological plausibility and rich feedback between the neurons. At the same time, up until now all the models of this class have had only one hidden layer, and have only been formulated with densely connected network architectures, two aspects that hinder their machine learning applications. This paper tackles this gap and describes a fully recurrent model of associative memory with an arbitrary large number of layers, some of which can be locally connected (convolutional), and a corresponding energy function that decreases on the dynamical trajectory of the neurons' activations. The memories of the full network are dynamically "assembled" using primitives encoded in the synaptic weights of the lower layers, with the "assembling rules" encoded in the synaptic weights of the higher layers. In addition to the bottom-up propagation of information, typical of commonly used feedforward neural networks, the model described has rich top-down feedback from higher layers that help the lower-layer neurons to decide on their response to the input stimuli.
△ Less
Submitted 13 July, 2021;
originally announced July 2021.
-
Can a Fruit Fly Learn Word Embeddings?
Authors:
Yuchen Liang,
Chaitanya K. Ryali,
Benjamin Hoover,
Leopold Grinberg,
Saket Navlakha,
Mohammed J. Zaki,
Dmitry Krotov
Abstract:
The mushroom body of the fruit fly brain is one of the best studied systems in neuroscience. At its core it consists of a population of Kenyon cells, which receive inputs from multiple sensory modalities. These cells are inhibited by the anterior paired lateral neuron, thus creating a sparse high dimensional representation of the inputs. In this work we study a mathematical formalization of this n…
▽ More
The mushroom body of the fruit fly brain is one of the best studied systems in neuroscience. At its core it consists of a population of Kenyon cells, which receive inputs from multiple sensory modalities. These cells are inhibited by the anterior paired lateral neuron, thus creating a sparse high dimensional representation of the inputs. In this work we study a mathematical formalization of this network motif and apply it to learning the correlational structure between words and their context in a corpus of unstructured text, a common natural language processing (NLP) task. We show that this network can learn semantic representations of words and can generate both static and context-dependent word embeddings. Unlike conventional methods (e.g., BERT, GloVe) that use dense representations for word embedding, our algorithm encodes semantic meaning of words and their context in the form of sparse binary hash codes. The quality of the learned representations is evaluated on word similarity analysis, word-sense disambiguation, and document classification. It is shown that not only can the fruit fly network motif achieve performance comparable to existing methods in NLP, but, additionally, it uses only a fraction of the computational resources (shorter training time and smaller memory footprint).
△ Less
Submitted 14 March, 2021; v1 submitted 18 January, 2021;
originally announced January 2021.
-
On extended 1-perfect bitrades
Authors:
Evgeny Bespalov,
Denis Krotov
Abstract:
We prove the equivalence of several definitions of extended $1$-perfect bitrades in the Hamming graph $H(n,q)$ and prove the nonexistence of such bitrades for odd $n$.
We prove the equivalence of several definitions of extended $1$-perfect bitrades in the Hamming graph $H(n,q)$ and prove the nonexistence of such bitrades for odd $n$.
△ Less
Submitted 3 December, 2020;
originally announced December 2020.
-
Equitable [[2,10],[6,6]]-partitions of the 12-cube
Authors:
Denis S. Krotov
Abstract:
We describe the computer-aided classification of equitable partitions of the $12$-cube with quotient matrix $[[2,10],[6,6]]$, or, equivalently, simple orthogonal arrays OA$(1536,12,2,7)$, or order-$7$ correlation-immune Boolean functions in $12$ variables with $1536$ ones (which completes the classification of unbalanced order-$7$ correlation-immune Boolean functions in $12$ variables). We find th…
▽ More
We describe the computer-aided classification of equitable partitions of the $12$-cube with quotient matrix $[[2,10],[6,6]]$, or, equivalently, simple orthogonal arrays OA$(1536,12,2,7)$, or order-$7$ correlation-immune Boolean functions in $12$ variables with $1536$ ones (which completes the classification of unbalanced order-$7$ correlation-immune Boolean functions in $12$ variables). We find that there are $103$ equivalence classes of the considered objects, and there are only two almost-OA$(1536,12,2,8)$ among them. Additionally, we find that there are $40$ equivalence classes of pairs of disjoint simple OA$(1536,12,2,7)$ (equivalently, equitable partitions of the $12$-cube with quotient matrix $[[2,6,4], [6,2,4], [6,6,0]]$) and discuss the existence of a non-simple OA$(1536,12,2,7)$.
Keywords: orthogonal arrays, correlation-immune Boolean functions, equitable partitions, perfect colorings, intriguing sets.
△ Less
Submitted 1 October, 2023; v1 submitted 30 November, 2020;
originally announced December 2020.
-
On minimal subspace Zp-null designs
Authors:
Denis S. Krotov
Abstract:
Let $q$ be a power of a prime $p$, and let $V$ be an $n$-dimensional space over the field GF$(q)$. A $Z_p$-valued function $C$ on the set of $k$-dimensional subspaces of $V$ is called a $k$-uniform $Z_p$-null design of strength $t$ if for every $t$-dimensional subspace $y$ of $V$ the sum of $C$ over the $k$-dimensional superspaces of $y$ equals $0$. For $q=p=2$ and $0\le t<k<n$, we prove that the…
▽ More
Let $q$ be a power of a prime $p$, and let $V$ be an $n$-dimensional space over the field GF$(q)$. A $Z_p$-valued function $C$ on the set of $k$-dimensional subspaces of $V$ is called a $k$-uniform $Z_p$-null design of strength $t$ if for every $t$-dimensional subspace $y$ of $V$ the sum of $C$ over the $k$-dimensional superspaces of $y$ equals $0$. For $q=p=2$ and $0\le t<k<n$, we prove that the minimum number of non-zeros of a non-void $k$-uniform $Z_p$-null design of strength $t$ equals $2^{t+1}$. For $q>2$, we give lower and upper bounds for that number.
△ Less
Submitted 30 November, 2020;
originally announced December 2020.
-
Large Associative Memory Problem in Neurobiology and Machine Learning
Authors:
Dmitry Krotov,
John Hopfield
Abstract:
Dense Associative Memories or modern Hopfield networks permit storage and reliable retrieval of an exponentially large (in the dimension of feature space) number of memories. At the same time, their naive implementation is non-biological, since it seemingly requires the existence of many-body synaptic junctions between the neurons. We show that these models are effective descriptions of a more mic…
▽ More
Dense Associative Memories or modern Hopfield networks permit storage and reliable retrieval of an exponentially large (in the dimension of feature space) number of memories. At the same time, their naive implementation is non-biological, since it seemingly requires the existence of many-body synaptic junctions between the neurons. We show that these models are effective descriptions of a more microscopic (written in terms of biological degrees of freedom) theory that has additional (hidden) neurons and only requires two-body interactions between them. For this reason our proposed microscopic theory is a valid model of large associative memory with a degree of biological plausibility. The dynamics of our network and its reduced dimensional equivalent both minimize energy (Lyapunov) functions. When certain dynamical variables (hidden neurons) are integrated out from our microscopic theory, one can recover many of the models that were previously discussed in the literature, e.g. the model presented in "Hopfield Networks is All You Need" paper. We also provide an alternative derivation of the energy function and the update rule proposed in the aforementioned paper and clarify the relationships between various models of this class.
△ Less
Submitted 27 April, 2021; v1 submitted 16 August, 2020;
originally announced August 2020.
-
On the number of frequency hypercubes $F^n(4;2,2)$
Authors:
Minjia Shi,
Shukai Wang,
Xiaoxiao Li,
Denis S. Krotov
Abstract:
A frequency $n$-cube $F^n(4;2,2)$ is an $n$-dimensional $4$-by-...-by-$4$ array filled by $0$s and $1$s such that each line contains exactly two $1$s. We classify the frequency $4$-cubes $F^4(4;2,2)$, find a testing set of size $25$ for $F^3(4;2,2)$, and derive an upper bound on the number of $F^n(4;2,2)$. Additionally, for any $n$ greater than $2$, we construct an $F^n(4;2,2)$ that cannot be refi…
▽ More
A frequency $n$-cube $F^n(4;2,2)$ is an $n$-dimensional $4$-by-...-by-$4$ array filled by $0$s and $1$s such that each line contains exactly two $1$s. We classify the frequency $4$-cubes $F^4(4;2,2)$, find a testing set of size $25$ for $F^3(4;2,2)$, and derive an upper bound on the number of $F^n(4;2,2)$. Additionally, for any $n$ greater than $2$, we construct an $F^n(4;2,2)$ that cannot be refined to a latin hypercube, while each of its sub-$F^{n-1}(4;2,2)$ can.
Keywords: frequency hypercube, frequency square, latin hypercube, testing set, MDS code
△ Less
Submitted 21 April, 2021; v1 submitted 21 May, 2020;
originally announced May 2020.
-
Bio-Inspired Hashing for Unsupervised Similarity Search
Authors:
Chaitanya K. Ryali,
John J. Hopfield,
Leopold Grinberg,
Dmitry Krotov
Abstract:
The fruit fly Drosophila's olfactory circuit has inspired a new locality sensitive hashing (LSH) algorithm, FlyHash. In contrast with classical LSH algorithms that produce low dimensional hash codes, FlyHash produces sparse high-dimensional hash codes and has also been shown to have superior empirical performance compared to classical LSH algorithms in similarity search. However, FlyHash uses rand…
▽ More
The fruit fly Drosophila's olfactory circuit has inspired a new locality sensitive hashing (LSH) algorithm, FlyHash. In contrast with classical LSH algorithms that produce low dimensional hash codes, FlyHash produces sparse high-dimensional hash codes and has also been shown to have superior empirical performance compared to classical LSH algorithms in similarity search. However, FlyHash uses random projections and cannot learn from data. Building on inspiration from FlyHash and the ubiquity of sparse expansive representations in neurobiology, our work proposes a novel hashing algorithm BioHash that produces sparse high dimensional hash codes in a data-driven manner. We show that BioHash outperforms previously published benchmarks for various hashing methods. Since our learning algorithm is based on a local and biologically plausible synaptic plasticity rule, our work provides evidence for the proposal that LSH might be a computational reason for the abundance of sparse expansive motifs in a variety of biological systems. We also propose a convolutional variant BioConvHash that further improves performance. From the perspective of computer science, BioHash and BioConvHash are fast, scalable and yield compressed binary representations that are useful for similarity search.
△ Less
Submitted 30 June, 2020; v1 submitted 14 January, 2020;
originally announced January 2020.
-
Local Unsupervised Learning for Image Analysis
Authors:
Leopold Grinberg,
John Hopfield,
Dmitry Krotov
Abstract:
Local Hebbian learning is believed to be inferior in performance to end-to-end training using a backpropagation algorithm. We question this popular belief by designing a local algorithm that can learn convolutional filters at scale on large image datasets. These filters combined with patch normalization and very steep non-linearities result in a good classification accuracy for shallow networks tr…
▽ More
Local Hebbian learning is believed to be inferior in performance to end-to-end training using a backpropagation algorithm. We question this popular belief by designing a local algorithm that can learn convolutional filters at scale on large image datasets. These filters combined with patch normalization and very steep non-linearities result in a good classification accuracy for shallow networks trained locally, as opposed to end-to-end. The filters learned by our algorithm contain both orientation selective units and unoriented color units, resembling the responses of pyramidal neurons located in the cytochrome oxidase 'interblob' and 'blob' regions in the primary visual cortex of primates. It is shown that convolutional networks with patch normalization significantly outperform standard convolutional networks on the task of recovering the original classes when shadows are superimposed on top of standard CIFAR-10 images. Patch normalization approximates the retinal adaptation to the mean light intensity, important for human vision. We also demonstrate a successful transfer of learned representations between CIFAR-10 and ImageNet 32x32 datasets. All these results taken together hint at the possibility that local unsupervised training might be a powerful tool for learning general representations (without specifying the task) directly from unlabeled data.
△ Less
Submitted 14 August, 2019;
originally announced August 2019.
-
On the number of resolvable Steiner triple systems of small 3-rank
Authors:
Minjia Shi,
Li Xu,
Denis S. Krotov
Abstract:
In a recent work, Jungnickel, Magliveras, Tonchev, and Wassermann derived an overexponential lower bound on the number of nonisomorphic resolvable Steiner triple systems (STS) of order $v$, where $v=3^k$, and $3$-rank $v-k$. We develop an approach to generalize this bound and estimate the number of isomorphism classes of STS$(v)$ of rank $v-k-1$ for an arbitrary $v$ of form $3^kT$.
In a recent work, Jungnickel, Magliveras, Tonchev, and Wassermann derived an overexponential lower bound on the number of nonisomorphic resolvable Steiner triple systems (STS) of order $v$, where $v=3^k$, and $3$-rank $v-k$. We develop an approach to generalize this bound and estimate the number of isomorphism classes of STS$(v)$ of rank $v-k-1$ for an arbitrary $v$ of form $3^kT$.
△ Less
Submitted 29 June, 2019;
originally announced July 2019.
-
On the OA(1536,13,2,7) and related orthogonal arrays
Authors:
Denis S. Krotov
Abstract:
With a computer-aided approach based on the connection with equitable partitions, we establish the uniqueness of the orthogonal array OA$(1536,13,2,7)$, constructed in [D.G.Fon-Der-Flaass. Perfect $2$-Colorings of a Hypercube, Sib. Math. J. 48 (2007), 740-745] as an equitable partition of the $13$-cube with quotient matrix $[[0,13],[3,10]]$. By shortening the OA$(1536,13,2,7)$, we obtain $3$ inequ…
▽ More
With a computer-aided approach based on the connection with equitable partitions, we establish the uniqueness of the orthogonal array OA$(1536,13,2,7)$, constructed in [D.G.Fon-Der-Flaass. Perfect $2$-Colorings of a Hypercube, Sib. Math. J. 48 (2007), 740-745] as an equitable partition of the $13$-cube with quotient matrix $[[0,13],[3,10]]$. By shortening the OA$(1536,13,2,7)$, we obtain $3$ inequivalent orthogonal arrays OA$(768,12,2,6)$, which is a complete classification for these parameters too. After our computing, the first parameters of unclassified binary orthogonal arrays OA$(N,n,2,t)$ attending the Friedman bound $N\ge 2^n(1-n/2(t+1))$ are OA$(2048,14,2,7)$. Such array can be obtained by puncturing any binary $1$-perfect code of length $15$. We construct orthogonal arrays with these and similar parameters OA$(N=2^{n-m+1},n=2^m-2,2,t=2^{m-1}-1)$, $m\ge 4$, that are not punctured $1$-perfect codes. Additionally, we prove that any orthogonal array OA$(N,n,2,t)$ with even $t$ attending the bound $N \ge 2^n(1-(n+1)/2(t+2))$ induces an equitable $3$-partition of the $n$-cube.
△ Less
Submitted 9 December, 2019; v1 submitted 27 May, 2019;
originally announced May 2019.
-
The Steiner triple systems of order 21 with a transversal subdesign TD(3,6)
Authors:
Yue Guan,
Minjia Shi,
Denis S. Krotov
Abstract:
We prove several structural properties of Steiner triple systems (STS) of order 3w+3 that include one or more transversal subdesigns TD(3,w). Using an exhaustive search, we find that there are 2004720 isomorphism classes of STS(21) including a subdesign TD(3,6), or, equivalently, a 6-by-6 latin square.
We prove several structural properties of Steiner triple systems (STS) of order 3w+3 that include one or more transversal subdesigns TD(3,w). Using an exhaustive search, we find that there are 2004720 isomorphism classes of STS(21) including a subdesign TD(3,6), or, equivalently, a 6-by-6 latin square.
△ Less
Submitted 23 May, 2019; v1 submitted 22 May, 2019;
originally announced May 2019.
-
On the number of autotopies of an $n$-ary qusigroup of order $4$
Authors:
Denis S. Krotov,
Evgeny V. Gorkunov,
Vladimir N. Potapov
Abstract:
An algebraic system from a finite set $Σ$ of cardinality $k$ and an $n$-ary operation $f$ invertible in each argument is called an $n$-ary quasigroup of order $k$. An autotopy of an $n$-ary quasigroup $(Σ,f)$ is a collection $(θ_0,θ_1,...,θ_n)$ of $n+1$ permutations of $Σ$ such that $f(θ_1(x_1),...,θ_n(x_n))\equiv θ_0(f(x_1,\ldots,x_n))$. We show that every $n$-ary quasigroup of order $4$ has at l…
▽ More
An algebraic system from a finite set $Σ$ of cardinality $k$ and an $n$-ary operation $f$ invertible in each argument is called an $n$-ary quasigroup of order $k$. An autotopy of an $n$-ary quasigroup $(Σ,f)$ is a collection $(θ_0,θ_1,...,θ_n)$ of $n+1$ permutations of $Σ$ such that $f(θ_1(x_1),...,θ_n(x_n))\equiv θ_0(f(x_1,\ldots,x_n))$. We show that every $n$-ary quasigroup of order $4$ has at least $2^{[n/2]+2}$ and not more than $6\cdot 4^n$ autotopies. We characterize the $n$-ary quasigroups of order $4$ with $2^{(n+3)/2}$, $2\cdot 4^n$, and $6\cdot 4^n$ autotopies.
△ Less
Submitted 1 March, 2019;
originally announced March 2019.
-
On multifold packings of radius-1 balls in Hamming graphs
Authors:
Denis S. Krotov,
Vladimir N. Potapov
Abstract:
A $λ$-fold $r$-packing (multiple radius-$r$ covering) in a Hamming metric space is a code $C$ such that the radius-$r$ balls centered in $C$ cover each vertex of the space by not more (not less, respectively) than $λ$ times. The well-known $r$-error-correcting codes correspond to the case $λ=1$, while in general multifold $r$-packing are related with list decodable codes. We (a) propose asymptotic…
▽ More
A $λ$-fold $r$-packing (multiple radius-$r$ covering) in a Hamming metric space is a code $C$ such that the radius-$r$ balls centered in $C$ cover each vertex of the space by not more (not less, respectively) than $λ$ times. The well-known $r$-error-correcting codes correspond to the case $λ=1$, while in general multifold $r$-packing are related with list decodable codes. We (a) propose asymptotic bounds for the maximum size of a $q$-ary $2$-fold $1$-packing as $q$ grows; (b) prove that a $q$-ary distance-$2$ MDS code of length $n$ is an optimal $n$-fold $1$-packing if $q\ge 2n$; (c) derive an upper bound for the size of a binary $λ$-fold $1$-packing and a lower bound for the size of a binary multiple radius-$1$ covering (the last bound allows to update the small-parameters table); (d) classify all optimal binary $2$-fold $1$-packings up to length $9$, in particular, establish the maximum size $96$ of a binary $2$-fold $1$-packing of length $9$; (e) prove some properties of $1$-perfect unitrades, which are a special case of $2$-fold $1$-packings. Keywords: Hamming graph, multifold ball packings, two-fold ball packings, list decodable codes, multiple coverings, completely regular codes, linear programming bound
△ Less
Submitted 13 May, 2020; v1 submitted 31 January, 2019;
originally announced February 2019.
-
On $(2n/3-1)$-resilient $(n,2)$-functions
Authors:
Denis S. Krotov
Abstract:
A $\{00,01,10,11\}$-valued function on the vertices of the $n$-cube is called a $t$-resilient $(n,2)$-function if it has the same number of $00$s, $01$s, $10$s and $11$s among the vertices of every subcube of dimension $t$. The Friedman and Fon-Der-Flaass bounds on the correlation immunity order say that such a function must satisfy $t\le 2n/3-1$; moreover, the $(2n/3-1)$-resilient $(n,2)$-functio…
▽ More
A $\{00,01,10,11\}$-valued function on the vertices of the $n$-cube is called a $t$-resilient $(n,2)$-function if it has the same number of $00$s, $01$s, $10$s and $11$s among the vertices of every subcube of dimension $t$. The Friedman and Fon-Der-Flaass bounds on the correlation immunity order say that such a function must satisfy $t\le 2n/3-1$; moreover, the $(2n/3-1)$-resilient $(n,2)$-functions correspond to the equitable partitions of the $n$-cube with the quotient matrix $[[0,r,r,r],[r,0,r,r],[r,r,0,r],[r,r,r,0]]$, $r=n/3$. We suggest constructions of such functions and corresponding partitions, show connections with Latin hypercubes and binary $1$-perfect codes, characterize the non-full-rank and the reducible functions from the considered class, and discuss the possibility to make a complete characterization of the class.
△ Less
Submitted 31 January, 2019;
originally announced February 2019.
-
On dual codes in the Doob schemes
Authors:
Denis S. Krotov
Abstract:
The Doob scheme $D(m,n'+n'')$ is a metric association scheme defined on $E_4^m \times F_4^{n'}\times Z_4^{n''}$, where $E_4=GR(4^2)$ or, alternatively, on $Z_4^{2m} \times Z_2^{2n'} \times Z_4^{n''}$. We prove the MacWilliams identities connecting the weight distributions of a linear or additive code and its dual. In particular, for each case, we determine the dual scheme, on the same set but with…
▽ More
The Doob scheme $D(m,n'+n'')$ is a metric association scheme defined on $E_4^m \times F_4^{n'}\times Z_4^{n''}$, where $E_4=GR(4^2)$ or, alternatively, on $Z_4^{2m} \times Z_2^{2n'} \times Z_4^{n''}$. We prove the MacWilliams identities connecting the weight distributions of a linear or additive code and its dual. In particular, for each case, we determine the dual scheme, on the same set but with different metric, such that the weight distribution of an additive code $C$ in the Doob scheme $D(m,n'+n'')$ is related by the MacWilliams identities with the weight distribution of the dual code $C^\perp$ in the dual scheme. We note that in the case of a linear code $C$ in $E_4^m \times F_4^{n'}$, the weight distributions of $C$ and $C^\perp$ in the same scheme are also connected.
△ Less
Submitted 31 January, 2019;
originally announced February 2019.
-
On completely regular codes of covering radius 1 in the halved hypercubes
Authors:
Denis S. Krotov,
Ivan Yu. Mogilnykh,
Anastasia Yu. Vasil'eva
Abstract:
We consider constructions of covering-radius-1 completely regular codes, or, equivalently, equitable 2-partitions (regular 2-partitions, perfect 2-colorings), of halved n-cubes. Keywords: completely regular code, equitable partition, regular partition, partition design, perfect coloring, halved hypercube.
We consider constructions of covering-radius-1 completely regular codes, or, equivalently, equitable 2-partitions (regular 2-partitions, perfect 2-colorings), of halved n-cubes. Keywords: completely regular code, equitable partition, regular partition, partition design, perfect coloring, halved hypercube.
△ Less
Submitted 7 December, 2018;
originally announced December 2018.
-
On the gaps of the spectrum of volumes of trades
Authors:
Denis S. Krotov
Abstract:
A pair $\{T_0,T_1\}$ of disjoint collections of $k$-subsets (blocks) of a set $V$ of cardinality $v$ is called a $t$-$(v,k)$ trade or simply a $t$-trade if every $t$-subset of $V$ is included in the same number of blocks of $T_0$ and $T_1$. The cardinality of $T_0$ is called the volume of the trade. Using the weight distribution of the Reed--Muller code, we prove the conjecture that for every $i$…
▽ More
A pair $\{T_0,T_1\}$ of disjoint collections of $k$-subsets (blocks) of a set $V$ of cardinality $v$ is called a $t$-$(v,k)$ trade or simply a $t$-trade if every $t$-subset of $V$ is included in the same number of blocks of $T_0$ and $T_1$. The cardinality of $T_0$ is called the volume of the trade. Using the weight distribution of the Reed--Muller code, we prove the conjecture that for every $i$ from $2$ to $t$, there are no $t$-trades of volume greater than $2^{t+1}-2^i$ and less than $2^{t+1}-2^{i-1}$ and derive restrictions on the $t$-trade volumes that are less than $2^{t+1}+2^{t-1}$.
△ Less
Submitted 31 October, 2018;
originally announced November 2018.
-
The existence of perfect codes in Doob graphs
Authors:
Denis S. Krotov
Abstract:
We solve the problem of existence of perfect codes in the Doob graph. It is shown that 1-perfect codes in the Doob graph D(m,n) exist if and only if 6m+3n+1 is a power of 2; that is, if the size of a 1-ball divides the number of vertices. Keywords: perfect codes, distance-regular graphs, Doob graphs, Eisenstein-Jacobi integers.
We solve the problem of existence of perfect codes in the Doob graph. It is shown that 1-perfect codes in the Doob graph D(m,n) exist if and only if 6m+3n+1 is a power of 2; that is, if the size of a 1-ball divides the number of vertices. Keywords: perfect codes, distance-regular graphs, Doob graphs, Eisenstein-Jacobi integers.
△ Less
Submitted 17 February, 2022; v1 submitted 8 October, 2018;
originally announced October 2018.
-
A new approach to the Kasami codes of type 2
Authors:
Minjia Shi,
Denis Krotov,
Patrick Solé
Abstract:
The dual of the Kasami code of length $q^2-1$, with $q$ a power of $2$, is constructed by concatenating a cyclic MDS code of length $q+1$ over $F_q$ with a Simplex code of length $q-1$. This yields a new derivation of the weight distribution of the Kasami code, a new description of its coset graph, and a new proof that the Kasami code is completely regular. The automorphism groups of the Kasami co…
▽ More
The dual of the Kasami code of length $q^2-1$, with $q$ a power of $2$, is constructed by concatenating a cyclic MDS code of length $q+1$ over $F_q$ with a Simplex code of length $q-1$. This yields a new derivation of the weight distribution of the Kasami code, a new description of its coset graph, and a new proof that the Kasami code is completely regular. The automorphism groups of the Kasami code and the related $q$-ary MDS code are determined. New cyclic completely regular codes over finite fields a power of $2$, generalized Kasami codes, are constructed; they have coset graphs isomorphic to that of the Kasami codes. Another wide class of completely regular codes, including additive codes, as well as unrestricted codes, is obtained by combining cosets of the Kasami or generalized Kasami code.
△ Less
Submitted 26 June, 2023; v1 submitted 28 September, 2018;
originally announced October 2018.
-
On $Z_pZ_{p^k}$-additive codes and their duality
Authors:
Minjia Shi,
Rongsheng Wu,
Denis S. Krotov
Abstract:
In this paper, two different Gray-like maps from $Z_p^α\times Z_{p^k}^β$, where $p$ is prime, to $Z_p^n$, $n={α+βp^{k-1}}$, denoted by $φ$ and $Φ$, respectively, are presented. We have determined the connection between the weight enumerators among the image codes under these two mappings. We show that if $C$ is a $Z_p Z_{p^k}$-additive code, and $C^\bot$ is its dual, then the weight enumerators of…
▽ More
In this paper, two different Gray-like maps from $Z_p^α\times Z_{p^k}^β$, where $p$ is prime, to $Z_p^n$, $n={α+βp^{k-1}}$, denoted by $φ$ and $Φ$, respectively, are presented. We have determined the connection between the weight enumerators among the image codes under these two mappings. We show that if $C$ is a $Z_p Z_{p^k}$-additive code, and $C^\bot$ is its dual, then the weight enumerators of the image $p$-ary codes $φ(C)$ and $Φ(C^\bot)$ are formally dual. This is a partial generalization of [On $Z_{2^k}$-dual binary codes, arXiv:math/0509325], and the result is generalized to odd characteristic $p$ and mixed alphabet. Additionally, a construction of $1$-perfect additive codes in the mixed $Z_p Z_{p^2} ... Z_{p^k}$ alphabet is given.
△ Less
Submitted 5 January, 2019; v1 submitted 31 August, 2018;
originally announced September 2018.
-
Unsupervised Learning by Competing Hidden Units
Authors:
Dmitry Krotov,
John Hopfield
Abstract:
It is widely believed that the backpropagation algorithm is essential for learning good feature detectors in early layers of artificial neural networks, so that these detectors are useful for the task performed by the higher layers of that neural network. At the same time, the traditional form of backpropagation is biologically implausible. In the present paper we propose an unusual learning rule,…
▽ More
It is widely believed that the backpropagation algorithm is essential for learning good feature detectors in early layers of artificial neural networks, so that these detectors are useful for the task performed by the higher layers of that neural network. At the same time, the traditional form of backpropagation is biologically implausible. In the present paper we propose an unusual learning rule, which has a degree of biological plausibility, and which is motivated by Hebb's idea that change of the synapse strength should be local - i.e. should depend only on the activities of the pre and post synaptic neurons. We design a learning algorithm that utilizes global inhibition in the hidden layer, and is capable of learning early feature detectors in a completely unsupervised way. These learned lower layer feature detectors can be used to train higher layer weights in a usual supervised way so that the performance of the full network is comparable to the performance of standard feedforward networks trained end-to-end with a backpropagation algorithm.
△ Less
Submitted 28 August, 2019; v1 submitted 26 June, 2018;
originally announced June 2018.
-
A new distance-regular graph of diameter 3 on 1024 vertices
Authors:
Minjia Shi,
Denis Krotov,
Patrick Solé
Abstract:
The dodecacode is a nonlinear additive quaternary code of length $12$. By puncturing it at any of the twelve coordinates, we obtain a uniformly packed code of distance $5$. In particular, this latter code is completely regular but not completely transitive. Its coset graph is distance-regular of diameter three on $2^{10}$ vertices, with new intersection array $\{33,30,15;1,2,15\}$. The automorphis…
▽ More
The dodecacode is a nonlinear additive quaternary code of length $12$. By puncturing it at any of the twelve coordinates, we obtain a uniformly packed code of distance $5$. In particular, this latter code is completely regular but not completely transitive. Its coset graph is distance-regular of diameter three on $2^{10}$ vertices, with new intersection array $\{33,30,15;1,2,15\}$. The automorphism groups of the code, and of the graph, are determined. Connecting the vertices at distance two gives a strongly regular graph of (previously known) parameters $(2^{10},495,238,240)$. Another strongly regular graph with the same parameters is constructed on the codewords of the dual code. A non trivial completely regular binary code of length $33$ is constructed.
△ Less
Submitted 5 November, 2018; v1 submitted 19 June, 2018;
originally announced June 2018.
-
Additive perfect codes in Doob graphs
Authors:
Minjia Shi,
Daitao Huang,
Denis S. Krotov
Abstract:
The Doob graph $D(m,n)$ is the Cartesian product of $m>0$ copies of the Shrikhande graph and $n$ copies of the complete graph of order $4$. Naturally, $D(m,n)$ can be represented as a Cayley graph on the additive group $(Z_4^2)^m \times (Z_2^2)^{n'} \times Z_4^{n''}$, where $n'+n''=n$. A set of vertices of $D(m,n)$ is called an additive code if it forms a subgroup of this group. We construct a…
▽ More
The Doob graph $D(m,n)$ is the Cartesian product of $m>0$ copies of the Shrikhande graph and $n$ copies of the complete graph of order $4$. Naturally, $D(m,n)$ can be represented as a Cayley graph on the additive group $(Z_4^2)^m \times (Z_2^2)^{n'} \times Z_4^{n''}$, where $n'+n''=n$. A set of vertices of $D(m,n)$ is called an additive code if it forms a subgroup of this group. We construct a $3$-parameter class of additive perfect codes in Doob graphs and show that the known necessary conditions of the existence of additive $1$-perfect codes in $D(m,n'+n'')$ are sufficient. Additionally, two quasi-cyclic additive $1$-perfect codes are constructed in $D(155,0+31)$ and $D(2667,0+127)$.
△ Less
Submitted 18 November, 2018; v1 submitted 12 June, 2018;
originally announced June 2018.
-
Dense Associative Memory is Robust to Adversarial Inputs
Authors:
Dmitry Krotov,
John J Hopfield
Abstract:
Deep neural networks (DNN) trained in a supervised way suffer from two known problems. First, the minima of the objective function used in learning correspond to data points (also known as rubbish examples or fooling images) that lack semantic similarity with the training data. Second, a clean input can be changed by a small, and often imperceptible for human vision, perturbation, so that the resu…
▽ More
Deep neural networks (DNN) trained in a supervised way suffer from two known problems. First, the minima of the objective function used in learning correspond to data points (also known as rubbish examples or fooling images) that lack semantic similarity with the training data. Second, a clean input can be changed by a small, and often imperceptible for human vision, perturbation, so that the resulting deformed input is misclassified by the network. These findings emphasize the differences between the ways DNN and humans classify patterns, and raise a question of designing learning algorithms that more accurately mimic human perception compared to the existing methods.
Our paper examines these questions within the framework of Dense Associative Memory (DAM) models. These models are defined by the energy function, with higher order (higher than quadratic) interactions between the neurons. We show that in the limit when the power of the interaction vertex in the energy function is sufficiently large, these models have the following three properties. First, the minima of the objective function are free from rubbish images, so that each minimum is a semantically meaningful pattern. Second, artificial patterns poised precisely at the decision boundary look ambiguous to human subjects and share aspects of both classes that are separated by that decision boundary. Third, adversarial images constructed by models with small power of the interaction vertex, which are equivalent to DNN with rectified linear units (ReLU), fail to transfer to and fool the models with higher order interactions. This opens up a possibility to use higher order models for detecting and stopping malicious adversarial attacks. The presented results suggest that DAM with higher order energy functions are closer to human visual perception than DNN with ReLUs.
△ Less
Submitted 4 January, 2017;
originally announced January 2017.
-
On the number of maximum independent sets in Doob graphs
Authors:
Denis Krotov
Abstract:
The Doob graph $D(m,n)$ is a distance-regular graph with the same parameters as the Hamming graph $H(2m+n,4)$. The maximum independent sets in the Doob graphs are analogs of the distance-$2$ MDS codes in the Hamming graphs. We prove that the logarithm of the number of the maximum independent sets in $D(m,n)$ grows as $2^{2m+n-1}(1+o(1))$. The main tool for the upper estimation is constructing an i…
▽ More
The Doob graph $D(m,n)$ is a distance-regular graph with the same parameters as the Hamming graph $H(2m+n,4)$. The maximum independent sets in the Doob graphs are analogs of the distance-$2$ MDS codes in the Hamming graphs. We prove that the logarithm of the number of the maximum independent sets in $D(m,n)$ grows as $2^{2m+n-1}(1+o(1))$. The main tool for the upper estimation is constructing an injective map from the class of maximum independent sets in $D(m,n)$ to the class of distance-$2$ MDS codes in $H(2m+n,4)$.
△ Less
Submitted 30 November, 2016;
originally announced December 2016.
-
Dense Associative Memory for Pattern Recognition
Authors:
Dmitry Krotov,
John J Hopfield
Abstract:
A model of associative memory is studied, which stores and reliably retrieves many more patterns than the number of neurons in the network. We propose a simple duality between this dense associative memory and neural networks commonly used in deep learning. On the associative memory side of this duality, a family of models that smoothly interpolates between two limiting cases can be constructed. O…
▽ More
A model of associative memory is studied, which stores and reliably retrieves many more patterns than the number of neurons in the network. We propose a simple duality between this dense associative memory and neural networks commonly used in deep learning. On the associative memory side of this duality, a family of models that smoothly interpolates between two limiting cases can be constructed. One limit is referred to as the feature-matching mode of pattern recognition, and the other one as the prototype regime. On the deep learning side of the duality, this family corresponds to feedforward neural networks with one hidden layer and various activation functions, which transmit the activities of the visible neurons to the hidden layer. This family of activation functions includes logistics, rectified linear units, and rectified polynomials of higher degrees. The proposed duality makes it possible to apply energy-based intuition from associative memory to analyze computational properties of neural networks with unusual activation functions - the higher rectified polynomials which until now have not been used in deep learning. The utility of the dense memories is illustrated for two test cases: the logical gate XOR and the recognition of handwritten digits from the MNIST data set.
△ Less
Submitted 27 September, 2016; v1 submitted 3 June, 2016;
originally announced June 2016.
-
On the Automorphism Groups of the Z2Z4-Linear 1-Perfect and Preparata-Like Codes
Authors:
Denis Krotov
Abstract:
We consider the symmetry group of a $Z_2Z_4$-linear code with parameters of a $1$-perfect, extended $1$-perfect, or Preparata-like code. We show that, provided the code length is greater than $16$, this group consists only of symmetries that preserve the $Z_2Z_4$ structure. We find the orders of the symmetry groups of the $Z_2Z_4$-linear (extended) $1$-perfect codes. Keywords: additive codes,…
▽ More
We consider the symmetry group of a $Z_2Z_4$-linear code with parameters of a $1$-perfect, extended $1$-perfect, or Preparata-like code. We show that, provided the code length is greater than $16$, this group consists only of symmetries that preserve the $Z_2Z_4$ structure. We find the orders of the symmetry groups of the $Z_2Z_4$-linear (extended) $1$-perfect codes. Keywords: additive codes, $Z_2Z_4$-linear codes, $1$-perfect codes, Preparata-like codes, automorphism group, symmetry group.
△ Less
Submitted 29 January, 2016;
originally announced February 2016.
-
MDS codes in the Doob graphs
Authors:
Evgeny Bespalov,
Denis Krotov
Abstract:
The Doob graph $D(m,n)$, where $m>0$, is the direct product of $m$ copies of The Shrikhande graph and $n$ copies of the complete graph $K_4$ on $4$ vertices. The Doob graph $D(m,n)$ is a distance-regular graph with the same parameters as the Hamming graph $H(2m+n,4)$. In this paper we consider MDS codes in Doob graphs with code distance $d \ge 3$. We prove that if $2m+n>6$ and $2<d<2m+n$, then the…
▽ More
The Doob graph $D(m,n)$, where $m>0$, is the direct product of $m$ copies of The Shrikhande graph and $n$ copies of the complete graph $K_4$ on $4$ vertices. The Doob graph $D(m,n)$ is a distance-regular graph with the same parameters as the Hamming graph $H(2m+n,4)$. In this paper we consider MDS codes in Doob graphs with code distance $d \ge 3$. We prove that if $2m+n>6$ and $2<d<2m+n$, then there are no MDS codes with code distance $d$. We characterize all MDS codes with code distance $d \ge 3$ in Doob graphs $D(m,n)$ when $2m+n \le 6$. We characterize all MDS codes in $D(m,n)$ with code distance $d=2m+n$ for all values of $m$ and $n$.
△ Less
Submitted 10 December, 2015;
originally announced December 2015.