Search | arXiv e-print repository

The Length and the Width of the Human Brain Circuit Connections are Strongly Correlated

Abstract: The correlations of several fundamental properties of human brain connections are investigated in a consensus connectome, constructed from 1064 braingraphs, each on 1015 vertices, corresponding to 1015 anatomical brain areas. The properties examined include the edge length, the fiber number, or edge width, meaning the number of discovered axon bundles forming the edge and the occurrence number of… ▽ More The correlations of several fundamental properties of human brain connections are investigated in a consensus connectome, constructed from 1064 braingraphs, each on 1015 vertices, corresponding to 1015 anatomical brain areas. The properties examined include the edge length, the fiber number, or edge width, meaning the number of discovered axon bundles forming the edge and the occurrence number of the edge, meaning the number of individual braingraphs where the edge exists. By using our previously published robust braingraphs at \url{https://braingraph.org}, we have prepared a single consensus graph from the data and compared the statistical similarity of the edge occurrence numbers, edge lengths, and fiber counts of the edges. We have found a strong positive Spearman correlation between the edge occurrence numbers and the fiber count numbers, showing that statistically, the most frequent cerebral connections have the largest widths, i.e., the fiber number. We have found a negative Spearman correlation between the fiber lengths and fiber counts, showing that, typically, the shortest edges are the widest or strongest by their fiber counts. We have also found a negative Spearman correlation between the occurrence numbers and the edge lengths: it shows that typically, the long edges are infrequent, and the frequent edges are short. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2312.05791 [pdf]

Gluing GAP to RAS Mutants: A New Approach to an Old Problem in Cancer Drug Development

Authors: Ivan Ranđelović, Kinga Nyíri, Gergely Koppány, Marcel Baranyi, József Tóvári, Attila Kigyós, József Timár, Beáta G. Vértessy, Vince Grolmusz

Abstract: Mutated genes may lead to cancer development in numerous tissues. While more than 600 cancer-causing genes are known today, some of the most widespread mutations are connected to the RAS gene: RAS mutations are found in approximately 25% of all human tumors. Specifically, KRAS mutations are involved in the three most lethal cancers in U.S.: pancreatic ductal adenocarcinoma, colorectal adenocarcino… ▽ More Mutated genes may lead to cancer development in numerous tissues. While more than 600 cancer-causing genes are known today, some of the most widespread mutations are connected to the RAS gene: RAS mutations are found in approximately 25% of all human tumors. Specifically, KRAS mutations are involved in the three most lethal cancers in U.S.: pancreatic ductal adenocarcinoma, colorectal adenocarcinoma, and lung adenocarcinoma. These cancers are among the most difficult to treat, and they are frequently excluded from chemotherapeutic attacks as hopeless cases. The mutated KRAS proteins have specific 3-dimensional conformations, which perturb functional interaction with the GAP protein on the GAP:RAS complex surface leading to a signaling cascade and uncontrolled cell growth. Here we describe a gluing docking method for finding small molecules that bind to both the GAP and the mutated KRAS molecules. These small molecules glue together the GAP and the mutated KRAS molecules and may serve as new cancer drugs for the most lethal, most difficult-to-treat carcinomas. As a proof of concept, we identify two new, drug-like small molecules with the new method: these compounds specifically inhibit the growth of PANC-1 cell line with KRAS mutation G12D in vitro and in vivo. Importantly, the two new compounds show significantly lower IC50 and higher specificity against the G12D KRAS mutant as compared to the recently described MRTX-1133 inhibitor against the G12D KRAS mutant. △ Less

Submitted 10 December, 2023; originally announced December 2023.

arXiv:2309.03624 [pdf, other]

Navigating Homogeneous Paths through Amyloidogenic and Non-Amyloidogenic Hexapeptides

Authors: Laszlo Keresztes, Evelin Szogi, Balint Varga, Viktor Farkas, Andras Perczel, Vince Grolmusz

Abstract: Hexapeptides are increasingly applied as model systems for studying the amyloidogenecity properties of oligo- and polypeptides. It is possible to construct 64 million different hexapeptides from the twenty proteinogenic amino acid residues. Today's experimental amyloid databases contain only a fraction of these annotated hexapeptides. For labeling all the possible hexapeptides as "amyloidogenic" o… ▽ More Hexapeptides are increasingly applied as model systems for studying the amyloidogenecity properties of oligo- and polypeptides. It is possible to construct 64 million different hexapeptides from the twenty proteinogenic amino acid residues. Today's experimental amyloid databases contain only a fraction of these annotated hexapeptides. For labeling all the possible hexapeptides as "amyloidogenic" or "non-amyloidogenic" there exist several computational predictors with good accuracies. It may be of interest to define and study a simple graph structure on the 64 million hexapeptides as nodes when two hexapeptides are connected by an edge if they differ by only a single residue. For example, in this graph, HIKKLM is connected to AIKKLM, or HIKKNM, or HIKKLC, but it is not connected with an edge to VVKKLM or HIKNPM. In the present contribution, we consider our previously published artificial intelligence-based tool, the Budapest Amyloid Predictor (BAP for short), and demonstrate a spectacular property of this predictor in the graph defined above. We show that for any two hexapeptides predicted to be "amyloidogenic" by the BAP predictor, there exists an easily constructible path of length at most 6 that passes through neighboring hexapeptides all predicted to be "amyloidogenic" by BAP. For example, the predicted amyloidogenic ILVWIW and FWLCYL hexapeptides can be connected through the length-6 path ILVWIW-IWVWIW-IWVCIW-IWVCIL-FWVCIL-FWLCIL-FWLCYL in such a way that the neighbors differ in exactly one residue, and all hexapeptides on the path are predicted to be amyloidogenic by BAP. The symmetric statement also holds for non-amyloidogenic hexapeptides. It is noted that the mentioned property of the Budapest Amyloid Predictor \url{https://pitgroup.org/bap} is not proprietary; it is also true for any linear Support Vector Machine (SVM)-based predictors. △ Less

Submitted 7 September, 2023; originally announced September 2023.

arXiv:2308.15914 [pdf]

Novel enzymes for biodegradation of polycyclic aromatic hydrocarbons: metagenomics-linked identification followed by functional analysis

Authors: Kinga K. Nagy, Kristóf Takács, Imre Németh, Bálint Varga, Vince Grolmusz, Mónika Molnár, Beáta G. Vértessy

Abstract: Polycyclic aromatic hydrocarbons (PAHs) are highly toxic, carcinogenic substances. On soils contaminated with PAHs, crop cultivation, animal husbandry and even the survival of microflora in the soil are greatly perturbed, depending on the degree of contamination. Most microorganisms cannot tolerate PAH-contaminated soils, however, some microbial strains can adapt to these harsh conditions and surv… ▽ More Polycyclic aromatic hydrocarbons (PAHs) are highly toxic, carcinogenic substances. On soils contaminated with PAHs, crop cultivation, animal husbandry and even the survival of microflora in the soil are greatly perturbed, depending on the degree of contamination. Most microorganisms cannot tolerate PAH-contaminated soils, however, some microbial strains can adapt to these harsh conditions and survive on contaminated soils. Analysis of the metagenomes of contaminated environmental samples may lead to discovery of PAH-degrading enzymes suitable for green biotechnology methodologies ranging from biocatalysis to pollution control. In the present study, our goal was to apply a metagenomic data search to identify efficient novel enzymes in remediation of PAH-contaminated soils. The metagenomic hits were further analyzed using a set of bioinformatics tools to select protein sequences predicted to encode well-folded soluble enzymes. Three novel enzymes (two dioxygenases and one peroxidase) were cloned and used in soil remediation microcosms experiments. The novel enzymes were found to be efficient for degradation of naphthalene and phenanthrene. Adding the inorganic oxidant CaO2 further increased the degrading potential of the novel enzymes for anthracene and pyrene. We conclude that metagenome mining paired with bioinformatic predictions, structural modelling and functional assays constitutes a powerful approach towards novel enzymes for soil remediation. △ Less

Submitted 30 August, 2023; originally announced August 2023.

arXiv:2212.00168 [pdf, other]

Robust Circuitry-Based Scores of Structural Importance of Human Brain Areas

Authors: Daniel Hegedus, Vince Grolmusz

Abstract: We consider the 1015-vertex human consensus connectome computed from the diffusion MRI data of 1064 subjects. We define seven different orders on these 1015 graph vertices, where the orders depend on parameters derived from the brain circuitry, that is, from the properties of the edges (or connections) incident to the vertices ordered. We order the vertices according to their degree, the sum, the… ▽ More We consider the 1015-vertex human consensus connectome computed from the diffusion MRI data of 1064 subjects. We define seven different orders on these 1015 graph vertices, where the orders depend on parameters derived from the brain circuitry, that is, from the properties of the edges (or connections) incident to the vertices ordered. We order the vertices according to their degree, the sum, the maximum, and the average of the fiber counts on the incident edges, and the sum, the maximum and the average length of the fibers in the incident edges. We analyze the similarities of these seven orders by the Spearman correlation coefficient and by their inversion numbers and have found that all of these seven orders have great similarities. In other words, if we interpret the orders as scoring of the importance of the vertices in the consensus connectome, then the scores of the vertices will be similar in all seven orderings. That is, important vertices of the human connectome typically have many neighbors, connected with long and thick axonal fibers (where thickness is measured by fiber numbers), and their incident edges have high maximum and average values of length and fiber-number parameters, too. Therefore, these parameters may yield robust ways of deciding which vertices are more important in the anatomy of our brain circuitry than the others. △ Less

Submitted 30 November, 2022; originally announced December 2022.

arXiv:2210.11842 [pdf, other]

Opening Amyloid-Windows to the Secondary Structure of Proteins: The Amyloidogenecity Increases Tenfold Inside Beta-Sheets

Authors: Kristof Takacs, Balint Varga, Viktor Farkas, Andras Perczel, Vince Grolmusz

Abstract: Methods from artificial intelligence (AI), in general, and machine learning, in particular, have kept conquering new territories in numerous areas of science. Most of the applications of these techniques are restricted to the classification of large data sets, but new scientific knowledge can seldom be inferred from these tools. Here we show that an AI-based amyloidogenecity predictor can strongly… ▽ More Methods from artificial intelligence (AI), in general, and machine learning, in particular, have kept conquering new territories in numerous areas of science. Most of the applications of these techniques are restricted to the classification of large data sets, but new scientific knowledge can seldom be inferred from these tools. Here we show that an AI-based amyloidogenecity predictor can strongly differentiate the border- and the internal hexamers of $β$-pleated sheets when screening all the Protein Data Bank-deposited homology-filtered protein structures. Our main result shows that more than 30\% of internal hexamers of $β$ sheets are predicted to be amyloidogenic, while just outside the border regions, only 3\% are predicted as such. This result may elucidate a general protection mechanism of proteins against turning into amyloids: if the borders of $β$-sheets were amyloidogenic, then the whole $β$ sheet could turn more easily into an insoluble amyloid-structure, characterized by periodically repeated parallel $β$-sheets. We also present that no analogous phenomenon exists on the borders of $α$-helices or randomly chosen subsequences of the studied protein structures. △ Less

Submitted 21 October, 2022; originally announced October 2022.

arXiv:2202.14031 [pdf, other]

Succinct Amyloid and Non-Amyloid Patterns in Hexapeptides

Authors: Laszlo Keresztes, Evelin Szogi, Balint Varga, Viktor Farkas, Andras Perczel, Vince Grolmusz

Abstract: Hexapeptides are widely applied as a model system for studying amyloid-forming properties of polypeptides, including proteins. Recently, large experimental databases have become publicly available with amyloidogenic labels. Using these datasets for training and testing purposes, one may build artificial intelligence (AI)-based classifiers for predicting the amyloid state of peptides. In our previo… ▽ More Hexapeptides are widely applied as a model system for studying amyloid-forming properties of polypeptides, including proteins. Recently, large experimental databases have become publicly available with amyloidogenic labels. Using these datasets for training and testing purposes, one may build artificial intelligence (AI)-based classifiers for predicting the amyloid state of peptides. In our previous work (Biomolecules, 11(4) 500, (2021)) we described the Support Vector Machine (SVM)-based Budapest Amyloid Predictor (\url{https://pitgroup.org/bap}). Here we apply the Budapest Amyloid Predictor for discovering numerous amyloidogenic and non-amyloidogenic hexapeptide patterns with accuracy between 80\% and 84\%, as surprising and succinct novel rules for further understanding the amyloid state of peptides. For example, we have shown that for any independently mutated residue (position marked by ``x''), the patterns CxFLWx, FxFLFx, or xxIVIV are predicted to be amyloidogenic, while those of PxDxxx, xxKxEx, and xxPQxx non-amyloidogenic at all. We note that each amyloidogenic pattern with two x's (e.g.,CxFLWx) describes succinctly $20^2=400$ hexapeptides, while the non-amyloidogenic patterns comprising four point mutations (e.g.,PxDxxx) gives $20^4=160,000$ hexapeptides in total. To our knowledge, no similar applications of artificial intelligence tools or succinct amyloid patterns were described before the present work. △ Less

Submitted 28 February, 2022; originally announced February 2022.

arXiv:2107.01699 [pdf, other]

Discovering Sex and Age Implicator Edges in the Human Connectome

Authors: Laszlo Keresztes, Evelin Szogi, Balint Varga, Vince Grolmusz

Abstract: Determining important vertices in large graphs (e.g., Google's PageRank in the case of the graph of the World Wide Web) facilitated the construction of excellent web search engines, returning the most important hits corresponding to the submitted user queries. Interestingly, finding important edges -- instead of vertices -- in large graphs has received much less attention until now. Here we examin… ▽ More Determining important vertices in large graphs (e.g., Google's PageRank in the case of the graph of the World Wide Web) facilitated the construction of excellent web search engines, returning the most important hits corresponding to the submitted user queries. Interestingly, finding important edges -- instead of vertices -- in large graphs has received much less attention until now. Here we examine the human structural braingraph (or connectome), identified by diffusion magnetic resonance imaging (dMRI) methods, with edges connecting cortical and subcortical gray matter areas and weighted by fiber strengths, measured by the number of the discovered fiber tracts along the edge. We identify several "single" important edges in these braingraphs, whose high or low weights imply the sex or the age of the subject observed. We call these edges implicator edges since solely from their weight, one can infer the sex of the subject with more than 67 \% accuracy or their age group with more than 62\% accuracy. We argue that these brain connections are the most important ones characterizing the sex or the age of the subjects. Surprisingly, the edges implying the male sex are mostly located in the anterior parts of the brain, while those implying the female sex are mostly in the posterior regions. Additionally, most of the inter-hemispheric implicator edges are male ones, while the intra-hemispheric ones are predominantly female edges. Our pioneering method for finding the sex- or age implicator edges can also be applied for characterizing other biological and medical properties, including neurodegenerative- and psychiatric diseases besides the sex or the age of the subject, if large and high-quality neuroimaging datasets become available. △ Less

Submitted 4 July, 2021; originally announced July 2021.

arXiv:2011.03759 [pdf, other]

The Budapest Amyloid Predictor and its Applications

Authors: Laszlo Keresztes, Evelin Szogi, Balint Varga, Viktor Farkas, Andras Perczel, Vince Grolmusz

Abstract: The amyloid state of proteins is widely studied with relevancy in neurology, biochemistry, and biotechnology. In contrast with amorphous aggregation, the amyloid state has a well-defined structure, consisting of parallel and anti-parallel $β$-sheets in a periodically repeated formation. The understanding of the amyloid state is growing with the development of novel molecular imaging tools, like cr… ▽ More The amyloid state of proteins is widely studied with relevancy in neurology, biochemistry, and biotechnology. In contrast with amorphous aggregation, the amyloid state has a well-defined structure, consisting of parallel and anti-parallel $β$-sheets in a periodically repeated formation. The understanding of the amyloid state is growing with the development of novel molecular imaging tools, like cryogenic electron microscopy. Sequence-based amyloid predictors were developed by using mostly artificial neural networks (ANNs) as the underlying computational techniques. From a good neural network-based predictor, it is a very difficult task to identify those attributes of the input amino acid sequence, which implied the decision of the network. Here we present a Support Vector Machine (SVM)-based predictor for hexapeptides with correctness higher than 84\%, i.e., it is at least as good as the published ANN-based tools. Unlike the artificial neural networks, the decision of the SVMs are much easier to analyze, and from a good predictor, we can infer rich biochemical knowledge. Availability and Implementation: The Budapest Amyloid Predictor webserver is freely available at https://pitgroup.org/bap. △ Less

Submitted 7 November, 2020; originally announced November 2020.

arXiv:2010.09568 [pdf, other]

Introducing and Applying Newtonian Blurring: An Augmented Dataset of 126,000 Human Connectomes at braingraph.org

Authors: Laszlo Keresztes, Evelin Szogi, Balint Varga, Vince Grolmusz

Abstract: Gaussian blurring is a well-established method for image data augmentation: it may generate a large set of images from a small set of pictures for training and testing purposes for Artificial Intelligence (AI) applications. When we apply AI for non-imagelike biological data, hardly any related method exists. Here we introduce the "Newtonian blurring" in human braingraph (or connectome) augmentatio… ▽ More Gaussian blurring is a well-established method for image data augmentation: it may generate a large set of images from a small set of pictures for training and testing purposes for Artificial Intelligence (AI) applications. When we apply AI for non-imagelike biological data, hardly any related method exists. Here we introduce the "Newtonian blurring" in human braingraph (or connectome) augmentation: Started from a dataset of 1053 subjects, we first repeat a probabilistic weighted braingraph construction algorithm 10 times for describing the connections of distinct cerebral areas, then take 7 repetitions in every possible way, delete the lower and upper extremes, and average the remaining 7-2=5 edge-weights for the data of each subject. This way we augment the 1053 graph-set to 120 x 1053 = 126,360 graphs. In augmentation techniques, it is an important requirement that no artificial additions should be introduced into the dataset. Gaussian blurring and also this Newtonian blurring satisfy this goal. The resulting dataset of 126,360 graphs, each in 5 resolutions (i.e., 631,800 graphs in total), is freely available at the site https://braingraph.org/cms/download-pit-group-connectomes/. Augmenting with Newtonian blurring may also be applicable in other non-image related fields, where probabilistic processing and data averaging are implemented. △ Less

Submitted 21 October, 2020; v1 submitted 19 October, 2020; originally announced October 2020.

arXiv:2008.13273 [pdf, other]

The braingraph.org Database with more than 1000 Robust Human Structural Connectomes in Five Resolutions

Authors: Balint Varga, Vince Grolmusz

Abstract: The human brain is the most complex object of study we encounter today. Mapping the neuronal-level connections between the more than 80 billion neurons in the brain is a hopeless task for science. By the recent advancement of magnetic resonance imaging (MRI), we are able to map the macroscopic connections between about 1000 brain areas. The MRI data acquisition and the subsequent algorithmic workf… ▽ More The human brain is the most complex object of study we encounter today. Mapping the neuronal-level connections between the more than 80 billion neurons in the brain is a hopeless task for science. By the recent advancement of magnetic resonance imaging (MRI), we are able to map the macroscopic connections between about 1000 brain areas. The MRI data acquisition and the subsequent algorithmic workflow contain several complex steps, where errors can occur. In the present contribution, we describe and publish 1064 human connectomes, computed from the public release of the Human Connectome Project. Each connectome is available in 5 resolutions, with 83, 129, 234, 463, and 1015 anatomically labeled nodes. For error correction, we follow an averaging and extreme value deleting strategy for each edge and for each connectome. The resulting 5320 braingraphs can be downloaded from the \url{https://braingraph.org} site. This dataset makes possible the access to these graphs for scientists unfamiliar with neuroimaging- and connectome-related tools: mathematicians, physicists, and engineers can use their expertize and ideas in the analysis of the connections of the human brain. Brain scientists also have a robust and large, multi-resolution set for connectomical studies. △ Less

Submitted 30 August, 2020; originally announced August 2020.

arXiv:2003.02942 [pdf, other]

On the Border of the Amyloidogenic Sequences: Prefix Analysis of the Parallel Beta Sheets in the PDB\_Amyloid Collection

Authors: Kristof Takacs, Vince Grolmusz

Abstract: The Protein Data Bank (PDB) today contains more than 153,000 entries with the 3-dimensional structures of biological macromolecules. Using the rich resources of this repository, it is possible identifying subsets with specific, interesting properties for different applications. Our research group prepared an automatically updated list of amyloid- and probably amyloidogenic molecules, the PDB\_Amyl… ▽ More The Protein Data Bank (PDB) today contains more than 153,000 entries with the 3-dimensional structures of biological macromolecules. Using the rich resources of this repository, it is possible identifying subsets with specific, interesting properties for different applications. Our research group prepared an automatically updated list of amyloid- and probably amyloidogenic molecules, the PDB\_Amyloid collection, which is freely available at the address \url{http://pitgroup.org/amyloid}. This resource applies exclusively the geometric properties of the steric structures for identifying amyloids. In the present contribution, we analyze the starting (i.e., prefix) subsequences of the characteristic, parallel beta-sheets of the structures in the PDB\_Amyloid collection, and identify further appearances of these length-5 prefix subsequences in the whole PDB data set. We have identified this way numerous proteins, whose normal or irregular functions involve amyloid formation, structural misfolding, or anti-coagulant properties, simply by containing these prefixes: including the T-cell receptor (TCR), bound with the major histocompatibility complexes MHC-1 and MHC-2; the p53 tumor suppressor protein; a mycobacterial RNA polymerase transcription initialization complex; the human bridging integrator protein BIN-1; and the tick anti-coagulant peptide TAP. △ Less

Submitted 5 March, 2020; originally announced March 2020.

arXiv:1912.02291 [pdf, other]

Identifying Super-Feminine, Super-Masculine and Sex-Defining Connections in the Human Braingraph

Authors: Laszlo Keresztes, Evelin Szogi, Balint Varga, Vince Grolmusz

Abstract: For more than a decade now, we can discover and study thousands of cerebral connections with the application of diffusion magnetic resonance imaging (dMRI) techniques and the accompanying algorithmic workflow. While numerous connectomical results were published enlightening the relation between the braingraph and certain biological, medical, and psychological properties, it is still a great challe… ▽ More For more than a decade now, we can discover and study thousands of cerebral connections with the application of diffusion magnetic resonance imaging (dMRI) techniques and the accompanying algorithmic workflow. While numerous connectomical results were published enlightening the relation between the braingraph and certain biological, medical, and psychological properties, it is still a great challenge to identify a small number of brain connections, closely related to those conditions. In the present contribution, by applying the 1200 Subjects Release of the Human Connectome Project (HCP), we identify just 102 connections out of the total number of 1950 connections in the 83-vertex graphs of 1065 subjects, which -- by a simple linear test -- precisely, without any error determine the sex of the subject. Very surprisingly, we were able to identify two graph edges out of these 102, if, whose weights, measured in fiber numbers, are all high, then the connectome always belongs to a female subject, independently of the other edges. Similarly, we have identified 3 edges from these 102, whose weights, if two of them are high and one is low, imply that the graph belongs to a male subject -- again, independently of the other edges. We call the former 2 edges superfeminine and the first two of the 3 edges supermasculine edges of the human connectome. Even more interestingly, one of the edges, connecting the right Pars Triangularis and the right Superior Parietal areas, is one of the 2 superfeminine edges, and it is also the third edge, accompanying the two supermasculine connections, if its weight is low; therefore it is also a "switching" connection. △ Less

Submitted 4 December, 2019; originally announced December 2019.

arXiv:1907.09586 [pdf, ps, other]

Good Neighbors, Bad Neighbors: The Frequent Network Neighborhood Mapping of the Hippocampus Enlightens Several Structural Factors of the Human Intelligence on a 414-Subject Cohort

Authors: Mate Fellner, Balint Varga, Vince Grolmusz

Abstract: The human connectome has become the very frequent subject of study of brain-scientists, psychologists, and imaging experts in the last decade. With diffusion magnetic resonance imaging techniques, unified with advanced data processing algorithms, today we are able to compute braingraphs with several hundred, anatomically identified nodes and thousands of edges, corresponding to the anatomical conn… ▽ More The human connectome has become the very frequent subject of study of brain-scientists, psychologists, and imaging experts in the last decade. With diffusion magnetic resonance imaging techniques, unified with advanced data processing algorithms, today we are able to compute braingraphs with several hundred, anatomically identified nodes and thousands of edges, corresponding to the anatomical connections of the brain. The analysis of these graphs without refined mathematical tools is hopeless. These tools need to address the high error rate of the MRI processing workflow, and need to find structural causes or at least correlations of psychological properties and cerebral connections. Until now, structural connectomics was only rarely able identifying such causes or correlations. In the present work, we study the frequent neighbor sets of the most deeply investigated brain area, the hippocampus. By applying the Frequent Network Neighborhood mapping method, we identified frequent neighbor-sets of the hippocampus, which may influence numerous psychological parameters, including intelligence-related ones. We have found neighbor sets, which have significantly higher frequency in subjects with high-scored Penn Matrix tests, and with low-scored Penn Word Memory tests. Our study utilizes the braingraphs, computed from the imaging data of the Human Connectome Project's 414 subjects, each with 463 anatomically identified nodes. △ Less

Submitted 22 July, 2019; originally announced July 2019.

arXiv:1903.05979 [pdf, other]

doi 10.1371/journal.pone.0236883

The Frequent Complete Subgraphs in the Human Connectome

Authors: Mate Fellner, Balint Varga, Vince Grolmusz

Abstract: While it is still not possible to describe the neural-level connections of the human brain, we can map the human connectome with several hundred vertices, by the application of diffusion-MRI based techniques. In these graphs, the nodes correspond to anatomically identified gray matter areas of the brain, while the edges correspond to the axonal fibers, connecting these areas. In our previous contr… ▽ More While it is still not possible to describe the neural-level connections of the human brain, we can map the human connectome with several hundred vertices, by the application of diffusion-MRI based techniques. In these graphs, the nodes correspond to anatomically identified gray matter areas of the brain, while the edges correspond to the axonal fibers, connecting these areas. In our previous contributions, we have described numerous graph-theoretical phenomena of the human connectomes. Here we map the frequent complete subgraphs of the human brain networks: in these subgraphs, every pair of vertices is connected by an edge. We also examine sex differences in the results. The mapping of the frequent subgraphs gives robust substructures in the graph: if a subgraph is present in the 80% of the graphs, then, most probably, it could not be an artifact of the measurement or the data processing workflow. We list here the frequent complete subgraphs of the human braingraphs of 414 subjects, each with 463 nodes, with a frequency threshold of 80%, and identify 812 complete subgraphs, which are more frequent in male and 224 complete subgraphs, which are more frequent in female connectomes. △ Less

Submitted 27 March, 2019; v1 submitted 14 March, 2019; originally announced March 2019.

arXiv:1811.07423 [pdf, ps, other]

The Frequent Network Neighborhood Mapping of the Human Hippocampus Shows Much More Frequent Neighbor Sets in Males Than in Females

Authors: Mate Fellner, Balint Varga, Vince Grolmusz

Abstract: In the study of the human connectome, the vertices and the edges of the network of the human brain are analyzed: the vertices of the graphs are the anatomically identified gray matter areas of the subjects; this set is exactly the same for all the subjects. The edges of the graphs correspond to the axonal fibers, connecting these areas. In the biological applications of graph theory, it happens ve… ▽ More In the study of the human connectome, the vertices and the edges of the network of the human brain are analyzed: the vertices of the graphs are the anatomically identified gray matter areas of the subjects; this set is exactly the same for all the subjects. The edges of the graphs correspond to the axonal fibers, connecting these areas. In the biological applications of graph theory, it happens very rarely that scientists examine numerous large graphs on the very same, labeled vertex set. Exactly this is the case in the study of the connectomes. Because of the particularity of these sets of graphs, novel, robust methods need to be developed for their analysis. Here we introduce the new method of the Frequent Network Neighborhood Mapping for the connectome, which serves as a robust identification of the neighborhoods of given vertices of special interest in the graph. We apply the novel method for mapping the neighborhoods of the human hippocampus and discover strong statistical asymmetries between the connectomes of the sexes, computed from the Human Connectome Project. We analyze 413 braingraphs, each with 463 nodes. We show that the hippocampi of men have much more significantly frequent neighbor sets than women; therefore, in a sense, the connections of the hippocampi are more regularly distributed in men and more varied in women. Our results are in contrast to the volumetric studies of the human hippocampus, where it was shown that the relative volume of the hippocampus is the same in men and women. △ Less

Submitted 23 November, 2018; v1 submitted 18 November, 2018; originally announced November 2018.

arXiv:1805.09758 [pdf, other]

PDB_Amyloid: The Extended Live Amyloid Structure List from the PDB

Authors: Kristof Takacs, Balint Varga, Vince Grolmusz

Abstract: The Protein Data Bank (PDB) contains more than 135 000 entries today. From these, relatively few amyloid structures can be identified, since amyloids are insoluble in water. Therefore, mostly solid state NMR-recorded amyloid structures are deposited in the PDB. Based on the geometric analysis of these deposited structures we have prepared an automatically updated webserver, which generates the lis… ▽ More The Protein Data Bank (PDB) contains more than 135 000 entries today. From these, relatively few amyloid structures can be identified, since amyloids are insoluble in water. Therefore, mostly solid state NMR-recorded amyloid structures are deposited in the PDB. Based on the geometric analysis of these deposited structures we have prepared an automatically updated webserver, which generates the list of the deposited amyloid structures, and, additionally, those globular protein entries, which have amyloid-like substructures of a given size and characteristics. We have found that applying only the properly chosen geometric conditions, it is possible to identify the deposited amyloid structures, and a number of globular proteins with amyloid-like substructures. We have analyzed these globular proteins and have found that many of them are known to form amyloids more easily than many other globular proteins. Our results relate to the method of (Stankovic, I. et al. (2017): Construction of Amyloid PDB Files Database. Transactions on Internet Research. 13 (1): 47-51), who have applied a hybrid textual-search and geometric approach for finding amyloids in the PDB. If one intends to identify a subset of the PDB for some applications, the identification algorithm needs to be re-run periodically, since in 2017, on average, every day 30 new entries were deposited in the data bank. Our webserver is updated regularly and automatically, and the identified amyloid- and partial amyloid structures can be viewed or their list can be downloaded from the site https://pitgroup.org/amyloid. △ Less

Submitted 25 May, 2018; v1 submitted 24 May, 2018; originally announced May 2018.

arXiv:1711.11314 [pdf, other]

The Frequent Subgraphs of the Connectome of the Human Brain

Authors: Mate Fellner, Balint Varga, Vince Grolmusz

Abstract: In mapping the human structural connectome, we are in a very fortunate situation: one can compute and compare graphs, describing the cerebral connections between the very same, anatomically identified small regions of the gray matter among hundreds of human subjects. The comparison of these graphs has led to numerous recent results, as the (i) discovery that women's connectomes have deeper and ric… ▽ More In mapping the human structural connectome, we are in a very fortunate situation: one can compute and compare graphs, describing the cerebral connections between the very same, anatomically identified small regions of the gray matter among hundreds of human subjects. The comparison of these graphs has led to numerous recent results, as the (i) discovery that women's connectomes have deeper and richer connectivity-related graph parameters like those of men, or (ii) the description of more and less conservatively connected lobes and cerebral regions, and (iii) the discovery of the phenomenon of the Consensus Connectome Dynamics. Today one of the greatest challenges of brain science is the description and modeling of the circuitry of the human brain. For this goal, we need to identify sub-circuits that are present in almost all human subjects and those, which are much less frequent: the former sub-circuits most probably have functions with general importance, the latter sub-circuits are probably related to the individual variability of the brain structure and functions. The present contribution describes the frequent connected subgraphs (instead of sub-circuits) of at most 6 edges in the human brain. We analyze these frequent graphs and also examine sex differences in these graphs: we demonstrate numerous connected sub-graphs that are more frequent in female or the male connectome. While our results describe subgraphs, instead of sub-circuits, we need to note that all macroscopic sub-circuits correspond to an underlying connected subgraph. Our data source is the public release of the Human Connectome Project, and we are applying the data of 426 human subjects in this study. △ Less

Submitted 30 November, 2017; originally announced November 2017.

arXiv:1710.10995 [pdf, ps, other]

MetaHMM: A Webserver for Identifying Novel Genes with Specified Functions in Metagenomic Samples

Authors: Balazs Szalkai, Vince Grolmusz

Abstract: The fast and affordable sequencing of large clinical and environmental metagenomic datasets opens up new horizons in medical and biotechnological applications. It is believed that today we have described only about 1\% of the microorganisms on the Earth, therefore, metagenomic analysis mostly deals with unknown species in the samples. Microbial communities in extreme environments may contain genes… ▽ More The fast and affordable sequencing of large clinical and environmental metagenomic datasets opens up new horizons in medical and biotechnological applications. It is believed that today we have described only about 1\% of the microorganisms on the Earth, therefore, metagenomic analysis mostly deals with unknown species in the samples. Microbial communities in extreme environments may contain genes with high biotechnological potential, and clinical metagenomes, related to diseases, may uncover still unknown pathogens and pathological mechanisms in known diseases. While the species-level identification and description of the taxa in the samples does not seem to be possible today, we can search for novel genes with known functions in these samples, using numerous techniques, including artificial intelligence tools, like the hidden Markov models (HMMs). Here we describe a simple-to-use webserver, the MetaHMM, which is capable of homology-based automatic model-building for the genes to be searched for, and it also finds the closest matches in the metagenome. The webserver uses already highly successful building blocks: it performs multiple alignment by applying Clustal Omega, builds a hidden Markov model with HMMER components of hmmbuild and uses hmmsearch for finding similar sequences to the specified model in the metagenomes. The webserver is publicly available at \url{https://metahmm.pitgroup.org}. △ Less

Submitted 30 October, 2017; originally announced October 2017.

arXiv:1709.09850 [pdf, ps, other]

SCARF: A Biomedical Association Rule Finding Webserver

Authors: Balazs Szalkai, Vince Grolmusz

Abstract: The analysis of enormous datasets with missing data entries is a standard task in biological and medical data processing. Large-scale, multi-institution clinical studies are the typical examples of such datasets. These sets make possible the search for multi-parametric relations since from the plenty of the data one is likely to find a satisfying number of subjects with the required parameter ense… ▽ More The analysis of enormous datasets with missing data entries is a standard task in biological and medical data processing. Large-scale, multi-institution clinical studies are the typical examples of such datasets. These sets make possible the search for multi-parametric relations since from the plenty of the data one is likely to find a satisfying number of subjects with the required parameter ensembles. Specifically, finding combinatorial biomarkers for some given condition also needs a very large dataset to analyze. For this goal, statistical regression analysis is not the preferred tool of choice, since (i) the {\em a priori} knowledge of the parameter-sets to analyze is missing, and (ii) typically relatively few subjects have the interesting parameter-value ensembles for the analysis. For fast and automatic multi-parametric relation discovery association-rule finding tools are used for more than two decades in the data-mining community. Here we present the SCARF webserver for {\em generalized} association rule mining. Association rules are of the form: $a$ AND $b$ AND ... AND $x \rightarrow y$, meaning that the presence of properties $a$ AND $b$ AND ... AND $x$ implies property $y$; our algorithm finds generalized association rules, since it also finds logical disjunctions (i.e., ORs) at the left-hand side, allowing the discovery of more complex rules in a more compressed form in the database. This feature also helps reducing the typically very large result-tables of such studies, since allowing ORs in the left-hand side of a single rule could include dozens of classical rules. The capabilities of the SCARF algorithm were demonstrated in mining the Alzheimer's database of the Coalition Against Major Diseases (CAMD) in our recent publication (Archives of Gerontology and Geriatrics Vol. 73, pp. 300-307, 2017). Here we describe the webserver implementation of the algorithm. △ Less

Submitted 28 September, 2017; originally announced September 2017.

arXiv:1709.04974 [pdf, other]

Comparing Advanced Graph-Theoretical Parameters of the Connectomes of the Lobes of the Human Brain

Authors: Balazs Szalkai, Balint Varga, Vince Grolmusz

Abstract: Deep, classical graph-theoretical parameters, like the size of the minimum vertex cover, the chromatic number, or the eigengap of the adjacency matrix of the graph were studied widely by mathematicians in the last century. Most researchers today study much simpler parameters of braingraphs or connectomes which were defined in the last twenty years for enormous networks -- like the graph of the Wor… ▽ More Deep, classical graph-theoretical parameters, like the size of the minimum vertex cover, the chromatic number, or the eigengap of the adjacency matrix of the graph were studied widely by mathematicians in the last century. Most researchers today study much simpler parameters of braingraphs or connectomes which were defined in the last twenty years for enormous networks -- like the graph of the World Wide Web -- with hundreds of millions of nodes. Since the connectomes, describing the connections of the human brain, typically contain several hundred vertices today, one can compute and analyze the much deeper, harder-to-compute classical graph parameters for these, relatively small graphs of the brain. This deeper approach has proven to be very successful in the comparison of the connectomes of the sexes in our earlier works: we have shown that graph parameters, deeply characterizing the graph connectivity are significantly better in women's connectomes than in men's. In the present contribution we compare numerous graph parameters in the three largest lobes --- frontal, parietal, temporal --- and in both hemispheres of the brain. We apply the diffusion weighted imaging data of 423 subjects of the NIH-funded Human Connectome Project, and present some findings, never described before, including that the right parietal lobe contains significantly more edges, has higher average degree, density, larger minimum vertex cover and Hoffman bound than the left parietal lobe. Similar advantages in the deep graph connectivity properties are hold for the left frontal vs. the right frontal and for the right temporal vs. the left temporal lobes. △ Less

Submitted 14 September, 2017; originally announced September 2017.

arXiv:1708.04103 [pdf, ps, other]

SECLAF: A Webserver and Deep Neural Network Design Tool for Biological Sequence Classification

Authors: Balazs Szalkai, Vince Grolmusz

Abstract: Artificial intelligence (AI) tools are gaining more and more ground each year in bioinformatics. Learning algorithms can be taught easily by using the existing enormous biological databases, and the resulting models can be used for the high-quality classification of novel, un-categorized data in numerous areas, including biological sequence analysis. Here we introduce SECLAF, an artificial neural-… ▽ More Artificial intelligence (AI) tools are gaining more and more ground each year in bioinformatics. Learning algorithms can be taught easily by using the existing enormous biological databases, and the resulting models can be used for the high-quality classification of novel, un-categorized data in numerous areas, including biological sequence analysis. Here we introduce SECLAF, an artificial neural-net based biological sequence classifier framework, which uses the Tensorflow library of Google, Inc. By applying SECLAF for residue-sequences, we have reported (Methods (2017), https://doi.org/10.1016/j.ymeth.2017.06.034) the most accurate multi-label protein classifier to date (UniProt --into 698 classes-- AUC 99.99\%; Gene Ontology --into 983 classes-- AUC 99.45\%). Our framework SECLAF can be applied for other sequence classification tasks, as we describe in the present contribution. Availability and implementation: The program SECLAF is implemented in Python, and is available for download, with example datasets at the website https://pitgroup.org/seclaf/. For Gene Ontology and UniProt based classifications a webserver is also available at the address above. △ Less

Submitted 14 August, 2017; originally announced August 2017.

arXiv:1703.10663 [pdf, other]

Near Perfect Protein Multi-Label Classification with Deep Neural Networks

Authors: Balazs Szalkai, Vince Grolmusz

Abstract: Artificial neural networks (ANNs) have gained a well-deserved popularity among machine learning tools upon their recent successful applications in image- and sound processing and classification problems. ANNs have also been applied for predicting the family or function of a protein, knowing its residue sequence. Here we present two new ANNs with multi-label classification ability, showing impressi… ▽ More Artificial neural networks (ANNs) have gained a well-deserved popularity among machine learning tools upon their recent successful applications in image- and sound processing and classification problems. ANNs have also been applied for predicting the family or function of a protein, knowing its residue sequence. Here we present two new ANNs with multi-label classification ability, showing impressive accuracy when classifying protein sequences into 698 UniProt families (AUC=99.99%) and 983 Gene Ontology classes (AUC=99.45%). △ Less

Submitted 30 March, 2017; originally announced March 2017.

arXiv:1610.04568 [pdf, other]

The Robustness and the Doubly-Preferential Attachment Simulation of the Consensus Connectome Dynamics of the Human Brain

Authors: Balázs Szalkai, Vince Grolmusz

Abstract: The increasing quantity and quality of the publicly available human cerebral diffusion MRI data make possible the study of the brain as it was unimaginable before. The Consensus Connectome Dynamics (CCD) is a remarkable phenomenon that was discovered by continuously decreasing the minimum confidence-parameter at the graphical interface of the Budapest Reference Connectome Server (\url{http://conne… ▽ More The increasing quantity and quality of the publicly available human cerebral diffusion MRI data make possible the study of the brain as it was unimaginable before. The Consensus Connectome Dynamics (CCD) is a remarkable phenomenon that was discovered by continuously decreasing the minimum confidence-parameter at the graphical interface of the Budapest Reference Connectome Server (\url{http://connectome.pitgroup.org}). The Budapest Reference Connectome Server depicts the cerebral connections of $n=418$ subjects with a frequency-parameter $k$: For any $k=1,2,...,n$ one can view the graph of the edges that are present in at least $k$ connectomes. If parameter $k$ is decreased one-by-one from $k=n$ through $k=1$ then more and more edges appear in the graph, since the inclusion condition is relaxed. The surprising observation is that the appearance of the edges is far from random: it resembles a growing, complex structure, like a tree or a shrub (visualized on \url{https://www.youtube.com/watch?v=yxlyudPaVUE}). Here we examine the robustness of the CCD phenomenon, and we show that it is almost independent of the particular choice of the set of underlying individual connectomes, yielding the CCD phenomenon. This result shows that the CCD phenomenon is very likely a biological property of the human brain and not just a property of the data sets examined. We also present a simulation that well-describes the growth of the CCD structure: in our random graph model a doubly-preferential attachment distribution is found to mimic the CCD: a new edge appear with a probability proportional to the sum of the degrees of the endpoints of the new edge. △ Less

Submitted 14 October, 2016; originally announced October 2016.

arXiv:1610.02016 [pdf, other]

The braingraph.org Database of High Resolution Structural Connectomes and the Brain Graph Tools

Authors: Csaba Kerepesi, Balazs Szalkai, Balint Varga, Vince Grolmusz

Abstract: Based on the data of the NIH-funded Human Connectome Project, we have computed structural connectomes of 426 human subjects in five different resolutions of 83, 129, 234, 463 and 1015 nodes and several edge weights. The graphs are given in anatomically annotated GraphML format that facilitates better further processing and visualization. For 96 subjects, the anatomically classified sub-graphs can… ▽ More Based on the data of the NIH-funded Human Connectome Project, we have computed structural connectomes of 426 human subjects in five different resolutions of 83, 129, 234, 463 and 1015 nodes and several edge weights. The graphs are given in anatomically annotated GraphML format that facilitates better further processing and visualization. For 96 subjects, the anatomically classified sub-graphs can also be accessed, formed from the vertices corresponding to distinct lobes or even smaller regions of interests of the brain. For example, one can easily download and study the connectomes, restricted to the frontal lobes or just to the left precuneus of 96 subjects using the data. Partially directed connectomes of 423 subjects are also available for download. We also present a GitHub-deposited set of tools, called the Brain Graph Tools, for several processing tasks of the connectomes on the site \url{http://braingraph.org}. △ Less

Submitted 6 October, 2016; originally announced October 2016.

arXiv:1609.09036 [pdf, ps, other]

High-Resolution Directed Human Connectomes and the Consensus Connectome Dynamics

Authors: Balázs Szalkai, Csaba Kerepesi, Bálint Varga, Vince Grolmusz

Abstract: Here we show a method of directing the edges of the connectomes, prepared from diffusion tensor imaging (DTI) datasets from the human brain. Before the present work, no high-definition directed braingraphs (or connectomes) were published, because the tractography methods in use are not capable of assigning directions to the neural tracts discovered. Previous work on the functional connectomes appl… ▽ More Here we show a method of directing the edges of the connectomes, prepared from diffusion tensor imaging (DTI) datasets from the human brain. Before the present work, no high-definition directed braingraphs (or connectomes) were published, because the tractography methods in use are not capable of assigning directions to the neural tracts discovered. Previous work on the functional connectomes applied low-resolution functional MRI-detected statistical causality for the assignment of directions of connectomes of typically several dozens of vertices. Our method is based on the phenomenon of the "Consensus Connectome Dynamics" (CCD), described earlier by our research group. In this contribution, we apply the method to the 423 braingraphs, each with 1015 vertices, computed from the public release of the Human Connectome Project, and we also made the directed connectomes publicly available at the site \url{http://braingraph.org}. We also show the robustness of our edge directing method in four independently chosen connectome datasets: we have found that 86\% of the edges, which were present in all four datasets, get the very same directions in all datasets; therefore the direction method is robust, it does not depend on the particular choice of the dataset. We think that our present contribution opens up new possibilities in the analysis of the high-definition human connectome: from now on we can work with a robust assignment of directions of the connections of the human brain. △ Less

Submitted 28 September, 2016; originally announced September 2016.

arXiv:1605.01441 [pdf, other]

The Dorsal Striatum and the Dynamics of the Consensus Connectomes in the Frontal Lobe of the Human Brain

Authors: Csaba Kerepesi, Balint Varga, Balazs Szalkai, Vince Grolmusz

Abstract: In the applications of the graph theory it is unusual that one considers numerous, pairwise different graphs on the very same set of vertices. In the case of human braingraphs or connectomes, however, this is the standard situation: the nodes correspond to anatomically identified cerebral regions, and two vertices are connected by an edge if a diffusion MRI-based workflow identifies a fiber of axo… ▽ More In the applications of the graph theory it is unusual that one considers numerous, pairwise different graphs on the very same set of vertices. In the case of human braingraphs or connectomes, however, this is the standard situation: the nodes correspond to anatomically identified cerebral regions, and two vertices are connected by an edge if a diffusion MRI-based workflow identifies a fiber of axons, running between the two regions, corresponding to the two vertices. Therefore, if we examine the braingraphs of $n$ subjects, then we have $n$ graphs on the very same, anatomically identified vertex set. It is a natural idea to describe the $k$-frequently appearing edges in these graphs: the edges that are present between the same two vertices in at least $k$ out of the $n$ graphs. Based on the NIH-funded large Human Connectome Project's public data release, we have reported the construction of the Budapest Reference Connectome Server \url{http://connectome.pitgroup.org} that generates and visualizes these $k$-frequently appearing edges. We call the graphs of the $k$-frequently appearing edges "$k$-consensus connectomes" since an edge could be included only if it is present in at least $k$ graphs out of $n$. Considering the whole human brain, we have reported a surprising property of these consensus connectomes earlier. In the present work we are focusing on the frontal lobe of the brain, and we report here a similarly surprising dynamical property of the consensus connectomes when $k$ is gradually changed from $k=n$ to $k=1$: the connections between the nodes of the frontal lobe are seemingly emanating from those nodes that were connected to sub-cortical structures of the dorsal striatum: the caudate nucleus, and the putamen. We hypothesize that this dynamic behavior copies the axonal fiber development of the frontal lobe. △ Less

Submitted 4 May, 2016; originally announced May 2016.

arXiv:1604.05992 [pdf, ps, other]

Human Sexual Dimorphism of the Relative Cerebral Area Volumes in the Data of the Human Connectome Project

Authors: Balázs Szalkai, Vince Grolmusz

Abstract: The average human brain volume of the males is larger than that of the females. Several MRI voxel-based morphometry studies show that the gray matter/white matter ratio is larger in females. Here we have analyzed the recent public release of the Human Connectome Project, and by using the diffusion MRI data of 511 subjects (209 men and 302 women), we have found that the relative volumes of numerous… ▽ More The average human brain volume of the males is larger than that of the females. Several MRI voxel-based morphometry studies show that the gray matter/white matter ratio is larger in females. Here we have analyzed the recent public release of the Human Connectome Project, and by using the diffusion MRI data of 511 subjects (209 men and 302 women), we have found that the relative volumes of numerous subcortical areas and the gray matter of most cortical areas are significantly larger in women than in men. Additionally, we have discovered differences of the strengths of the sexual correlations between the same structures in different hemispheres. △ Less

Submitted 20 April, 2016; originally announced April 2016.

arXiv:1603.00904 [pdf, other]

The Graph of Our Mind

Authors: Balázs Szalkai, Bálint Varga, Vince Grolmusz

Abstract: Graph theory in the last two decades penetrated sociology, molecular biology, genetics, chemistry, computer engineering, and numerous other fields of science. One of the more recent areas of its applications is the study of the connections of the human brain. By the development of diffusion magnetic resonance imaging (diffusion MRI), it is possible today to map the connections between the 1-1.5 cm… ▽ More Graph theory in the last two decades penetrated sociology, molecular biology, genetics, chemistry, computer engineering, and numerous other fields of science. One of the more recent areas of its applications is the study of the connections of the human brain. By the development of diffusion magnetic resonance imaging (diffusion MRI), it is possible today to map the connections between the 1-1.5 cm$^2$ regions of the gray matter of the human brain. These connections can be viewed as a graph: the vertices are the anatomically identified regions of the gray matter, and two vertices are connected by an edge if the diffusion MRI-based workflow finds neuronal fiber tracts between these areas. This way we can compute 1015-vertex graphs with tens of thousands of edges. In a previous work, we have analyzed the male and female braingraphs graph-theoretically, and we have found statistically significant differences in numerous parameters between the sexes: the female braingraphs are better expanders, have more edges, larger bipartition widths, and larger vertex cover than the braingraphs of the male subjects. Our previous study has applied the data of 96 subjects; here we present a much larger study of 426 subjects. Our data source is an NIH-founded project, the "Human Connectome Project (HCP)" public data release. As a service to the community, we have also made all of the braingraphs computed by us from the HCP data publicly available at the \url{http://braingraph.org} for independent validation and further investigations. △ Less

Submitted 17 March, 2020; v1 submitted 2 March, 2016; originally announced March 2016.

Comments: arXiv admin note: substantial text overlap with arXiv:1512.01156, arXiv:1501.00727

arXiv:1602.04776 [pdf, other]

Parameterizable Consensus Connectomes from the Human Connectome Project: The Budapest Reference Connectome Server v3.0

Authors: Balázs Szalkai, Csaba Kerepesi, Bálint Varga, Vince Grolmusz

Abstract: Connections of the living human brain, on a macroscopic scale, can be mapped by a diffusion MR imaging based workflow. Since the same anatomic regions can be corresponded between distinct brains, one can compare the presence or the absence of the edges, connecting the very same two anatomic regions, among multiple cortices. Previously, we have constructed the consensus braingraphs on 1015 vertices… ▽ More Connections of the living human brain, on a macroscopic scale, can be mapped by a diffusion MR imaging based workflow. Since the same anatomic regions can be corresponded between distinct brains, one can compare the presence or the absence of the edges, connecting the very same two anatomic regions, among multiple cortices. Previously, we have constructed the consensus braingraphs on 1015 vertices first in five, then in 96 subjects in the Budapest Reference Connectome Server v1.0 and v2.0, respectively. Here we report the construction of the version 3.0 of the server, generating the common edges of the connectomes of variously parameterizable subsets of the 1015-vertex connectomes of 477 subjects of the Human Connectome Project's 500-subject release. The consensus connectomes are downloadable in csv and GraphML formats, and they are also visualized on the server's page. The consensus connectomes of the server can be considered as the "average, healthy" human connectome since all of their connections are present in at least $k$ subjects, where the default value of $k=209$, but it can also be modified freely at the web server. The webserver is available at \url{http://connectome.pitgroup.org}. △ Less

Submitted 15 February, 2016; originally announced February 2016.

arXiv:1602.03008 [pdf, other]

Mapping Correlations of Psychological and Connectomical Properties of the Dataset of the Human Connectome Project with the Maximum Spanning Tree Method

Authors: Balazs Szalkai, Balint Varga, Vince Grolmusz

Abstract: We analyzed correlations between more than 700 psychological-, anatomical- and connectome--properties, originated from the Human Connectome Project's (HCP) 500-subject dataset. Apart from numerous natural correlations, which describe parameters computable or approximable from one another, we have discovered numerous significant correlations in the dataset, never described before. We also have foun… ▽ More We analyzed correlations between more than 700 psychological-, anatomical- and connectome--properties, originated from the Human Connectome Project's (HCP) 500-subject dataset. Apart from numerous natural correlations, which describe parameters computable or approximable from one another, we have discovered numerous significant correlations in the dataset, never described before. We also have found correlations described very recently independently from the HCP-dataset: e.g., between gambling behavior and the number of the connections leaving the insula. △ Less

Submitted 9 February, 2016; originally announced February 2016.

arXiv:1512.01156 [pdf, other]

The Advantage is at the Ladies: Brain Size Bias-Compensated Graph-Theoretical Parameters are Also Better in Women's Connectomes

Authors: Balázs Szalkai, Bálint Varga, Vince Grolmusz

Abstract: In our previous study we have shown that the female connectomes have significantly better, deep graph-theoretical parameters, related to superior "connectivity", than the connectome of the males. Since the average female brain is smaller than the average male brain, one cannot rule out that the significant advantages are due to the size- and not to the sex-differences in the data. To filter out th… ▽ More In our previous study we have shown that the female connectomes have significantly better, deep graph-theoretical parameters, related to superior "connectivity", than the connectome of the males. Since the average female brain is smaller than the average male brain, one cannot rule out that the significant advantages are due to the size- and not to the sex-differences in the data. To filter out the possible brain-volume related artifacts, we have chosen 36 small male and 36 large female brains such that all the brains in the female set are larger than all the brains in the male set. For the sets, we have computed the corresponding braingraphs and computed numerous graph-theoretical parameters. We have found that (i) the small male brains lack the better connectivity advantages shown in our previous study for female brains in general; (ii) in numerous parameters, the connectomes computed from the large-brain females, still have the significant, deep connectivity advantages, demonstrated in our previous study. △ Less

Submitted 3 December, 2015; originally announced December 2015.

Comments: arXiv admin note: substantial text overlap with arXiv:1501.00727

arXiv:1509.05703 [pdf, other]

doi 10.1371/journal.pone.0158680

How to Direct the Edges of the Connectomes: Dynamics of the Consensus Connectomes and the Development of the Connections in the Human Brain

Authors: Csaba Kerepesi, Balázs Szalkai, Bálint Varga, Vince Grolmusz

Abstract: The human connectome is the object of an intensive research today. In these graphs, the vertices correspond to the small areas of the gray matter, and two vertices are connected by an edge, if a diffusion-MRI based workflow finds connections between those areas. One main question of the field is discovering the directions of the edges. In a previous work we have reported the construction of the Bu… ▽ More The human connectome is the object of an intensive research today. In these graphs, the vertices correspond to the small areas of the gray matter, and two vertices are connected by an edge, if a diffusion-MRI based workflow finds connections between those areas. One main question of the field is discovering the directions of the edges. In a previous work we have reported the construction of the Budapest Reference Connectome Server http://connectome.pitgroup.org from the data recorded in the Human Connectome Project of the NIH. After the server had been published, we recognized a surprising and unforeseen property of it: The server can generate the braingraph of connections that are present in at least $k$ graphs out of the 418, for any value of $k=1,2,...,418$. When the value of $k$ is changed from $k=418$ through 1 by moving a slider at the webserver from right to left, more and more edges appear in the consensus graph. The astonishing observation is that the appearance of the new edges is not random: it is similar to a growing tree. We hypothesize that this movement of the slider in the webserver may copy the development of the connections in the human brain in the following sense: the connections that are present in all subjects are the oldest ones, and those that are present in a decreasing fraction of subjects are gradually the newer connections in the individual brain development. An animation on the phenomenon is available at https://youtu.be/EnWwIf_HNjw. Based on this hypothesis, we can assign directions to the edges of the connectome as follows: Let $G_i$ denote the consensus connectome where each edge is present in at least $i$ graphs. Suppose that vertex $v$ is isolated in $G_{k+1}$, and becomes connected to a vertex $u$ in $G_k$, where $u$ was connected to other vertices already in $G_{k+1}$. Then we direct this $(v,u)$ edge from $v$ to $u$. △ Less

Submitted 13 March, 2016; v1 submitted 18 September, 2015; originally announced September 2015.

arXiv:1509.04850 [pdf]

Life without dUTPase

Authors: Csaba Kerepesi, Judit E. Szabó, Vince Grolmusz, Beáta G. Vértessy

Abstract: Fine-tuned regulation of the cellular nucleotide pools is indispensable for faithful replication of DNA. The genetic information is also safeguarded by DNA damage recognition and repair processes. Uracil is one of the most frequently occurring erroneous base in DNA; it can arise from cytosine deamination or thymine-replacing incorporation. Two enzyme families are primarily involved in keeping DNA… ▽ More Fine-tuned regulation of the cellular nucleotide pools is indispensable for faithful replication of DNA. The genetic information is also safeguarded by DNA damage recognition and repair processes. Uracil is one of the most frequently occurring erroneous base in DNA; it can arise from cytosine deamination or thymine-replacing incorporation. Two enzyme families are primarily involved in keeping DNA uracil-free: dUTPases that prevent thymine-replacing incorporation and uracil-DNA glycosylases that excise uracil from DNA and initiate uracil-excision repair. Both dUTPase and the most efficient uracil-DNA glycosylase UNG is thought to be ubiquitous in free-living organisms. In the present work, we have systematically investigated the genotype of deposited fully sequenced bacterial and Archaeal genomes. Surprisingly, we have found that in contrast to the generally held opinion, a wide number of bacterial and Archaeal species lack the dUTPase gene(s). The dut- genotype is present in diverse bacterial phyla indicating that loss of this (or these) gene(s) has occurred multiple times during evolution. We have identified several survival strategies in lack of dUTPases: i) simultaneous lack or inhibition of UNG, ii) acquisition of a less dUTP-specific sanitizing nucleotide pyrophosphatase, and iii) supply of dUTPase from bacteriophages. Our data indicate that several unicellular microorganisms may efficiently cope with a dut- genotype potentially leading to an unusual uracil-enrichment in their genomic DNA. △ Less

Submitted 16 September, 2015; originally announced September 2015.

arXiv:1507.00327 [pdf, other]

Comparative Connectomics: Mapping the Inter-Individual Variability of Connections within the Regions of the Human Brain

Authors: Csaba Kerepesi, Balázs Szalkai, Bálint Varga, Vince Grolmusz

Abstract: The human braingraph, or connectome is a description of the connections of the brain: the nodes of the graph correspond to small areas of the gray matter, and two nodes are connected by an edge if a diffusion MRI-based workflow finds fibers between those brain areas. We have constructed 1015-vertex graphs from the diffusion MRI brain images of 395 human subjects and compared the individual graphs… ▽ More The human braingraph, or connectome is a description of the connections of the brain: the nodes of the graph correspond to small areas of the gray matter, and two nodes are connected by an edge if a diffusion MRI-based workflow finds fibers between those brain areas. We have constructed 1015-vertex graphs from the diffusion MRI brain images of 395 human subjects and compared the individual graphs with respect to several different areas of the brain. The inter-individual variability of the graphs within different brain regions was discovered and described. We have found that the frontal and the limbic lobes are more conservative, while the edges in the temporal and occipital lobes are more diverse. Interestingly, a "hybrid" conservative and diverse distribution was found in the paracentral lobule and the fusiform gyrus. Smaller cortical areas were also evaluated: precentral gyri were found to be more conservative, and the postcentral and the superior temporal gyri to be very diverse. △ Less

Submitted 1 July, 2015; originally announced July 2015.

arXiv:1505.00476 [pdf, ps, other]

Nucleotide 9-mers Characterize the Type II Diabetic Gut Metagenome

Authors: Balázs Szalkai, Vince Grolmusz

Abstract: Discoveries of new biomarkers for frequently occurring diseases are of special importance in today's medicine. While fully developed type II diabetes (T2D) can be detected easily, the early identification of high risk individuals is an area of interest in T2D, too. Metagenomic analysis of the human bacterial flora has shown subtle changes in diabetic patients, but no specific microbes are known to… ▽ More Discoveries of new biomarkers for frequently occurring diseases are of special importance in today's medicine. While fully developed type II diabetes (T2D) can be detected easily, the early identification of high risk individuals is an area of interest in T2D, too. Metagenomic analysis of the human bacterial flora has shown subtle changes in diabetic patients, but no specific microbes are known to cause or promote the disease. Moderate changes were also detected in the microbial gene composition of the metagenomes of diabetic patients, but again, no specific gene was found that is present in disease-related and missing in healthy metagenome. However, these fine differences in microbial taxon- and gene composition are difficult to apply as quantitative biomarkers for diagnosing or predicting type II diabetes. In the present work we report some nucleotide 9-mers with significantly differing frequencies in diabetic and healthy intestinal flora. To our knowledge, it is the first time such short DNA fragments have been associated with T2D. The automated, quantitative analysis of the frequencies of short nucleotide sequences seems to be more feasible than accurate phylogenetic and functional analysis, and thus it might be a promising direction of diagnostic research. △ Less

Submitted 3 May, 2015; originally announced May 2015.

arXiv:1503.05575 [pdf, other]

The "Giant Virus Finder" Discovers an Abundance of Giant Viruses in the Antarctic Dry Valleys

Authors: Csaba Kerepesi, Vince Grolmusz

Abstract: The first giant virus was identified in 2003 from a biofilm of an industrial water-cooling tower in England. Later, numerous new giant viruses were found in oceans and freshwater habitats, some of them having even 2,500 genes. We have demonstrated their very likely presence in four soil samples taken from the Kutch Desert (Gujarat, India). Here we describe a bioinformatics work-flow, called the "G… ▽ More The first giant virus was identified in 2003 from a biofilm of an industrial water-cooling tower in England. Later, numerous new giant viruses were found in oceans and freshwater habitats, some of them having even 2,500 genes. We have demonstrated their very likely presence in four soil samples taken from the Kutch Desert (Gujarat, India). Here we describe a bioinformatics work-flow, called the "Giant Virus Finder" that is capable to discover the very likely presence of the genomes of giant viruses in metagenomic shotgun-sequenced datasets. The new tool is applied to numerous hot and cold desert soil samples as well as some tundra- and forest soils. We show that most of these samples contain giant viruses, and especially many were found in the Antarctic dry valleys. The results imply that giant viruses could be frequent not only in aqueous habitats, but in a wide spectrum of soils on our planet. △ Less

Submitted 24 November, 2015; v1 submitted 18 March, 2015; originally announced March 2015.

arXiv:1501.00727 [pdf, other]

doi 10.1371/journal.pone.0130045

Graph Theoretical Analysis Reveals: Women's Brains are Better Connected than Men's

Authors: Balazs Szalkai, Balint Varga, Vince Grolmusz

Abstract: Deep graph-theoretic ideas in the context with the graph of the World Wide Web led to the definition of Google's PageRank and the subsequent rise of the most-popular search engine to date. Brain graphs, or connectomes, are being widely explored today. We believe that non-trivial graph theoretic concepts, similarly as it happened in the case of the World Wide web, will lead to discoveries enlighten… ▽ More Deep graph-theoretic ideas in the context with the graph of the World Wide Web led to the definition of Google's PageRank and the subsequent rise of the most-popular search engine to date. Brain graphs, or connectomes, are being widely explored today. We believe that non-trivial graph theoretic concepts, similarly as it happened in the case of the World Wide web, will lead to discoveries enlightening the structural and also the functional details of the animal and human brains. When scientists examine large networks of tens or hundreds of millions of vertices, only fast algorithms can be applied because of the size constraints. In the case of diffusion MRI-based structural human brain imaging, the effective vertex number of the connectomes, or brain graphs derived from the data is on the scale of several hundred today. That size facilitates applying strict mathematical graph algorithms even for some hard-to-compute (or NP-hard) quantities like vertex cover or balanced minimum cut. In the present work we have examined brain graphs, computed from the data of the Human Connectome Project, recorded from male and female subjects between ages 22 and 35. Significant differences were found between the male and female structural brain graphs: we show that the average female connectome has more edges, is a better expander graph, has larger minimal bisection width, and has more spanning trees than the average male connectome. Since the average female brain weights less than the brain of males, these properties show that the female brain is more "well-connected" or perhaps, more "efficient" in a sense than the brain of males. △ Less

Submitted 12 January, 2015; v1 submitted 4 January, 2015; originally announced January 2015.

arXiv:1412.3151 [pdf, other]

The Budapest Reference Connectome Server v2.0

Authors: Balazs Szalkai, Csaba Kerepesi, Balint Varga, Vince Grolmusz

Abstract: The connectomes of different human brains are pairwise distinct: we cannot talk about an abstract "graph of the brain". Two typical connectomes, however, have quite a few common graph edges that may describe the same connections between the same cortical areas. The Budapest Reference Connectome Server Ver. 2.0 (http://connectome.pitgroup.org) generates the common edges of the connectomes of 96 dis… ▽ More The connectomes of different human brains are pairwise distinct: we cannot talk about an abstract "graph of the brain". Two typical connectomes, however, have quite a few common graph edges that may describe the same connections between the same cortical areas. The Budapest Reference Connectome Server Ver. 2.0 (http://connectome.pitgroup.org) generates the common edges of the connectomes of 96 distinct cortexes, each with 1015 vertices, computed from 96 MRI data sets of the Human Connectome Project. The user may set numerous parameters for the identification and filtering of common edges, and the graphs are downloadable in both csv and GraphML formats; both formats carry the anatomical annotations of the vertices, generated by the Freesurfer program. The resulting consensus graph is also automatically visualized in a 3D rotating brain model on the website. The consensus graphs, generated with various parameter settings, can be used as reference connectomes based on different, independent MRI images, therefore they may serve as reduced-error, low-noise, robust graph representations of the human brain. △ Less

Submitted 6 January, 2015; v1 submitted 9 December, 2014; originally announced December 2014.

arXiv:1410.1278 [pdf, ps, other]

Giant Viruses of the Kutch Desert

Authors: Csaba Kerepesi, Vince Grolmusz

Abstract: The Kutch desert (Great Rann of Kutch, Gujarat, India) is a unique ecosystem: in the larger part of the year it is a hot, salty desert that is flooded regularly in the Indian monsoon season. In the dry season, the crystallized salt deposits form the "white desert" in large regions. The first metagenomic analysis of the soil samples of Kutch was published in 2013, and the data was deposited in the… ▽ More The Kutch desert (Great Rann of Kutch, Gujarat, India) is a unique ecosystem: in the larger part of the year it is a hot, salty desert that is flooded regularly in the Indian monsoon season. In the dry season, the crystallized salt deposits form the "white desert" in large regions. The first metagenomic analysis of the soil samples of Kutch was published in 2013, and the data was deposited in the NCBI Sequence Read Archive. The sequences were analyzed at the same time phylogenetically for prokaryotes, especially for bacterial taxa. In the present work, we are searching for the DNA sequences of the recently discovered giant viruses in the soil samples of the Kutch desert. Since most giant viruses were discovered in biofilms in industrial cooling towers, ocean water and freshwater ponds, we were surprised to find their DNA sequences in the soil samples of a seasonally very hot and arid, salty environment. △ Less

Submitted 7 October, 2014; v1 submitted 6 October, 2014; originally announced October 2014.

arXiv:1312.4660 [pdf, other]

An Intuitive Graphical Webserver for Multiple-Choice Protein Sequence Search

Authors: Dániel Bánky, Balázs Szalkai, Vince Grolmusz

Abstract: Every day tens of thousands of sequence searches and sequence alignment queries are submitted to webservers. The capitalized word "BLAST" become a verb, describing the act of performing sequence search and alignment. However, if one needs to search for sequences that contain, for example, two hydrophobic and three polar residues at five given positions, the query formation on the most frequently u… ▽ More Every day tens of thousands of sequence searches and sequence alignment queries are submitted to webservers. The capitalized word "BLAST" become a verb, describing the act of performing sequence search and alignment. However, if one needs to search for sequences that contain, for example, two hydrophobic and three polar residues at five given positions, the query formation on the most frequently used webservers will be difficult. Some servers support the formation of queries with regular expressions, but most of the users are unfamiliar with their syntax. Here we present an intuitive, easily applicable webserver, the Protein Sequence Analysis server, that allows the formation of multiple choice queries by simply drawing the residues to their positions; if more than one residue are drawn to the same position, then they will be nicely stacked on the user interface, indicating the multiple choice at he given position. This computer-game like interface is natural and intuitive, and the coloring of the residues makes possible to form queries requiring not just certain amino acids in the given positions, but also small nonpolar, negatively charged, hydrophobic, positively charged, or polar ones. The webserver is available at http://psa.pitgroup.org. △ Less

Submitted 17 December, 2013; originally announced December 2013.

arXiv:1312.1876 [pdf, ps, other]

Identifying Combinatorial Biomarkers by Association Rule Mining in the CAMD Alzheimer's Database

Authors: Balazs Szalkai, Vince K. Grolmusz, Vince I. Grolmusz, Coalition Against Major Diseases

Abstract: Background: The concept of combinatorial biomarkers was conceived around 2010: it was noticed that simple biomarkers are often inadequate for recognizing and characterizing complex diseases. Methods: Here we present an algorithmic search method for complex biomarkers which may predict or indicate Alzheimer's disease (AD) and other kinds of dementia. We applied data mining techniques that are cap… ▽ More Background: The concept of combinatorial biomarkers was conceived around 2010: it was noticed that simple biomarkers are often inadequate for recognizing and characterizing complex diseases. Methods: Here we present an algorithmic search method for complex biomarkers which may predict or indicate Alzheimer's disease (AD) and other kinds of dementia. We applied data mining techniques that are capable to uncover implication-like logical schemes with detailed quality scoring. Our program SCARF is capable of finding multi-factor relevant association rules automatically. The new SCARF program was applied for the Tucson, Arizona based Critical Path Institute's CAMD database, containing laboratory and cognitive test data for more than 6000 patients from the placebo arm of clinical trials of large pharmaceutical companies, and consequently, the data is much more reliable than numerous other databases for dementia. Results: The results suggest connections between liver enzyme-, B12 vitamin-, sodium- and cholesterol levels and dementia, and also some hematologic parameter-levels and dementia. △ Less

Submitted 6 December, 2013; originally announced December 2013.

arXiv:1309.1895 [pdf, other]

Fast and Exact Sequence Alignment with the Smith-Waterman Algorithm: The SwissAlign Webserver

Authors: Gabor Ivan, Daniel Banky, Vince Grolmusz

Abstract: It is demonstrated earlier that the exact Smith-Waterman algorithm yields more accurate results than the members of the heuristic BLAST family of algorithms. Unfortunately, the Smith-Waterman algorithm is much slower than the BLAST and its clones. Here we present a technique and a webserver that uses the exact Smith-Waterman algorithm, and it is approximately as fast as the BLAST algorithm. The te… ▽ More It is demonstrated earlier that the exact Smith-Waterman algorithm yields more accurate results than the members of the heuristic BLAST family of algorithms. Unfortunately, the Smith-Waterman algorithm is much slower than the BLAST and its clones. Here we present a technique and a webserver that uses the exact Smith-Waterman algorithm, and it is approximately as fast as the BLAST algorithm. The technique unites earlier methods of extensive preprocessing of the target sequence database, and CPU-specific coding of the Smith-Waterman algorithm. The SwissAlign webserver is available at the http://swissalign.pitgroup.org address. △ Less

Submitted 7 September, 2013; originally announced September 2013.

arXiv:1309.1892 [pdf, other]

Dimension reduction of clustering results in bioinformatics

Authors: Gabor Ivan, Vince Grolmusz

Abstract: OPTICS is a density-based clustering algorithm that performs well in a wide variety of applications. For a set of input objects, the algorithm creates a so-called reachability plot that can be either used to produce cluster membership assignments, or interpreted itself as an expressive two-dimensional representation of the density-based clustering structure of the input set, even if the input set… ▽ More OPTICS is a density-based clustering algorithm that performs well in a wide variety of applications. For a set of input objects, the algorithm creates a so-called reachability plot that can be either used to produce cluster membership assignments, or interpreted itself as an expressive two-dimensional representation of the density-based clustering structure of the input set, even if the input set is embedded in higher dimensions. The main focus of this work is a visualization method that can be used to assign colours to all entries of the input database, based on hierarchically represented a-priori knowledge available for each of these objects. Based on two different, bioinformatics-related applications we illustrate how the proposed method can be efficiently used to identify clusters with proven real-life relevance. △ Less

Submitted 7 September, 2013; originally announced September 2013.

Showing 1–44 of 44 results for author: Grolmusz, V