Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
Banu Diri

    Banu Diri

    In this work we will introduce EmoSTAR as a new emotional database and perform cross-corpus tests between EmoSTAR and EmoDB (Berlin Emotional Database) using one of the two databases as training set and the other as test set. We will also... more
    In this work we will introduce EmoSTAR as a new emotional database and perform cross-corpus tests between EmoSTAR and EmoDB (Berlin Emotional Database) using one of the two databases as training set and the other as test set. We will also investigate the performance of feature selectors in both databases. Feature extraction will be implemented with openSMILE toolkit employing Emobase and Emo_large configurations. Classification and feature selection will be run with WEKA tool. EmoSTAR is still under development for more samples and emotion types and we will welcome emotional speech sample donations from the speech community. EmoSTAR is available only for personal research purposes via email to the authors by signing an End User License Agreement.
    Correction of Turkish characters in a text, which written without using Turkish characters, takes quite time if done manually due to various constraints (keyboard keypad, editor restriction). In addition, it can be overlooked some words... more
    Correction of Turkish characters in a text, which written without using Turkish characters, takes quite time if done manually due to various constraints (keyboard keypad, editor restriction). In addition, it can be overlooked some words in many cases. Especially if the case is also a word of Turkish characterless (kir/kır) makes sense, it is quite natural to overlook them. In such cases, on the basis of word correction is inadequate, considering related word's place in the sentence, it is necessary to go to semantically corrections. Let's consider the following sentence “kir dugunlerini cok severim.”. Although kir is meaningful word, when the flow of sentences considered, kir must correct with kır. In this study, a semantic web based algorithm was designed and high success ratio was obtained, when correcting Turkish characters without using morphological tools.
    Wikipedia pages associate the key words and phrases from the text with the other Wikipedia pages by linking them together for the purpose of enabling the user to reach the information in an easier way. In this study, by using the Natural... more
    Wikipedia pages associate the key words and phrases from the text with the other Wikipedia pages by linking them together for the purpose of enabling the user to reach the information in an easier way. In this study, by using the Natural Language Processing techniques, the linking system has been tried to be automatized. Initially, the approach has been designed for Turkish Wikipedia, then in the second step, it has been tried for English Wikipedia and the results have been compared, evaluations are promising.
    Phishing attacks are one of the most preferred types of attacks for cybercriminals, who can easily contact a large number of victims through the use of social networks, particularly through email messages. To protect end users, most of... more
    Phishing attacks are one of the most preferred types of attacks for cybercriminals, who can easily contact a large number of victims through the use of social networks, particularly through email messages. To protect end users, most of the security mechanisms control Uniform Resource Locator (URL) addresses because of their simplicity of implementation and execution speed. However, due to sophisticated attackers, this mechanism can miss some phishing attacks and has a relatively high false positive rate. In this research, a hybrid technique is proposed that uses not only URL features, but also content-based features as the second level of detection mechanism, thus improving the accuracy of the detection system while also minimizing the number of false positives. Additionally, most phishing detection algorithms use datasets that contain easily differentiated data pieces, either phishing or legitimate. However, in order to implement a more secure protection mechanism, we aimed to coll...
    This study aims to develop a system that classifies tweets by their subject and implement it as a mobile application. Tweets from user's home timeline are classified into topics taken from the user. Classification is done by using the... more
    This study aims to develop a system that classifies tweets by their subject and implement it as a mobile application. Tweets from user's home timeline are classified into topics taken from the user. Classification is done by using the ontologies of the topics'. Wikipedia pages and TDK definitions of the terms are used while creating the ontologies. Links referencing other Wikipedia pages that found in the terms' Wikipedia page are taken as ontological relations. Strength of these relations are calculated using the Wikipedia descriptions and TDK definitions. Ontology of the topic is created using terms with strong relations. Tweets' relations with ontologies are calculated. Tweets are placed under the topics they are most related to. It is seen that the proposed algorithm is only suitable to classify tweets into maximum of five classes as a result of this study. Three reasons could be observed. First, tweets don't contain enough distinctive words for this algorith...
    Users often use online reviews to assess the quality of hotels according to their various attributes. In this study, a sentiment analysis of online reviews has been conducted using 11 predetermined attributes pertaining to hotels. Using... more
    Users often use online reviews to assess the quality of hotels according to their various attributes. In this study, a sentiment analysis of online reviews has been conducted using 11 predetermined attributes pertaining to hotels. Using this analysis, users’ overall assessments of hotels have been determined and summarized from reviews left for a group of various hotels. To identify words with similar meanings to the 11 predetermined hotel attributes, the Word2Vec method has been employed. Additionally, the FastText method has been used to detect words containing spelling errors. The sentiment analysis of the comments has been made by using three different methods belonging to two different approaches. These methods are: VADER method as dictionary-based approach, BERT and RoBERTa as machine learning approaches. Using these methods, the reviews have been evaluated in three categories as positive, negative, and neutral, and the quality score has been calculated. In addition, a softwar...
    In most of widely used distance education platforms which are named as MOOC (Massive Open Online Courses) language of lectures are English, but even so, they have participants from a lot of different countries. This situation causes... more
    In most of widely used distance education platforms which are named as MOOC (Massive Open Online Courses) language of lectures are English, but even so, they have participants from a lot of different countries. This situation causes differences in learners usage behaviors and performances. In our previous studies we tried to divide the users into language groups according to their English language proficiency. In this study, with natural language processing techniques we aimed to improve the division of language groups of students and automatically generate datasets which belong to language groups from a distance education platform named as FutureLearn. In FutureLearn platform (like other distance education platforms), learners do not have to provide their country information while registering. Also for some of the learners, provided country information belongs to where they currently live which is different from their home country. In such situations, it is not possible to determine whether English is their first, official or secondary language. Our study focused on using regex patterns to update learners language groups' labels with aim of using them in future studies like predicting the learners' language groups. As data source the datasets of "Understanding Language: Learning and Teaching-4" course on the FutureLearn platform is used. To update the language groups with natural language processing we mostly used features like learners' comments, ids, and country information. As a result of this study, with the analysis of the comments of the users, we identified 63.06% of all commented users' language groups which consist of English as official and primary language, English is official but not primary language and English is not official language. It is observed that 78.19% of these learners belong to the same language group as their provided country information in registration progress and 21.81% of users groups' home country is different from their language group which is identified from their comments. When we just use their country information (the information provided in registration step) number of English language group identified learners were lower and identified learners' language groups could be wrong.
    Ozet Yazilim testinde uygulanabilecek en basit yaklasim verilen bir kod parcasindaki butun olasiliklari test etmektir. Bu durum zaman ve butce kisitlari nedeniyle pratikte imkansizdir. Yazilim hata tahmini yontemleri proje yoneticileri... more
    Ozet Yazilim testinde uygulanabilecek en basit yaklasim verilen bir kod parcasindaki butun olasiliklari test etmektir. Bu durum zaman ve butce kisitlari nedeniyle pratikte imkansizdir. Yazilim hata tahmini yontemleri proje yoneticileri tarafindan, test asamasinda, kisitli olan kaynaklari efektif bir sekilde dagitmak icin kullanilmaktadir. Bu alandaki calismalar ozellikle 2005 yilindan itibaren artarak devam etmektedir. Bu calismada literaturde var olan metriklerin web uygulamalari icin yeterli olup olmadigi sorgulanmistir. Web uygulamalari uzerinde yaptigimiz deneyler hata tahmininin web uygulamalari uzerinde optimum sonuclar vermekten uzakta oldugunu gostermektedir. Bu tip uygulamalari gelistirmede kullanilan yasam dongusu, diger uygulamalar icin kullanilanlarla ayni olsa da teknik bakimdan ayristiklari bazi noktalar bulunmaktadir. Bu nedenle yazilim hata tahmini alaninda web uygulamalarina ozel metrikler olusturulmasini onermekteyiz.
    Git is used as the distributed version control system for many open-source software projects. One Git-based service, GitHub, is the most common code hosting and repository service for open-source software projects. For researchers that... more
    Git is used as the distributed version control system for many open-source software projects. One Git-based service, GitHub, is the most common code hosting and repository service for open-source software projects. For researchers that study software engineering, the content that is hosted on these platforms provides much valuable data. There are some alternatives to get GitHub data such as GitHub Archive, GitHub API or GHTorrent. Among these options, GHTorrent is the most widely known and used GitHub dataset in the literature. Although there are some review studies about software engineering challenges across the GitHub platform, no review of GHTorrent dataset-specific research is available. In this study, the 172 studies that use GHTorrent as a data source were categorized within the scope of software engineering challenges and a systematic mapping study was carried out. Moreover, the pros and cons of the dataset have been indicated and the focused issues of the literature on and ...
    Bu calismada metin veriler uzerinde yapilmakta olan dizgi esleme islemi istatistikleri ile ayni veriler uzerinde gerceklestirilen sikistirilmis dizgi esleme islemi istatistikleri karsilastirilmistir. Bu kiyaslamayi yapmak icin daha once... more
    Bu calismada metin veriler uzerinde yapilmakta olan dizgi esleme islemi istatistikleri ile ayni veriler uzerinde gerceklestirilen sikistirilmis dizgi esleme islemi istatistikleri karsilastirilmistir. Bu kiyaslamayi yapmak icin daha once gelistirdigimiz bir uygulama* iyilestirilmistir ve test sonuclari bu uygulama sayesinde elde edilmistir. Calismanin amacina uygun olarak literaturde mevcut dizgi esleme algoritmalarinin uzerinde herhangi bir degisiklik yapilmadan, sikistirilmis dizgi eslemede de kullanilabilmesini saglayan bir sikistirma yontemi de sunulmustur. Yapilan testlerde ikili ve uclu kodlamaya dayanan sikistirma algoritmasi %30-%35 arasi bir sikistirma faktoru sunarken, elde edilen sikistirilmis dizgi esleme suresi, sikistirilmamis metin uzerinde yapilan dizgi esleme suresinden daha dusuk olarak bulunmustur. Ayrica, dizgi esleme yaparken gerceklestirilen karakter karsilastirma sayilarinin sikistirilmis metinde, sikistirilmamis metne gore daha az oldugu saptanmistir. Dolayisi...
    Makine ogrenmesi alaninda yapay sinir aglari bircok problemin cozumunde siklikla kullanilmistir. Ancak ―Yapay Zeka Kis Uykusu‖ olarak da adlandirilan donemde basta donanimsal kisitlamalar ve diger problemler sebebiyle bu alandaki... more
    Makine ogrenmesi alaninda yapay sinir aglari bircok problemin cozumunde siklikla kullanilmistir. Ancak ―Yapay Zeka Kis Uykusu‖ olarak da adlandirilan donemde basta donanimsal kisitlamalar ve diger problemler sebebiyle bu alandaki calismalar durma noktasina gelmistir. 2000’lerin basinda tekrar gozde bir alan olmaya baslayan yapay sinir aglari, GPU gelismeleriyle birlikte sig aglardan derin aglara gecis yapmistir. Bu yaklasim goruntu islemeden, dogal dil islemeye, medikal uygulamalardan aktivite tanimaya kadar oldukca genis bir yelpazede basariyla kullanilmaya baslanmistir. Bu calismada, derin ogrenmenin tarihcesi, kullanilan yontemler ve uygulama alanlarina gore ayrilmis calismalar anlatilmistir. Ayrica son yillarda kullanilan kutuphaneler ve derin ogrenme uzerine yogunlasan calisma gruplari hakkinda da bilgiler verilmistir. Bu calismanin amaci, hem arastirmacilara derin ogrenme konusundaki gelismeleri anlatmak, hem de derin ogrenme ile calisilacak muhtemel konulari vermektir.
    Sentimental Twitter software is parsing, analyzing and reporting Twitter data, giving service to individuals and corporate users via its user friendly graphical user interface. Each tweet is classified as positive, negative or neutral in... more
    Sentimental Twitter software is parsing, analyzing and reporting Twitter data, giving service to individuals and corporate users via its user friendly graphical user interface. Each tweet is classified as positive, negative or neutral in Sentimental Twitter. In this study, both lexicon and n-gram method has been used to perform and implement two different methods. As a result the lexicon method has been measured more performance than the n-gram method.
    It aimed that traditional text compression methods are using on compression of DNA sequence in this study. It has seen that the random short repeats are vital and it has examined their posivite impact for compression. A pipelined system... more
    It aimed that traditional text compression methods are using on compression of DNA sequence in this study. It has seen that the random short repeats are vital and it has examined their posivite impact for compression. A pipelined system with multiple algorithms running sequentially for compression. How the contribution of the algorithm to the system was investigated and especially the effect of the BWT on compression was shown. Results show that the pipeline system was found unable to catch the compression success of the Huffman coding alone.
    In this study, association estimators applied in the network inference methods used to determine disease-related molecular interactions using breast cancer, which is the most common type of cancer in women, proteomic data were examined... more
    In this study, association estimators applied in the network inference methods used to determine disease-related molecular interactions using breast cancer, which is the most common type of cancer in women, proteomic data were examined and hub genes in the gene-gene interaction network related to the disease were identified. Proteomic data of 901 breast cancer patients were generated using reverse phase protein array provided by The Cancer Proteome Atlas (TCPA) as a data set. Correlations and mutual information (MI) based estimators used in the literature were compared in the study, and WGCNA and minet R packages were used. As a result, it is seen that the MI based shrink estimator method has more successful results than the correlation-based adjacency function used in the estimation of biological networks in the WGCNA package. Achievement rates have ranged from 0.67 to 1.00 in the shrink estimation, with adjacency functions ranging from 0.33 to 0.86 for different module counts. In ...
    Bu calismada, derlem, derlem turleri, mevcut Turkce derlemler ve Turkce bir derlemin etiketlenmesi gibi konular uzerinde durulmustur. Ayrica, Turkce derlemlerden etkili bir sekilde faydalanmayi saglayacak sorgulamalara imkan taniyan ve... more
    Bu calismada, derlem, derlem turleri, mevcut Turkce derlemler ve Turkce bir derlemin etiketlenmesi gibi konular uzerinde durulmustur. Ayrica, Turkce derlemlerden etkili bir sekilde faydalanmayi saglayacak sorgulamalara imkan taniyan ve Turkce derlemlerdeki cumlelerin soz dizim agaclarini sozcuk turleriyle birlikte gorsellestiren bir arac gelistirilmistir.
    In open-source software development environments; textual, numerical, and relationship-based data generated are of interest to researchers. Various data sets are available for this data, which is frequently used in areas such as software... more
    In open-source software development environments; textual, numerical, and relationship-based data generated are of interest to researchers. Various data sets are available for this data, which is frequently used in areas such as software engineering and natural language processing. However, since these data sets contain all the data in the environment, the problem arises in the terabytes of data processing. For this reason, almost all of the studies using GitHub data use filtered data according to certain criteria. In this context, using a different data set in each study makes a comparison of the accuracy of the studies quite difficult. In order to solve this problem, a common dataset was created and shared with the researchers, which would allow to work on many software engineering problems.
    In this paper, we aimed to guide about latest development and studies about students' performance analysis and Learning Analytics in Massively Open Online Courses (MOOCs) for researchers related with the topics. For this purpose short... more
    In this paper, we aimed to guide about latest development and studies about students' performance analysis and Learning Analytics in Massively Open Online Courses (MOOCs) for researchers related with the topics. For this purpose short review for usage of performance prediction and Learning Analytics in MOOCs is investigated In our study, to help readers get familiar with our topic, firstly literature information about basic concepts are explained. Then to understand features' importance level and their relationships more detailed, information about some papers were provided. After that, findings about usage of student performance prediction and Learning Analytics in MOOCs are summarized.
    In this study, the temporal logics in the literature are researched. Some of these temporal logics are suitable for machine computation and some are suitable for natural language processing. An optimization is suggested on these... more
    In this study, the temporal logics in the literature are researched. Some of these temporal logics are suitable for machine computation and some are suitable for natural language processing. An optimization is suggested on these computable and natural laguage processing suitable temporal logics to cover Turkish temporal logic. Keywords: Event Ordering, Chronology Extraction, Temporal Logic, Computable Temporal Languages, Olay Sıralama, Kronoloji Çıkarımı, Zamansal Mantıklar, İşlenebilir Zamansal Diller, TimeML
    Extraction of semantic relations from various resources (Wikipedia, Web, corpus etc.) is an important issue in natural language processing. In this paper, automatic extraction of hyponym-hypernym pairs from Turkish corpus is aimed. For... more
    Extraction of semantic relations from various resources (Wikipedia, Web, corpus etc.) is an important issue in natural language processing. In this paper, automatic extraction of hyponym-hypernym pairs from Turkish corpus is aimed. For extraction of hyponym-hypernym pairs, pattern and semantic similarity based methods are used together. Patterns are extracted from initial hyponym-hypernym pairs and using patterns, hyponyms are extracted for various hypernyms. Incorrect candidate hyponyms are removed using document frequency and semantic similarity based elimination methods. After experiments for 14 hypernyms, average accuracy of 77% was obtained.
    Extraction of various semantic relation pairs from different sources (dictionary definitions, corpus etc.) with high accuracy is one of the most popular topics in natural language processing (NLP). In this study, a hybrid method is... more
    Extraction of various semantic relation pairs from different sources (dictionary definitions, corpus etc.) with high accuracy is one of the most popular topics in natural language processing (NLP). In this study, a hybrid method is proposed to extract Turkish part-whole pairs from corpus. Corpus statistics, WordNet similarities and Word2Vec word vector similarities are used together in this study. Firstly, initial part-whole seeds are prepared and by using these seeds part-whole patterns are extracted from corpus. For each pattern, a reliability score is calculated and reliable patterns are selected to produce new pairs from corpus. Various reliability scores are used for new pairs. To measure success of method, 19 target whole words are selected and average 83% (first 10 pairs), 74% (first 20 pairs), 68% (first 30 pairs) precisions are obtained, respectively.
    Text classification is one of the most important issues in natural language processing. In this study, texts belonging to different problems were classified using classical machine learning and deep learning methods. Additionally,... more
    Text classification is one of the most important issues in natural language processing. In this study, texts belonging to different problems were classified using classical machine learning and deep learning methods. Additionally, transformer-based classifiers using transfer learning were also used, and the effects of transfer learning on classification success were examined. As a result of the experiments, it was seen that higher performance was obtained from the transfer learning based Bert classifier compared to other methods. With the study, transfer learning effect in Turkish text classification was examined in detail.
    In this paper, we applied lexico-syntactic patterns to disclose meronymy relation from a huge Turkish raw text. Once, the system takes a huge raw corpus and extract matched cases for a given pattern, it proposes a list of whole-part pairs... more
    In this paper, we applied lexico-syntactic patterns to disclose meronymy relation from a huge Turkish raw text. Once, the system takes a huge raw corpus and extract matched cases for a given pattern, it proposes a list of whole-part pairs depending on their co-occur frequencies. For the purpose, we exploited and compared a list of pattern clusters. The clusters to be examined could fall into three types; general patterns, dictionary-based pattern, and bootstrapped pattern. We evaluated how these patterns improve the system performance especially within corpus-based approach and distributional feature of words. Finally, we discuss all the experiments with a comparison analysis and we showed advantage and disadvantage of the approaches with promising results.
    Data analysis becomes difficult with the increase of large amounts of data. More specifically, extracting meaningful insights from this vast amount of data and grouping them based on their shared features without human intervention... more
    Data analysis becomes difficult with the increase of large amounts of data. More specifically, extracting meaningful insights from this vast amount of data and grouping them based on their shared features without human intervention requires advanced methodologies. There are topic modeling methods to overcome this problem in text analysis for downstream tasks, such as sentiment analysis, spam detection, and news classification. In this research, we benchmark several classifiers, namely Random Forest, AdaBoost, Naive Bayes, and Logistic Regression, using the classical LDA and n-stage LDA topic modeling methods for feature extraction in headlines classification. We run our experiments on 3 and 5 classes publicly available Turkish and English datasets. We demonstrate that n-stage LDA as a feature extractor obtains state-of-the-art performance for any downstream classifier. It should also be noted that Random Forest was the most successful algorithm for both datasets.
    Feature selection is a process of selecting a subset of features that is highly distinguishable from the data set to obtain better or at least equivalent success rates. Artificial Bee Colony (ABC) Algorithm is a intelligence algorithm... more
    Feature selection is a process of selecting a subset of features that is highly distinguishable from the data set to obtain better or at least equivalent success rates. Artificial Bee Colony (ABC) Algorithm is a intelligence algorithm that model the behavior of honey bees in the nature of food seeking behavior and has been developed to produce a solution at continuous space. BitABC is a bitwise operator based binary ABC algorithm that can produce fast results in binary space. In this study, BitABC was improved to increase the local search capacity and adapted to the feature selection problem to measure the success of the proposed method. The results obtained using 10 data sets from UCI Machine Learning Repository indicate the success of the proposed method.
    With the rapid development of the Internet, thousands of different news reports from different channels are presented to us. So much news, particularly in the media sector, is an important question to be categorized and archived without... more
    With the rapid development of the Internet, thousands of different news reports from different channels are presented to us. So much news, particularly in the media sector, is an important question to be categorized and archived without human effort. In this study, it is aimed to be able to determine which news item belongs to large news headlines collected from news sites. For this, a two stage method is proposed, which is based on the classical Latent Dirichlet Allocation (LDA) algorithm used in the model. With the developed two stage LDA method, comparison of the conventional LDA was made. Then, by creating a file with an arff extension from the word weights of the topics, the success of the machine learning methods in Weka was measured.
    Metinlerde yaygin olarak kullanilan kisaltmalarin acik karsiliklarinin bulunmasi, bilginin elde edilmesi ve anlasilmasi acisindan onemli bir gerekliliktir. Metinde kullanilan kisaltmalari, herkesin bildigi dusunuluyorsa metinde bu... more
    Metinlerde yaygin olarak kullanilan kisaltmalarin acik karsiliklarinin bulunmasi, bilginin elde edilmesi ve anlasilmasi acisindan onemli bir gerekliliktir. Metinde kullanilan kisaltmalari, herkesin bildigi dusunuluyorsa metinde bu kisaltmalarin acik karsiliklarina yer verilmeyebilir. Bununla birlikte bazen kullanilan kisaltma birden fazla acik karsiliga sahip olabilir ve bu durum anlasilabilirligi zorlastirir. Kisaltmalardan dogru acilimin olusturulabilmesi halen uzerinde calisilan bir konu olarak farkli yontemlerle incelenmektedir. Incelenen literaturde Apriori algoritmasinin kisaltma aciliminin bulunmasina yonelik kullanimina rastlanmamasi nedeniyle, bu calismada PubMed ozetlerinde bulunan kisaltmalarin acik karsiliklarinin elde edilmesi icin Birliktelik Kurali temelinde bir yontem onerilmistir. Incelenen veri kumesi ve kisaltmalar dahilinde kisaltmanin birden fazla acik karsiligi olsa dahi uygulanan yontem, ortak bir minimum destek degeri ile %87,5, farkli minimum destek degerler...
    Users evaluate hotels by using online reviews depend on their various attributes. In this study, sentiment analysis of the reviews consisting of these attributes has been carried out by using 11 attributes, which determined beforehand.... more
    Users evaluate hotels by using online reviews depend on their various attributes. In this study, sentiment analysis of the reviews consisting of these attributes has been carried out by using 11 attributes, which determined beforehand. Thanks to the this analysis, user’s overall assessment has been determined and summarized from reviews regarding any aspect of the hotels. In order to determine words with similar meaning to represent the 11 attributes, Word2Vec method have been employed. Ad- ditionally; FastText method have been utilized in an effort to disregard possible spelling errors. Since the employed data set is not labeled, VADER, which is a dictionary based method, has been employed in order to perform sentiment analysis. The performance score has been calculated by evaluating the comments in three categories as positive, negative and neutral.
    In recent years, different biological data sets obtained by the next generation sequencing techniques have enhanced the analysis of the underlying molecular interactions of diseases. In our study we apply ARNetMiT, C3NET, WGCNA and ARACNE... more
    In recent years, different biological data sets obtained by the next generation sequencing techniques have enhanced the analysis of the underlying molecular interactions of diseases. In our study we apply ARNetMiT, C3NET, WGCNA and ARACNE algorithms on microRNA-target gene datasets to infer gene coexpression networks of breast, prostate, colon and pancreatic cancers. Gene coexpression networks are evaluated according to their topological and biological features. WGCNA based gene coexpression networks fits to scale free network topology more than other gene coexpression networks. In biological assessment there is no obvious difference found between gene coexpression networks which derived from different algorithms.
    Gene co-expression networks (GCN) present undirected relations between genes to understand molecular structures behind the diseases, including cancer. The utilization of various biological datasets and gene network inference (GNI)... more
    Gene co-expression networks (GCN) present undirected relations between genes to understand molecular structures behind the diseases, including cancer. The utilization of various biological datasets and gene network inference (GNI) algorithms can reveal meaningful gene-gene interactions of GCNs. This study applies three GNI algorithms on mRNA gene expression, RNA-Seq, and miRNA-target genes datasets to infer GCNs of breast and prostate cancers. To evaluate the performance of the GCNs, we utilize overlap analysis via literature data, topological assessment, and Gene Ontology-based biological assessment. The results emphasize how the selection of biological datasets and GNI algorithms affect the performance results on different evaluation criteria. GCNs on microarray gene expression data slightly outperform in overlap analysis. Also, GCNs on RNA-Seq and gene expression datasets follow scale-free topology. The biological assessment results are close to each other on all biological datas...
    Recently, cyber-attacks have increased worldwide, especially during the pandemic period. The number of connected devices in the world and the anonymous structure of the internet enable this security deficit for not only computer networks... more
    Recently, cyber-attacks have increased worldwide, especially during the pandemic period. The number of connected devices in the world and the anonymous structure of the internet enable this security deficit for not only computer networks but also single computing devices. With the connected use of computing device in anytime and anywhere conditions, lots of real-world activities are transferred to the digital world by adapting them to new lifestyles. Thus, the concept of cybersecurity has become more focused not only for security admins but also for academicians/researchers. Phishing attacks, which hackers mostly prefer to use in the last decade, have become even more harmful because its focuses on the weakest part of the security chain: computer user. Therefore, it is extremely important to prevent these cyber-attacks before they reach users. Based on this idea, we aimed to implement a phishing detection system by using a Convolutional Neural Network with n-gram features that are extracted from URLs. There are different n-gram feature extraction techniques, and in this work, it is aimed to determine which of them is more effective for our proposals. As a second goal, it is aimed to discover what parameters of the n-gram work best. In experiments, it is discovered that unigram has the highest accuracy rate. It was observed that, instead of all the characters that are obtained in unigram, the specified 70 characters (regardless of case sensitivity) give the highest accuracy rate of 88.90% with a High-Risk URL dataset. Experimental results also showed that a URL can be classified (either as legitimate or phishing) in about 0.008 seconds. These metrics can be accepted at a very good rate both in accuracy and run-time efficiency.
    Deep learning approaches are machine learning methods used in many application fields today. Some core mathematical operations performed in deep learning are suitable to be parallelized. Parallel processing increases the operating speed.... more
    Deep learning approaches are machine learning methods used in many application fields today. Some core mathematical operations performed in deep learning are suitable to be parallelized. Parallel processing increases the operating speed. Graphical Processing Units (GPU) are used frequently for parallel processing. Parallelization capacities of GPUs are higher than CPUs, because GPUs have far more cores than Central Processing Units (CPUs). In this study, benchmarking tests were performed between CPU and GPU. Tesla k80 GPU and Intel Xeon Gold 6126 CPU was used during tests. A system for classifying Web pages with Recurrent Neural Network (RNN) architecture was used to compare performance during testing. CPUs and GPUs running on the cloud were used in the tests because the amount of hardware needed for the tests was high. During the tests, some hyperparameters were adjusted and the performance values were compared between CPU and GPU. It has been observed that the GPU runs faster than the CPU in all tests performed. In some cases, GPU is 4-5 times faster than CPU, according to the tests performed on GPU server and CPU server. These values can be further increased by using a GPU server with more features.

    And 92 more