Computer Science

Permanent URI for this collectionhttps://hdl.handle.net/10464/6140

Browse

Recent Submissions

Now showing 1 - 20 of 25
  • ItemOpen Access
    Parameter selection for modeling of epidemic networks
    (IEEE, 2018-5) Dube, Michael; Houghten, Sheridan; Ashlock, Daniel
    The accurate modeling of epidemics on social contact networks is difficult due to the variation between different epidemics and the large number of parameters inherent to the problem. To reduce complexity, evolutionary computation is used to create a generative representation of the epidemic model. Previous gains from the use of local, verses global, operators are further explored to better balance exploration and exploitation of the genetic algorithm. A typical parameter study is conducted to test this new local operator and the new method of point packing is utilized as a proof of concept to perform a better search of the parameter space. All experiments from both approaches are tested against nine epidemic profiles. The point-packing driven parameter search demonstrates that the algorithm parameters interact substantially and in a non-linear fashion, and also shows that the good parameter settings are problem specific.
  • ItemOpen Access
    Edit metric decoding: Return of the side effect machines
    (IEEE, 2018-5) Houghten, Sheridan; Collins, Tyler K.; Hughes, James Alexander; Brown, Joseph Alexander
    Side Effect Machines (SEMs) are an extension of finite state machines which place a counter on each node that is incremented when that node is visited. Previous studies examined a genetic algorithm to discover node connections in SEMs for edit metric decoding for biological applications, namely to handle sequencing errors. Edit metric codes, while useful for decoding such biologically created errors, have a structure which significantly differentiates them from other codes based on Hamming distance. Further, the inclusion of biologically- motivated restrictions on allowed words makes development of decoders a bespoke process based on the exact code used. This study examines the use of evolutionary programming for the creation of such decoders, thus allowing for the number of states to be evolved directly, not witnessed in previous approaches which used genetic algorithms. Both direct and fuzzy decoding are used, obtaining correct decoding rates of up to 95% in some SEMs.
  • ItemOpen Access
    A deep learning pipeline to classify different stages of Alzheimer's disease from fMRI data
    (IEEE, 2018-5) Kazemi, Yosra; Houghten, Sheridan
    Alzheimer's disease (AD) is an irreversible, progressive neurological disorder that causes memory and thinking skill loss. Many different methods and algorithms have been applied to extract patterns from neuroimaging data in order to distinguish different stages of Alzheimer's disease (AD). However, the similarity of the brain patterns in older adults and in different stages makes the classification of different stages a challenge for researchers. In this paper, convolutional neuronal network architecture AlexNet was applied to fMRI datasets to classify different stages of the disease. We classified five different stages of Alzheimer's using a deep learning algorithm. The method successfully classified normal healthy control (NC), significant memory concern (SMC), early mild cognitive impair (EMCI), late cognitive mild impair (LMCI), and Alzheimer's disease (AD). The model was implemented using GPU high performance computing. Before applying any classification, the fMRI data were strictly preprocessed. Then, low to high level features were extracted and learned using the AlexNet model. Our experiments show significant improvement in classification. The average accuracy of the model was 97.63%. We then tested our model on test datasets to evaluate the accuracy of the model per class, obtaining an accuracy of 94.97% for AD, 95.64% for EMCI, 95.89% for LMCI, 98.34% for NC, and 94.55% for SMC.
  • ItemOpen Access
    Hierarchical clustering and tree stability
    (IEEE, 2018-5) Saunders, Amanda; Ashlock, Daniel; Houghten, Sheridan
    Hierarchical clustering via neighbor joining, widely used in biology, can be quite sensitive to the addition or deletion of single taxa. In an earlier study it was found that neighbor joining trees on random data were commonly quite unstable in the sense that large re-arrangements of the tree occurred when the tree was reconstructed after the deletion of a single data point. In this study, we use an evolutionary algorithm to evolve extremely stable and unstable data sets for a standard neighbor-joining algorithm and then check the stability using a novel type of clustering called bubble clustering. Bubble clustering is an instance of associator clustering. The stability measure used is based on the size of the subtree containing each pair of taxa, a quantity that provides an objective measure of a given trees hypothesis about the relatedness of taxa. It is shown experimentally that even in data sets evolved to be stable for a standard neighbor joining algorithm, bubble clustering is a significantly more stable algorithm.
  • ItemOpen Access
    Representation for Evolution of Epidemic Models
    (IEEE, 2019-6) Dube, Michael; Houghten, Sheridan; Ashlock, Daniel
    Creating a representation capable of generating personal contact networks that are most likely to exhibit specific epidemic behavior is difficult due to the inherit volatility of an epidemic and the numerous parameters accompanying the problem. To surpass these hurdles, evolutionary algorithms are used to create a generative solution which generates personal contact networks, modeling human populations, to satisfy the epidemic duration and epidemic profile matching problems. This representation is entitled the Local THADS-N representation. Two new operators are added to the original THADS-N system, and tested with a traditional parameter sweep and a parameter selection method known as point packing on nine epidemic profiles. Additionally, a new epidemic model is implemented in order to allow for lost immunity within a population thus increasing the length of an epidemic.
  • ItemOpen Access
    We Are Not Pontius Pilate: Acknowledging Ethics and Policy
    (IEEE, 2020-12-01) Hughes, James Alexander; Hannah, William; Kikkert, Peter; MacKenzie, Barry; Ashlock, Wendy; Houghten, Sheridan; Ashlock, Daniel; Stoodley, Matthew; Dube, Michael; Brown, Rachel; Saunders, Amanda
    A new AI system is being developed to optimize vaccination strategies based on the structure and shape of a community's social contact network. The technology is minimally constrained and not bound by preconceived notions or human biases. With this come novel outside the box strategies; however, the system is only capable of optimizing what it is instructed to optimize, and does not consider any ethical or political concerns. With the growing concern for systematic discrimination as a result of artificial intelligence, we acknowledge a number of relevant issues that may arise as a consequence of our new technology and categorize them into three classes. We also introduce four normative ethical approaches that are used as a framework for decision-making. Despite the focus on vaccination strategies, our goal is to improve the discussions surrounding public concern and trust over artificial intelligence and demonstrate that artificial intelligence practitioners are addressing these concerns.
  • ItemOpen Access
    Pandemic: A Graph Evolution Story
    (IEEE, 2019-07) Dube, Michael; Houghten, Sheridan; Ashlock, Daniel
    The Graph Evolution Tool (GET)was built to generate personal contact networks representing who can infect whom within a community. The tool is expanded in order to permit an infection scheme which divides the community into different districts, thus permitting within-district and between-district infections. The evolutionary algorithm comprising GET is expanded upon to simulate communities which include 512 individuals in up to eight districts, initially infecting one person in one district and spreading through a community. The overall goal is to generate communities that will maximize the length of an epidemic. The problem associated with adequately exploring the numerous parameters accompanying evolutionary algorithms is addressed using a point packing and insight from previous work. The Susceptible-Infected-Removed (SIR)model of infection was chosen as it provides a sufficient balance of simplicity and complexity for the problem.
  • ItemOpen Access
    Using Genetic Programming to Investigate a Novel Model of Resting Energy Expenditure for Bariatric Surgery Patients
    (IEEE, 2020-10-27) Hughes, James Alexander; Reid, Ryan E. R.; Houghten, Sheridan; Andersen, Ross E.
    Traditionally, models developed to estimate resting energy expenditure (REE) in the bariatric population have been limited to linear modelling based on data from `normal' or `overweight' individuals - not `obese'. This type of modelling can be restrictive and yield functions which poorly estimate this important physiological outcome.Linear and nonlinear models of REE for individuals after bariatric surgery are developed with linear regression and symbolic regression via genetic programming. Features not traditionally used in REE modelling were also incorporated and analyzed and genetic programming's intrinsic feature selection was used as a measure of feature importance.A collection of effective new linear and nonlinear models were generated. The linear models generated outperformed the nonlinear on testing data, although the nonlinear models fit the training data better. Ultimately, the newly developed linear models showed an improvement over existing models and the feature importance analysis suggested that the typically used features (age, weight, and height) were the most important.
  • ItemOpen Access
    Deep Learning for the Prediction of Stock Market Trends
    (IEEE, 2019-12) Fazeli, Arvand; Houghten, Sheridan
    In this study, deep learning will be used to test the predictability of stock trends. Stock markets are known to be volatile, prices fluctuate, and there are many complicated financial indicators involved. Various data including news or financial indicators can be used to predict stock prices. In this study, the focus will be on using past stock prices and using technical indicators to increase the performance of the results. The goal of this study is to measure the accuracy of predictions and evaluate the results. Historical data is gathered for Apple, Microsoft, Google and Intel stocks. A prediction model is created by using past data and technical indicators were used as features in the model. The experiments were performed by using long short-term memory networks. Different approaches and techniques were tested to boost the performance of the results. To prove the usability of the final model in the real world and measure the profitability of results backtesting was performed. The final results show that while it is not possible to predict the exact price of a stock in the future to gain profitable results, deep learning can be used to predict the trend of stock markets to generate buy and sell signals.
  • ItemOpen Access
    Compression of Biological Networks using a Genetic Algorithm with Localized Merge
    (IEEE, 2019-07) Houghten, Sheridan; Romualdo, Angelo; Collins, Tyler K.; Brown, Joseph Alexander
    Network graphs appear in a number of important biological data problems, recording information relating to protein-protein interactions, gene regulation, transcription regulation and much more. These graphs are of such a significant size that they are impossible for a human to understand. Furthermore, the ever-expanding quantity of such information means that there are storage issues. To help address these issues, it is common for applications to compress nodes to form supernodes of similarly connected components. In previous graph compression studies it was noted that such supernodes often contain points from disparate parts of the graph. This study aims to correct this flaw by only allowing merges to occur within a local neighbourhood rather than across the entire graph. This restriction was found to not only produce more meaningful compressions, but also to reduce the overall distortion created by the compression for two out of three biological networks studied.
  • ItemOpen Access
    Descriptive Symbolic Models of Gaits from Parkinson's Disease Patients
    (IEEE, 2019-07) Hughes, James Alexander; Houghten, Sheridan; Brown, Joseph Alexander
    Parkinson's disease (PD) is a degenerative disorder of the central nervous system that has many debilitating symptoms which affect the patient's motor system and can cause significant changes in their gait. By using genetic programming, we aim to develop descriptive symbolic nonlinear models of PD patient gait from time series data recorded from pressure sensors under subjects' feet. When compared to popular types of linear regression (OLS and LASSO), the nonlinear models fit their data better and generalize to unseen data significantly better. It was found that models developed for healthy control subjects generalized to other control subjects well, however the models trained on subjects with PD did not generalize well to other PD patients, which complicates the issue of being able to detect the progression of the disease. It is suspected that health care professionals can have difficulty classifying PD due to a lack of accurate data from patient reports; having individually trained models for active monitoring of patients would help in effectively diagnosing PD.
  • ItemOpen Access
    Cryptanalysis of RSA: Integer Prime Factorization Using Genetic Algorithms
    (IEEE, 2020-07) Rutkowski, Emilia; Houghten, Sheridan
    In recent years, researchers have been exploring alternative methods to solving Integer Prime Factorization, the decomposition of an integer into its prime factors. This has direct application to cryptanalysis of RSA, as one means of breaking such a cryptosystem requires factorization of a large number that is the product of two prime numbers. This paper applies three different genetic algorithms to solve this issue, utilizing mathematical knowledge concerning distribution of primes to improve the algorithms. The best of the three genetic algorithms has a chromosome that represents m in the equation prime = 6 m ± 1, and is able to factor a number of up to 22 decimal digits. This is a significantly larger number than the largest factored by comparable methods in earlier work. This leads to the conclusion that approaches such as genetic algorithms are a promising avenue of research into the problem of integer factorization.
  • ItemOpen Access
    Evolutionary Graph Compression and Diffusion Methods for City Discovery in Role Playing Games
    (IEEE, 2020-07) Brown, Joseph Alexander; Ashlock, Daniel; Houghten, Sheridan; Romualdo, Angelo
    Cities, while exciting in their visualization and permitting several layouts, do not take into account the placement of crucial characters which might be part of the narrative. Narrative graphs, a connected graph of all potential and existing relations within a game, can enable an ability to find a Nonplayer Character (NPC) who is likely to live nearby, under the assumption that those who interact most frequently are also close in distance. We examine the use of an evolutionary graph compression method and a method using simulated diffusion to cluster features based on relational information about players to generate relationally intimate groups. This clustering can be used to generate information about the game world and cities to inform PCG as to how the connectivity of these areas is, and should be, arranged. The algorithms are validated as being human competitive.
  • ItemOpen Access
    Gait Model Analysis of Parkinson’s Disease Patients under Cognitive Load
    (IEEE, 2020-07) Hughes, James Alexander; Houghten, Sheridan; Brown, Joseph Alexander
    Parkinson's disease is a neurodegenerative disease that affects close to 10 million with various symptoms including tremors and changes in gait. Observing differences or changes in an individual's manifestations of gait may provide a mechanism to identify Parkinson's disease and understand specific changes. In this study, timeseries data from both Control subjects and Parkinson's disease patients was modelled with symbolic regression and extreme gradient boosting. Model effectiveness was analyzed along with the differences in the models between modelling strategies, between Control subjects and Parkinson's disease patients, and between normal walking and walking while under a cognitive load. Both modelling strategies were found to effective. The symbolic regression models were more easily interpreted, while extreme gradient boosting had higher overall accuracy. Interpretation of the models identified certain characteristics that distinguished Control subjects from Parkinson's disease patients and normal walking conditions from walking while under a cognitive load.
  • ItemOpen Access
    Extracting Information from Weighted Contact Networks via Genetic Algorithms
    (IEEE, 2020-10-27) Rutkowski, Emilia; Houghten, Sheridan; Brown, Joseph Alexander
    Epidemic contact tracing examines the movement of infection through a population based upon links in a contact network, and weighted networks represent the potential of transfer of the contagion. Graph compression reduces the size of a network by merging groups of nodes into supernodes. This study considers the use of genetic algorithms to select the nodes to be merged, grouping together highly connected sections of the graphs. Examined is a dataset that is extracted from contacts that occurred during several days of the "Infectious: Stay Away" event. The incorporation of weights, to indicate the strength of interactions between individuals, is an important contribution of this work. The demonstrated outcomes are that by including weighted information on the edges, there is more effective detection of highly interacting subgroups when compared to the unweighted version of graphs. These methods not only compress the networks with a low rate of distortion, but also the identification of supernodes in the networks allows for better targeting of interventions by public health upon individuals in such groups. This is crucial because when one member becomes infected, all members of the group are exposed to the contagion.
  • ItemOpen Access
    Evolving the Curve
    (IEEE, 2020-10-27) Dube, Michael; Houghten, Sheridan; Ashlock, Daniel; Hughes, James Alexander
    Evolutionary algorithms are used to generate personal contact networks, modelling human populations, that are most likely to match a given epidemic profile. The Susceptible-Infected-Removed (SIR) model is used and also expanded upon to allow for an extended period of infection, termed the SIIR model. The networks generated for each of these models are thoroughly evaluated for their ability to match nine different epidemic profiles. The addition of the SIIR model showed that the model of infection has an impact on the networks generated. For the SIR and SIIR models, these differences were relatively minor in most cases.
  • ItemOpen Access
    Effective Side Effect Machines for Decoding
    (IEEE, 2020-10-27) Banik, Sharnendu; Houghten, Sheridan
    The development of general edit metric decoders is a challenging problem, especially with the inclusion of additional biological restrictions that can occur when using error correcting codes in biological applications. Side effect machines (SEMs), an extension of finite state machines, can provide efficient decoding algorithms for such edit metric codes.Several codes of varying lengths are used to study the effectiveness of evolutionary programming (EP) as a general approach for finding SEMs for edit metric decoding. Direct and fuzzy classification methods are compared while also changing some of the EP settings to observe how decoding accuracy is affected. Regardless of code length, the best results are found using the fuzzy classification methods. For codes of length 10, a maximum accuracy of up to 99.4% is achieved for distance 1 whereas distance 2 and 3 achieve up to 97.1% and 85.9%, respectively. The accuracy suffers for longer codes, as the maximum accuracies achieved by codes of length 14 were 92.4%, 85.7% and 69.2% for distance 1, 2, and 3 respectively. Additionally, the SEMs are examined for potential bloat by comparing the number of reachable states against the total number of states. Bloat is seen more in larger machines than it is in smaller machines. Furthermore, the results are analyzed to find potential trends and relationships among the parameters, with the most consistent trend being that, when allowed, the longer codes generally show a propensity for larger machines.
  • ItemOpen Access
    Modelling of Vaccination Strategies for Epidemics using Evolutionary Computation
    (IEEE, 2020-07) Dube, Michael; Houghten, Sheridan; Ashlock, Daniel
    Personal contact networks that represent social interactions can be used to identify who can infect whom during the spread of an epidemic. The structure of a personal contact network has great impact upon both epidemic duration and the total number of infected individuals. A vaccine, with varying degrees of success, can reduce both the length and spread of an epidemic, but in the case of a limited supply of vaccine a vaccination strategy must be chosen, and this has a significant effect on epidemic behaviour.In this study we consider four different vaccination strategies and compare their effects upon epidemic duration and spread. These are random vaccination, high degree vaccination, ring vaccination, and the base case of no vaccination. All vaccinations are applied as the epidemic progresses, as opposed to in advance. The strategies are initially applied to static personal contact networks that are known ahead of time. They are then applied to personal contact networks that are evolved as the vaccination strategy is applied. When any form of vaccination is applied, all strategies reduce both duration and spread of the epidemic. When applied to a static network, random vaccination performs poorly in terms of reducing epidemic duration in comparison to strategies that take into account connectivity of the network. However, it performs surprisingly well when applied on the evolved networks, possibly because the evolutionary algorithm is unable to take advantage of a fixed strategy.
  • ItemOpen Access
    Genetic programming for improved cryptanalysis of elliptic curve cryptosystems
    (IEEE, 2017-06) Ribaric, Tim; Houghten, Sheridan
    Public-key cryptography is a fundamental compo- nent of modern electronic communication that can be constructed with many different mathematical processes. Presently, cryp- tosystems based on elliptic curves are becoming popular due to strong cryptographic strength per small key size. At the heart of these schemes is the intractability of the elliptic curve discrete logarithm problem (ECDLP). Pollard’s Rho algorithm is a well known method for solving the ECDLP and thereby breaking ciphers based on elliptic curves. It has the same time complexity as other known methods but is advantageous due to smaller memory requirements. This paper considers how to speed up the Rho process by modifying a key component: the iterating function, which is the part of the algorithm responsible for determining what point is considered next when looking for a collision. It is replaced with an alternative that is found through an evolutionary process. This alternative consistently and significantly decreases the number of iterations required by Pollard’s Rho Algorithm to successfully find a solution to the ECDLP.
  • ItemOpen Access
    Lossy Compression of Quality Values in Sequencing Data
    (Institute of Electrical and Electronics Engineers (IEEE), 2021-09-01) Morales, Veronica Suaste; Houghten, Sheridan
    The dropping cost of sequencing human DNA has allowed for fast development of several projects around the world generating huge amounts of DNA sequencing data. This deluge of data has run up against limited storage space, a problem that researchers are trying to solve through compression techniques. In this study we address the compression of SAM files, the standard output files for DNA alignment. We specifically study lossy compression techniques used for quality values reported in the SAM file and analyze the impact of such lossy techniques on the CRAM format. We present a series of experiments using a data set corresponding to individual NA12878 with three different fold coverages. We introduce a new lossy model, dynamic binning, and compare its performance to other lossy techniques, namely Illumina binning, LEON and QVZ. We analyze the compression ratio when using CRAM and also study the impact of the lossy techniques on SNP calling. Our results show that lossy techniques allow a better CRAM compression ratio. Furthermore, we show that SNP calling performance is not negatively affected and may even be boosted.