The document discusses global modeling versus local modeling approaches for regression and time series prediction problems. Global modeling fits a single analytical function to all input data, while local modeling performs separate fits to subsets of nearby data points. The document outlines the local modeling approach using lazy learning, which stores all training data and performs local fits when making predictions for new query points. It then applies lazy learning techniques to problems in regression, time series prediction, and feature selection.
Data Compression - Text Compression - Run Length EncodingMANISH T I
Run-length encoding (RLE) replaces consecutive repeated characters in data with a single character and count. For example, "aaabbc" would compress to "3a2bc". RLE works best on data with many repetitive characters like spaces. It has limitations for natural language text which contains few repetitions longer than doubles. Variants include digram encoding which compresses common letter pairs, and differencing which encodes differences between successive values like temperatures instead of absolute values.
The document discusses ant colony optimization (ACO) algorithms. It introduces ACO as a probabilistic metaheuristic technique inspired by the behavior of ants seeking paths between their colony and food sources. It outlines the ACO metaheuristic and describes key ACO algorithms like Ant System, Ant Colony System, and MAX-MIN Ant System. The document also covers applications of ACO, advantages like inherent parallelism and efficient solutions to problems like the traveling salesman problem, and disadvantages like difficulty analyzing ACO theoretically.
This document discusses various applications of parallel processing. It describes how parallel processing is used in numeric weather prediction to forecast weather by processing large amounts of observational data. It is also used in oceanography and astrophysics to study oceans and conduct particle simulations. Other applications mentioned include socioeconomic modeling, finite element analysis, artificial intelligence, seismic exploration, genetic engineering, weapon research, medical imaging, remote sensing, energy exploration, and more. The document also discusses loosely coupled and tightly coupled multiprocessors and the differences between the two approaches.
Basic architecture and organization of computers, Von Neumann Model, Registers and storage, Register Transfer Language, Bus and Memory Transfer, Common Bus System, Machine instructions, functional units and execution of a program; instruction cycles, Instruction set architectures, instruction formats
This document provides an overview of quantum computing and common algorithms used on different types of quantum computers. It discusses how quantum computers work using qubits or qumodes and the existing gate-based and quantum annealing-based architectures. Some examples of algorithms that could run on these quantum computers are presented, including for supervised and unsupervised machine learning tasks as well as graph and network analysis problems. Researchers can access existing quantum computers through the cloud or simulate circuits classically.
This document provides information about submarine cable networks in India. It discusses why satellite communication is not used more widely, provides a brief history of submarine cables, and describes some of the key features of modern submarine cables. It also lists some of the major submarine cable landing stations in India, shows a map of India's submarine cable network, and discusses maintenance and repair of damaged cables.
This document provides an overview of the Domain Name System (DNS) including its hierarchical structure, services, and protocol. DNS acts as a distributed database that maps hostnames to IP addresses and vice versa. It is implemented through a global hierarchy of root servers, top-level domain servers like .com and .edu, and authoritative name servers for each organization. DNS uses a query/response system to look up mappings in a recursive or iterative process. Caching improves efficiency.
Data Communications,Data Networks,computer communications,multiplexing,spread spectrum,protocol architecture,data link protocols,signal encoding techniques,transmission media,asynchronous transfer mode
This document provides an overview of data communications and computer networks. It discusses the fundamental problem of communication and reproducing messages at different points. It also describes trends in traffic growth and new services, and reviews the components of a basic communications model. Additionally, it examines different types of networks including local area networks, wide area networks, and the Internet—covering technologies like circuit switching, packet switching, frame relay, and asynchronous transfer mode. The chapter introduces concepts like transmission media, networking, and elements that make up modern network architectures.
This document discusses processes, threads, interprocess communication, and scheduling in operating systems. It begins by defining processes and threads, explaining process creation and termination, and comparing user-space and kernel-based thread implementations. Interprocess communication methods like semaphores, monitors, and message passing are then introduced. The final section covers CPU scheduling algorithms and goals like throughput, turnaround time, and response time optimization.
This document discusses multiplexing techniques for sharing bandwidth between multiple users. It describes how multiplexing allows simultaneous transmission of multiple signals across a single data link. The key multiplexing techniques covered are frequency-division multiplexing (FDM), wavelength-division multiplexing (WDM), time-division multiplexing (TDM), and statistical time-division multiplexing. Examples are provided to illustrate concepts like FDM configuration, guard bands, bandwidth calculation, data rate matching through multilevel, multislot and pulse stuffing techniques, and frame synchronization.
The document provides an introduction to computer networks and the Internet. It discusses key concepts like what the Internet is, common network devices like routers and switches, different network topologies, and protocols. It also describes the layered structure of network protocols including physical, link, network, transport and application layers. Packet switching is introduced as the dominant transmission method used by the Internet, compared to circuit switching. The document outlines sources of delay in packet switched networks and how congestion can lead to packet queuing and loss.
Knowledge of Floating to Fixed point conversion of DSP codes is must for every aspiring DSP Er. This is a quick course to know that how to do the fixed point conversion of DSP codes. After reading this pdf, you can write never failing fixed point DSP codes e.g. FIR / IIR digital filter and also audio compression codes like mp3 codec...
Differential pulse-code modulation (DPCM) encodes signals by taking the difference between the current sample and a prediction of the next sample based on previous samples. This difference signal has a smaller range than the original signal and can be more efficiently quantized and encoded. DPCM uses a feedback loop where the difference is quantized, sent to the receiver, and added to the previous reconstructed sample to estimate the current sample. Adaptive delta modulation is a variant of DPCM where the quantization step size varies depending on the number of consecutive bits in the same direction to reduce errors. DPCM can reconstruct signals sampled above the Nyquist rate but may suffer from error drift or error propagation issues over multiple samples.
Circuit-switched networks establish a dedicated connection between devices before communication can occur. This involves a connection setup phase, data transfer phase, and connection teardown phase. While circuit switching guarantees bandwidth for the connection, it is inefficient because resources are allocated for the entire duration even if no data is being sent. It also has longer delays than packet-switched networks due to the setup and teardown phases. Packet-switched networks break messages into packets that are transmitted independently and may take different routes to the destination, providing more efficient use of network resources but less reliability than circuit switching.
This document discusses different approaches for learning with complete data, including:
1) Parameter learning aims to find numerical parameters for a fixed probability model given complete data for all variables.
2) Maximum likelihood parameter learning derives parameter expressions as log terms and finds values by equating logs to 0.
3) Naive Bayes models assume attributes are conditionally independent, and truth is not representable as a decision tree.
4) Continuous models represent real-world applications using linear Gaussian models that minimize sum of squared errors via standard linear recursion.
From last four decades of research it is well-established that all electrophysiological signals are nonlinear, irregular and aperiodic. Since those signals are used in everyday clinical practice as diagnostic tools (EMG, ECG, EEG), a huge progress in using it in making diagnostic more precise and
Here is the presentation for Network Layer Numericals from the book Andrew S. Tanenbaum (Computer Networks) and B A Forouzan ( Data Communication and Networking)
Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...AutonomyIncubator
This document discusses tractable robust planning and model learning under uncertainty. It begins by outlining opportunities in autonomous systems due to new data availability and maturing vehicle control. Example applications discussed include driving with uncertainty, UAV traffic pattern integration, and multi-agent mission planning. Challenges in planning under uncertainty and model inaccuracies are described. Approaches to address these challenges through synergistic planning and learning algorithms as well as Bayesian nonparametric models are summarized.
Computational optimization, modelling and simulation: Recent advances and ove...Xin-She Yang
This document summarizes recent advances in computational optimization, modeling, and simulation. It discusses how optimization is important for engineering design and industrial applications to maximize profits and minimize costs. Metaheuristic algorithms and surrogate-based optimization techniques are becoming widely used for complex optimization problems. The workshop accepted papers that applied optimization, modeling, and simulation to diverse areas like production planning, mixed-integer programming, electromagnetics, and reliability analysis. Overall computational optimization and modeling have broad applications and continued research is needed in areas like metaheuristic convergence and surrogate modeling methods.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
This document describes ProtOLAP, a methodology for rapid OLAP prototyping when source data is not initially available. ProtOLAP enables designers to create a conceptual schema based on user requirements, which is then automatically translated into a logical schema and prototype. Users manually input sample data and validate the prototype using simple pivot tables. If validated, source data is collected, ETL is designed, and the prototype is finalized. ProtOLAP allows for agile design approaches even when source data availability is delayed, by facilitating early user validation through sample data exploration.
QU Speaker Series - Session 3
https://qusummerschool.splashthat.com
A conversation with Quants, Thinkers and Innovators all challenged to innovate in turbulent times!
Join QuantUniversity for a complimentary summer speaker series where you will hear from Quants, innovators, startups and Fintech experts on various topics in Quant Investing, Machine Learning, Optimization, Fintech, AI etc.
Topic: Machine Learning and Model Risk (With a focus on Neural Network Models)
All models are wrong and when they are wrong they create financial or non-financial risks. Understanding, testing and managing model failures are the key focus of model risk management particularly model validation.
For machine learning models, particular attention is made on how to manage model fairness, explainability, robustness and change control. In this presentation, I will focus the discussion on machine learning explainability and robustness. Explainability is critical to evaluate conceptual soundness of models particularly for the applications in highly regulated institutions such as banks. There are many explainability tools available and my focus in this talk is how to develop fundamentally interpretable models.
Neural networks (including Deep Learning), with proper architectural choice, can be made to be highly interpretable models. Since models in production will be subjected to dynamically changing environments, testing and choosing robust models against changes are critical, an aspect that has been neglected in AutoML.
This paper advances the Domain Segmentation based on Uncertainty in the Surrogate (DSUS) framework which is a novel approach to characterize the uncertainty in surrogates. The leave-one-out cross-validation technique is adopted in the DSUS framework to measure local errors of a surrogate. A method is proposed in this paper to evaluate the performance of the leave-out-out cross-validation errors as local error measures. This method evaluates local errors by comparing: (i) the leave-one-out cross-validation error with (ii) the actual local error estimated within a local hypercube for each training point. The comparison results show that the leave-one-out cross-validation strategy can capture the local errors of a surrogate. The DSUS framework is then applied to key aspects of wind resource as- sessment and wind farm cost modeling. The uncertainties in the wind farm cost and the wind power potential are successfully characterized, which provides designers/users more confidence when using these models
MACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORINGVisionGEOMATIQUE2014
The document discusses using machine learning techniques for satellite-guided water quality monitoring. It covers using machine learning algorithms to automatically develop empirical models from multimodal satellite and field data sets. Machine learning can help construct nonlinear mappings between satellite measurements and water quality products and optimize in-situ data collection through mission planning. Experimental results are shown applying these techniques to map water quality metrics like chlorophyll-a and total suspended solids using MODIS satellite images of Lake Winnipeg.
This tutorial provides an overview of recent advances in deep generative models. It will cover three types of generative models: Markov models, latent variable models, and implicit models. The tutorial aims to give attendees a full understanding of the latest developments in generative modeling and how these models can be applied to high-dimensional data. Several challenges and open questions in the field will also be discussed. The tutorial is intended for the 2017 conference of the International Society for Bayesian Analysis.
This document discusses predictive analytics for transportation in a high dimensional heterogeneous data world. It covers several topics:
1) The transportation world is generating large amounts of high dimensional data from sources like cameras, GPS, cell phones, and probe vehicles that needs to be combined and analyzed.
2) Connected and automated vehicles will generate huge amounts of detailed data on vehicle movements, passenger activities and intentions that can be used to infer travel patterns and predict crashes.
3) Applying machine learning, advanced computation and domain knowledge is necessary to make sense of this non-standardized, high volume data.
Dexuan Zhang is a data scientist with a M.S. in Computational Statistics and Machine Learning from University College London. He has experience implementing deep learning algorithms like CNNs, RNNs, and LSTMs to tackle NLP problems at BeiJing BAIFENDIAN Inc. His research includes using machine learning techniques like Gaussian processes and neural networks for causal inference and sentiment analysis of movie reviews. He has also led teams in Kaggle competitions on driver behavior analysis, click-through rate prediction, and more.
In spite of the recent developments in surrogate modeling techniques, the low fidelity of these models often limits their use in practical engineering design optimization. When surrogate models are used to represent the behavior of a complex system, it is challenging to simultaneously obtain high accuracy over the entire design space. When such surrogates are used for optimization, it becomes challenging to find the optimum/optima with certainty. Sequential sampling methods offer a powerful solution to this challenge by providing the surrogate with reasonable accuracy where and when needed. When surrogate-based design optimization (SBDO) is performed using sequential sampling, the typical SBDO process is repeated multiple times, where each time the surrogate is improved by addition of new sample points. This paper presents a new adaptive approach to add infill points during SBDO, called Adaptive Sequential Sampling (ASS). In this approach, both local exploitation and global exploration aspects are considered for updating the surrogate during optimization, where multiple iterations of the SBDO process is performed to increase the quality of the optimal solution. This approach adaptively improves the accuracy of the surrogate in the region of the current global optimum as well as in the regions of higher relative errors. Based on the initial sample points and the fitted surrogate, the ASS method adds infill points at each iteration in the locations of: (i) the current optimum found based on the
fitted surrogate; and (ii) the points generated using cross-over between sample points that
have relatively higher cross-validation errors. The Nelder and Mead Simplex method is adopted as the optimization algorithm. The effectiveness of the proposed method is illustrated using a series of standard numerical test problems.
The document discusses prescriptive analytics techniques such as optimization and simulation modeling. It provides examples of how organizations like a school district and ExxonMobil have used optimization models in Excel and GAMS respectively to make strategic decisions by evaluating multiple options and variables. Decision modeling approaches like mathematical programming, spreadsheets, and decision trees are described for representing decision situations involving alternatives under certainty, risk, and uncertainty.
HPC Deployment / Use Cases (EVEREST + DAPHNE: Workshop on Design and Programm...University of Maribor
Extended slides from the talk provided at:
High Performance Embedded Architectures and Compilers (HiPEAC) 2023
https://www.hipeac.net/2023/toulouse/
EVEREST + DAPHNE: Workshop on Design and Programming High-performance, distributed, reconfigurable and heterogeneous platforms for extreme-scale analytics
https://www.hipeac.net/2023/toulouse/#/program/sessions/8037/
Wednesday, January 18th 2023, 10:00 - 17:30
Argos (Level 1), Pierre Baudis Convention Centre, Toulouse, France
[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...thanhdowork
GSTNet is a deep learning model for traffic flow prediction that incorporates spatial and temporal information. It contains multi-resolution temporal and global correlated spatial modules. The temporal module captures short and long-term patterns, while the spatial module considers both local and non-local correlations between locations. In experiments on Beijing transportation data, GSTNet achieved more accurate predictions compared to other methods and was able to capture both short and long-term dependencies in traffic flow.
Post Graduate Admission Prediction SystemIRJET Journal
This document presents a post graduate admission prediction system built using machine learning algorithms. The system analyzes factors like GRE scores, TOEFL scores, undergraduate GPA, research experience etc. to predict the universities a student is likely to get admission in. Various machine learning models like multiple linear regression, random forest regression, support vector machine and logistic regression are implemented and evaluated on an admission prediction dataset. Logistic regression achieved the highest accuracy of 97%. A web application called PostPred is developed using the logistic regression model to help students predict suitable universities to apply to based on their profile.
May 2015 talk to SW Data Meetup by Professor Hendrik Blockeel from KU Leuven & Leiden University.
With increasing amounts of ever more complex forms of digital data becoming available, the methods for analyzing these data have also become more diverse and sophisticated. With this comes an increased risk of incorrect use of these methods, and a greater burden on the user to be knowledgeable about their assumptions. In addition, the user needs to know about a wide variety of methods to be able to apply the most suitable one to a particular problem. This combination of broad and deep knowledge is not sustainable.
The idea behind declarative data analysis is that the burden of choosing the right statistical methodology for answering a research question should no longer lie with the user, but with the system. The user should be able to simply describe the problem, formulate a question, and let the system take it from there. To achieve this, we need to find answers to questions such as: what languages are suitable for formulating these questions, and what execution mechanisms can we develop for them? In this talk, I will discuss recent and ongoing research in this direction. The talk will touch upon query languages for data mining and for statistical inference, declarative modeling for data mining, meta-learning, and constraint-based data mining. What connects these research threads is that they all strive to put intelligence about data analysis into the system, instead of assuming it resides in the user.
Hendrik Blockeel is a professor of computer science at KU Leuven, Belgium, and part-time associate professor at Leiden University, The Netherlands. His research interests lie mostly in machine learning and data mining. He has made a variety of research contributions in these fields, including work on decision tree learning, inductive logic programming, predictive clustering, probabilistic-logical models, inductive databases, constraint-based data mining, and declarative data analysis. He is an action editor for Machine Learning and serves on the editorial board of several other journals. He has chaired or organized multiple conferences, workshops, and summer schools, including ILP, ECMLPKDD, IDA and ACAI, and he has been vice-chair, area chair, or senior PC member for ECAI, IJCAI, ICML, KDD, ICDM. He was a member of the board of the European Coordinating Committee for Artificial Intelligence from 2004 to 2010, and currently serves as publications chair for the ECMLPKDD steering committee.
In spite of the recent developments in surrogate modeling techniques, the low fidelity of these models often limits their use in practical engineering design optimization. When surrogate models are used to represent the behavior of a complex system, it is challenging to simultaneously obtain high accuracy over the entire design space. When such surrogates are used for optimization, it becomes challening to find the optimum/optima with certainty. Sequential sampling methods offer a powerful solution to this challenge by providing the surrogate with reasonable accuracy where and when needed. When surrogate-based design optimization (SBDO) is performed using sequential sampling, the typical SBDO process is repeated multiple times, where each time the surrogate is improved by addition of new sam- ple points. This paper presents a new adaptive approach to add infill points during SBDO, called Adaptive Sequential Sampling (ASS). In this approach, both local exploitation and global exploration aspects are considered for updating the surrogate during optimization, where multiple iterations of the SBDO process is performed to increase the quality of the optimal solution. This approach adaptively improves the accuracy of the surrogate in the region of the current global optimum as well as in the regions of higher relative errors. Based on the initial sample points and the fitted surrogate, the ASS method adds infill points at each iteration in the locations of: (i) the current optimum found based on the fitted surrogate; and (ii) the points generated using cross-over between sample points that have relatively higher cross-validation errors. The Nelder and Mead Simplex method is adopted as the optimization algorithm. The effectiveness of the proposed method is illus- trated using a series of standard numerical test problems.
This paper advances the Domain Segmentation based on Uncertainty in the Surrogate (DSUS) framework which is a novel approach to characterize the uncertainty in surrogates. The leave-one-out cross-validation technique is adopted in the DSUS framework to measure local errors of a surrogate. A method is proposed in this paper to evaluate the performance of the leave-out-out cross-validation errors as local error measures. This method evaluates local errors by comparing: (i) the leave-one-out cross-validation error with (ii) the actual local error estimated within a local hypercube for each training point. The comparison results show that the leave-one-out cross-validation strategy can capture the local errors of a surrogate. The DSUS framework is then applied to key aspects of wind resource as- sessment and wind farm cost modeling. The uncertainties in the wind farm cost and the wind power potential are successfully characterized, which provides designers/users more confidence when using these models.
Rides Request Demand Forecast- OLA BikeIRJET Journal
The document presents a study that develops a model to forecast demand for Ola bike rides in Bangalore, India using ride request data from Ola. The study uses clustering and machine learning techniques like XGBoost to predict demand for rides by time period and location. This will help Ola better understand demand patterns and maximize the efficiency of their bike fleet to meet rider needs. The model is trained on attributes from ride requests including booking time, pickup and drop off locations.
Similar to Local modeling in regression and time series prediction (20)
A statistical criterion for reducing indeterminacy in linear causal modelingGianluca Bontempi
This document proposes a new statistical criterion called C to help distinguish between causal patterns in completely connected triplets when inferring causal relationships from observational data. The criterion is based on differences in values of the term S, which is derived from the covariance matrix, between different causal hypotheses. This criterion informs an algorithm called RC that incorporates both relevance and causal measures to iteratively select variables. Experiments on linear and nonlinear networks show RC has higher accuracy than other algorithms at inferring network structure. The criterion C and RC algorithm help address challenges of causal inference from complex data where dependencies are frequent.
Adaptive model selection in Wireless Sensor NetworksGianluca Bontempi
This document discusses challenges in using wireless sensor networks for environmental monitoring applications. It notes that sensor nodes have limited energy, which poses challenges for applications that need to run for months or years. The document describes the hardware capabilities of wireless sensor nodes and their energy consumption during different operating modes. It also provides an overview of using machine learning models to model sensor measurements over time and across sensors, with the goal of reducing energy usage through adaptive model selection.
Combining Lazy Learning, Racing and Subsampling for Effective Feature SelectionGianluca Bontempi
This document discusses approaches to feature selection for machine learning models, specifically comparing global versus local modeling techniques. It proposes combining lazy learning, racing, and subsampling for effective feature selection. Lazy learning uses local linear models for prediction rather than global nonlinear models, improving computational efficiency when many predictions are needed. Racing and subsampling allow efficient evaluation of feature subsets during wrapper-based feature selection by discarding poor-performing subsets early based on statistical tests of performance on subsets of the data. Experimental results are said to validate this combined approach for feature selection.
A model-based relevance estimation approach for feature selection in microarr...Gianluca Bontempi
This document presents a model-based approach for estimating feature relevance for feature selection in microarray datasets. It aims to provide an unbiased relevance estimation between filter and wrapper methods. The approach combines a low-bias k-nearest neighbor cross-validation error estimator with either a direct probability model estimator or a mutual information filter estimator to reduce variance. Experimental results on 20 public microarray datasets compare the proposed combined estimators to a support vector machine wrapper approach.
Machine Learning Strategies for Time Series PredictionGianluca Bontempi
This document introduces machine learning strategies for time series prediction. It begins with an introduction to the speaker and his background and research interests. It then provides an outline of the topics to be covered, including notions of time series, machine learning approaches for prediction, local learning methods, forecasting techniques, and applications and future directions. The document discusses what the audience should know coming into the course and what they will learn.
This document discusses using feature selection techniques to address the curse of dimensionality in microarray data analysis. It presents the problem of having many more features than samples in bioinformatics tasks like cancer classification and network inference. It describes filter, wrapper and embedded feature selection approaches and proposes a blocking strategy that uses multiple learning algorithms to evaluate feature subsets in order to improve selection robustness when samples are limited. Finally, it lists several microarray gene expression datasets that are commonly used to evaluate feature selection methods.
A Monte Carlo strategy for structure multiple-step-head time series predictionGianluca Bontempi
The document proposes a Monte Carlo approach called SMC (Structured Monte Carlo) for multiple-step-ahead time series forecasting that takes into account the structural dependencies between predictions. It generates samples using a direct forecasting approach and weights them based on how well they satisfy dependencies identified by an iterated approach. Experiments on three benchmark datasets show the SMC approach achieves more accurate forecasts as measured by SMAPE than iterated, direct, or other comparison methods for most prediction horizons tested.
THM1: Formalizing a problem as a prediction problem is often the most important contribution of a data scientist.
THM2: A predictor is an estimator, i.e. an algorithm which takes data and returns a prediction. Reality is stochastic, so data and predictions are stochastic.
THM3: Learning is challenging since data must be used both to create prediction models and to assess them. Bias and variance must be balanced to achieve good generalization.
FP7 evaluation & selection: the point of view of an evaluatorGianluca Bontempi
The document discusses the process of evaluating proposals for EU funding as an EU evaluator. It begins by introducing the author's expertise and background evaluating FP6 and FP7 proposals. It then outlines the evaluation process, which involves individual evaluation of assigned proposals followed by consensus building and panel evaluation. Key aspects covered include managing conflicts of interest, maintaining confidentiality, and adhering to a code of conduct. The evaluation criteria for integrated projects focus on relevance to program objectives, potential impact, scientific and technological excellence, quality of consortium, and quality of management.
This document discusses feature selection methods for causal inference in bioinformatics. It describes how relevance and causality differ, with relevant features not always being causal. Information theory concepts like mutual information, conditional mutual information, and interaction information are introduced to quantify dependence and independence between variables. The min-Interaction Max-Relevance (mIMR) filter method is proposed to select features based on both relevance to the target and minimal interaction, approximating causal relationships. Experimental results on breast cancer gene expression datasets show mIMR outperforms conventional ranking in predictive performance, identifying a potential causal signature for survival.
Computational Intelligence for Time Series PredictionGianluca Bontempi
This document provides an overview of computational intelligence methods for time series prediction. It begins with introductions to time series analysis and machine learning approaches for prediction. Specific models discussed include autoregressive (AR), moving average (MA), and autoregressive moving average (ARMA) processes. Parameter estimation techniques for AR models are also covered. The document outlines applications in areas like forecasting, wireless sensors, and biomedicine and concludes with perspectives on future directions.
Hadoop Vs Snowflake Blog PDF Submission.pptxdewsharon760
Explore the key differences between Hadoop and Snowflake. Understand their unique features, use cases, and how to choose the right data platform for your needs.
emotional interface - dehligame satta for youbkldehligame1
Welcome to DelhiGame.in, your premier hub for the latest Satta results and gaming updates in Delhi! Check out our live results https://delhigame.in/ and stay informed with the latest updates https://delhigame.in/past-results/ . Join us to experience the thrill of gaming like never before!
Data analytics is a powerful tool that can transform business decision-making across industries. Contact District 11 Solutions, which specializes in data analytics, to make informed decisions and achieve your business goals.
Graph Machine Learning - Past, Present, and Future -kashipong
Graph machine learning, despite its many commonalities with graph signal processing, has developed as a relatively independent field.
This presentation will trace the historical progression from graph data mining in the 1990s, through graph kernel methods in the 2000s, to graph neural networks in the 2010s, highlighting the key ideas and advancements of each era. Additionally, recent significant developments, such as the integration with causal inference, will be discussed.
Introduction to Data Science
1.1 What is Data Science, importance of data science,
1.2 Big data and data Science, the current Scenario,
1.3 Industry Perspective Types of Data: Structured vs. Unstructured Data,
1.4 Quantitative vs. Categorical Data,
1.5 Big Data vs. Little Data, Data science process
1.6 Role of Data Scientist
Parcel Delivery - Intel Segmentation and Last Mile Opt.pdf
Local modeling in regression and time series prediction
1. On the use of cross-validation for local
modeling in regression and time series
prediction
Gianluca Bontempi
gbonte@ulb.ac.be
Machine Learning Group
Departement d’Informatique, ULB
Boulevard de Triomphe - CP 212
http://www.ulb.ac.be/di/mlg
On the use of cross-validation for local modeling in regression and time series prediction – p.1/75
2. Outline
The Machine Learning Group
A local learning algorithm: the Lazy Learning.
Lazy Learning for multivariate regression modeling.
Lazy Learning for multi-step-ahead time series prediction.
Lazy Learning for feature selection.
Applications.
Future work.
On the use of cross-validation for local modeling in regression and time series prediction – p.2/75
3. Machine Learning: a definition
The field of machine learning is concerned with the question of how to
construct computer programs that automatically improve with
experience. [35]
On the use of cross-validation for local modeling in regression and time series prediction – p.3/75
4. The Machine Learning Group (MLG)
¡
7 researchers (1 prof, 6 PhD students), 4 graduate students).¡
Research topics: Bioinformatics, Classification, Computational statistics, Data
mining, Regression, Time series prediction, Sensor networks.
¡
Computing facilities: cluster of 16 processors, LEGO Robotics Lab.
¡
Website: www.ulb.ac.be/di/mlg.
¡
Scientific collaborations in ULB: IRIDIA (Sciences Appliquées), Physiologie
Moléculaire de la Cellule (IBMM), Conformation des Macromolécules Biologiques
et Bioinformatique (IBMM), CENOLI (Sciences), Microarray Unit (Hopital Jules
Bordet), Service d’Anesthesie (ERASME).
¡
Scientific collaborations outside ULB: UCL Machine Learning Group (B),
Politecnico di Milano (I), Universitá del Sannio (I), George Mason University (US).
¡
The MLG is part to the "Groupe de Contact FNRS" on Machine Learning.
On the use of cross-validation for local modeling in regression and time series prediction – p.4/75
5. MLG: running projects
1. "Integrating experimental and theoretical approaches to decipher the molecular
networks of nitrogen utilisation in yeast": ARC (Action de Recherche Concertée)
funded by the Communauté Française de Belgique (2004-2009). Partners: IBMM
(Gosselies and La Plaine), CENOLI.
2. "COMP2SYS" (COMPutational intelligence methods for COMPlex SYStems)
MARIE CURIE Early Stage Research Training funded by the European Union
(2004-2008). Main contractor: IRIDIA (ULB).
3. "Predictive data mining techniques in anaesthesia": FIRST Europe Objectif 1
funded by the Région wallonne and the Fonds Social Européen (2004-2009).
Partners: Service d’anesthesie (ERASME).
4. "AIDAR - Adressage et Indexation de Documents Multimédias Assistés par des
techniques de Reconnaissance Vocale": funded by Région Bruxelles-Capitale
(2004-2006). Partners: Voice Insight, RTBF, Titan.
On the use of cross-validation for local modeling in regression and time series prediction – p.5/75
6. Machine learning and applied statistics
Reductionist attitude: ML is a modern buzzword which equates to
statistics plus marketing
Positive attitude: ML paved the way to the treatment of real problems
related to data analysis, sometimes overlooked by statisticians
(nonlinearity, classification, pattern recognition, missing variables,
adaptivity, optimization, massive datasets, data management,
causality, representation of knowledge, parallelisation)
Interdisciplinary attitude: ML should have its roots on statistics and
complements it by focusing on: algorithmic issues, computational
efficiency, data engineering.
On the use of cross-validation for local modeling in regression and time series prediction – p.6/75
7. Motivations
There exists a wide amount of theoretical and practical results for
linear methods in statistics, forecasting and control.
However, in real settings we encounter often nonlinear problems.
Nonlinear methods are generally more difficult to analyze than
linear ones, rarely produce closed-form or analytically tractable
expressions, and are not easy to manipulate and implement.
Local learning techniques are a powerful way of re-using linear
techniques in a nonlinear setting.
On the use of cross-validation for local modeling in regression and time series prediction – p.7/75
8. Prediction models from data
TARGET
PREDICTION
MODEL
PREDICTION
INPUT OUTPUT
ERROR
DATA
TRAINING
On the use of cross-validation for local modeling in regression and time series prediction – p.8/75
10. The global modeling approach
x
y
q
Input-output regression problem.
On the use of cross-validation for local modeling in regression and time series prediction – p.10/75
11. The global modeling approach
x
y
q
!
#$
%
'(
)012
3456
7899A@
BBAC
DE
FG
HI
PQ
RRASS
TU
VW
XY
`a
bc
de
fg
hipq
rstu
vwxxAy
€€A
‚ƒ
„…
††A‡‡
ˆ‰
‘
’“
”•
–—
˜™
de
fghi
jklm
noppAq
rrAs
tu
vw
xy
zzA{{
|}
~
€
‚ƒ
„…
†‡
Training data set.
On the use of cross-validation for local modeling in regression and time series prediction – p.10/75
13. The global modeling approach
üüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýý
y
q x
Prediction by using the fitted global model.
On the use of cross-validation for local modeling in regression and time series prediction – p.10/75
14. The global modeling approach
þþþþþþþþþþþþþþþþþþþþþþþþþþþþþþþÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
x
y
q
Another prediction by using the fitted global model.
On the use of cross-validation for local modeling in regression and time series prediction – p.10/75
15. The local modeling approach
x
y
q
¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡
Input-output regression problem.
On the use of cross-validation for local modeling in regression and time series prediction – p.11/75
18. The local modeling approach
!
#$
%
''(
))0
12
34
56
78
99@@
AB
CD
EF
GH
IP
QR
ST
UVWX
Y`
ab
cd
eef
ggh
ip
qr
sstt
uv
wx
y€
‚
ƒ„
…†
‡ˆ
‰‘’
“”
•–
—˜
™™d
eef
gh
ij
kl
mn
oopp
qr
st
uv
wx
yz
{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{|||||||||||||||||||||||||||||||
x
y
q
}~
€
‚
ƒ„
Another local fitting and prediction.
On the use of cross-validation for local modeling in regression and time series prediction – p.11/75
19. Global vs. local modeling
The traditional approach to supervised learning is global
modeling which describes the relationship between the input and
the output with an analytical function over the whole input domain.
Even for huge datasets, a parametric model can be stored in a
small memory. Also, the evaluation of the parametric model
requires a short program that can be executed in a reduced
amount of time.
Modeling complex input/output relations often requires the
adoption of global nonlinear models, whose learning procedures
are typically slow and analytically intractable. In particular,
validation methods, which address the problem of assessing a
global model on the basis of a finite amount of noisy samples, are
computationally prohibitive.
For these reasons, in recent years, interest has grown in pursuing
alternatives (divide-and-conquer) to global modeling techniques.
On the use of cross-validation for local modeling in regression and time series prediction – p.12/75
20. Global vs. local modeling
The divide-and-conquer strategy consists in attacking a complex
problem by dividing it into simpler problems whose solutions can
be combined to yield a solution to the original problem.
Instances of the divide-and-conquer approach are modular
techniques (e.g. local model networks [36], regression trees [19],
splines [45]) and local modeling (aka smoothing) techniques.
The principle underlying local modeling is that a smooth function
can be well approximated by a low degree polynomial in the
neighborhood of any query point.
Local modeling techniques do not return a global fit of the
available dataset but perform the prediction of the output for
specific test input values, also called queries.
The talk presents our contribution to local modeling techniques
and their application to a number of experimental problems.
On the use of cross-validation for local modeling in regression and time series prediction – p.13/75
21. Lazy vs. eager modeling
Eager techniques perform a wide amount of computation for
tuning the model before observing the new query.
An eager technique must then commit to a specific hypothesis
that covers all the future queries.
Lazy techniques [1] wait for the query to be defined before
starting the learning procedure.
For that purpose, the database of observed input/output data is
always kept in memory and the output prediction is obtained by
interpolating the samples in the neighborhood of the query point.
Lazy methods will generally require less computation during
training but more computation when they must predict the target
value for a new query.
On the use of cross-validation for local modeling in regression and time series prediction – p.14/75
22. Examples
The classical linear regression is an example of global, eager, and
linear approach.
Neural networks (NN) are instances of the global, eager, and
nonlinear approach: NN are global in the sense that a single
representation covers the whole input space. They are eager in
the sense that the examples are used for tuning the network and
then they are discarded without waiting for any query. Finally, NN
are nonlinear in the sense that the relation between the weights
and the output is nonlinear.
The technique we are going to discuss here is a lazy and local
approach.
Remark: we can imagine a local technique (e.g. a K-nearest
neighbor) where the most important parameter (i.e. the number of
neighbors) is defined in an eager fashion.
On the use of cross-validation for local modeling in regression and time series prediction – p.15/75
23. Some history
Local regression estimation was independently introduced in
several different fields in the late nineteenth [42] and early
twentieth century [28].
In the statistical literature, the method was independently
introduced from different viewpoints in the late 1970’s [20, 31, 43].
Reference books are Fan and Gijbels [26] and Loader [32].
In the machine learning literature, work on local techniques for
classification dates back to 1967 [24]. A more recent reference is
the special issue on Lazy Learning [1].
On the use of cross-validation for local modeling in regression and time series prediction – p.16/75
24. Local modeling procedure
The identification of a local model [3] can be summarized in these
steps:
1. Compute the distance between the query and the training
samples according to a predefined metric.
2. Rank the neighbors on the basis of their distance to the query.
3. Select a subset of the nearest neighbors according to the
bandwidth which measures the size of the neighborhood.
4. Fit a local model (e.g. constant, linear,...).
Each of the local approaches has one or more structural (or
smoothing) parameters that control the amount of smoothing
performed.
In this talk we will focus on the bandwidth selection.
On the use of cross-validation for local modeling in regression and time series prediction – p.17/75
27. Bandwidth and bias/variance trade-off
Mean Squared Error
1/Bandwith
FEW NEIGHBORSMANY NEIGHBORS
Bias
Variance
Underfitting Overfitting
On the use of cross-validation for local modeling in regression and time series prediction – p.20/75
28. Existing work on bandwidth selection
Rule of thumb methods. They provide a crude bandwidth selection which in some
situations may result sufficient. Examples of rule of thumb are in [25],[27].
Plug-in techniques. The exact expression of optimal bandwidth can be obtained from
the asymptotic expressions of bias and variance, which unfortunately depends on
unknown terms. The idea of the direct plug-in method is to replace these terms
with estimates. This method was first introduced by Woodrofe [47] in density
estimation. Examples of plug-in methods for non parametric regression are
reported in Ruppert et al. [41].
Data-driven estimation. It is a selection procedure which estimates the generalization
error directly from data. Unlike the previous approach, this method does not rely
on the asymptotic expression but it estimates the values directly from the finite
data set. To this group belong methods like cross-validation, Mallow’s
sut
,
Akaike’s AIC and other extensions of methods used in classical parametric
modeling.
On the use of cross-validation for local modeling in regression and time series prediction – p.21/75
29. Existing work (II)
¡
Debate on the superiority of plug-in methods over data-driven methods is still
open and the experimental evidences are contrasting. Results on behalf of
plug-in methods come from [47, 41, 38].
¡
Loader [33] showed how the supposed superior performance of plug-in
approaches is a complete myth. The use of cross-validation for bandwidth
selection has been investigated in several papers, mainly in the case of density
estimation [30].
¡
In regression an adaptation of Mallow’s
st
was introduced by Rice [40] for
constant fitting and by Cleveland and Devlin [21] in local polynomial regression.
Cleveland and Loader [22] suggested local
st
and local PRESS for choosing
both the degree of local polynomial mixing and the bandwidth.
¡
We believe that plug-in methods are built on a series of assumptions about the
statistical process underlying the data set and on theoretical results which are
more reliable more the number of points tends to infinity.
¡
In a common black-box situation where no a priori information is available, the
adoption of data driven techniques can result a promising approach to the
problem. On the use of cross-validation for local modeling in regression and time series prediction – p.22/75
31. Original contributions
Problem1: identifying a sequence of local models is expensive.
Solution1: we propose recursive-least-squares (RLS) to speed up the
identification of sequence of models with increasing number of
neighbors [6, 13].
Problem 2: validating a local model by cross-validation is expensive.
Solution 2: we compute the leave-one-out cross-validation by obtaining
the PRESS statistic through the terms of RLS [9].
Problem 3: choosing the best model is prone to errors.
Solution 3: we combine the best models [7].
On the use of cross-validation for local modeling in regression and time series prediction – p.24/75
33. PRESS statistic and leave-one-out
PARAMETRIC IDENTIFICATION ON N-1 SAMPLES
PUT THE j-th SAMPLE ASIDE
TEST ON THE j-th SAMPLE
PARAMETRIC IDENTIFICATION
ON N SAMPLES
N TIMES
TRAINING SET
PRESS STATISTIC
LEAVE-ONE-OUT
PRESS was first introduced by Allen [2].
On the use of cross-validation for local modeling in regression and time series prediction – p.26/75
34. The regression task
Given two variables
V
W
XY
and
`
W
X
, let us consider the mapping
acb
XY
d
X
,
known only through a set of
e
examples
fg
Vhpi
`h
qr
Yhts
u
obtained as follows:
`h
v
ag
Vh
qxw
yhpi
where
€
,
¡
yh
is a random variable such that
‚
ƒ
yh
„
v
…
and
‚
ƒ
yh
y†
„
v
…
,
€‡
ˆv
,
¡
‚
ƒ
y
‰
h
„
v
‰g
Vh
q
,
€’‘
“
”
, where
‰g–•
q
is the unknown
‘
th
moment of the
distribution of
yh
and is defined as a function of
Vh
.
In particular for
‘
v
”
, the last of the above mentioned properties implies that no
assumption of global homoscedasticity is made.
On the use of cross-validation for local modeling in regression and time series prediction – p.27/75
35. Local Weighted Regression
¡
The problem of local regression can be stated as the problem of estimating the
value that the regression function
ag
V
q
v
‚
ƒ
`—
V„
assumes for a specific query
point
V
, using information pertaining only to a neighborhood of
V
.
¡
Given a query point
V™˜
, and under the hypothesis of a local homoscedasticity of
yh
, the parameter
d
of a local linear approximation of
ag–•
q
in a neighborhood of
V˜
can be obtained solving the local polynomial regression:
e
hs
u
f
`h
g
V
h
h
d
i
j
k
lg
Vhi
V˜
q
m
i
where, given a metric on the space
XY
,
¡
lg
Vhi
V˜
q
is the distance from the query point to the
n
o
example,
v
p
iq
q
q
i
r
,
¡
kg–•
q
is a weight (aka kernel) function,
¡
m
is the bandwidth
On the use of cross-validation for local modeling in regression and time series prediction – p.28/75
36. Local Weighted Regression (II)
¡
In matrix notation, the solution of the above stated weighted least squares
problem is given by:
sd
v
g
t
h
u
h
u
t
qv
u
t
h
u
h
uxw
v
gy
h
y
qv
u
y
h{z
v
|
y
hz
i
where
t
is a matrix whose
n
o
row is
V
h
h
,
w
is a vector whose
n
o
element is
`h
,
u
is a diagonal matrix whose
n
o
diagonal element is
}h
h
v
kg
lg
Vhi
V˜
q~
m
q
,
y
v
u
t
,
z
v
uxw
, and the matrix
t
h
u
h
u
t
vy
h
y
is assumed to be
non-singular so that its inverse
|
v
gy
h
y
qv
u
is defined.
¡
Once obtained the local linear polynomial approximation, a prediction of
`˜
v
ag
V˜
q
, is finally given by:
€`˜
v
V
h
˜
sd
q
On the use of cross-validation for local modeling in regression and time series prediction – p.29/75
37. Linear Leave-one-out
¡
By exploiting the linearity of the local approximator, a leave-one-out
cross-validation estimation of the mean squared error
‚
ƒg
ag‚
˜
q
g
€`˜
q
j
„
can be
obtained without any significant overload.
¡
In fact, using the PRESS statistic [2, 37], it is possible to calculate the error
ƒ
cv
†
v
`†
g
V
h
†
sdv
†
, without explicitly identifying the parameters
sdv
†
from the
examples available with the
‡
th
removed.
¡
The formulation of the PRESS statistic for the case at hand is the following:
ƒ
cv
†
v
`†
g
V
h
†
sdv
†
v
`†
g
V
h
†
|
y
hz
p
g
„
h
†
|
„†
v
`†
g
V
h
†
sd
p
g
m†
†
i
where
„
h
†
is the
‡
th
row of
y
and therefore
„†
v
}†
†
V†
, and where
m†
†
is the
‡
th
diagonal element of the Hat matrix
…
vy
|
y
h
vy
gy
h
y
q†v
u
y
h
.
On the use of cross-validation for local modeling in regression and time series prediction – p.30/75
38. Rectangular weight function
¡
In what follows, for the sake of simplicity, we will focus on linear approximator. An
extension to generic polynomial approximators of any degree is straightforward.
We will assume also that a metric on the space
XY
is given. All the attention will
be thus centered on the problem of bandwidth selection.
¡
If as a weight function
kg–•
q
the indicator function
k
lg
Vhi
V˜
q
m
v
‡ˆ‰
p
if
lg
Vhi
V˜
qŠ
m
,
…
otherwise;
(0)
is adopted, the optimization of the parameter
m
can be conveniently reduced to
the optimization of the number ‹ of neighbors to which a unitary weight is
assigned in the local regression evaluation.
¡
In other words, we reduce the problem of bandwidth selection to a search in the
space of
mg
‹
q
v
lg
Vg
‹
q
i
V˜
q
, where
Vg
‹
q
is the
‹
th
nearest neighbor of the query
point.
On the use of cross-validation for local modeling in regression and time series prediction – p.31/75
39. Recursive local regression
The main advantage deriving from the adoption of the rectangular weight function is
that, simply by updating the parameter
sdg
‹
q
of the model identified using the
‹
nearest neighbors, it is straightforward and inexpensive to obtain
sdg
‹w
p
q
. In fact,
performing a step of the standard recursive least squares algorithm [4], we have:
‡ŒŒŽŒ‚ŒŽŒŒŽŒ‚ŒŽŒˆŒŽŒ‚ŒŽŒŒŒŒŽŒ‚ŒŽ‰
|g
‹w
p
q
v
|g
‹
q
g
|g
‹
q
Vg
‹w
p
q
V
hg
‹w
p
q
|g
‹
q
pw
V
hg
‹w
p
q
|g
‹
q
Vg
‹w
p
q
g
‹w
p
q
v
|g
‹w
p
q
Vg
‹w
p
q
ƒg
‹w
p
q
v
`g
‹w
p
q
g
V
hg
‹w
p
q
sdg
‹
q
sdg
‹w
p
q
v
sdg
‹
qxw
g
‹w
p
q
ƒg
‹w
p
q
where
|g
‹
q
v
gy
h
y
q†v
u
when
m
v
mg
‹
q
, and where
Vg
‹w
p
q
is the
g
‹w
p
q
th
nearest
neighbor of the query point.
On the use of cross-validation for local modeling in regression and time series prediction – p.32/75
42. Local Model combination
¡
As an alternative to the winner-takes-all paradigm, we explored also the
effectiveness of local combinations of estimates [46].
¡
The final prediction of the value
`˜
is obtained as a weighted average of the best
»
models, where
»
is a parameter of the algorithm.
¡
Suppose the predictions
€`˜
g
‹
q
and the error vectors
¼
cv
g
‹
q
have been ordered
creating a sequence of integers
f
‹h
r
so that
½
MSE
g
‹h
qŠ
½
MSE
g
ܠ
q
,
€¾
‡
. The
prediction of
€`˜
is given by
€`˜
v
¿
À
hs
u
Áh
€`˜
g
‹h
q
¿
À
hts
u
Áh
i
where the weights are the inverse of the mean square errors:
Áh
v
p~
½
MSE
g
‹h
q
.
This is an example of the generalized ensemble method [39].
On the use of cross-validation for local modeling in regression and time series prediction – p.35/75
43. From local learning to Lazy Learning (LL)
By speeding up the local learning procedure, we can delay the
learning procedure to the moment when a prediction in a query
point is required (query-by-query learning).
The combination approach makes possible to integrate local
models of different order (e.g. constant and linear) and different
bandwidths.
This method is called lazy since the whole learning procedure
(i.e. the parametric and the structural identification) is deferred
until a prediction is required.
On the use of cross-validation for local modeling in regression and time series prediction – p.36/75
44. Experimental setup for regression
Datasets: 23 real and artificial datasets from the ML repository.
Methods: Lazy Learning, Local modeling, Feed Forward Neural
Networks, Mixtures of Experts, Neuro Fuzzy, Regression Trees
(Cubist).
Experimental methodology:
’Â
-fold cross-validation.
Results: Mean absolute error (Table 7.2), relative error (Table 7.3) and
paired t-test (Appendix C) [7].
On the use of cross-validation for local modeling in regression and time series prediction – p.37/75
45. Regression datasets
Dataset Number of examples Number of regressors
Housing 330 8
Cpu 506 13
Prices 209 6
Mpg 159 16
Servo 392 7
Ozone 167 8
Bodyfat 252 13
Pool 253 3
Energy 2444 5
Breast 699 9
Abalone 4177 10
Sonar 208 60
Bupa 345 6
Iono 351 34
Pima 768 8
Kin_8fh 8192 8
Kin_8nh 8192 8
Kin_8fm 8192 8
Kin_8nm 8192 8
Kin_32fh 8192 32
Kin_32nh 8192 32
Kin_32fm 8192 32
Kin_32nm 8192 32
On the use of cross-validation for local modeling in regression and time series prediction – p.38/75
46. Experimental results: paired comparison
Each method is statistically compared with all the others
(9 * 23 =207 comparisons).
Method
Number of times the method
was significantly worse than another
LL linear 74
LL constant 96
LL combination 23
Local modeling linear 58
Local modeling constant 81
Cubist 40
Feed Forward NN 53
Mixtures of Experts 80
Local Model Network (fuzzy) 132
Local Model Network (k-mean) 145
The less, the best !!
On the use of cross-validation for local modeling in regression and time series prediction – p.39/75
47. Award in EUFIT competition
Data analysis competition on regression: awarded as a runner-up among
Ã
’
participants at the Third International Erudit competition on
Protecting rivers and streams by monitoring chemical
concentrations and algae communities [10].
On the use of cross-validation for local modeling in regression and time series prediction – p.40/75
48. Lazy Learning for dynamic tasks
Multi-step-ahead prediction: [12]
long horizon forecasting based on the iteration of a LL
one-step-ahead predictor.
Nonlinear control: [11]
1. Lazy Learning inverse/forward control.
2. Lazy Learning self-tuning control.
3. Lazy Learning optimal control.
On the use of cross-validation for local modeling in regression and time series prediction – p.41/75
50. 0
10
20
30
40
50
0
10
20
30
40
50
−8
−6
−4
−2
0
2
4
6
8
10
fit
TIME SERIES t t-n+1t-1
= (ϕ ,ϕ ,..., ϕ )fϕt+1
t-1
ϕ
temporal representation
embedding representation
ϕt+1
ϕt
input/output representation
ϕ
1
2
3
4
5
On the use of cross-validation for local modeling in regression and time series prediction – p.43/75
51. One-step and multi-step-ahead prediction
One-step ahead prediction: the
Ç
previous values of the series are
assumed to be available for the prediction of the next value.
This is equivalent to a problem of supervised learning. LL was
used in this way in several prediction tasks: finance, economic
variables, environmental modeling [23].
Multi-step ahead prediction: we predict the value of the series for the
next
£
steps.
We can classify the methods for multiple step prediction
according to two features, the horizon of the predictor and the
training criterion.
On the use of cross-validation for local modeling in regression and time series prediction – p.44/75
53. Iteration of a one-step-ahead predictor
f
ϕt-2
z-1
z-1
z-1
z-1
ϕt-3
ϕt-n
ϕt-1
z-1
ϕt
On the use of cross-validation for local modeling in regression and time series prediction – p.46/75
57. Conventional and iterated leave-one-out
a)
3
1
2
4
5
3
e (3)
cv
1
2
3
4
5
e (3)
b)
it
1
2
4
5
3
1
2 3
4
5
3
On the use of cross-validation for local modeling in regression and time series prediction – p.50/75
58. It Press in the space
x4 x5x3x2x1z1 z2 z4 z5
y1
y2
y3
y4
y5
xy
loo
z3
yz
it
x
xyβ
-3
yzβ
-3
x
y
z
3
-3
y^
e (3)
xz
e (3)
e (3)
loo
¢
represents the value of the time series with order
Ç
§
’
at time
Ê
”
’
,
¦
represents the value of the time series at time
Ê
, and
represents
the value of the time series at time
Ê
’
.
On the use of cross-validation for local modeling in regression and time series prediction – p.51/75
59. From conventional to iterated PRESS
PRESS statistic returns leave-one-out as a by product of the local
weighted regression.
We derived in [12] an analytical iterated formulation of the PRESS
statistic for long horizon assessment.
Iterated assessment criterion improves stability and prediction
accuracy.
On the use of cross-validation for local modeling in regression and time series prediction – p.52/75
60. The Iterated multi-step-ahead algo
1. Time series embedded as an input/output mapping
¨
¤¥
¤
.
2. The one-step-ahead predictor is a local estimate of the mapping
¨
.
3. The
£
-step-ahead prediction is performed by iterating a
one-step-ahead estimator.
4. Local structure identification performed in a space of alternative
model configurations, each characterized by a different
bandwidth.
5. Prediction ability assessed by the iterated formulation of the
cross-validation PRESS statistic (
£
-step-ahead criterion).
On the use of cross-validation for local modeling in regression and time series prediction – p.53/75
61. The Santa Fe time series
The iterated PRESS approach has been applied both to the
prediction of a real-world data set (A) and to a computer
generated time series (D) from the Santa Fe Time Series
Prediction and Analysis Competition.
The A time series has a training set of 1000 values and a test set
of 10000 samples: the task is to predict the continuation for
’Â
Â
steps, starting from different points.
The D time series has a training set of 100000 values and a test
set of 500 samples: the task is to predict the continuation for
Ã
steps, starting from different points.
On the use of cross-validation for local modeling in regression and time series prediction – p.54/75
62. A series: training set
0 100 200 300 400 500 600 700 800 900 1000
0
50
100
150
200
250
300
On the use of cross-validation for local modeling in regression and time series prediction – p.55/75
63. A series: one-step criterion
0 10 20 30 40 50 60 70 80 90 100
0
50
100
150
200
250
300
On the use of cross-validation for local modeling in regression and time series prediction – p.56/75
64. A series: multi-step criterion
0 10 20 30 40 50 60 70 80 90 100
0
50
100
150
200
250
300
On the use of cross-validation for local modeling in regression and time series prediction – p.57/75
65. Experiments: The Santa Fe Time Series A
order n=16 Training set: 1000 values Test set: 100 steps
Test data Non iter. PRESS Iter. PRESS Sauer Wan
1-100 0.350 0.029 0.077 0.055
1180-1280 0.379 0.131 0.174 0.065
2870-2970 0.793 0.055 0.183 0.487
3000-3100 0.003 0.003 0.006 0.023
4180-4280 1.134 0.051 0.111 0.160
Sauer: combination of iterated and direct local models.
Wan: recurrent network.
On the use of cross-validation for local modeling in regression and time series prediction – p.58/75
66. The Santa Fe Time Series D
order
Ç
§
ÃÂ
Training set:
’Â
Â
šÂ
Â
Â
values Test set:
Ã
steps
Test data Non iter. PRESS Iter. PRESS Zhang Hutchinson
0-24 0.1255 0.0492 0.0665
100-124 0.0460 0.0363 0.0616
200-224 0.2635 0.1692 0.1475
300-324 0.0461 0.0405 0.0541
400-424 0.1610 0.0644 0.0720
Zhang: combination of iterated and direct multilayer perceptron.
On the use of cross-validation for local modeling in regression and time series prediction – p.59/75
67. Award in Leuven Competition
Training set made of
ÃÂ
Â
Â
points.
Task: predict the continuation for the next
ÃÂ
Â
points.
0 20 40 60 80 100 120 140 160 180 200
−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
Iterated Lazy Learning ranked second and fourth [8].
On the use of cross-validation for local modeling in regression and time series prediction – p.60/75
68. Lazy Learning for iterated prediction
Multi-step ahead by iteration of a one-step predictor.
Lazy learning to implement the one-step predictor.
Selection of the local structure by an iterated PRESS.
Iterated criterion avoids the accumulation of prediction errors and
improves the performance.
On the use of cross-validation for local modeling in regression and time series prediction – p.61/75
69. Complexity in global and local modeling
Consider
r
training samples,
e
features and
query points.
GLOBAL LAZY
Parametric ident.
(NLS)
(Nn)+
(LS)
Structural ident. by K-fold cross-validation K
(NLS) small
prediction for Q queries negligible Q (
(Nn)+
(LS))
TOTAL K
(NLS) Q [
(Nn)+
(LS)]
where
(NLS) stands for the cost of Non-Linear least-Squares and
(LS) stands for
the cost of Linear least-Squares.
On the use of cross-validation for local modeling in regression and time series prediction – p.62/75
70. Feature selection and LL
Local modeling techniques are known to be weak in large
dimensional spaces.
A way to defy the curse of dimensionality is dimensionality
reduction (aka feature selection).
It requires the assessment of an exponential number of
alternatives (
Ã¥
subsets of input variables) and the choice of the
best one.
Several techniques exist: we focus here on wrappers.
Wrappers rely on expensive cross-validation (e.g. leave-one-out
assessment)
Our idea: combine racing [34] and sub-sampling [29] to
accelerate the wrapper feature selection procedure in LL.
On the use of cross-validation for local modeling in regression and time series prediction – p.63/75
71. On the use of cross-validation for local modeling in regression and time series prediction – p.64/75
72. Racing for feature selection
Suppose we have several sets of different input variables.
The computational cost of making a selection results from the
cost of identification and the cost of validation.
The validation cost required by a global model is independent of
Q, while this is not the case for LL.
The idea of racing techniques consists in using blocking and paired
multiple test to compare different models in similar conditions and
discard as soon as possible the worst ones.
Racing reduces the number of tests
to be made.
This makes more competitive the wrapper LL approach.
On the use of cross-validation for local modeling in regression and time series prediction – p.65/75
73. On the use of cross-validation for local modeling in regression and time series prediction – p.66/75
74. On the use of cross-validation for local modeling in regression and time series prediction – p.67/75
75. Sub-sampling and LL
The goal of model selection is to find the best hypothesis in a set
of alternatives.
What is relevant is ordering the different alternatives: M2 M3
M5 M1 M2.
Reducing the training set size N, we hope to reduce the accuracy
of each single model but not necessarily their ordering.
In LL reducing the training set size
reduces the cost.
The idea of sub-sampling is to reduce the size of the training set
without altering the ranking of the different models.
This makes more competitive the LL approach
On the use of cross-validation for local modeling in regression and time series prediction – p.68/75
76. RACSAM for feature selection
We proposed the following algorithm [14]
1. Define an initial group of promising feature subsets.
2. Start with small training and test sets.
3. Discard by racing all the feature subsets that appear as
significantly worse than the others.
4. Increase the training and test size until at most winners models
remain.
5. Update the group with new candidates to be assessed and go
back to 3.
On the use of cross-validation for local modeling in regression and time series prediction – p.69/75
77. Experimental session
We compare the performance accuracy of the LL algorithm
enhanced by the RACSAM procedure to the the accuracy of two
state-of-art algorithms, a SVM for regression and a regression
tree (RTREE).
Two version of the RACSAM algorithm were tested: the first
(LL-RAC1) takes as feature set the best one (in terms of estimate
Mean absolute Error (MAE)) among the winning candidates :
the second (LL-RAC1) averages the predictions of LL
predictors.
§
, and p-value is
Â
¤
Â
’
.
On the use of cross-validation for local modeling in regression and time series prediction – p.70/75
78. Experimental results
Five-fold cross-validation on six real datasets of high dimensionality:
Ailerons (
§
’
Â
š
Ç
§
Â
), Pole (
§
’
Â
Â
Â
š
Ç
§
),
Elevators (
§
’
Î
š
Ç
§
’
), Triazines (
§
’
Î
š
Ç
§
ÎÂ
),
Wisconsin (
§
’
š
Ç
§
Ã
) and Census (
§
Ã
Ã
!
š
Ç
§
’
!
).
Dataset AIL POL ELE TRI WIS CEN
LL-RAC1 9.7e-5 3.12 1.6e-3 0.21 27.39 0.17
LL-RAC2 9.0e-5 3.13 1.5e-3 0.12 27.41 0.16
SVM 1.3e-4 26.5 1.9e-3 0.11 29.91 0.21
RTREE 1.8e-4 8.80 3.1e-3 0.11 33.02 0.17
On the use of cross-validation for local modeling in regression and time series prediction – p.71/75
79. Applications
¡
Financial prediction of stock markets: in collaboration with Masterfood, Belgium.¡
Prediction of yearly sales: in collaboration with Dieteren, Belgium, the first
Belgian car dealer.
¡
Non linear control and identification task in power systems: in collaboration with
Universit´a del Sannio (I) [44, 18].
¡
Modeling of industrial processes: in collaboration with FaFer Usinor steel
company (B), and Honeywell Technology Center, (US).
¡
Performance modelling of embedded systems: during my stay at Philips
Research [16], Eindhoven (NL).
¡
Quality of service: during my stay at IMEC, Leuven (B) [17].
¡
Black-box simulators: in collaboration with CENEARO, Gosselies (B) [15].
¡
Environmental predictions: in collaboration with Politecnico di Milano (I) [23].
On the use of cross-validation for local modeling in regression and time series prediction – p.72/75
80. Software
MATLAB toolbox on Lazy Learning [5].
R contributed package lazy.
Joint work with Dr. Mauro Birattari (IRIDIA).
Web page: http://iridia.ulb.ac.be/~lazy.
About 5000 accesses since October 2002.
On the use of cross-validation for local modeling in regression and time series prediction – p.73/75
81. The importance of being Lazy
Fast data-driven design.
No global assumption on the noise.
Linear methods still effective in a multivariate non-linear setting
(LWR, PRESS).
An estimate of the variance is returned with each prediction.
Intrinsically adaptive.
On the use of cross-validation for local modeling in regression and time series prediction – p.74/75
82. Future work
Extension of the LL method to other local selection criteria (VC
dimension, GCV).
Classification applications.
Integration with powerful software and hardware devices.
From large to huge databases.
New applications: bioinformatics, text mining, medical data,
sensor networks, power systems.
On the use of cross-validation for local modeling in regression and time series prediction – p.75/75
83. References
[1] D. W. Aha. Editorial of special issue on lazy learning. Artificial
Intelligence Review, 11(1–5):1–6, 1997.
[2] D. M. Allen. The relationship between variable and data augmen-
tation and a method of prediction. Technometrics, 16:125–127,
1974.
[3] C. G. Atkeson, A. W. Moore, and S. Schaal. Locally weighted
learning. Artificial Intelligence Review, 11(1–5):11–73, 1997.
[4] G. J. Bierman. Factorization Methods for Discrete Sequential
Estimation. Academic Press, New York, NY, 1977.
[5] M. Birattari and G. Bontempi. The lazy learning toolbox, for
use with matlab. Technical Report TR/IRIDIA/99-7, IRIDIA-ULB,
Brussels, Belgium, 1999.
[6] M. Birattari, G. Bontempi, and H. Bersini. Lazy learning meets
the recursive least-squares algorithm. In M. S. Kearns, S. A.
Solla, and D. A. Cohn, editors, NIPS 11, pages 375–381, Cam-
bridge, 1999. MIT Press.
75-1
84. [7] G. Bontempi. Local Learning Techniques for Modeling, Predic-
tion and Control. PhD thesis, IRIDIA- Universit´e Libre de Brux-
elles, 1999.
[8] G. Bontempi, M. Birattari, and H. Bersini. Lazy learning for it-
erated time series prediction. In J. A. K. Suykens and J. Van-
dewalle, editors, Proceedings of the International Workshop on
Advanced Black-Box Techniques for Nonlinear Modeling, pages
62–68. Katholieke Universiteit Leuven, Belgium, 1998.
[9] G. Bontempi, M. Birattari, and H. Bersini. Recursive lazy learning
for modeling and control. In Machine Learning: ECML-98 (10th
European Conference on Machine Learning), pages 292–303.
Springer, 1998.
[10] G. Bontempi, M. Birattari, and H. Bersini. Lazy learners at work:
the lazy learning toolbox. In Proceeding of the 7th European
Congress on Inteligent Techniques and Soft Computing EUFIT
’99, 1999.
[11] G. Bontempi, M. Birattari, and H. Bersini. Lazy learning for
modeling and control design. International Journal of Control,
72(7/8):643–658, 1999.
75-1
85. [12] G. Bontempi, M. Birattari, and H. Bersini. Local learning for iter-
ated time-series prediction. In I. Bratko and S. Dzeroski, editors,
Machine Learning: Proceedings of the Sixteenth International
Conference, pages 32–38, San Francisco, CA, 1999. Morgan
Kaufmann Publishers.
[13] G. Bontempi, M. Birattari, and H. Bersini. A model selection ap-
proach for local learning. Artificial Intelligence Communications,
121(1), 2000.
[14] G. Bontempi, M. Birattari, and P.E. Meyer. Combining lazy learn-
ing, racing and subsampling for effective feature selection. In
Proceedings of the International Conference on Adaptive and
Natural Computing Algorithms. Springer Verlag, 2005. To ap-
pear.
[15] G. Bontempi, O. Caelen, S. Pierret, and C. Goffaux. On the
use of supervised learning techniques to speed up the design
of aeronautics components. WSEAS Transactions on Systems,
10(3):3098–3103, 2005.
[16] G. Bontempi and W. Kruijtzer. The use of intelligent data anal-
ysis techniques for system-level design: a software estimation
75-1
86. example. Soft Computing, 8(7):477–490, 2004.
[17] G. Bontempi and G. Lafruit. Enabling multimedia qos control with
black-box modeling. In D. Bustard, W. Liu, and R. Sterritt, edi-
tors, Soft-Ware 2002: Computing in an Imperfect World, Lecture
Notes in Computer Science, pages 46–59, 2002.
[18] G. Bontempi, A. Vaccaro, and D. Villacci. A semi-physical mod-
elling architecture for dynamic assessment of power components
loading capability. IEE Proceedings of Generation Transmission
and Distribution, 151(4):533–542, 2004.
[19] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Clas-
sification and Regression Trees. Wadsworth International Group,
Belmont, CA, 1984.
[20] W. S. Cleveland. Robust locally weighted regression and smooth-
ing scatterplots. Journal of the American Statistical Association,
74:829–836, 1979.
[21] W. S. Cleveland and S. J. Devlin. Locally weighted regression:
an approach to regression analysis by local fitting. Journal of
American Statistical Association, 83:596–610, 1988.
75-1
87. [22] W. S. Cleveland and C. Loader. Smoothing by local regression:
Principles and methods. Computational Statistics, 11, 1995.
[23] G. Corani. Air quality prediction in milan: feed-forward neural
networks, pruned neural networks and lazy learning. Ecological
Modelling, 2005. In press.
[24] T. Cover and P. Hart. Nearest neighbor pattern classification.
Proc. IEEE Trans. Inform. Theory, pages 21–27, 1967.
[25] J. Fan and I. Gijbels. Adaptive order polynomial fitting: band-
width robustification and bias reduction. J. Comp. Graph. Statist.,
4:213–227, 1995.
[26] J. Fan and I. Gijbels. Local Polynomial Modelling and Its Appli-
cations. Chapman and Hall, 1996.
[27] W. Hardle and J. S. Marron. Fast and simple scatterplot smooth-
ing. Comp. Statist. Data Anal., 20:1–17, 1995.
[28] R. Henderson. Note on graduation by adjusted average. Trans-
actions of the Actuarial Society of America, 17:43–48, 1916.
[29] G. H. John and P. Langley. Static versus dynamic sampling for
data mining. In Proceedings of the Second International Con-
75-1
88. ference on Knowledge Discovery in Databases and Data Mining.
AAAI/MIT Press, 1996.
[30] M. C. Jones, J. S. Marron, and S. J. Sheather. A brief survey of
bandwidth selection for density estimation. Journal of American
Statistical Association, 90, 1995.
[31] V. Y. Katkovnik. Linear and nonlinear methods of nonparametric
regression analysis. Soviet Automatic Control, 5:25–34, 1979.
[32] C. Loader. Local Regression and Likelihood. Springer, New York,
1999.
[33] C. R. Loader. Old faithful erupts: Bandwidth selection reviewed.
Technical report, Bell-Labs, 1987.
[34] O. Maron and A. Moore. The racing algorithm: Model selection
for lazy learners. Artificial Intelligence Review, 11(1–5):193–225,
1997.
[35] T. M. Mitchell. Machine Learning. McGraw Hill, 1997.
[36] R. Murray-Smith and T. A. Johansen. Local learning in local
model networks. In R. Murray-Smith and T. A. Johansen, editors,
75-1
89. Multiple Model Approaches to Modeling and Control, chapter 7,
pages 185–210. Taylor and Francis, 1997.
[37] R. H. Myers. Classical and Modern Regression with Applications.
PWS-KENT Publishing Company, Boston, MA, second edition,
1994.
[38] B. U. Park and J. S. Marron. Comparison of data-driven band-
width selectors. Journal of American Statistical Association,
85:66–72, 1990.
[39] M. P. Perrone and L. N. Cooper. When networks disagree: En-
semble methods for hybrid neural networks. In R. J. Mammone,
editor, Artificial Neural Networks for Speech and Vision, pages
126–142. Chapman and Hall, 1993.
[40] J. Rice. Bandwidth choice for nonparametric regression. The
Annals of Statistics, 12:1215–1230, 1984.
[41] D. Ruppert, S. J. Sheather, and M. P. Wand. An effective band-
width selector for local least squares regression. Journal of
American Statistical Association, 90:1257–1270, 1995.
75-1
90. [42] G. V. Schiaparelli. Sul modo di ricavare la vera espressione
delle leggi della natura dalle curve empiricae. Effemeridi Astro-
nomiche di Milano per l’Arno, 857:3–56, 1886.
[43] C. Stone. Consistent nonparametric regression. The Annals of
Statistics, 5:595–645, 1977.
[44] D. Villacci, G. Bontempi, A. Vaccaro, and M. Birattari. The role
of learning methods in the dynamic assessment of power com-
ponents loading capability. IEEE Transactions on Industrial Elec-
tronics, 52(1), 2005.
[45] G. Wahba and S. Wold. A completely automatic french curve:
Fitting spline functions by cross-validation. Communications in
Statistics, 4(1), 1975.
[46] D. Wolpert. Stacked generalization. Neural Networks, 5:241–
259, 1992.
[47] M. Woodrofe. On choosing a delta-sequence. Ann. Math. Statist.,
41:1665–1671, 1970.
75-1