Abstracts DHC | Data Sciences

Stephen Boyd, Stanford University

Convex Optimization

Convex optimization has emerged as useful tool for applications that include data analysis and model fitting, resource allocation, engineering design, network design and optimization, finance, and control and signal processing.
After an overview of the mathematics, algorithms, and software frameworks for convex optimization, we turn to common themes that arise across applications, such as sparsity and relaxation.
We describe recent work on real-time embedded convex optimization, in which small problems are solved repeatedly in millisecond or microsecond time frames, and large-scale distributed convex optimization, in which many solvers are coordinated to solve enormous problems.  


Peter Bühlmann, ETH Zürich

The Statistics-"Machine" in Data Science

Statistics plays a unique role in Data Science for quantifying uncertainties, addressing pressing needs on replicability, and contributing towards improving the search for scientific findings. We will discuss recent developments for statistical inference statements in high-dimensional, large-scale data models - from regression to causality.


Jean-Charles Delvenne, UCL

Network Science: data science meets dynamical systems

Network science has appeared as the unifying framework able to formulate parallel questions emerging in different communities -computer science, statistics, physics, control theory-pertaining to the pairwise interactions of many individual entities. We focus on the example of clustering, aka community detection, intuitively defined as the search of densely connected subgraphs. In different contexts- image processing, bioinformatics, social analysis, multi-agent systems, etc.-it may be formulated as a variant of minimum cuts in graphs,  as a statistical inference problem, as a model-reduction problem for the dynamics taking place on the networks, such as opinion dynamics or epidemics. Although arising from different motivations, the mathematical formulations overlap on simple cases (undirected networks, memoryless dynamics, etc.) which they generalise in different directions. We discuss these formulations and their hidden pitfalls.


Pierre Dupont, UCL

Machine learning methods for precision medicine: even small data can raise big challenges

Machine learning (ML) is the science of getting computers to act without being explicitly programmed. ML typically follows a data-driven methodology where models are built from observed data before making predictions on new data.
This talk will present several ML applications to precision medicine, an area of medicine where decisions, treatment and follow-up are aimed to be tailored to each individual patient.

We present prototypical examples including breast cancer prognosis, early diagnosis of undifferentiated arthritis or treatment response prediction of an immunotherapy against melanoma.
Such examples illustrate core ML concepts including multi-class prediction, multitask or transfer learning, and feature selection.

At times where big data is ubiquitous, we discuss briefly why scarce data can sometimes be even more challenging.


François Glineur, UCL

Performance estimation of first-order optimization methods

Optimization algorithms are widely used in a large variety of domains in  engineering, computer science, economics and management. Because of the  ever-increasing amounts of data available and the growing demand for more accurate models, larger and larger optimization models have to be solved. This is one of the reasons for the renewed interest in first-order methods, which are particularly suited for large-scale models. In this talk, we describe the recently developed framework of performance estimation, that allows to compute tight performance guarantees for a large class of first-order methods in a completely automated manner.


Bernadette Govaerts, UCL

Integrating chemometrics and statistical methods in the analysis of spectral Omics data

The use of omics technologies becomes common in a variety of health and pharmaceutical applications with the aim of better understanding the link between the genetic, transcriptomics, proteomic, metabolomic… profiles of biological samples with outcomes of interest as the presence of a disease, a treatment effect,…  Among these technologies, spectroscopic techniques (MS, NMR) produce target or untargeted proteomic or metabolomic fingerprints in the form of 1D or 2D high-dimensional spectra that must be preprocessed by finely tuned algorithms and analyzed by advanced multivariate methods in order to extract the relevant information for the biological/medical question of interest.   
The talk will present several places where the integration of statistical and chemometrics methods provide solutions to process and analyze spectral omics data and how the UCL metabolomic team of ISBA/IMMAQ tries to contribute to the field.  Chemometrics methods, traditionally applied in chemistry to the analysis of spectral data, have indeed limitations in the presence of (omics) biological samples affected many sources of variability, issued from complex experimental studies and to answer questions where interpretability and reliable statistical significance measures are necessary to validate the outcomes of the data analysis (e.g. biomarker discovery).

 


Isabelle Thomas, UCL

« Big data » in urban and economic geography: revolution or evolution?

BRUNET is a 4 years multidisciplinary research project financed by Innoviris and running at CORE.  It aims at revealing socio-economic communities in and around Brussels with big data.   
We here present a selection of results and demonstrate (1) how this new generation of data and methods are undoubtedly opportunities for geographers and decision makers, but (2) big data are not “the” panacea from solving most quantitative geography issues. There is still a high potential for further multidisciplinary research. Big data are like statistics: « What they reveal is suggestive, but what they conceal is vital ».