Big Data Spain

15th ~ 16th OCT 2015 MADRID, SPAIN #BDS15


THANK YOU FOR AN AMAZING CONFERENCE!


THE 4th EDITION OF BIG DATA IN Oct 2015 WAS A RESOUNDING SUCCESS.

Thursday 15th

from 15:15 pm to 16:00 pm

Room 25

-

Begin at the beginning: Feature selection for Big Data

Technical

Preprocessing data is one of the most effort consuming tasks in Machine Learning (ML). In the Big Data context, the models automatically derived from data should be as simple as possible, interpretable and fast, and for achieving that we will need to use the best variables, that is, use the best features of such data.

Read more

Although there are already several libraries available which approach ML tasks in Big Data, that is not the case for FS algorithms yet, and other preprocessing techniques such as discretization. However, the existing FS methods do not scale well when dealing with Big Data. In this presentation, we show our efforts and new ideas for parallelizing standard FS methods for its use on Big Data environments.

Amparo Alonso foto

Amparo Alonso

University A CoruñaIA Lab Head


Building graphs to discover information

Technical

The basic challenge of a data scientist is to unveil information from raw data. Traditional machine learning algorithms have treated “pure” data analytics situations that should comply with a set of restrictions, such as access to labels, a clear prediction objective… However, the reality in practice shows that, due to the wide spread of data science nowadays, the exception is the norm and it is usual to encounter situations that depend on gathering information from raw data which lacks any kind of structure, or objective that classic approaches assume.

Read more

In these situations, building a graph that encodes the information we are trying to unveil is the most intuitive place to start or even the only one feasible when we lack any field knowledge or previously stated aim. Unfortunately, building a graph when the number of nodes is huge from scratch is a challenging task computationally, and requires some approximations to make it feasible. In this review, we will talk about the most standard way of building those graphs in practice, and how to exploit them to solve data science tasks.

David Martínez foto

David Martínez

UCLResearcher