Track: Big Data

Theme: CODE

Room: Moebius - Grand Auditorium

On: Oct 4, 2013, from 14:00 to 18:00

Track leader(s): Ori Pekelman (Founder, Constellation Matrix) / Florian Douetteau (CEO, Dataiku) / Vincent Heuschling (Founder, Affini-Tech)

Big ideas on Big Data.. data technologies are changing fast and changing the technology landscape in a big way. This track aims to give at the same time a broad overview on the current state of the art with big data infrastructures, analytics, search, machine learning and real time... and open future perspectives... how do you handle big data over a decade, what is the social and ethical impact it will have? If you have a big data strategy these talks may give you a new perspective, if you don't.. you might very well learn where you are supposed to start.

The is accompanied by two sister tracks to give you the lowdown on everything with dedicated sessions on using NewSQL technologies to tackle huge volumes of data (with MariaDB, SkySQL and Postgres) and hands on workshops on Hadoop orchestration and search technologies.

Talks


14:00 - Big Data Track presentation

Duration: 10 minutes

Speakers: Ori Pekelman (Founder, Constellation Matrix) / Florian Douetteau (CEO, Dataiku) / Vincent Heuschling (Founder, Affini-Tech)

Big ideas on Big Data.. data technologies are changing fast and changing the technology landscape in a big way. This track aims to give at the same time a broad overview on the current state of the art with big data infrastructures, analytics, search, machine learning and real time... and open future perspectives... how do you handle big data over a decade, what is the social and ethical impact it will have? If you have a big data strategy these talks may give you a new perspective, if you don't.. you might very well learn where you are supposed to start.

The is accompanied by two sister tracks to give you the lowdown on everything with dedicated sessions on using NewSQL technologies to tackle huge volumes of data (with MariaDB, SkySQL and Postgres) and hands on workshops on Hadoop orchestration and search technologies.


14:10 - Bigdata is transforming the business intelligence landscape.

Duration: 15 minutes

Speakers: Vincent Heuschling (Founder, Affini-Tech)

It's not only a question a question of volumes. The opportunity is not only on cost efficiency, but on the way you process your data. You'll be able to think about your data globaly. You've got the opportunity to mix every dataset from your organisation with external data and increase value. Everything that have made the succes of large dotcom could be integrated to your business. This is possible even if you're on a traditionnal business. Imagine that your data would be active using Machine Learning. You can create new value using technologies and advanced algorithms like correlations, predictors, and recommender system. In this talk we will cover all these facets of Bigdata.


14:25 - Data Science : Rebuild or not

Duration: 15 minutes

Speakers: Florian Douetteau (CEO, Dataiku)

Data scientists are supermen that do and redo again the same jobs. Data cleansing, connecting backend with frontend, orchestrating jobs, .... Is there an end to that?


14:40 - Hadoop and the Cloud : what works, what doesn't

Duration: 15 minutes

Speakers: Pierre Couzy (Senior program manager, Criteo)

More and more workloads are beoing pushed to the cloud every month. Hadoop seems the best example of a cloud application, but there's more than meets the eye. This session will focus on the main différences between on-premise and cloud hadoop setups, notably to see what are the impacts of storage and elasticity.


14:55 - Big Data Forever

Duration: 15 minutes

Speakers: Jonathan Winandy (BI platform Engineer, Viadeo)

In this talk Jonathan will expose the concept of intention preserving append only data structures and their impact on large scale data repositories.


15:10 - Introduction To Mahout

Duration: 45 minutes

Speakers: Ted Dunning (Chief Application Architect, Mapr)

In this talk Ted Dunning will present Apache Mahout, an Apache project to produce free implementations of distributed or otherwise scalable machine learning algorithms focused primarily in the areas of collaborative filtering, clustering and classification, often leveraging, but not limited to, the Hadoop platform.


15:55 - Our Big Analytics Stack

Duration: 15 minutes

Speakers: Charly Clairmont (Technical Director, Altic)

For a long time Altic has been an active member of the french open source ecosystem. Since a few years, Altic has taken a deep interest in the Big Data technologies. Today many actors have added new features related to Big Data in their offers. Altic and its partners, Talend and Engeeniring Informatica (SpagoBI), have decided to create a Big Data Stack using their own solutions. The magic thing : Altic hasn't changed the way its projects are done but only learnt how to store Big Data and compute them. In this presentation we will propose to discover our Big Data stack with : * Hadoop and Spark to store and compute data * Talend DI to create Big Data tasks published and scheduled in SpagoBI * SpagoBI to manage security and allow end users to access to data visualization. For a more user friendly Big Data stack we provide new components : * tMahout talend component which helps us to create Datamining job inside Hadoop without code development in MapReduce paradingm, * SpagoBID3Engine which provides easy manipulation of data to develop beautiful data visualizations


16:10 - MLBASE : Distributed Machine Learning at Scale

Duration: 20 minutes

Speakers: Sam Bessalah (Software Engineer, .)

"Machine learning (ML) and statistical techniques are crucial to transforming Big Data into actionable knowledge. However, the complexity of existing ML algorithms is often overwhelming for non Machine Learning developers.

MLBase is a platform for non machine learning experts to implement complex machine learning tasks, select and dynamically adapt models and algorithms, in a simple, scalable and declarative way."


16:30 - Elastify your app: from SQL to NoSQL in less than 40 mn!

Duration: 40 minutes

Speakers: David Pilato (Technical Advocate, Elastic Search) / Tugdual Grall (Technical Evangelist, Couchbase)

During this "live coding" talk, Tugdual and David will move an old-fashion full SQL application to the NoSQL world.

Using CouchBase and Elasticsearch, they will show all gains you can have with this new architecture: - Easyness - Elasticity (scalablity)

Following points will be covered: - Document Oriented Model - JSon - REST - Caching / Memcache - Full text search - Building live dashboards with Kibana


17:10 - Big Data, Bad Data

Duration: 45 minutes

Speakers: Rand Hindi (CEO, Snips) / Romain Lacombe (Head of Innovation, Etalab) / Rayna Stamboliyska (Founder, RS Strategy)

This talk will explore the ethical boundaries and implications of big data strategies from marketing to science.


Our partners

Institutional partners

Diamond Sponsor

Inria

Platinum sponsors

Ater Way Smile Microsoft Suse

Gold sponsors

Red Hat enovance

Silver sponsors

Abilian af83 blackduck elasticsearch HP jamendo La Poste Palamida StackOverflow

Organizer

Systematic

Co-organizers

af83 Alter Way Inria Smile

Community Partners

adullact aful cnll ploss OSDC.fr OW2 Silicon Sentier

WebTV Partner

Intelli'N

Press partners

01business 3Dnatives l'atelier channelbiz CIO Collaboratif-info i-entreprise frenchweb l'informaticien it espresso it expert i-solo le journal du net linuxfr.org linux magazine linux pratique le monde informatique reseaux telecoms silicon terra eco ubergizmo