Date

May 30 – 31, 2015

Location

Boston Convention Center, Boston MA

Remaining

Speakers

70 Professional Speakers

GOLD

GOLD SPONSORS / ODSC Gold Level Sponsors

dr-logo3
mgh-logo-310
Booz-Allen-Stacked-Logo (1)

SILVER

SILVER SPONSORS / ODSC Silver Level Sponsors

basis-logo-300x300
rstudio-logo
metis_310-300x300
rifiniti_300-300x300

About The Conference / Here's What You Need To Know

The #ODSc Conference brings together the most influential practitioners, innovators, and thought leaders in the open source and data science fields in an effort to encourage the development and use of open source in data science.

Open source data science is revolutionizing the way we analyze information. To tap into the latest innovations and opportunities be sure to join us.

  • Attend over 21 workshops and 72 presentations by some of the best minds in data science
  • Be among the first to understand how you can leverage the promise of the #ODSc revolution
  • Discover latest applications and breakthroughs
  • Book signings by well-known presenters and authors such as Allen Downey, Wes McKinney, and Josh Wills.
  • Connect with other innovators across industries and disciplines

Keynote Speakers / ODSC 2015 Boston Keynotes

The Open Data Science Conference is very pleased to announce Anthony Goldbloom, CEO of Kaggle and Josh Wills, Senior Director of Data Science as our dual keynote speakers. Both Anthony and Josh epitomize the key goals of our conference including making data science accessible and the open exchange of ideas.

Anthony Goldbloom, CEO of Kaggle

Forbes has twice named Anthony Goldbloom one of the 30 under 30 in technology, the MIT Technology Review has named him as one of the 35 Innovators Under 35 and the University of Melbourne has given Anthony an Alumni of Distinction Award.

Kaggle is a platform for predictive modeling and analytics competitions on which companies and researchers post their data and statisticians and data miners from all over the world compete to produce the best models.

Josh Wills, Senior Director of Data Science at Cloudera

Josh Wills is Cloudera’s Senior Director of Data Science, working with customers and engineers to develop Hadoop-based solutions across a wide-range of industries. He is the founder and VP of the Apache Crunch project for creating optimized MapReduce pipelines in Java and lead developer of Cloudera ML, a set of open-source libraries and command-line tools for building machine learning models on Hadoop. Prior to joining Cloudera, Josh worked at Google, where he worked on the ad auction system and then led the development of the analytics infrastructure used in Google+

ODSC Networking Reception / Connect with your fellow data scientists!

Connect with your fellow data scientists! #ODSC is pleased to announce a Saturday evening networking reception sponsored by Basis Technology and Tamr from 5:30 pm to 7:00pm. Light food and beverages served. Relax with a beer or wine as you rehash the day’s highlights!

Conference Schedule / 72 Presentations, 21 Workshops

sched-img3
We have a fantastic lineup of 72 Presentations, 21 Workshops, and Keynotes. Please note that talk schedule is subject to change. Click on the schedule image for a printable PDF. An interactive schedule is below.

ODSC Conference App / Get connnected

ODSC is a comprehensive event management mobile app. With ODSC app you will be able to:
  • See the events agenda.
  • Get information on which topics will be discussed.
  • Participate in polls and Q&A.
  • Create your own event profile if you wish.
  • Be notified by the ODSC of any updates, in real time.
  • Consult general event info (location, how to get there and more).
Dallin Akagi & Brian Bell

Introduction to Python for Data Science

During the course of this workshop, we will cover a broad set of data science how-to's using Python and several popular industry-standard open source libraries, including pandas and scikit-learn. Starting from
Rahul Dave

Introduction to Machine Learning Basics with Python

What you will Learn:

You will learn the basics of Data Science and Machine Learning, especially the critical concepts of Loss (or Cost or Score), Overfitting, Regularization, and Validation.

Todd Cioffi

Data Science 101

Curious about Data Science?  Self-taught on some aspects, but missing the big picture?  Well, you've got to start somewhere and this session is the place to do it. This session
Pawel (Pavo) Paczuski

Introduction to R for Data Science

The goal of this workshop is to introduce those who are unfamiliar with R to: • The style of the language • Data manipulation • Graphical output - in particular, the ggplot2 package • Basics of statistical models The Workshop will be divided into 3 parts: • The first part will introduce the participants to the R language. • The second part will focus on data manipulation and basic graphical output • The third part will focus on statistical modeling. At the end of the workshop, participants will have a thorough understanding of R program and can add another tool
Chris Mack

Introduction to Text Analytics

Abstract

You will learn the fundamental building blocks that are used to address real world text analytics problems, including tokenization, tagging, normalization, and disambiguation. We will also look

Alex Johnson

Introduction to Open Source Tools for Data Science

From basic research to crowd-sourced statistics, anyone with an internet connection can access a staggering quantity and variety of data about the world around them. This workshop will describe
James Powell

Introduction to Python for Data Science

Brief: This workshop serves as a starting-point for attendees who are interested in attending ODSC presentations given in Python but who do not yet have experience with the use
Daniel Gerlanc

An Introduction to Programming with Data using R

In this workshop we will learn about and use some of the fundamental data structures and statistical methods provided by R. Unlike most programming languages, R was designed specifically
Max Kuhn

Predictive Modeling Workshop

The workshop is an overview of creating predictive models using R. An example data set will be used to demonstrate a typical workflow: data splitting, pre-processing, model tuning and
Sponsored by Basis Technology and Tamr

ODSC Networking Reception

Connect with your fellow data scientists!  #ODSC is pleased to announce a Saturday evening networkingreception sponsored by Basis Technology and Tamr from 5:30 pm to 7:00pm.  Light

Registration Desk Opens

We have a fantastic conference program today. We kick off with dual keynote speeches, and follow with 26 presentations, 6 workshops, and 2 panel discussions over five tracks. Click here to
Dr. Fidan Boylu, Dr. Syed Fahad Allam Shah

Microsoft – Building a Predictive Analytics Solution with Azure ML

-        Create and operationalize a predictive model using Microsoft Azure Machine Learning. -        Perform the typical steps involved in building a predictive analytics solution such as data ingestion, data cleansing,
Rahul Dave

Machine Learning for Suits: Making Decisions from Analytics

You will learn the basic concepts of machine learning - such as Modeling, Model Selection, Loss or Profit, overfitting, and validation - in a non-mathematical way, so that

Break

Break

Alec Radford

Recurrent Neural Networks for text analysis

Recurrent Neural Networks hold great promise as general sequence learning algorithms. As such, they are a very promising tool for text analysis. However, outside of very specific use cases
Kate Saenko

DIY Deep Learning with Caffe Workshop

Caffe (Convolutional Architecture for Fast Feature Embedding) is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center
Paul Mineiro

The Vowpal Wabbit Machine Learning Toolkit: A Tutorial

Vowpal Wabbit is both an open-source machine learning toolkit and an active research platform.  In this talk I introduce Vowpal Wabbit, discuss some of the design decisions, and the
Sponsored by Basis Technology and Tamr

ODSC Networking Reception

Connect with your fellow data scientists!  #ODSC is pleased to announce a Saturday evening networkingreception sponsored by Basis Technology and Tamr from 5:30 pm to 7:00pm.  Light

Registration Desk Opens

We have a fantastic conference program today. We kick off with dual keynote speeches, and follow with 26 presentations, 6 workshops, and 2 panel discussions over five tracks. Click here to
Break

Break

David Epstein, Justin Kamerman

Feature Engineering

David Epstein, Justin Kamerman / David Epstein: SENIOR DATA SCIENTIST AT SOCURE  
Jared Lander

Making R Go Faster and Bigger

The features of R that make it easy to use--dynamically typed, in-memory analysis, the interpreter engine and REPL--can also slow it down. Fortunately the R Core Team has made
Aaron Schumacher

Practical Mergic: How to Join Anything

Combining data sets can be a huge pain, with possible problems both obvious and insidious. Aaron will present practical approaches for detecting and avoiding potential pitfalls, as well as
Sponsored by Basis Technology and Tamr

ODSC Networking Reception

Connect with your fellow data scientists!  #ODSC is pleased to announce a Saturday evening networkingreception sponsored by Basis Technology and Tamr from 5:30 pm to 7:00pm.  Light

Registration Desk Opens

We have a fantastic conference program today. We kick off with dual keynote speeches, and follow with 26 presentations, 6 workshops, and 2 panel discussions over five tracks. Click here to
Break

Break

Julie Steele

Understanding the CDO

As the importance of having a data strategy in place is sinking in, many organizations have added a chief data officer (CDO) to their executive team to help create
Sponsored by Basis Technology and Tamr

ODSC Networking Reception

Connect with your fellow data scientists!  #ODSC is pleased to announce a Saturday evening networkingreception sponsored by Basis Technology and Tamr from 5:30 pm to 7:00pm.  Light

Registration Desk Opens

We have a fantastic conference program today. We kick off with dual keynote speeches, and follow with 26 presentations, 6 workshops, and 2 panel discussions over five tracks. Click here to
Mark Higgins, Kirat Singh

Crowd-Sourced Data Science Competitions

Mark Higgins, Kirat Singh / Co-Founder and CEO at Washington Square Technologies; Co-Founder, President and CTO at Washington Square Technologies  
Lynn Root

Metric-Driven Development: See the Forest for the Trees

At Spotify, my team struggled to be awesome.  We had a very loose understanding of what product/service our squad was responsible for, and even less so of the expectations
Matthew Wills

Utilizing R in Sports Business Operations

This presentation will overview how the Grizzlies apply the use of R to their sales and marketing business operations. From basic data manipulation, to statistical modeling and enhanced visualization,
Break

Break

Max Kleiman-Weiner

Machine-in-the-loop for knowledge discovery

I'll present the new knowledge discovery tools we are building at Diffeo. Unlike traditional search engines that use keywords, Diffeo provides an in-browser knowledge base that accelerates information gathering
Paul Bamberg

Adventures in using R to teach Mathematics

In 2014 I launched a new course, “Mathematical Foundations of Statistical Software,” in the Harvard Extension school, aimed at students with a solid background in calculus. Lectures were a
TJ Houk, David Jaw

Introducing Machine Learning to a Pet Insurance Company

As an insurance company, we receive a monthly premium from policy holders and in return, we pay claims on veterinary bills.  Insurance risk for pet health is relatively uncharted
Bill Disch

Modeling in the Healthcare Industry: A Collaborative Approach

Evariant has partnered with, and are using DataRobot for multivariate predictive analytics because it is a flexible, robust, and extremely efficient tool for maximizing our modeling efforts, as well
Sponsored by Basis Technology and Tamr

ODSC Networking Reception

Connect with your fellow data scientists!  #ODSC is pleased to announce a Saturday evening networkingreception sponsored by Basis Technology and Tamr from 5:30 pm to 7:00pm.  Light

Registration Desk Opens

We have a fantastic conference program today. We kick off with dual keynote speeches, and follow with 26 presentations, 6 workshops, and 2 panel discussions over five tracks. Click here to
Gregor Stewart

Beyond Names

Finding and classifying the mentions of the things named in text, often called Named Entity Recognition or NER, is a fundamental task in many search and analysis applications.

Break

Break

Ted Kwartler

Text Analytics & Sentiment Modeling Workshop

You will learn how modern customer service organizations use data to understand important customer attributes and how R is used for workforce optimization. Topics include real world examples of
Mark Schindler & Bang Wong

Hands-on Workshop: Data Visualization and UX

Data Visualization is about helping people gain knowledge from data. The focus of this workshop is on approaches to turn data into actionable insights, combining heuristics for visual analytics with techniques from user-experience design. Participants will learn how to choose and create data visualizations driven by user-oriented objectives, through presentations and an in-class exercise.  The class exercise will be conducted in small groups.  Workshop class size is limited to 60 participants, on a first-come, first-served basis.
Break

Break

Andreas Muller

Machine Learning – scikit-learn

scikit-learn has emerged as one of the most popular open source machine learning toolkits, now widely used in academia and industry. scikit-learn provides easy-to-use interfaces to perform advanced analysis and
Gilbert Benghiat, Christopher Bergh, Eric Estabrooks

Big Data Infrastructure – Introduction to Hadoop with MapReduce, Pig, and Hive

The main objective of this workshop is to give the audience hands on experience with several Hadoop technologies and jump start their hadoop journey.  In this workshop, you will
Break

Break

Kenneth Reitz

# Real Data, Real Humans

An exposé on human-centered design, as related to data science and “medium data”. Examples of great API design will be showcased, as well as other end-user facing tools that
Gil Benghiat

Do agile data in just 5 shocking steps

To rephrase an old saying: ‘It takes a village to raise an Analyst.’   Data Analysts and Scientists are working in teams delivering insight and analysis on an ongoing basis.  
Owen Zhang

Open Source Tools and Data Science Competitions

This talk shares the presenter's experience with open source tools in data science competitions. In the past several years Kaggle and other competitions have created a large online community
Steve Cohen

10 Harbingers of the Textpocalypse

Human-generated text may be the next frontier for big data analysis, but we humans are complicated beasts and the text we generate is messy and complicated in ways that
Break

Break

Saul Diez-Guerra

What we learned while teaching Python and Data Science

Pedagogy and lessons learned from teaching an online introductory Python and Data Science courses. This is how we approached the matter, what we learned and where we want to go
Break

Break

Richard Robehr Bijjani

Jumping to Conclusions

Data Science is the study of the extraction of knowledge from data. What if we extract partial or inaccurate knowledge? This illusion of knowledge would lead us to make
Nicole White

Graphs R Cool

Nicole White / Data Scientist at Neo4j  
Break

Break

Speakers / We expect to have over 70 speakers, with confirmed speakers listed below

Wes McKinney

Creator of Pandas, Author, Engineer at Cloudera

Josh Wills

Director of Data Science at Cloudera

Andrew Odewahn

CTO at O'Reilly Media

Vivian S. Zhang

Founder at NYC Data Science Academy

Lynn Root

Founder of PyLadies SF, Engineer at Spotify

Owen Zhang

#1 Ranked Kaggle Data Scientist, Chief Product Officer at DataRobot

Usama Fayyad

Chief Data Officer at Barclays

Thomas Wiecki

Data Science Lead at Quantopian

Max Kuhn

Director of Statistics at Pfizer R&D

Christina Qi

Managing Partner at Domeyard LP

Andy Terrel

Chief Science Officer at Continuum Analytics

Anna Herlihy

Software Engineer at MongoDB, Core contributor to PyMongo and Monary

Kenneth Reitz

Python Overlord and Evangelist at Heroku

Allen Downey

Author of Think Python, Professor at Olin College

Tickets / Grab them soon!

We offer special discounts for groups of 5 or more. Please contact grouptix@opendatascicon.com

If you are a member of the following groups below please click on the link for our discount codes. Please provide documentation at registration desk.

20% Non-profit

20% Academic/Educational Institution employee

20% Government Employee

20% Veteran

20% Startup employee

We have discounts available for many Meetups. Please check with your Meetup organzier or email us at info@opendatascicon.com

All Access Two Day Pass - Student ID

$129

Expires May 25th
Discover career opportunities at top organizations & hottest startups
Explore data science with 70+ technical & non-technical talks
Connect with top data scientists
Get hired!

BUY

All-access 2 day pass-Corporate Donation

$499

$150 goes directly to the NumFOCUS foundation
Discover #ODSc innovators and top talent
Explore latest #ODSc applications with 70+ technical & non-technical talks

BUY

Data Science Community Supporters / We Thank Our Community For Their Support

Languages, Tools, and Topics / Workshops and talks

We’re heavily focused on applied data science featuring real world applications. Here is a list of languages, tools and topics we’re looking to cover in our workshops and talks:

Languages /

R
Python
Pig/Hive
SQL
D3.js

Tools /

Hadoop
Spark
MongoDB
Vowpal Wabbit
Elastic Search

Topics /

Predictive Analytics
Machine Learning
Text Analytics
Deep Learning
Data Visualization

Attendees / Who You Will See

/ Data Scientists

/ Data Engineers

/ Data Analysts

/ Software Developers

/ Technical Leads

/ Researchers

/ System Architects

/ CEOs, CTOs, CIO, etc.

/ Head Data Scientists

/ Head Researchers

/ IT Managers

/ Entrepreneurs

/ Business Strategists

/ Consultants