Workshops

SKILL BUILDING WORKSHOPS
Dates: January 9-13, 2017
Learn computational skills in a hands-on format with expert instructors.

REGISTRATION
Registration is now open for most workshops.  Workshops are free and open to the public but advance registration is required.  

We regret that workshops will not be video taped or livestreamed.

Search and register for workshops by clicking on a day below.  Or, click here to search by topic.

QUESTIONS? Email iacs-info@seas.harvard.edu

Data Science in Python (Intro)

Data Science in Python
Day 1: Introduction to Python, Numpy, Matplotlib and Bokeh
Presented by Rahul Dave, IACS and Ian Stokes-Rees, Continuum Analytics
Monday, January 9, 2017
8:30 AM - 11:30 AM

This is day one of a five-day, three hours-per-day workshop that will take you from being a person with some idea of how to program to a person with some idea of how to do data science. 

Pre-requisites:

Must have programmed in some programming language; being math savvy will help but is not necessary. 

Participants must bring a laptop with Anaconda Python Distribution installed (https://www.continuum.io/downloads).  We will use Python 2.7 in all sessions.

Overview of the entire week:

We'll work through learning those parts of Python needed to do data science, starting with numerical python; we'll then move on to exploratory data analysis and visualization; from there we'll tackle training some machine learning models, both regression (the prediction of continuous outcomes) and classification (the prediction of labels), including concepts such as feature selection, cross-validation, and regularization, and (time permitting) including the use of ensembles.

Finally, you’ll learn how to train these models when the data sizes are two large for one machine, and how to reduce the amount of computational time required to train these models.

Topics covered throughout the week include:

Day 1: Monday, January 9:
Intro to Python, Numpy, Matplotlib and Bokeh.

Day 2: Tuesday, January 10:
Exploratory analysis and visualization; the basics of machine learning.

Day 3: Wednesday, January 11:
Learning a model (complexity, regularization, cross-validation); Regression.

Day 4: Thursday, January 12:
Classification and Model comparison.

Day 5: Friday, January 13:
Large scale machine learning with joblib, dask, and ipython parallel. If time permits, Ensembles.

Please note that you must register for each workshop separately.

REGISTER FOR DAY 1 HERE

 

 

 

 

 

 

Introduction to MATLAB

Introduction to MATLAB: Problem Solving and Programming
Presented by MathWorks
Monday, January 9, 2017
12:00 PM - 2:30 PM

MATLAB is a high-level language that allows you to quickly perform computation and visualization through easy-to-use programming constructs.  This hands-on lab presents the essentials you need to use MATLAB for your classes or research.

In this hands-on workshop, attendees will learn how to import data from an external file, plot the data over time, then perform some analysis to view the data trends.  You’ll learn how to write a MATLAB script and publish it to a format for sharing, such as HTML. You’ll also learn how to write your own MATLAB functions, use flow control, and create loops.  By the end of the session, you’ll have learned to create an application in MATLAB.

Key topics include:

  • Navigating the MATLAB desktop
  • Working with variables in MATLAB
  • Calling MATLAB functions
  • Importing and extracting data
  • Visualizing data
  • Conducting computational analysis
  • Fitting data to a curve
  • Automating analysis with scripts
  • Publishing MATLAB programs
  • Programming in MATLAB

Pre-requisites:
Attendees must bring a laptop to this hands-on workshop with MATLAB already installed.  A trial can be obtained here:  http://www.mathworks.com/products/matlab/

REGISTER HERE

 

MATLAB and the Internet of Things

MATLAB and the Internet of Things
Presented by MathWorks
Monday, January 9, 2017
3:00 PM - 5:00 PM

Internet of Things (IoT) describes an emerging trend where a large number of embedded devices (things) are connected to the Internet. These connected devices communicate with people and other things and often provide sensor data to cloud storage and cloud computing resources where the data is processed and analyzed to gain important insights. MATLAB® and Simulink® products support IoT systems by helping you develop and test edge node devices, access and aggregate data, and analyze IoT sensor data and model communications channels.  We will also show how ThingSpeak can be used as an IoT data aggregation platform

Highlights:

  • Overview of MathWorks® IoT workflow
  • Demonstrate how MATLAB can be used to access and aggregate data, analyze and visualize data and develop and test edge node devices
  • Hands On:  Explore MATLAB and ThingSpeak to create your own IoT system

Pre-requisites:

Attendees must bring a laptop to this hands-on workshop with MATLAB and ThingSpeak already installed.  Trials can be obtained here:
https://www.mathworks.com/products/matlab/
http://www.mathworks.com/hardware-support/thingspeak.html

REGISTER HERE

 

Data Science in Python (Visualization and Basics of Machine Learning)

Data Science in Python
Day 2: Exploratory analysis and visualization; the basics of machine learning
Presented by Rahul Dave, IACS and Ian Stokes-Rees, Continuum Analytics
Tuesday, January 10, 2017
8:30 AM - 11:30 AM

This is day two of a five-day, three hours-per-day workshop that will take you from being a person with some idea of how to program to a person with some idea of how to do data science.  Day two will cover the following topics: exploratory analysis and visualization; the basics of machine learning.

Pre-requisites:

Must have programmed in some programming language; being math savvy will help but is not necessary. 

Participants must bring a laptop with Anaconda Python Distribution installed (https://www.continuum.io/downloads).  We will use Python 2.7 in all sessions.

Overview of the entire week:

We'll work through learning those parts of Python needed to do data science, starting with numerical python; we'll then move on to exploratory data analysis and visualization; from there we'll tackle training some machine learning models, both regression (the prediction of continuous outcomes) and classification (the prediction of labels), including concepts such as feature selection, cross-validation, and regularization, and (time permitting) including the use of ensembles.

Finally, you’ll learn how to train these models when the data sizes are two large for one machine, and how to reduce the amount of computational time required to train these models.

Topics covered throughout the week include:

Day 1: Monday, January 9:
Intro to Python, Numpy, Matplotlib and Bokeh.

Day 2: Tuesday, January 10:
Exploratory analysis and visualization; the basics of machine learning.

Day 3: Wednesday, January 11:
Learning a model (complexity, regularization, cross-validation); Regression.

Day 4: Thursday, January 12:
Classification and Model comparison.

Day 5: Friday, January 13:
Large scale machine learning with joblib, dask, and ipython parallel. If time permits, Ensembles.

Please note that you must register for each workshop separately.

REGISTER FOR DAY 2 HERE

 

Image Processing, Machine Learning, Computer Vision and Deep Learning in MATLAB

Image Processing, Maching Learning, Computer Vision and Deep Learning in MATLAB
Presented by MathWorks
Tuesday, January 10, 2017
12:00 PM - 2:30 PM

This seminar will be particularly valuable for anyone interested in using MATLAB to process, visualize, and quantify imagery. Rather than focus on extracting information from a few homogeneous images, we will introduce a typical real-world challenge, and discuss approaches to managing and exploring collections of widely heterogeneous images.  We will also describe approaches to implementing deep learning networks in MATLAB, and will compare and contrast those approaches with more traditional computer vision and machine learning techniques.

In this presentation, we will:

* Explore and manage a range of real-world image sets

* Solve challenging image processing problems with user interfaces

* Classify images by content using machine learning techniques

* Detect, recognize, and track objects and faces in images

Pre-requisites:

Attendees must bring a laptop to this hands-on workshop with MATLAB already installed.  A trial can be obtained here:  http://www.mathworks.com/products/matlab/

REGISTER HERE

 

 

 

Deep Learning Part I

Deep Learning Part I
Presented by Andrea Azzini and Giovanni Conserva, IACS / Politecnico di Milano
Tuesday, January 10, 2017
3:00 PM - 5:30 PM

Introduction to Deep Learning and TensorFlow basics

  • Theory about neural networks, backpropagation, optimization, etc.
  • TensorFlow basics (functions, graphs, etc.)
  • Dropout and/or other kinds of normalization (theory + TensorFlow)
  • Matplotlib visualization

Pre-requisites:

The workshop will be based on TensorFlow, a Python Machine Learning library by Google. Attendees should download and setup TensorFlow in advance of class:
https://www.tensorflow.org/versions/r0.12/get_started/os_setup.html

Attendees should also install matplotlib, a visualization library:
http://matplotlib.org/faq/installing_faq.html

REGISTER HERE

Data Science in Python (Learning a Model and Regression)

Data Science in Python
Day 3: Learning a Model (complexity, regularization, cross-validation) and Regression.
Presented by Rahul Dave, IACS and Ian Stokes-Rees, Continuum Analytics
Wednesday, January 11, 2017
8:30 AM - 11:30 AM

This is day three of a five-day, three hours-per-day workshop that will take you from being a person with some idea of how to program to a person with some idea of how to do data science.  Day three will cover the following topics: learning a model (complexity, regularization, cross-validation) and regression.

Pre-requisites:

Must have programmed in some programming language; being math savvy will help but is not necessary. 

Participants must bring a laptop with Anaconda Python Distribution installed (https://www.continuum.io/downloads).  We will use Python 2.7 in all sessions.

Overview of the entire week:

We'll work through learning those parts of Python needed to do data science, starting with numerical python; we'll then move on to exploratory data analysis and visualization; from there we'll tackle training some machine learning models, both regression (the prediction of continuous outcomes) and classification (the prediction of labels), including concepts such as feature selection, cross-validation, and regularization, and (time permitting) including the use of ensembles.

Finally, you’ll learn how to train these models when the data sizes are two large for one machine, and how to reduce the amount of computational time required to train these models.

Topics covered throughout the week include:

Day 1: Monday, January 9:
Intro to Python, Numpy, Matplotlib and Bokeh.

Day 2: Tuesday, January 10:
Exploratory analysis and visualization; the basics of machine learning.

Day 3: Wednesday, January 11:
Learning a model (complexity, regularization, cross-validation); Regression.

Day 4: Thursday, January 12:
Classification and Model comparison.

Day 5: Friday, January 13:
Large scale machine learning with joblib, dask, and ipython parallel. If time permits, Ensembles.

Please note that you must register for each workshop separately.

REGISTER FOR DAY 3 HERE

 

Deep Learning Part II

Deep Learning Part II
Presented by Andrea Azzini and Giovanni Conserva, IACS / Politecnico di Milano
Wednesday, January 11, 2017
12:00 PM – 2:30 PM

Introduction to Computer Vision and Advanced TensorFlow

The following topics will be covered:

  • Theory of Convolutional Neural Networks and Computer Vision (image classification, semantic-wise pixel segmentation, edge detection, etc.)
  • TensorBoard visualization (graph, loss, accuracy, images, features)
  • How to save/restore a trained model
  • Tensor/Image manipulation (crop, resize, rotation, upscaling, etc)

Pre-requisites:

The workshop will be based on TensorFlow, a Python Machine Learning library by Google. Attendees should download and setup TensorFlow in advance of class:
https://www.tensorflow.org/versions/r0.12/get_started/os_setup.html

Attendees should also install matplotlib, a visualization library:
http://matplotlib.org/faq/installing_faq.html

REGISTER HERE

Introduction to Tableau

Introduction to Tableau 
Presented by Mary Kate Quigley, Tableau
Wednesday, January 11, 2017
12:00 PM - 2:30 PM

Learn how you can accelerate your analysis and create impactful insights with Tableau! Join a Tableau expert for a hands-on Desktop training session to help you see and understand your own data. Learn best practices for data presentation, and how to create interactive visualizations and dashboards in Tableau. This workshop is designed for all audiences and skill levels.

Pre-requisites:
Tableau downloaded and installed.  Instructions will be sent to registrants prior to the workshop with download information.

REGISTER HERE

Introduction to R

Introduction to R
Presented by Ista Zahn, the Institute for Quantitative Social Science
Wednesday, January 11, 2017
3:00 PM - 5:30 PM

R has become the go-to language for statistics and is increasingly popular among data scientists. This workshop will gently guide you toward R mastery using real-world examples and hands-on exercises.  Specifically, you will learn how to install and manage packages, navigate the help system, import and manipulate data, conduct simple statistical analyses, and construct simple graphics in R.

No previous R experience is required.

Pre-requisites:
You must have a laptop with R (https://cran.r-project.org/) and RStudio (https://www.rstudio.com/products/rstudio/download/) installed.

REGISTER HERE

 

Data Science in Python (Classification and Model Comparison)

Data Science in Python
Day 4: Classification and model comparison.
Presented by Rahul Dave, IACS and Ian Stokes-Rees, Continuum Analytics
Thursday, January 12, 2017
8:30 AM - 11:30 AM

This is day four of a five-day, three hours-per-day workshop that will take you from being a person with some idea of how to program to a person with some idea of how to do data science.  Day four will cover the following topics: classification and model comparison.

Pre-requisites:

Must have programmed in some programming language; being math savvy will help but is not necessary. 

Participants must bring a laptop with Anaconda Python Distribution installed (https://www.continuum.io/downloads).  We will use Python 2.7 in all sessions.

Overview of the entire week:

We'll work through learning those parts of Python needed to do data science, starting with numerical python; we'll then move on to exploratory data analysis and visualization; from there we'll tackle training some machine learning models, both regression (the prediction of continuous outcomes) and classification (the prediction of labels), including concepts such as feature selection, cross-validation, and regularization, and (time permitting) including the use of ensembles.

Finally, you’ll learn how to train these models when the data sizes are two large for one machine, and how to reduce the amount of computational time required to train these models.

Topics covered throughout the week include:

Day 1: Monday, January 9:
Intro to Python, Numpy, Matplotlib and Bokeh.

Day 2: Tuesday, January 10:
Exploratory analysis and visualization; the basics of machine learning.

Day 3: Wednesday, January 11:
Learning a model (complexity, regularization, cross-validation); Regression.

Day 4: Thursday, January 12:
Classification and Model comparison.

Day 5: Friday, January 13:
Large scale machine learning with joblib, dask, and ipython parallel. If time permits, Ensembles.

Please note that you must register for each workshop separately.

REGISTER FOR DAY 4 HERE

 

 

NoSQL vs. NewSQL: Demystifying the Zoo of Contemporary Database Systems

NoSQL vs. NewSQL: Demystifying the Zoo of Contemporary Database Systems 
Presented by Niv Dayan, Institute for Applied Computational Science
Thursday, January 12, 2017
12:00 PM - 2:30 PM
Maxwell Dworkin, Room G115 (note that this location is across the street from other workshops)

Over the past 5-10 years, there has been an explosion in database system technologies. Today, businesses and researchers facing a data management task must choose between SQL, NoSQL, NewSQL, as well as various sub-categories, e.g. row-stores, column-stores, document-stores, key-value-stores, etc.  In this workshop, we will demystify and learn to navigate the landscape of contemporary database systems. You will gain the ability to decide “which database system is right for me?” In particular, we will compare and contrast MongoDB to VoltDB, which are two prominent examples of NoSQL and NewSQL systems respectively. We will take a crash-course of MongoDB to leave an impression of how a NoSQL system looks and feels in practice.

Pre-requisites:
We advise participants to install MongoDB ahead of the workshop.  Detailed instructions are easy to find online, for example:

OS X: https://www.youtube.com/watch?v=G--uGbOFF9E (you may need to add “sudo” as a prefix to every installation command)
Windows: https://www.youtube.com/watch?v=K_5mj3-_uJQ

REGISTER HERE

 

R Programming for Data Analysis

R Programming for Data Analysis
Presented by Ista Zahn, Institute for Quantitative Social Science
Thursday, January 12, 2017
12:00 PM - 2:30 PM

R is an excellent statistics platform, and a full programming language.  This workshop takes you beyond simple statistical analysis by introducing you to some of the more powerful programming techniques in R. Specifically, you will learn how to manipulate text using regular expressions, automate repetitive tasks, define your own functions and more.

Pre-requisites:
Basic familiarity with R, such as acquired from an introductory R workshop.

Participants should bring a laptop with a recent version of R (http://cran.r-project.org/) and RStudio (http://www.rstudio.com/products/rstudio/download/) installed.

REGISTER HERE

NVIDIA Pascal P100 & CUDA 8 Deep Dive

NVIDIA Pascal P100 and CUDA 8 Deep Dive
Presented by Jonathan Bentz, NVIDIA
Thursday, January 12, 2017
3:00 PM - 5:30 PM

The Tesla P100 delivers unprecedented performance for hyperscale and HPC applications. It offers 5.3 TeraFLOPS of peak double-precision performance—3X faster than the previous-generation Tesla K40 GPU. It also delivers 10.6 TeraFLOPS of peak single-precision performance to accelerate applications in machine learning, AI and other HPC workloads. This performance is realized in software applications around the world by developers building their applications using NVIDIA CUDA 8, the latest update to NVIDIA’s powerful parallel computing platform and programming model.

This session will present the Pascal hardware architecture, with an emphasis on how CUDA 8 enabled applications and frameworks are able to take advantage of the latest features found in the Tesla P100 and related GPUs.

REGISTER HERE

Data Science in Python (Large Scale Machine Learning)

Data Science in Python
Day 5: Large scale machine learning with joblib, dask, and ipython parallel. 
Presented by Rahul Dave, IACS and Ian Stokes-Rees, Continuum Analytics
Friday, January 13, 2017
8:30 AM - 11:30 AM


This is day five of a five-day, three hours-per-day workshop that will take you from being a person with some idea of how to program to a person with some idea of how to do data science.  Day five will cover the following topics: Large scale machine learning with joblib, dask, and ipython parallel.  If time permits, we will also cover ensembles. 

Pre-requisites:

Must have programmed in some programming language; being math savvy will help but is not necessary. 

Participants must bring a laptop with Anaconda Python Distribution installed (https://www.continuum.io/downloads).  We will use Python 2.7 in all sessions.

Overview of the entire week:

We'll work through learning those parts of Python needed to do data science, starting with numerical python; we'll then move on to exploratory data analysis and visualization; from there we'll tackle training some machine learning models, both regression (the prediction of continuous outcomes) and classification (the prediction of labels), including concepts such as feature selection, cross-validation, and regularization, and (time permitting) including the use of ensembles.

Finally, you’ll learn how to train these models when the data sizes are two large for one machine, and how to reduce the amount of computational time required to train these models.

Topics covered throughout the week include:

Day 1: Monday, January 9:
Intro to Python, Numpy, Matplotlib and Bokeh.

Day 2: Tuesday, January 10:
Exploratory analysis and visualization; the basics of machine learning.

Day 3: Wednesday, January 11:
Learning a model (complexity, regularization, cross-validation); Regression.

Day 4: Thursday, January 12:
Classification and Model comparison.

Day 5: Friday, January 13:
Large scale machine learning with joblib, dask, and ipython parallel. If time permits, Ensembles.

Please note that you must register for each workshop separately.

REGISTER FOR DAY 5 HERE

 

Intro to Image Recognition with Microsoft Cognitive Toolkit

Introduction to Image Recognition with Microsoft Cognitive Toolkit
Presented by Jonathan Bentz, NVIDIA
Friday, January 13, 2017
8:30 AM - 11:30 AM

Image recognition has been a very successful practical application of deep neural networks to date. This lab will showcase how to use the Microsoft Cognitive Toolkit (formerly CNTK) from Microsoft for training and testing neural networks to recognize handwritten digits.

We’ll work through a series of examples that will allow you to design, create, train and test a neural network to classify the MNIST handwritten digit dataset, illustrating the use of convolutional, pooling and fully connected layers as well as different types of activation functions.

By the end of the lab you’ll have a basic knowledge of convolutional neural networks which will prepare you to move to more advanced usage of Microsoft Cognitive Toolkit.

REGISTER HERE

 

 

Machine Learning in the Wolfram Language

Machine Learning in the Wolfram Language
Presented by Etienne Bernard, Wolfram Research
Friday, January 13, 2017
12:00 PM - 2:30 PM

During the last two years, the Wolfram Language has introduced a set of machine learning functionalities including user-friendly functions to perform classification, extracting features, clustering data, etc., and a high-performance neural network framework. In this class we’ll present an overview of these functionalities and you’ll learn to use them through hands-on examples.

Pre-requisites:

Basic knowledge of Mathematica is helpful, but not required.  Attendees should create a Wolfram ID and download a free 15 day trial of Mathematica BEFORE CLASS: https://www.wolfram.com/mathematica/trial/.  

Hands-on activities will be performed in class online at: https://develop.open.wolframcloud.com/app/

REGISTER HERE

 

Deep Learning for Image Segmentation with TensorFlow

Deep Learning for Image Segmentation with TensorFlow
Presented by Jonathan Bentz, NVIDIA
Friday, January 13, 2017
12:00 PM - 2:30 PM

There are a variety of important applications that need to go beyond detecting individual objects within an image and instead segment the image into spatial regions of interest. For example, in medical imagery analysis it is often important to separate the pixels corresponding to different types of tissue, blood or abnormal cells so that we can isolate a particular organ.

In this lab we will use the TensorFlow deep learning framework to train and evaluate an image segmentation network using a medical imagery dataset.

REGISTER HERE


DATAFEST:
On January 17 - 18, the Institute for Quantitative Social Sciences (IQSS) at Harvard will host DataFest, a two day workshop on working with data.  The curriculum at DataFest complements the skills taught at ComputeFest workshops.