Workshops

SKILL BUILDING WORKSHOPS
Dates: January 11-15, 2016
Learn computational skills such as R, Amazon AWS, NVIDIA CUDA, cuDNN, deep learning, MATLAB, Python, Tableau, Microsoft Azure and more in a hands-on format with expert instructors.  
All workshops are free and open to the public.  (Workshops will not be video taped or lived streamed.)

REGISTRATION:
Click below on each day's workshop schedule to register.

DIRECTIONS, PARKING AND WI-FI:
Find parking, meals and additional workshop preparation information here.

Download the ComputeFest Program here, which includes a full workshop schedule.

INTRO TO PYTHON FOR SCIENTISTS & ENGINEERS (9:30am)

Introduction to Python for Scientists & Engineers - WORKSHOP IS CLOSED
Presented by Ian Stokes-Rees, Continuum Analytics
9:30am - 12:30pm

Using the latest version of Python, Python 3.5, we will cover core data types, functions, classes, modules and scripts. We will also cover the standard namespacing, scoping, and the standard library. We'll show you how to do input/output and file handling. Finally, you will learn how to interact with Excel and Databases.

This is a workshop for those with no previous programming experience.

Participants must bring a laptop with Anaconda Python 3.5 installed: https://www.continuum.io/downloads.

REGISTER HERE

INTRO TO MICROSOFT AZURE PART I (9:30am)

Introduction to Microsoft Azure - Part I
Presented by Brandon Rohrer, Microsoft
9:30am - 12:30pm 

In this workshop, Microsoft will present a demonstration of how one can leverage multiple cloud services to create an end-to-end, implemented solution. This session will walk you through how a single processing pipeline can be used to leverage both "hot" path (i.e. streaming data) and "cold" path (i.e. batch data) analytics, and we will illustrate how one can operationalize predictive models in order to solve a real business problem.

Skill Level: Introductory Pre-requisites:
Free Live ID account is preferred, but not necessary.

REGISTER HERE

INTRODUCTION TO R (1:30pm)

Introduction to R 
Presented by Ista Zahn, Institute for Quantitative Social Science
1:30 - 4:30pm

Get an introduction to R, the open-source system for statistical computation and graphics. With hands-on exercises, learn how to import and manage datasets, create R objects, install and load R packages, conduct basic statistical analyses and create common graphical displays. Appropriate for those with little or no prior experience with R.

Participants should bring a laptop with a recent version of R (http://cran.r-project.org/) and RStudio (http://www.rstudio.com/products/rstudio/download/) installed and should download course materials from http://tutorials.iq.harvard.edu/R/Rintro.zip.

REGISTER HERE

INTRO TO MICROSOFT AZURE PART II (1:30pm)

Introduction to Microsoft Azure - Part II (Hands-On)
Presented by Microsoft
1:30 - 4:30pm

In this workshop, Microsoft will lead a set of hands-on laboratories that will allow participants to explore some of the data processing features available in the Cortana Analytics Suite. Attendance in the morning session is not required to attend this session, but the hands-on laboratories covered will be set in the context of the initial session, so maximum benefit will arise from attending both.

Skill Level: Introductory
Pre-requisites: Free Live ID account.

REGISTER HERE

DATA VISUALIZATION WITH PYTHON (9:30am)

Data Visualization with Python
Presented by Ian Stokes-Rees, Continuum Analytics
9:30am - 12:30pm

This workshop is a a tour of options for data visualization in Python. We will be using Python 3.5, and will look at multiple libraries, including matplotlib, seaborn, networkx, ggplot, and bokeh. A detailed introduction to matplotlib and options pertaining to it will be provided.

Participants must bring a laptop with Anaconda Python 3.5 installed: https://www.continuum.io/downloads.

REGISTER HERE

INTRODUCTION TO THE WOLFRAM LANGUAGE (9:30am)

Introduction to the Wolfram Language
Presented by Etienne Bernard, Wolfram Research
9:30am - 12:00pm

Developed for more than 25 years as part of Mathematica, the Wolfram Language offers a vast yet coherent collection of built-in algorithms and knowledge. Through "hands-on" coding of simple programs, we will learn the basic principles of the language, how to deploy code, and how to use advanced functionalities such as automated machine learning.

Registrants should bring a laptop. "Hands-on" activities can be performed on a web browser here: https://develop.open.wolframcloud.com/app/. For a better experience, registrants can create a Wolfram ID and download a free 15-day trial of Mathematica here: https://www.wolfram.com/mathematica/trial/

REGISTER HERE

BASIC R PROGRAMMING FOR DATA ANALYSIS (1:30pm)

Basic R Programming for Data Analysis
Presented by Ista Zahn, Institute for Quantitative Social Science
1:30 - 4:30pm

This hands-on, intermediate course will guide you through a variety of programming functions in the open-source statistical software program, R. It is intended for those already comfortable with using R for data analysis who wish to move on to writing their own functions. To the extent possible this workshop uses real-world examples. Concepts will be introduced as they are needed for a realistic analysis task. In the course of working through a realistic project we learn about interacting with web services, regular expressions, iteration, functions, control flow and more.

Prerequisite: basic familiarity with R, such as acquired from an introductory R workshop.

Participants should bring a laptop with a recent version of R (http://cran.r-project.org/) and RStudio (http://www.rstudio.com/products/rstudio/download/) installed and should download course materials from http://tutorials.iq.harvard.edu/R/RProgramming.zip.

REGISTER HERE

ADVANCED PYTHON (1:30pm)

Advanced Python
Presented by Ian Stokes-Rees, Continuum Analytics
1:30 - 4:30pm

Through live coding demonstrations and opportunities to complete short hands-on exercises, this workshop will cover a range of advanced Python topics in Python 3.5 including: dunder methods, properties, generators, decorators, descriptors, function wrappers, meta-classes, standard library containers you may not know, and a mental model of how Python works that will give you the high ground in any debate about how to use Python properly.

Participants must bring a laptop with Anaconda Python 3.5 installed: https://www.continuum.io/downloads.

REGISTER HERE

DATA ANALYTICS WITH MATLAB (10:00am)

Data Analytics with MATLAB
Presented by Loren Shure, MathWorks
10:00am - 12:00pm

Using Data Analytics to turn large volumes of complex data into actionable information can help you improve design and decision-making processes. However, developing effective analytics and integrating them into business systems can be challenging. In this seminar you will learn approaches and techniques available in MATLAB® to tackle these challenges.

Highlights include:
· Accessing, exploring, and analyzing data stored in files, the web, and data warehouses
· Techniques for cleaning, exploring, visualizing, and combining complex multivariate data sets
· Prototyping, testing, and refining predictive models using machine learning methods
· Integrating and running analytics within enterprise business systems and interactive web applications

REGISTER HERE

NoSQL VS. NEWSQL: DEMYSTIFYING THE ZOO OF CONTEMPORARY DATABASE SYSTEMS (10:00am)

NoSQL vs. NewSQL:
Demystifying the Zoo of Contemporary Database Systems - WORKSHOP CLOSED

Presented by Niv Dayan, Institute for Applied Computational Science
10:00am - 12:30pm

Over the past 5-10 years, there has been an explosion in database system technologies. Today, businesses and researchers facing a data management task must choose between SQL, NoSQL, NewSQL, as well as various sub-categories, e.g. row-stores, column-stores, document-stores, key-value-stores, etc. In this workshop, we will demystify and learn to navigate the landscape of contemporary database systems. You will gain the ability to decide “which database system is right for me?” In particular, we will compare and contrast MongoDB to VoltDB, which are two prominent examples of NoSQL and NewSQL systems respectively. We will take a crash-course of MongoDB to leave an impression of how a NoSQL system looks and feels in practice.

Pre-Requisites:
We advise participants to install MongoDB ahead of the workshop. Detailed instructions are easy to find online, for example: OS X: https://www.youtube.com/watch?v=G--uGbOFF9E (you may need to add “sudu ” as a prefix to every installation command) Windows: https://www.youtube.com/watch?v=K_5mj3-_uJQ

REGISTER HERE

INTRO TO MATLAB: PROBLEM SOLVING AND PROGRAMMING (1:00pm)

Introduction to MATLAB: Problem Solving and Programming
Presented by Matt Tearle, MathWorks
1:00 - 5:00pm

MATLAB is a high-level language that allows you to quickly perform computation and visualization through easy-to-use programming constructs. This hands-on lab presents the essentials you need to use MATLAB for your classes or research. In this hands-on workshop, attendees will learn how to import data from an external file, plot the data over time, then perform some analysis to view the data trends. You’ll learn how to write a MATLAB script and publish it to a format for sharing, such as HTML. You’ll also learn how to write your own MATLAB functions, use flow control, and create loops. By the end of the session, you’ll have learned to create an application in MATLAB.

Key topics include:
• Navigating the MATLAB desktop
• Working with variables in MATLAB
• Calling MATLAB functions
• Importing and extracting data
• Visualizing data
• Conducting computational analysis
• Fitting data to a curve
• Automating analysis with scripts
• Publishing MATLAB programs
• Programming in MATLAB Note:

IMPORTANT:
Attendees must bring a laptop to this hands-on workshop with MATLAB already installed. In advance of the session MathWorks will provide each registrant with a temporary MATLAB license that attendees will be required to install. Trials will not be issued the day of the hands-on workshop. Please register for the hands-on workshop only if you have 100% certainty of your ability to attend.

REGISTER HERE


DEEP LEARNING PART I (1:30pm)

Deep Learning Part I - WORKSHOP CLOSED 
Presented by Verena KaynigInstitute for Applied Computational Science
1:30 - 4:30pm

Classical machine learning techniques often involve extracting some form of manually designed features from data and then training a model for classification. Designing the features is an important task, often involving domain knowledge and manual tuning. Deep learning methods instead learn multiple levels of feature representations directly from the data. Learning the features has been shown to improve classification results, and make the same model applicable to different data modalities like images, speech, and text. Theano is a great Python library, that facilitates using deep learning methods, even on the GPU. In this workshop we will introduce the basics of deep learning and models like deep networks, autoencoders, and convolutional networks, and how to train them. Part one (Wednesday, January 13) will focus on deep learning fundamentals and how to learn features in a supervised and unsupervised setting, as well as best practices for training these complex models. 

Prerequisites
Python programming, Laptop with Python 2.7, IPython, and Theano installed for in-class work.
[NOTE: Part two of this workshop is offered on Thursday, January 14 at 1:30pm and will introduce convolutional neural networks and cover advanced tips and tricks for training. Register separately for Thursday's workshop at ComputeFest Workshop Page]

INTRO TO GPU PROGRAMMING WITH NVIDIA CUDA & OPENACC PART 1 of 2 (9:00am)

Introduction to GPU Programming with NVIDIA CUDA and OpenACC Part 1 of 2
Presented by Jonathan Bentz, NVIDIA
Facilitated by Barton Fiske, NVIDIA
9:00am - 12:00pm

NVIDIA GPUs are the world’s fastest and most efficient accelerators delivering world record scientific application performance. NVIDIA CUDA is the most pervasive parallel computing model, used by over 250 scientific applications and over 150,000 developers worldwide. This workshop will focus on introducing scientific computing and programming concepts utilizing NVIDIA GPUs to accelerate applications. The workshop will introduce programming techniques using CUDA and OpenACC paradigms as well as optimization, profiling, and debugging methods for GPU programming.

Topics covered include: GPU Architecture, OpenACC, Introduction to CUDA, CUDA Libraries, and CUDA performance tools such as NVIDIA Visual Profiler along with hands on examples using NVIDIA-provided cloud based GPU resources and development tools.

This Day 1 Morning Workshop will cover:  

  • Introduction to GPU programming with NVIDIA CUDA and OpenACC
  • High Level Overview of GPU Architecture
    • OpenACC 2.0 Update
    • Introduction to OpenACC pragma compiler directives
    • Specify loops and regions of code in standard C, C++ and Fortran
    • Offloading from a host CPU to an attached accelerator.
  • Hands-On examples to focus on data locality
  • Basics of GPU Programming; An introduction to the CUDA C/C++ Language
  • Hands-On examples will Illustrate simple kernel launches and using threads 

Suggested pre-requisites for GPU and CUDA sessions
Laptop with wireless access and SSH client installed
Basic Linux desktop and command line familiarity including use of a standard file editor such as VIM or Emacs.
Familiarity with software development tools and concepts: compiling, linking and using GNUMake.
Rudimentary programming experience in C/C++ (memory management using malloc/free, using pointers, etc)

REGISTER HERE

(Note: Part 2 of this workshop is offered on Friday, January 15 at 9:30am.  
Sign up here for the Friday workshop.)

TACKLING BIG DATA WITH MATLAB (9:30am)

Tackling Big Data with MATLAB
Presented by Loren Shure, MathWorks
9:30am - 11:30am (Note that the start time has changed from 10:00am to 9:30am) 

Are the data sets you need to analyze becoming uncomfortably large to work with in memory? Are they taking too long to compute? Are you finding it challenging to scale your algorithms to big data sets? In this seminar, you will learn strategies and techniques for handling large amounts of data in MATLAB. Recent new big data capabilities in MATLAB will be highlighted.

Highlights include:
• Using best practices for memory use in MATLAB
• Accessing data in large text files, databases or from the Hadoop Distributed File System (HDFS)
• Leveraging distributed memory to work with large data sets
• Processing data using the MapReduce programming technique
• Developing algorithms on your desktop and scaling to a cluster, cloud or Hadoop using parallel computing paradigms

REGISTER HERE

SCIENTIFIC & RESEARCH COMPUTING WITH AWS (1:30pm)

Scientific and Research Computing on Amazon Web Services
Presented by Leo Zhadanovsky, Amazon Web Services
1:30 - 4:30pm

It is possible to bootstrap a personal compute cluster on Amazon Web Services (AWS) within minutes, but really what does that mean? Which services should you use, and what are the implications of those services on how you develop algorithms and data analysis pipelines? In this workshop, we'll cover the essential services for scientific computing on AWS. We'll also discuss some practical examples of using AWS for HPC workloads, with some hands-on experiences. Attendees that would like to accomplish the hands-on exercises on-site are expected to bring their own internet-connected laptop.

REGISTER HERE

DEEP LEARNING PART II (1:30pm)

Deep Learning Part II
Presented by Ray JonesInstitute for Applied Computational Science
1:30 - 4:30pm

Classical machine learning techniques often involve extracting some form of manually designed features from data and then training a model for classification. Designing the features is an important task, often involving domain knowledge and manual tuning. Deep learning methods instead learn multiple levels of feature representations directly from the data. Learning the features has been shown to improve classification results, and make the same model applicable to different data modalities like images, speech, and text. Theano is a great Python library, that facilitates using deep learning methods, even on the GPU. In this workshop we will introduce convolutional neural networks and cover advanced tips and tricks for training. 

Prerequisites: 
Python programming, Laptop with Python 2.7, IPython, and Theano installed for in-class work.

REGISTER HERE

[NOTE: Part one of this workshop is offered on Wednesday, January 13 at 1:30pm and will introduce the basics of deep learning and models like deep networks, autoencoders, and convolutional networks, and how to train them. Register separately for Wednesday's Part I workshop at ComputeFest Workshop Page]

 

INTRODUCTION TO PYTHON'S SCIKIT-LEARN LIBRARY (9:00am)

Introduction to Python's Scikit-Learn Library
Presented by Rahul DaveInstitute for Applied Computational Science
9:00am - 1:00pm 

In this workshop we'll provide a practical introduction to the use of Python's scikit-learn machine learning library. We'll use an example from the prediction of continuous outcomes, or "regression" to illustrate concepts, and then work through a complete example of the prediction of labels in data, or "classification."

Prerequisites: Python programming, Laptop with Python 3.5, IPython, and scikit-learn installed for in-class work. The Anaconda distribution is highly recommended (https://www.continuum.io/downloads).

REGISTER HERE

GPU PROGRAMMING & OPTIMIZATION WITH NVIDIA CUDA PART 2 of 2 (9:30am)

GPU Programming and Optimization with NVIDIA CUDA Part 2 of 2
Presenter: Jonathan Bentz, NVIDIA
Facilitator: Barton Fiske, NVIDIA
9:30am - 12:30pm

NVIDIA GPUs and NVIDIA CUDA provide the most pervasive parallel computing model in use today across a wide variety of scientific applications which have been optimized for a multitude of workloads by over 150,000 developers worldwide. This workshop will focus on sharing some of the lessons learned from these optimization techniques for scientific programming utilizing NVIDIA GPUs to accelerate domain leading applications. The workshop will introduce programming techniques using CUDA and developer tools for optimization, profiling, and debugging strategies for GPU programming. Topics covered include GPU Architecture, Introduction to CUDA, CUDA Libraries, and CUDA performance tools such as NVIDIA Visual Profiler along with hands on examples using NVIDIA provided cloud based GPU resources and development tools. 

This Day 2 Morning Workshop will cover:

  • GPU Programming & Optimization with NVIDIA CUDA
  • Parallel Performance and Optimization
  • Overview of Global and Shared Memory usage
  • Hands-On examples: 1D Stencil and Matrix Transpose
  • Using NVIDIA profiler tool to identify performance bottlenecks
  • Using libraries such as CUB and THRUST

Suggested pre-requisites for GPU and CUDA sessions
Laptop with wireless access and SSH client installed
Basic Linux desktop and command line familiarity including use of a standard file editor such as VIM or Emacs.
Familiarity with software development tools and concepts: compiling, linking and using GNUMake.
Rudimentary programming experience in C/C++ (memory management using malloc/free, using pointers, etc)

REGISTER HERE

(Note: Part 1 of this workshop is offered on Thursday, January 14 at 9:00am. Sign up here for the Thursday workshop.)

 

DEEP LEARNING WITH GPUs USING NVIDIA DIGITS & CONTEMPORARY FRAMEWORKS (1:00pm)

Deep Learning with GPUs using NVIDIA DIGITS and Contemporary Frameworks
Presented by Jonathan Bentz, NVIDIA
Facilitated by Barton Fiske, NVIDIA
1:00 - 4:30pm

Deep learning is a rapidly growing segment born from the fields of artificial intelligence and machine learning. It is increasingly used to deliver near-human level accuracy for image classification, voice recognition, natural language processing, sentiment analysis, recommendation engines, and more. Applications areas include facial recognition, scene detection, advanced medical and pharmaceutical research, and autonomous, self-driving vehicles. NVIDIA GPUs are the world’s fastest and most efficient accelerators delivering world record scientific application performance. NVIDIA CUDA is the most pervasive parallel computing model, used by over 250 scientific applications and over 150,000 developers worldwide. This half day programming workshop will focus on introducing attendees to the use of GPU accelerated deep learning frameworks utilizing NVIDIA GPUs for ideal performance and scalability.

  • Deep Learning with GPUs and NVIDIA DIGITS 
  • Intro to Deep Learning / Machine Learning 
  • Overview of 3 Contemporary Deep Learning Frameworks
    • Caffe
    • Torch
    • Theano
  • Live demo using NVIDIA DIGITS
  • Caffe Lab Examples, (time permitting)

Pre-requisites:
Laptop with wireless access
Basic web based user interface familiarity
Knowledge of machine learning and/or deep learning use cases (image/speech/text recognition - object detection)

REGISTER HERE

INTRODUCTION TO TABLEAU (1:30pm)

Introduction to Tableau 
Presented by Mary Kate Quigley, Tableau
1:30 - 4:30pm

Join a Tableau expert for a hands-on training session to help you see and understand your data. Learn best practices for data visualization, and how to create interactive visualizations and dashboards in Tableau. This workshop is designed for all audiences and skill levels.

Prerequisites: Tableau downloaded and installed.  Instructions will be sent to registrants prior to the workshop with download information.

REGISTER HERE

Questions? Email iacs-info@seas.harvard.edu