Scientific Computing Fundamentals for CAMH Researchers

May 9th - 11th & 15th - 17th, 2017

A six-day, self-paced series of workshops on scientific computing fundamentals taught with ♥ by CAMH researcher-nerds
(an initiative of the CAMH Scientific Computing Working Group)



Arin Bakht
Co-op Student
Sibille Lab
Nikhil Bhagwat
Graduate Student
Kimel Lab
Navona Calarco
Research Analyst
Kimel Lab
Qing Chang
Research Methods Specialist
Research IT
Erin W. Dickie
Scientist
Kimel Lab
Leon French
Scientist
Bioinformatics
Ricardo Harripaul
Graduate Student
Vincent Lab
Colin Hawco
Scientist
Kimel Lab
Steve Hawley
Research Methods Specialist
Slaight Centre
Sophie Lafaille
Research Coordinator
MRI Centre
Kyle Lago
Research Analyst
Geriatrics
Rachael Lyon
Research Analyst
Research Services
Yuliya Nikolova
Postdoctoral Fellow
Sibille Lab
Dawson Overton
Research Methods Specialist
Kimel Lab
Natalia Potapova
Research Methods Specialist
CAMH IT
Jacob Ritchie
Trainee
Computational Neurobiology Laboratory
David Rotenberg
Manager, Scientific Computing
CAMH IT
Dinara Salaeva
Research Methods Specialist
Temerty Centre
Marcos Sanches
Statistician
Research IT
Andy Wang
Research Methods Specialist
Research IT
Tom Wright
Research Methods Specialist
Kimel Lab
Why

Because …

… research is becoming more computational and you’ve probably never been formally trained in general computing skills.

That’s a problem.

Software is your experimental apparatus. Just like cleaning test tubes and pipetting, computing is a basic skill you need to be competent with.

These workshops will focus on some computing skills fundamentals you’ll need for getting your study data organized, doing repeatable/reproducible analysis, and making use of existing CAMH computing resources to save time.

You should attend if you are doing any sort of scientific computing work, or work that involves repetition that could be automated.

This time around, we’re using a shared dataset - spanning demographic, cognitive, imaging, and genomic data - across all workshops so we can get started faster and increase attendees’ exposure to all the great work going on around the hospital

If you have questions, send us an email at scwg@camh.ca.

Workshops

Tuesday, May 9th

Introduction to Programming Logic

9am-10am, RS 2022

Instructor: Leon
Helpers: Arin & Jacob

Description: Come here if you’d like to learn about programming, but have little to no prior experience. This will be an interactive workshop where you will play a game to learn new concepts.

You’ll learn:

  • What is programming, anyway?
  • Essential programming concepts, such as sequencing, loops, conditionals, functions, variables, and datatypes

Prerequisites: None

You’ll need: A laptop. We’ll be working with python, but you don’t need it installed.



Introduction to Linux and the Shell

10am-12pm, RS 2022

Instructor: Tom

Description: Computers aren’t scary (yet) and knowing how to use them will make doing science better/easier/quicker.

You’ll learn:

  • About Unix/Linux
  • What a terminal/shell is
  • Managing (making, moving, editing) files and folders
  • Remote access (SSH/FTP)
  • Where to find help online/etc.
  • Some super useful Linux commands

Prerequisites: None

You’ll need: A laptop. If you’re using Windows, install BASH and mobaXterm.



Introduction to SPSS

1-3pm, RS 2020

Instructor: Marcia
Helper: Rachael

Description: SPSS can be very useful for simple data exploration to advanced statistics. It is widely used at CAMH and is a tool very much worth knowing.

You’ll learn:

  • What is SPSS
  • Types of data and how to enter them into SPSS
  • Data manipulation – Creating new variables
  • Data manipulation – Sorting
  • Basic data analysis – Means, Frequencies and Crosstabs

Prerequisites: Some familiarity with data in general, no previous knowledge of SPSS is required

You’ll need: A computer/ laptop and SPSS will be provided in lab. If you’d prefer to use your own laptop, see the ‘resources’ section below for a link to a free trial version of SPSS. If you have a laptop with SPSS, please bring it!

Advanced Statistics in SPSS

3-5pm, RS 2020

Instructor: Marcos
Helper: Rachael

Description:: We will cover topics related to Linear Models often used with Normal dependent variables (correlation, linear regression, t-test, ANOVA, MANOVA). We will not only cover how to fit these models in SPSS, but also interpretation of output, analysis of residuals and influential observations, multicolinearity and standardization of coefficients, explanatory and predictive models, confounders).

Prerequisites: Introduction to SPSS

You’ll need: A computer/ laptop and SPSS will be provided in lab. If you’d prefer to use your own laptop, see the ‘resources’ section below for a link to a free trial version of SPSS. If you have a laptop with SPSS, please bring it!

Wednesday, May 10th

Preparing your data for analysis

9-11am, RS T4100

Instructor: Tom

Description: So you have been given that spreadsheet. This course will introduce OpenRefine, a tool for working with messy data. Along the way we will identify common errors people make when recording data and look at best practices for avoiding them.

You’ll learn:

  • The characteristics of a clean dataset.
  • Using OpenRefine to clean and format a spreadsheet.

What you’ll need: A laptop, and working copy of OpenRefine

Prerequisites: None

Acknowledgments: This workshop is based on the “Open Refine for Ecology” workshop by DataCarpentry.




Introduction to the SCC

11am-12pm, RS 2062

Instructor: Andy & David
Helper: Ricardo

Description: The Specialised Computing Centre (SCC) is CAMH’s own super-computer. You can use CAMH’S SCC to speed up your processing-intensive analyses by distributing your work across many computers at once.

You’ll learn:

  • How the SCC is organized into different types of nodes
  • How to load software using the SCC’s modue system
  • How to get your data to and from the SCC using rsync
  • How to submit jobs on the SCC queue

Prerequisites: Familiarity with the Unix shell (e.g. ‘Intro to Linux and the Shell’ and ‘Automating in Linux’ workshops)

You’ll need: An account on the SCC, and a laptop. If you’re using a Windows laptop, install BASH and mobaXterm.

Programming in REDCap

1-3pm, RS 2062

Instructor: Dinara
Helper Kyle

Description: REDCap is a web application for building and managing surveys and databases. It requires little programming knowledge, and it’s increasingly used at CAMH for multiple purposes.

You’ll learn:

  • How to set up a study in REDCap
  • Writing display logic
  • Writing formulas for all sorts of calculations
  • Setting up multiple arms, visits, and timepoints
  • Importing and exporting data
  • Incorporating html tags and cues for online-decision making
  • Cool tricks to customize your study

Prerequisites: None

You’ll need: Your own laptop and REDCap account.




REDCap Surveys

3-5pm, RS 2062

Instructor: Natalia

Description: REDCap has great survey functionality, which offers a secure, intuitive platform for collecting data from participants (including self-report), staff, and others!

You’ll learn:

  • Survey Design.
  • Inviting Participants by email
  • Types of surveys: anonymous and non-anonymous
  • Public link and individual links
  • Survey Queue and Autocontinue option
  • Survey notifications
  • Automatic survey invitations vs manual
  • REDCap Mobile

Prerequisites: Some familiarity with REDCap

You’ll need: Your own laptop and REDCap account.




Thursday, May 11th

Introduction to R

9-11am, RS 2015

Note: This workshop is also offered on Tuesday May 16th, 1:00pm-3:00pm

Instructors: Erin

Description: R is a free, featureful and sometimes magical language for doing statistical analysis. This workshop will introduce you to R and the Rstudio environment.

You’ll learn:

  • Reading you data into and R dataframe
  • Sorting/merging/filtering your data tables
  • Data Cleaning
  • Getting summary statistics

Prerequisites: Some familiarity with another programming language

You’ll need: A laptop with R and R studio installed. As well as packages (dplyr rms, and ggplot2)



Introduction to Neuroimaging Tools

11am-12pm, RS 2015

Instructors: Erin & Sophie

Description: There’s a lot of tools out there for the analysis of MR images. We’ll give you an overview so you can get started.

You’ll learn about tools for:

  • Anatomical (T1w) analysis
  • fMRI analysis
  • DTI analysis
  • PET

Prerequisites: None

You’ll need: A laptop (optional).



Exploring Data with R

1-3pm, RS 2022

Instructors: Erin

Description: You know the basic functionality of R, but the hardest part is getting your data in order. In this workshop you’ll learn some fundamental ways of organizing and manipulating datasets, as well as visualising your data.

You’ll learn:

  • Reorganize/Reshape your dataset with tidyr and dplyr
  • Quickly create stats tables
  • Plotting and visualization with ggplot2

Prerequisites: Basic familiarity with R (i.e. Introduction to R)

You’ll need: A laptop with R and R studio installed.




Spreadsheets

3-5pm, RS 2022

Instructor: Yuliya & Arin
Helper: Rachael

Description: Everyone knows what a spreadsheet is, but most people use them terribly inefficiently. Sometimes, you need to clean things up and do some quick analyses without tools like R or OpenRefine; we’ll look at how to smartly work within your native spreadsheet software (i.e., Excel, OpenOffice).

You’ll learn:

  • How to set up a spreadsheet for research logs and data
  • Autofilling in sequences
  • Avoiding “copy and paste” by using cell references
  • What a “formula” is
  • How to auto-update cells
  • Helpful data cleaning tools

Prerequisites: Basic familiarity with spreadsheets (Excel/Openoffice)

You’ll need: A laptop with Excel or OpenOffice installed.

Monday, May 15th

Introduction to Python

9-11am, RS 2015

Instructor: Tom

Description: This basic instroduction is designed to make you familiar with programming in Python.

You’ll learn:

  • Data types (e.g. lists and dictionaries)
  • String manipulation
  • Reading and writing to files
  • Modules and packages
  • Basic plotting with matplotlib

Prerequisites: Familiarity with another programming language.

You’ll need: A laptop with python installed.


Genetics on the SCC

11am-12pm, RS 2022

Instructor: Ricardo

Description: This course will introduce you to popular genomics software used on the CAMH Specialized Computing Cluster. This course will focus on currently used software and best practices.

  • PLINK, SAMTOOLS,
  • BWA, GATK, PICARD
  • workflows for analyzing Next Generation Sequencing data in the Linux environment.

Prerequisites: None.

You’ll need: Just you (laptop optional).




MATLAB

1-3pm, RS 2022

Instructor: Colin
Helper: Dawn

Description: MATLAB is a scripting and programming language paired with an interactive environment that focuses on manipulation, analysis, and visualization of numerical data.

You’ll learn:

  • Using the graphical MATLAB environment and workspace
  • MATLAB variables (double, string, cell, struct)
  • Basic syntax and operations (for loops, while, if, switch, try)
  • How to call a function
  • How to write scripts and functions

Prerequisites: Familiarity with another programming or scripting language.

You’ll need: A laptop with MATLAB installed.




Managing Code, Experiments, & Data (with help from git)

3-4:30pm, RS 2022

Instructors: Erin & Ricardo & Qing

Description: Research can get messy quickly, but there are some tried and true ways of organizing your experiments that work.

You’ll learn:

  • File and directory naming conventions
  • Version control with GitHub and GitLab (now at CAMH!)
  • Backup guidelines
  • Creating scripts and metadata to reproduce your work

Prerequisites: Familiarity with the Shell (e.g. cd, ls, mv, cp)

You’ll need: A laptop. If you’re using Windows, install BASH and mobaXterm.




Tuesday, May 16th

Automating in Linux

9-11am, RS 2062

Instructor: Nikhil & Ricardo
Helper: Dawn

Description: Doing work on lots of data by hand is boring and error prone. Learn how to use the shell to automate your work.

You’ll learn:

  • Running commands on many files (globbing, looping, if statements)
  • Reading and writing to files & sorting/filtering data in files
  • Writing scripts, chaining tools together (pipes, redirection)

Prerequisites: Familiarity with the Shell (e.g. cd, ls, mv, cp)

You’ll need: A laptop. If you’re using Windows, install BASH and mobaXterm.



Advanced SCC (Scientific Computing)

11am-12pm, RS 2062

Instructor: Andy & David
Helper: Ricardo

Description: The purpose of this course is to expose students to automating process and working on the Specialized Computing Cluster.

You’ll learn:

  • How to run many samples in parallel using GNU Parallel
  • Debugging your jobs and Interactive Nodes
  • Best practices for using the cluster efficiently

Prerequisites: Familiarity with BASH scripting (e.g.’Intro to Linux and the Shell’ and ‘Automating in Linux’ workshops) and familiarity with the SCC (e.g. Introduction to the SCC).

You’ll need: An account on the SCC, and a laptop. If it’s a Windows computer, install BASH and mobaXterm.




Introduction to R

1-3pm, RS 2022

Note: This workshop is also offered on Thursday May 11th, 9:00-11:00am

Instructors: Tom

Description: R is a free, featureful and sometimes magical language for doing statistical analysis. This workshop will introduce you to R and the Rstudio enviroment.

You’ll learn:

  • Reading you data into and R dataframe
  • Sorting/merging/filtering your data tables
  • Getting summary statistics

Prerequisites: Familiarity with another programming language

You’ll need: A laptop with R and R studio installed.

Statistics with R

3-5pm, RS 2022

Instructors: Erin
Helper:

Description: You know the basic functionality of R, now let’s actually run some statistical tests.

You’ll learn:

  • Transforming variables to normality
  • Linear modeling and regression
  • Building tables and figures to publish your results

Prerequisites: Some knowledge of statistics. Basic familiarity with R (i.e. Introduction to R)

You’ll need: A laptop with R and Rstudio installed. R packages dplyr, tidyr, cars and rms.


Wednesday, May 17th

SQL

9-11am, RS 2062

Instructor: Ricardo
Helper: Dawn

Description: SQL is a langauge crucial to databases (and underlies software like Access and REDCap). Come and get a glimpse of what SQL is and what it can do for you.

You’ll learn:

  • how to build a database
  • entering data into a database
  • constructing queries for selecting, filtering, and joining tables
  • calculating basic statistics.
  • basic normalization principles
  • Basic SQL syntax

What you’ll need: A laptop



REDCap Data Management

1-3pm, RS 2020

Instructor: Steve

Description: You’ve built a project in REDCap, and now would like to learn more about this software.

You’ll learn:

  • Customized Data Export with Reports tool
  • User Rights, multi-site access and DAGs
  • Randomization module
  • Data quality module
  • Data resolution workflow
  • Project audit with Logging tool

Prerequisites: Some knowledge of REDCap.

You’ll need: A REDCap account. This course is held in a computer lab, but if you’d prefer your own laptop, feel free to bring it.




REDCap API

3-5pm, RS 2062

Instructors: Navona & Tom & Dawn

Description: REDCap has a REST API (Application Programming Interface) based on simple HTTP requests, and all REDCap users at CAMH are able to use it.

Any general-purpose programming language with a HTTP library can use the REDCap API, but we’ll be using Python (via IPython/Jupyter Notebooks) for this workshop.

You’ll learn:

  • use cases: when is the API useful / necessary?
  • how to get data in/out/manipulated
  • how to print HTTP status codes
  • how to debug when things go wrong
  • overview of helpful third-party tools/applications/libraries
  • pairing API requests with cron jobs for automation

Prerequisites: Familiarity with Python and REDCap.

You’ll need: Your own laptop with IPython/Jupyter notebooks installed, a REDCap account and a (test) REDCap project with an API token.


Register

Register with CAMH email

The workshop series is FREE for all CAMH students, staff, and trainees

Please sign up with a CAMH email address, so that we know who you are!.


Register

Non-CAMH Registration

All others are welcome to join us for $15-30 per course.

Register & Pay

Preparation

Install the following bits of software before you come to the workshops. For more detailed installation instructions go here.

Send us email if you are having trouble.

Linux/Shell (windows users only)

MATLAB

MS Access

Python

R

REDCap

SCC

Spreadsheets

SPSS

Version Control with Git

More Help

What more help/instruction? There are lots of things going on at CAMH and U of T that you can get involved in: