Table of Contents

General Information

The Summer Institute of 2018 brings together nine science teams for a short course on data and software skills in socio-environmental synthesis. Through lectures, hands-on computer labs, and project consultation, SESYNC staff will aim to accelerate your team’s adoption of cyber resources in all phases of data-driven research and dissemination.

Participants should expect to:

  • learn new scientific computing skills
  • overcome specific or conceptual project hurdles
  • gain coding confidence
  • have fun

Instructors:

  • Ian Carroll, Data Scientist
  • Mary Shelly, Associate Director for Synthesis
  • Benoit Parmentier, Data Scientist
  • Kelly Hondula, rOpenSci Fellow

Assistants:

  • Elizabeth Green, Research Assistant
  • Sigfried Gold, Graduate Assistant

When:

Tuesday, July 24, 2018 to Friday, July 27, 2018

Optional day for basic R training: Monday, July 23

Where:

1 Park Place, Suite 300
Annapolis, MD 21401

Get directions with OpenStreetMap or Google Maps.

Contact:

Please email icarroll@sesync.org with any questions or for information not covered here.

Requirements

  • Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) with a full-featured browser (not Microsft Edge).
  • At least one team member must bring data for the mini-project; a sample/incomplete data is okay.
  • After the course, participants must complete a reimbursement form to recover allowed travel expenses.

Schedule

Nourishment will arrive at the 10:45 am break, the on-site lunch provided by SESYNC at 12:30 pm, and afternoon snacks. Participants are responsible for their own breakfast and dinner arrangements (we can make recommendations).

[Monday] 9:00 Introduction to the RStudio IDE    
  9:15 Pseudo-coding Exercise    
  9:45 Base R Ian R
  10:45 Coffee + Tea Break    
  11:00 Base R (continued) Ian R
  12:00 Pair-coding Exercise    
  12:30 Lunch    
  13:30 Visualizing Tabular Data Elizabeth R > ggplot2
  15:30 Snack Break    
  15:45 Scripting Exercise    
Tuesday 9:00 Welcome and Overview of SESYNC Mary  
  9:10 Collaborative Workflows & Reproducible Pipelines Ian git, GitHub
  10:45 Coffee + Tea Break    
  11:00 Introduce Coaches & ‘data2doc’    
  11:45 Meet the Teams    
  12:30 Lunch    
  13:15 Manipulating Tabular Data Mary R > dplyr
  15:00 Challenge Exercise    
  15:30 ‘data2doc’ + Snacks!    
  17:00 Reception (with tasty beverages, etc.)    
Wednesday 9:00 Vector and Raster Geospatial Data Benoit R > sf, raster
  10:45 Coffee + Tea Break    
  11:00 ‘data2doc’    
  12:30 Lunch    
  13:15 Regression Ian R > nlme
  15:00 Challenge Exercise    
  15:30 ‘data2doc’ + Snacks!    
Thursday 9:00 Classification Kelly Python > scikit-learn
  10:45 Coffee + Tea Break    
  11:00 ‘data2doc’    
  12:30 Lunch    
  13:15 Online Data Ian Python > requests
  15:00 Challenge Exercise | ‘data2doc’    
  15:30 ‘data2doc’ + Snacks!    
Friday 9:00 Smart and Interactive Documents Kelly R > rmarkdown, shiny
  10:45 Coffee + Tea Break    
  11:00 Data Provenance and Publishing Ian  
  12:30 Lunch    
  13:15 ‘data2doc’    
  15:00 Team Presentations (4 x 10 min)    
  15:45 Snack Break    
  16:00 Team Presentations (5 x 10 min)    

Software

The workshop uses RStudio and Jupyter, as well as many packages and dependencies associated with these two Integrated Development Environments (IDEs). SESYNC provides a cloud platform capable of supporting the software needs for the short course, so there is nothing for you to install in advance. During and after the course, you will be able to install any and or all of this software—it is all free and open source—on your own machines. Feel free to request assistance any time during the course with installing the listed software on your laptop.

The table and lists below should help you find the right way to install the software, depending on your operating system. Both Windows and macOS users can install from the listed “Download Site”, or by following instructions given there. Linux (and optionally macOS) users should use a package manager—your Linux distro’s native one, or Homebrew on macOS—where possible. The GDAL/OGR downloads are not essential for using spatial libraries with R installed through the given download site.

Software Download Site Homebrew Package(s) Aptitude Package(s)
git https://git-scm.com/downloads git git
R https://cran.rstudio.com/ r r-base
RStudio https://www.rstudio.com/products/rstudio/download2/    
Python 3.x https://www.python.org/downloads/ python3 python3
Jupyter Lab http://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html    
GDAL/OGR https://trac.osgeo.org/osgeo4w/ gdal, geos gdal-bin1

1: Ubuntu users will need to add the UbuntuGIS repository prior to running apt-get install gdal-bin

The following R packages need to be installed. Open RStudio and, for each package below, type install.packages(%package%) at the prompt and press return. Follow all prompts.

  • tidyr
  • ggplot2
  • dplyr
  • raster
  • sf
  • sp
  • shiny
  • leaflet
  • rmarkdown
  • lme4
  • rstanarm
  • data.table

The following Python packaged need to be installed. From a command prompt, type pip3 install %package% and press return. Follow all prompts.

  • jupyterlab
  • numpy
  • scipy
  • pandas
  • beautifulsoup4
  • census
  • lxml
  • requests
  • sqlalchemy
  • scikit-learn
  • mlxtend
  • seaborn

Acknowledgments

Portions of the instructional materials are adopted from Data Carpentry and Software Carpentry. The structure of the curriculum as well as the teaching style are informed by Software Carpentry.