Data Science at the Command Line

Obtain, scrub, explore, and model data with unix power tools.

Description

The unix command line, although invented decades ago, is an amazing environment for efficiently performing tedious but essential data science tasks. By combining small, powerful, command-line tools (like parallel, jq, and csvkit), you can quickly scrub and explore your data and hack together prototypes.

This hands-on workshop is based on the O’Reilly book Data Science at the Command Line, written by our CEO Jeroen Janssens. You’ll learn how to build fast data pipelines, how to leverage R and Python at the command line, and how to quickly visualise data. No prior knowledge about the unix command line is required.

By the end of this workshop you will have a solid understanding of how to integrate the command line in your data science workflow. Even if you’re already comfortable processing data with, for example, R or Python, being able to also leverage the power of the command line can make you a more effective and efficient data scientist.

Schedule

  • Introduction
  • Case study
  • Setting up the Data Science Toolbox
  • Essential concepts of the Unix command line
  • Classical filters such as cut, grep, sed, and awk
  • Accessing APIs using curl
  • Processing CSV with csvkit
  • Processing JSON with jq
  • Running R from the command line
  • Basic data visualisation using R and Bash
  • Executing SQL queries directly on CSV data
  • Parallelising data-intensive pipelines using GNU Parallel
  • Where to go from here

Clients

We have previously delivered this workshop (or a custom training based on this workshop) at the following organisations:

Teradata
SURFnet
Snow
Amazon
Accenture
Social Point
Container Solutions
The New York Times
Prezi

Photos and Testimonials

Sanne Bouwman
Data Scientist, Teradata

Great workshop! Very well done and very useful information delivered in an excellent and interactive manner. Jeroen anticipated very well on the different knowledge levels within the group. I would highly recommend the Data Science at the Command Line workshop to anyone that is interested in either kickstarting their command-line experiences or improving their data science with Unix power tools.

Joost van Dijk
Manager Middleware Services, SURFnet

As a seasoned UNIX command line adept, I didn’t expect to learn much from a Data Science at the Command Line workshop. I was wrong! Over the years, many new tools have become available that I didn’t know about, and that can be combined with traditional tools in new ways.

Since attending the workshop, I have been able to simplify and improve the efficiency of many of the scripts I use on a daily basis. Recommended for anyone working from the command line, newbies and ninjas alike!

Marc Canaleta

Besides demonstrating a good knowledge and experience in command-line tools for data science, the instructor had very good training skills, clear communication, and managed to adapt the level of the training to the level of the audience, which is not always easy!

One-day hands-on workshop
Introduction to Linux and the Bash Shell
We’re currently writing the description of this course. Please contact us for more information.
€ 590 per person
One-day hands-on workshop
Mastering Version Control with Git and Github
We’re currently writing the description of this course. Please contact us for more information.
€ 590 per person