My Spark Adventure

Overview

This project was inspired by conversations with Matei Zaharia while I was in his operating systems class at UC Berkeley. GitHub Project Repo.

Post Table of Contents

Goals

The concepts/tools I’m getting hands-on with in this project:

  • Distributed systems
  • SSH
  • Docker
  • Apache Spark (PySpark)

Filetree

├── README.md
├── compose.yml
├── conf
│   └── spark-env.sh
├── master
│   ├── Dockerfile
│   └── start-master.sh
├── scripts
│   ├── git-autopilot.sh
│   ├── pyspark.sh
│   └── start_spark_cluster.sh
├── spark
│   ├── bin
│   ├── include
│   ├── lib
│   ├── pyvenv.cfg
│   └── share
└── worker
    ├── Dockerfile
    └── start-worker.sh