Big Data

Labs

Preparation for using Hadoop

Preparation for using SSH

Preparation for using AWS

Lab1: WordCount

Lab2: Bigrams

Lab3: Relative frequencies (Order Inversion)

Lab4: InvertedIndex

Preparation for using Spark

Lab5: Exploring full-text Wikipedia articles

Acknowledgements

Most of the labs and exercises have been heavily inspired by the following resources:

AWS Logo We use the Amazon Web Services (AWS) cloud for some of the labs. Amazon supports HES-SO by a generous grant in their AWS Educate program.

Databricks Logo We also use the Databricks cloud for some of the Spark labs.