Big Data

Big Data Analytics (BDA)

The Big Data Analytics course is an Advanced Development Module (MA) eligible by students pursuing a Master of Science in Engineering. It corresponds to 3 ECTS credits.

Professors Nastaran Fatemi and Marcel Graf
Assistant Fatemeh Borran
Master Research Unit TIC / HEIG-VD
Eligible in these specialisations TIC / Software Engineering
TIC / Distributed information systems and multimedia
Time constraints None
Capacity Min. 5, max. 40
Location Yverdon-les-Bains
Summary

This class will be taught by Prof. Nastaran Fatemi and Prof. Marcel Graf.

Since many years structured data, typically stored in relational databases, has been analyzed with data warehousing technologies for the benefit of marketing and financial decision taking. The rapid development of social networks and the ubiquity of computing in everyday life have lead to the creation of large volumes of data (Big Data), mostly unstructured: web logs, videos, audio files, photos, emails, tweets, etc. At the same time, following Moore's law, CPU power has increased and storage space has become cheaper.

Today we have the possibility to store reliably huge amounts of data for an almost negligeable cost. This data can be efficiently analyzed to extract insights useful for business and social life.

This course presents techniques to manipulate, store and analyze large volumes of data (Hadoop, tools for accessing non-structured data Pig and Hive, NoSQL databases and data mining techniques as well as their implementation for Big Data).

Content Subjects treated in this course and their allocated time:
  • Recent paradigms for distributed programming (5%)
  • Principles of programming using MapReduce (10%)
  • MapReduce applications (natural language processing, image processing) (10%)
  • NoSQL databases such as HBase, Cassandra and Oracle NoSQL (important concepts such as read and write performance, data consistency and scaling to large volumes of data) (15%)
  • In-memory frameworks for data processing allowing on-the-fly analysis (10%)
  • Data mining algorithms and systems and their implementation for Big Data (graph analysis, clustering, frequent itemset mining, near duplicate detection) (25%)
  • Project (25%)
Prerequisites None
Evaluation Written exam
Teaching methods
Periods in classStudent work in hours
Lectures2145
Exercices00
Labs2145
(Total)4290
ECTS credits3