Course Syllabus
Data Intensive computing Paradigms-types, need and use – Supercomputing, Grid Computing, Cloud Computing, Many-core Computing. Parallel Programming Systems-MapReduce-Hadoop, Workflows-Swift, MPI-MPICH, OpenMP, Multi-Threading-PThreads. Job Management Systems- Batch scheduling, Light-weight Task Scheduling. Storage Systems-File Systems- EXT3, Shared File Systems -NFS, Distributed File Systems-HDFS, FusionFS, Parallel File Systems-GPFS, PVFS, Lustre, Distributed NoSQL Key/Value Stores-Casandra, MongoDB, ZHT, Relational Databases-MySQL.
Data-Intensive Computing with GPUs and databases, many-core computing era and new challenges, Case studies on open research questions in data-intensive computing.