Along with the rise of Internet commerce and social networks comes the opportunities and challenges of extremely large data sets where vital information is extracted by data mining. This course introduces the background, algorithms, and techniques for data mining specially targeting very large data sets. It begins with an introduction to data mining critical concepts. It then expands to the discussion of the map-reduce frameworks for parallelizing algorithms, which is the key for massive data set mining. The algorithms for locality sensitive hashing and streaming data mining will be followed. The course will then cover the techniques to find frequent item sets and clustering. Upon completion of the course, the student will have a solid foundation on how to efficiently and effectively extract information from massive data sets from myriad sources.
MSCS3020: Mining Massive Data Sets
Class Program