DSCI 553: Foundations and Applications of Data Mining (Spring 2023)

Algorithms and techniques of Data Mining and Machine Learning for analyzing massive datasets. Emphasis on system building with Spark. Case studies and applications.

Course Description

Data mining is a fundamental skill for massive data analysis. At a high level, it allows the analyst to discover patterns in data, and transform them into usable products. The course will teach data mining algorithms for analyzing very large data sets. It will have an applied focus, in that it is meant for preparing students to utilize topics in data mining to build systems and solve real world problems.

Homework

Environment: Python 3.6, Scala 2.12, JDK 1.8 and Spark 3.1.2

Most of the assignments can only use standard python libraries and Spark RDD.

	Topic	Programming	Tags
1	Spark Operation	Python	`Spark` `Pyspark`
2	Frequent Itemset	Python	`SON` `A-Priori` `MultiHash` `PCY`
3	Recommendation System	Python	`LSH` `Jaccard similarity` `Pearson similarity` `Collaborative filtering` `Recommendation system`
4	Community Detection	Python	`Girvan-Newman Algorithm` `GraphFrames`
5	Data Stream	Python	`Bloom Filter` `Flajolet-Martin Algorithm` `Reservoir sampling`
6	Clustering	Python	`Bradley-Fayyad-Reina (BFR) Algorithm` `K-Means`

Competition

Environment: Python 3.6, Scala 2.12, JDK 1.8 and Spark 3.1.2

Can use any external Python libraries as long as they are available on Vocareum. Data pre/post-processing are required to only use Spark RDD.

Topic	Programming	Tags	RMSE
Hybrid Recommendation System	Python	`XGBoost` `Yelp Data` `Model-based recommendation system`	0.979346

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
competition		competition
hw1		hw1
hw2		hw2
hw3		hw3
hw4		hw4
hw5		hw5
hw6		hw6
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DSCI 553: Foundations and Applications of Data Mining (Spring 2023)

Course Description

Homework

Competition

About

Releases

Packages

Languages

icharleschen/DSCI-553

Folders and files

Latest commit

History

Repository files navigation

DSCI 553: Foundations and Applications of Data Mining (Spring 2023)

Course Description

Homework

Competition

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages