In this project we experimented with apache spark queries on big data datasets like the movielens dataset ("https://grouplens.org/datasets/movielens/") and tried to optimise their perfomance both on local cluster scenarios and at cloud/server scenarios like the livy server("https://livy.apache.org/").