Here I will be sharing my journey throughout comprising of codes, documentations and algorithmns.
Task1: Visualise how data is distributed as RDD across a cluster of 5 machines of 4 cores. Describe a complete picture of RDD, partitions, lambda implementing on each partition and correlation with the JVMs.
Task2: Learn about the Scala and Java Future Objects to achieve concurrency in Spark
Task3: Implement practically and code the Scala and Java Future objects to achieve concurrency in Spark
Task4: Work on analyzing using presidio. For example, if there is an email, then it should automatically detect PII information and encrypt/mark it
Task5: Work on anonymizing using presidio. For example, hello I am taking disprin and my age is 25 and i am having bipolar disorder then it should automatically detect PII information and encrypt/mark it.
Task6: Creating a Model using Spacy and training it to detect the medicine, age and diseases in the given statement.