Credit Score Classification with PySpark Machine Learning
Credit Score Classification with PySpark Machine Learning
Sep 16, 2024
big-data pyspark machine-learning decision-tree-classifier random-forest-classifier multilayer-perceptron python

This project involves building a credit score classification model using PySpark's machine learning library. By leveraging distributed computing, we compare the performance of Multilayer Perceptron, Decision Tree Classifier, and Random Forest Classifier to predict creditworthiness based on selected customer and financial features.

Big Data Analysis - Using Hadoop for MapReduce, Cluster Analysis, and Image Classification
Big Data Analysis - Using Hadoop for MapReduce, Cluster Analysis, and Image Classification
Jul 08, 2024
big-data mapreduce apache-mahout machine-learning image-classification python

This project explores various capabilities of distributed computing across three distinct analytical domains by processing large datasets to perform descriptive statistics and clustering as well as image classification. The implementation includes Hadoop MapReduce jobs for weather data analysis on a dataset of hourly weather observations and unsupervised learning using Apache Mahout with several distance metrics, on a dataset of french plays. This work also showcases a scalable cat and dog classifier using the CLIP model within a Hadoop Streaming framework.