ML Engineering
Duration:
01.2024 - 02.2024
Client:
Anonymous client in public administration
Technologies:
Java, Scala, Python, PySpark (Cloudera & Stackable), mlflow
Situation
A client in public administration developed multiple distributed AI applications using PySpark. However, these applications were scaling poorly across the cluster due to misconfigurations of Apache Spark and Kubernetes.
Task
The client requested optimization of the existing AI applications to improve runtime efficiency and reduce compute resource consumption.
Action
I optimized the PySpark AI applications through several approaches:
Adjusting various Spark distributed computing settings for optimal parallelization and resource utilization.
Refactoring the application code for better performance.
Introducing data partitioning to enhance processing efficiency.
Result
Optimizations led to a 5x reduction in batch run times and a 3x reduction in resource consumption, significantly improving the scalability and efficiency of the clientโs AI applications.
More Projects