Loading…
LISA18 has ended
Back To Schedule
Monday, October 29 • 11:45am - 12:30pm
Serverless Data Processing and Machine Learning

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Serverless computing reduces infrastructure complexity, provides fine grained billing and easy scalability. Setting up concurrent data processing infrastructure pipelines that supports many users is a complex task, moreover utilization, cost and performance are hard to tune for these pipelines. Machine Learning (ML) workloads are on the uptick, and likes of Apache Spark aim to provide an end to end data to ML story, but run in to the same complexities previously mentioned. These aren't two disjoint data and ML workflows, but share a lot in common.

In this talk, I will present a serverless data and machine learning pipeline that includes a MapReduce framework built on using Amazon S3 and AWS Lambda. We'll see how it can help alleviate issues like concurrent processing, cost and scaling. I will also showcase how machine learning algorithms like K-Means clustering can be built on top of this framework, exploiting the inherent distributed architecture. We'll then discuss the benefits and challenges of the framework with a focus on production deployments for ML models.

Speakers
avatar for Sunil Shah

Sunil Shah

Engineering Manager, Airbnb
Sunil Shah is an Engineering Manager at Airbnb. His team builds and maintains the Kubernetes-based platform that powers Airbnb.com. Prior to Airbnb, Sunil managed compute for Yelp, helped commercialise Apache Mesos at Mesosphere, studied robotics at UC Berkeley, and build ingestion... Read More →


Monday October 29, 2018 11:45am - 12:30pm CDT
Legends Ballroom D