This talk tells the story of implementation and optimization of a sparse logistic regression algorithm in spark. I would like to share the lessons I learned and the steps I had to take to improve the speed of execution and convergence of my initial naive implementation. The message isn’t to convince the audience that logistic regression is great and my implementation is awesome, rather it will give details about how it works under the hood, and general tips for implementing an iterative parallel machine learning algorithm in spark. The talk is structured as a sequence of “lessons learned” that are shown in form of code examples building on the initial naive implementation. The performance impact of each “lesson” on execution time and speed of convergence is measured on benchmark datasets. You will see how to formulate logistic regression in a parallel setting, how to avoid data shuffles, when to use a custom partitioner, how to use the ‘aggregate’ and ‘treeAggregate’ functions, how momentum can accelerate the convergence of gradient descent, and much more. I will assume basic understanding of machine learning and some prior knowledge of spark. The code examples are written in scala, and the code will be made available for each step in the walkthrough. Lorand is a data scientist working on risk management and fraud prevention for the payment processing system of Zalando, the leading fashion platform in Europe. Previously, Lorand has developed highly scalable low-latency machine learning algorithms for real-time bidding in online advertising.

Hora

19:00 - 20:00 hs GMT+1

Organizador

Business Intelligence and Analytics
Compartir
Enviar a un amigo
Mi email *
Email destinatario *
Comentario *
Repite estos números *
Control de seguridad
Julio / 2019 571 webinars
Lunes
Martes
Miércoles
Jueves
Viernes
Sábado
Domingo
Lun 01 de Julio de 2019
Mar 02 de Julio de 2019
Mié 03 de Julio de 2019
Jue 04 de Julio de 2019
Vie 05 de Julio de 2019
Sáb 06 de Julio de 2019
Dom 07 de Julio de 2019
Lun 08 de Julio de 2019
Mar 09 de Julio de 2019
Mié 10 de Julio de 2019
Jue 11 de Julio de 2019
Vie 12 de Julio de 2019
Sáb 13 de Julio de 2019
Dom 14 de Julio de 2019
Lun 15 de Julio de 2019
Mar 16 de Julio de 2019
Mié 17 de Julio de 2019
Jue 18 de Julio de 2019
Vie 19 de Julio de 2019
Sáb 20 de Julio de 2019
Dom 21 de Julio de 2019
Lun 22 de Julio de 2019
Mar 23 de Julio de 2019
Mié 24 de Julio de 2019
Jue 25 de Julio de 2019
Vie 26 de Julio de 2019
Sáb 27 de Julio de 2019
Dom 28 de Julio de 2019
Lun 29 de Julio de 2019
Mar 30 de Julio de 2019
Mié 31 de Julio de 2019
Jue 01 de Julio de 2019
Vie 02 de Julio de 2019
Sáb 03 de Julio de 2019
Dom 04 de Julio de 2019