Vinod Bakthavachalam is a data scientist working with the Content Strategy and Enterprise teams where his work has recently focused on understanding the skills landscape around the world using Coursera data (see the Global Skills Index Coursera recently published for some of his work). Prior to Coursera, he majored in Economics, Statistics, and Molecular and Cell Biology at UC Berkeley, and worked in quantitative finance.
Scott J Haines
Scott Haines is a distributed systems engineer focused on real-time, highly available, trust- worthy analytics systems. He works at Twilio where he is a Principal Software Engineer on the Voice Insights team where he helped drive spark adoption, streaming pipeline architecture best practices, as well as a massive stream processing platform. Prior to Twilio, he worked writing the backend Java API’s for Yahoo Games, as well as the real- time game ranking/ratings engine (built on Storm) to provide personalized recommendations and page views for 10 million customers. He finished his tenure at Yahoo working for Flurry Analytics where he wrote the alerts/notifications system for mobile.
Jane Adams is an emergent media artist, working at the intersection of visual expression and scientific inquiry. As the Data Visualization Artist in Residence at the University of Vermont Complex Systems Center, Jane builds engaging, interactive, web-based visualizations of high-dimensional data for exploratory analysis. Her visualization research topics include social network lexical analysis, healthcare morbidity and mortality modeling, and geospatial temporal dynamics, all through a lens of complexity science. In her spare time, Jane experiments with music-color synesthesia, machine learning for computational creativity, self-sustaining aquaponic sculpture, and citizen science. She is the lead community organizer of Vermont Women in Machine Learning and Data Science (WiMLDS), and holds a MFA in Emergent Media. Stay in touch on Twitter @artistjaneadams
Andrew Long, PhD
Andrew Long is a Senior Data Scientist at Fresenius Medical Care North America (FMCNA). Andrew holds a PhD in biomedical engineering from Johns Hopkins University and a Master’s degree in mechanical engineering from Northwestern University. Andrew joined FMCNA in 2017 after participating in the Insight Health Data Fellows Program. At FMCNA, he is responsible for building, piloting, and deploying predictive models using machine learning to improve the quality of life of every patient who receives dialysis from FMCNA. He currently has multiple models in production to predict which patients are at the highest risk of negative outcomes.
Causal Inference & Machine Learning
Speaker: Vinod Bakthavachalam, Data Scientist at Coursera
Lots of data science problems, especially towards informing business and product strategy, involve understanding causal relationships. The standard way to measure these is through AB testing, but many times that is infeasible, requiring alternative techniques from the causal inference that are an essential component of any data scientist's toolkit. The talk will walk through these techniques, some applications, and recent work at the intersection of causal inference and machine learning to handle large data sets.
Real-ish Time Predictive Analytics with Spark Structured Streaming
Speaker: Scott J Haines, Principal Software Engineer at Twilio
In 20 short minutes learn what becomes possible when you add Spark into your analytics pipeline. Learn how to effectivley solve common Data Engineering problems with compile-time guarenttes - like how to ingest, normalize, transform and join datasets in realtime. Learn how to add insights on top of your streaming data with simple filters and pre-trained models.
Visualizing Complexity: Dimensionality Reduction and Network Science
Speaker: Jane Adams, Data Visualization Artist at University of Vermont Complex Systems Center
Working with mathematicians, data scientists, and domain experts at the University of Vermont Complex Systems Center, data visualization artist Jane Adams has developed strategies for prototyping exploratory graphs of high-dimensional data. In this 90-minute workshop, Adams shares some of these methods for data discovery and interaction, navigating a creative workflow from paper prototypes of visual hypotheses through web-based interactive slices, offering critical insight for clustering, interpolation, and feature engineering.
Healthcare NLP with a doctor's bag of notes
Speaker: Andrew Long, PhD, Data Scientist at Fresenius Medical Care
Nausea, vomiting, and diarrhea are words you would not frequently find in a natural language processing (NLP) project for tweets or product reviews. However, these words are common in healthcare. In fact, many clinical signs and patient symptoms (e.g. shortness of breath, fever, or chest pain) are only present in free-text notes and are not captured with structured numerical data. As a result, it is important for healthcare data scientists to be able to extract insight from unstructured clinical notes in electronic medical records. In this 20 min warm-up, the audience will have the opportunity to learn about an NLP concept known as bag-of-words. The audience will also get a preview of the outline for the 90-min workshop held at the upcoming ODSC West 2019.