ACM SIGKDD
Seattle Chapter

Our goal is to promote community of Data Scientists, Statisticians, Machine Learning experts/researchers and practitioners from both industry and academia by organizing talks from leaders in the field, regular meetups with fun data science activities and workshops in the greater Seattle area.

Discussed at KDD 2013 and Established in March, 2014

Great support from local community and help from ACM and SIGKDD

Special thanks to Ying Li, Johannes Gehrke, Raghu Ramakrishnan, Bing Liu, and Zarina Strakhan

Events

Recent Activities / Presentations / Talks

The Science Behind Predicting Voice Elicited Emotions

Wednesday, May 18th, 2016

(5:30 PM - 7:30 PM)

Location: Madrona Venture Group
999 3rd Ave, Seattle

34th floor

It’s not what you say; it’s how you say it!

Meet local data scientists, data enthusiasts, developers, and otherwise cool people while learning about the science behind voice analytics and how they are being applied at Jobaline Inc. in Kirkland.

Jobaline Inc.’s Chief Data Scientist Dr. Ying Li presents the latest research in her talk: The Science Behind Predicting Voice Elicited Emotions Hosted at the Madrona Venture Group offices in Seattle.

5:30pm: Doors open, come enjoy Pizza & refreshments, socialize pre-talk
6:15 - 7:00pm: Dr. Li presents
7:00pm: Q & A, post-presentation social

Dr. Li will present the research, product development and eventual deployment, of Voice Analyzer developed at Jobaline that analyzes voice data and predicts human emotions elicited by the paralinguistic elements of voices. She will give an overview of the raw data, the data processing steps, and the prediction algorithms we experimented with, and the deployed system.

She will present case studies where, given a voice clip, models predict the degree in which a listener will find themselves feeling “engaged” or “soothed”. The technology is deployed into Jobaline products for assisting companies to hire workers in the service industries where customers’ emotional response to workers’ voices may affect the service outcome.

Message from the Speaker:

Dr. Ying Li

Building on my personal dedicated practice of data science in multiple industries since 1998, and in the spirit of sharing with the community, the last quarter of this talk will present a set of learnt principles for the Practice of Data Science, enumerate the current states of practices through examples, anticipate an optimal future for which the practitioners of data science should be prepared for and contribute to, in the hope that a disciplined practice of data science will truly deserve the hyped social and economical attention, and more importantly will scale and maximize to new potentials.

Data Club Meetup-Data Science from Scratch (Gradient Descent/Logistic Regression)

Wednesday, September 2nd 2015

(6:30 PM - 8:30 PM)

Location: Bellevue City Hall

Room: 1E-120

Abstract:

In continuation from our last meet-up, we will be covering two more chapters from Joel Grus' book, Data Science from Scratch: First Principles with Python. Following Joel’s format, we will first go over a brief theoretical description of the algorithms and then collaboratively code them in Python. The two chapters we will work as examples are Gradient Descent and Logistic Regression.

Regardless to one's level of programming expertise, one should gain a good understanding of these two algorithms after this meet-up.

About Kushal Lakhotia:

Kushal is an engineer in Bing's Web Search Relevance team at Microsoft where he works on ranking. He tweets at @hikushalhere.

Data Club Meetup - Data Science from Scratch (Naive Bayes and Neural Networks)

Wednesday, August 5th 2015

(6:30 PM - 8:30 PM)

Location: Bellevue City Hall

Room: 1E-120

Abstract

In continuation from our last meet-up, we will continue to work more chapters from Joel Grus' book, Data Science from Scratch: First Principles with Python. Following Joel’s format, we will first go over a brief theoretical description of the algorithms and then collaboratively code them in pure python. The two chapters we will work as examples are Naïve Bayes and Neural Networks.

Regardless to one's level of programming expertise, one should gain a deeper understanding of these two algorithms after this meet-up.

About Kevin Mueller:

Kevin is a current graduate student at the University of Washington studying applied mathematics. He is currently interning at Jobaline where he assists Dr. Ying Li with developing Jobaline’s voice analyzer.

Please bring your laptop, if you want to code along. You should also have python and matplotlib installed.

Data Club Meetup #5 - Data Science from Scratch and Clustering Application

Wednesday, June 24th 2015

(6:30 PM - 8:30 PM)

Location: Jobaline Headquarters

620 Kirkland Way

Suite 208

Kirkland, WA

Abstract

Everyone wants to either be a data scientist or hire a data scientist. Yet we spend very little time thinking about the best way to teach (or learn) data science. Should one start with math and stats? Or instead, should they just dive right into machine learning? Do they need to learn all the tools? I've tried them all and more. During this meetup, I'll give examples of what's worked and what hasn't and share some broader thoughts about tech education.

In particular, we will work through this problem as example: K-means clustering is a popular machine learning technique for identifying “clusters” in data sets. It’s also pretty simple to understand and implement. In this meetup, we’ll learn how the algorithm works, implement it in Python, and use it to “posterize” pictures.

About Joel

Joel is the author of "Data Science from Scratch: First Principles with Python". He works as a software engineer at Google. Before that he was a data scientist at several startups, where he first learned and then taught data science. He spends more time than is healthy thinking about pedagogy.

Please bring your laptop, if you want to code along. You should also have python and matplotlib installed.

Pizza and soft drinks will be sponsored by Jobaline.

Data Club Meetup #4

Tuesday, June 9th 2015

(6:30 PM - 8:30 PM)

Location: Bellevue City Hall

450 110th Ave NE

Bellevue, WA 98004

Topics for This Session

In the world of Big Data, analytics systems have benefited greatly from the ability to scale horizontally. Systems like Hadoop have been widely used to perform distributed batch processing on massive data sets, but there is a growing need in the industry to do the same scale of processing except in a real-time streaming fashion. Apache Storm is one such framework that enables this kind of processing. In this session, Brandon will introduce the core concepts of streaming distributed processing using Storm, the architecture of a Storm cluster, and show you what it takes to build your first Storm topology.

About Storm

Apache Storm is an open-source distributed realtime computation system used in the industry by companies like Twitter, Spotify, Expedia and others. Storm makes it easy to reliably process unbounded streams of data, doing for
realtime processing what Hadoop did for batch processing.

About Brandon

Brandon O’Brien is a Data Engineer working at Expedia who is leveraging Storm to build a real time travel market analytics platform called Expedia Insights. Contact: https://www.linkedin.com/in/brandonjobrien

Please bring your laptop, if you want to implement code.

Data Club Meetup #3

Wednesday, April 22nd, 2015

(6:30 PM - 8:30 PM)

Location: Bellevue City Hall (Room: 1E-120)

450 110th Ave NE

Bellevue, WA 98004

Objectives

We will meet to discuss/share data mining and machine learning (ML) techniques/tools.

We will also analyze public datasets, and build data mining and ML models/applications.

Topics for This Session

We would cover following topics.

Public medical survey data (at patient level after treatment)
A demo of supervised learning applied to the above data, with detailed steps

Please bring your laptop, if you want to implement code.

Directions and Parking: http://www.ci.bellevue.wa.us/parking-directions.htm

Bellevue City Hall provides complimentary parking, however, the visitor parking lot fills quickly. There are several “pay for parking” lots in the immediate vicinity should the lot be full.

David Kasik, Boeing use of Visualization and Visual Analytics

Tuesday, April 14th, 2015

Speaker Bio

Dave Kasik is Boeing's Senior Technical Fellow in visualization and interactive techniques and is pioneering the use of visual analytics to help extract more information from complex non-geometric data. Visual analytics supplements more traditional analytic techniques (like statistics and data mining) with a human’s ability to use vision to find anomalies and detect trends. He is exploring emerging visual analytics tools in areas as diverse as safety and marketing.

Dave earned his Masters in Computer Science from the University of Colorado in 1972 and a Bachelor’s in Quantitative Studies from the Johns Hopkins University in 1970. He’s an ACM Fellow and involved in professional activities with both ACM and IEEE.

Abstract:

The talk would be centered around impact of increasing amount of data on visualization, difference between Data Analysis and Data Analytics, motivation, trends, desired skills and more - similar to what Dave talked to KD Nuggets

http://www.kdnuggets.com/2015/02/interview-david-kasik-boeing-data-analytics.html

Data Club Meetup #1 & #2

Thursday, March 19, 2015

Monday, March 30th, 2015
We will meet to discuss/share data mining and machine learning (ML) techniques/tools.We will also analyze public datasets, and build data mining and ML models/applications.

Participants will be able to build/accumulate a portfolio of data science work. A portfolio is best acknowledged if it is displayed to the public. For this reason and for the benefit of participants at Data Club, we consider what we will be working on in the Data Club to be public domain.

Sum-Product Networks: Deep Models with Tractable Inference by Dr. Pedro Domingos

Tuesday, March 3, 2015
Abstract:
Big data makes it possible in principle to learn very rich probabilistic models, but inference in them is prohibitively expensive. Since inference is typically a subroutine of learning, in practice learning such models is very hard. Sum-product networks (SPNs) are a new model class that squares this circle by providing maximum flexibility while guaranteeing tractability. In contrast to Bayesian networks and Markov random fields, SPNs can remain tractable even in the absence of conditional independence. SPNs are defined recursively: an SPN is either a univariate distribution, a product of SPNs over disjoint variables, or a weighted sum of SPNs over the same variables. It's easy to show that the partition function, all marginals and all conditional MAP states of an SPN can be computed in time linear in its size. SPNs have most tractable distributions as special cases, including hierarchical mixture models, thin junction trees, and nonrecursive probabilistic context-free grammars. I will present generative and discriminative algorithms for learning SPN weights, and an algorithm for learning SPN structure. SPNs have achieved impressive results in a wide variety of domains, including object recognition, image completion, collaborative filtering, and click prediction. Our algorithms can easily learn SPNs with many layers of latent variables, making them arguably the most powerful type of deep learning to date. (Joint work with Rob Gens and Hoifung Poon.)

Dr. Domingos received an undergraduate degree (1988) and M.S. in Electrical Engineering and Computer Science (1992) from IST, in Lisbon. He received an M.S. (1994) and Ph.D. (1997) in Information and Computer Science from the University of California at Irvine. He spent two years as an assistant professor at IST, before joining the faculty of the University of Washington in 1999. He’s the author or co-author of over 200 technical publications in machine learning, data mining, and other areas. He’s a winner of the SIGKDD Innovation Award, the highest honor in data science. He’s a AAAI Fellow, and he's received a Sloan Fellowship, an NSF CAREER Award, a Fulbright Scholarship, an IBM Faculty Award, several best paper awards, and other distinctions. He’s a member of the editorial board of the Machine Learning journal, co-founder of the International Machine Learning Society, and past associate editor of JAIR. He was program co-chair of KDD-2003 and SRL-2009, and he’s served on the program committees of AAAI, ICML, IJCAI, KDD, NIPS, SIGMOD, UAI, WWW, and others.

The Future of Data Mining Talk by Dr. Oren Etzioni

Tuesday, October 28, 2014
Abstract:
Deep learning has catapulted to the front page of the New York Times, formed the core of the so-called 'Google brain,' and achieved impressive results in vision, speech recognition, and elsewhere. Yet building intelligent systems requires us to go way beyond the capabilities of deep learning and today's data-mining systems. The future of the Big Data paradigm lies in extending these powerful methods to acquire knowledge from text, databases, diagrams, images, and video. We also need to reason tractably using this acquired knowledge to make sense of the world, and to draw novel conclusions. My talk will describe research at the new Allen Institute for AI aimed at building this next generation of intelligent systems. This will be a more in-depth version of my KDD 2014 keynote talk.