Link

Link

Data-Intensive Text Processing with MapReduce

Link

MapReduce Patterns, Algorithms, and Use Cases

Quote
"Data Journalism Handbook 1.0 BETA"

Welcome - The Data Journalism Handbook

Link

Exploratory data analysis (EDA)
Learning to explore data with plots

Quote
"The machine learning toolbox’s focus is on large scale kernel methods and especially on Support Vector Machines (SVM)"

shogun | A Large Scale Machine Learning Toolbox

Link

This opinionated guide exists to provide both novice and expert Python developers a best-practice handbook to the installation, configuration, and usage of Python on a daily basis.

Link

MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine-learning methods for structured and unstructured data.

Quote
"GGobi is an open source visualization program for exploring high-dimensional data. It provides highly dynamic and interactive graphics such as tours, as well as familiar graphics such as the scatterplot, barchart and parallel coordinates plots"

GGobi data visualization system.

Tags: ml viz dm book
Link
Link
Link
Link

DataKind (formerly known as Data Without Borders) brings together leading data scientists with high impact social organizations through a comprehensive, collaborative approach that leads to shared insights, greater understanding, and positive action through data in the service of humanity.

Link

Some data and machine learning talks videos from PyCon Us 2012

Tags: ml dm python
Text

Data Scientists to follow

Photo of Joseph Adler

Joseph Adler

LinkedIn

Joseph Adler has many years of experience in data mining and data analysis at companies including DoubleClick, American Express, and VeriSign. He graduated from MIT with an B.Sc. and M.Eng in Computer Science and Electrical Engineering. He is the inventor of several patents for computer security and cryptography, and the author of “Baseball Hacks” and “R in a Nutshell”. Currently, he is a senior data scientist at LinkedIn.


www.oreillynet.com/pub/au/2033

***

Photo of Hilary Mason

Hilary Mason

bitly

Hilary Mason is the Chief Scientist at bit.ly, where she finds sense in vast data sets. Her work involves both pure research and development of product-focused features.

She’s also a co-founder of HackNY, a non-profit organization that connects talented student hackers from around the world with startups in NYC.

Hilary recently started the data science blog Dataists and is a member of hacker collective NYC Resistor.

She has discovered two new species, loves to bake cookies, and asks way too many questions.

http://www.hilarymason.com

***

Photo of Drew Conway

Drew Conway

New York University

Drew Conway is a PhD student in political science at New York University. Drew studies terrorism and armed conflict; using tools from mathematics and computer science to gain a deeper understanding of these phenomena.

http://www.drewconway.com/zia

***

Photo of Jake Hofman

Jake Hofman

Yahoo!

Jake Hofman is a member of the Human Social Dynamics group at Yahoo! Research. His work involves data-driven modeling of social data, focusing on applications of machine learning and statistical inference to large-scale data. He holds a B.S. in Electrical Engineering from Boston University and a Ph.D. in Physics from Columbia University.

http://jakehofman.com

Tags: dm ml