Data-Intensive Text Processing with MapReduce
MapReduce Patterns, Algorithms, and Use Cases
Exploratory data analysis (EDA)
Learning to explore data with plots
This opinionated guide exists to provide both novice and expert Python developers a best-practice handbook to the installation, configuration, and usage of Python on a daily basis.
MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine-learning methods for structured and unstructured data.
DataKind (formerly known as Data Without Borders) brings together leading data scientists with high impact social organizations through a comprehensive, collaborative approach that leads to shared insights, greater understanding, and positive action through data in the service of humanity.
Some data and machine learning talks videos from PyCon Us 2012
Data Scientists to follow

Joseph Adler
Joseph Adler has many years of experience in data mining and data analysis at companies including DoubleClick, American Express, and VeriSign. He graduated from MIT with an B.Sc. and M.Eng in Computer Science and Electrical Engineering. He is the inventor of several patents for computer security and cryptography, and the author of “Baseball Hacks” and “R in a Nutshell”. Currently, he is a senior data scientist at LinkedIn.
www.oreillynet.com/pub/au/2033
***

Hilary Mason
bitly
Hilary Mason is the Chief Scientist at bit.ly, where she finds sense in vast data sets. Her work involves both pure research and development of product-focused features.
She’s also a co-founder of HackNY, a non-profit organization that connects talented student hackers from around the world with startups in NYC.
Hilary recently started the data science blog Dataists and is a member of hacker collective NYC Resistor.
She has discovered two new species, loves to bake cookies, and asks way too many questions.
***

Drew Conway
New York University
Drew Conway is a PhD student in political science at New York University. Drew studies terrorism and armed conflict; using tools from mathematics and computer science to gain a deeper understanding of these phenomena.
***

Jake Hofman
Yahoo!
Jake Hofman is a member of the Human Social Dynamics group at Yahoo! Research. His work involves data-driven modeling of social data, focusing on applications of machine learning and statistical inference to large-scale data. He holds a B.S. in Electrical Engineering from Boston University and a Ph.D. in Physics from Columbia University.