I started learning data science about a years ago.. This is mostly geared towards people who are in the same position I was in.
A lot of advice around learning data science starts with "first learn python", or "first take a linear algebra course". This advice is fine, but if I followed it, I never would have learned any data science. Being data scientist requires a solid foundation typically in computer science and applications, modelling, statistics, analytics and math.
A lot of advice around learning data science starts with "first learn python", or "first take a linear algebra course". This advice is fine, but if I followed it, I never would have learned any data science. Being data scientist requires a solid foundation typically in computer science and applications, modelling, statistics, analytics and math.
What sets the data scientist apart is strong business acumen, coupled with the ability to communicate findings to both business and IT leaders in a way that can influence how an organization approaches a business challenge. Good data scientists will not just address business problems, they will pick the right problems that have the most value to the organization.
Here is a list of books on doing machine learning / data science in R and Python which I’ve come across in last one year. Since, reading helps to keep close to the topic it will also works and reference guide.
Disclosure: The amazon links in this article are affiliate links. If you buy a book through this link, we would get paid through Amazon. This is one of the ways for us to cover our costs while we continue to create these awesome articles. Further, the list reflects our recommendation based on content of book and is no way influenced by the commission.
Python and R
1) Python for Data Analysis
I was new to Python and to Data Analysis when I started working through this book. I found the chapters on numpy particularly useful: what was also helpful was to have everything on one place, rather than having to scratch around for it online.
I was new to Python and to Data Analysis when I started working through this book. I found the chapters on numpy particularly useful: what was also helpful was to have everything on one place, rather than having to scratch around for it online.
The book covers the basics of Python, as well as IPython, Numpy and Pandas. I still use it now as a reference. If you’re well versed in python and data analysis, it’s probably worth the purchase price; but if you’re new to it all, I would definitely recommend it.
However, Think Python is a book I'd recommend over again and again to anyone who seeks a gentle introduction to the good parts of the Python language.
The book, as I've found is often recommended by professionals everywhere - for example, right on Quora, stack overflow. Personally, it's one of those few books I've managed to go through cover-to-cover .
If you've used `R` in the past but mainly use base functions then this will be a great refresher for you. If you're new to the world of `R` then this book will give you a solid foundation of how to get started. It is a collection of R packages designed to work together to make data science fast and fluent.
Statistical and Math for data analysis
1) Think stat (2nd edition)This book covers more aspect of statistics required to get your hands dirty by learning to do practical work. This book works completely fine for beginner as well .
2) Introduction to Probability
An intuitive, yet precise introduction to probability theory, stochastic processes, statistical inference, and probabilistic models used in science, engineering, economics, and related fields. This is the currently used textbook for "Probabilistic Systems Analysis," an introductory probability course at the Massachusetts Institute of Technology, attended by a large number of undergraduate and graduate students.
“An Introduction to Statistical Learning With Applications in R” gives you an overview of analyzing, organizing, and leveraging data using the powerful and popular R programming language. Written by Gareth James, professor of data sciences at USC; Daniela Witten, professor of biostatistics at University of Washington; and Robert Tibshirani and Trevor Hastie, professors of statistics at Stanford, it is ideal for both statisticians and non-technical professionals who are looking to understand data management, analysis, and presentation techniques.
Big data
In “Predictive Analytics“, Eric Siegel, a renowned expert in data analytics and former professor at Columbia University, explains how scientists use big data to help predict, well, anything – from what you will buy, to where you will travel, to when you will quit your job, and more. The Seattle Post-Intelligencer called the book “mesmerizing,” and also praised its relevance to multiple business departments.
Apache Hadoop is a framework used to process large amounts of data. Tom White is an expert Hadoop consultant, trainer, and member of the Apache Software Foundation. His guide, “Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale,” will help you understand how to build and manage scalable systems using Hadoop. It’s a good reference for programmers, and for IT managers tasked with running Hadoop clusters.
This book is written by Kenneth Cukier and Viktor Mayer Schonberger. This book takes you on a world tour of values added by big data across all industries. This book will help you to stay ahead of the key trends defining businesses in coming years. Jeff Jonas, Chief Scientist, IBM Entity Analytics said, ‘The book teems with great insights on the new ways of harnessing information, and offers a convincing vision of the future. It is essential reading for anyone who uses — or is affected by — big data.’
We will keep on updating the list with few more resource and books .Feel free to share your views and suggestion which will be helpful to refer to become a better data scientist.
1 comments so far
Thanks for posting the resources. Can you please provide with machine learning resources or books you follow ?
EmoticonEmoticon