Clean Data Science: Evaluating The Cleanliness of NYC Craft Beer Bar Kitchens

[unable to retrieve full-text content]An analysis of NYC Open Data health inspections showing that craft beer bar kitchens in Manhattan are cleaner than the average establishment by a statistically significant margin. An encouraging finding for Dry January.
Original Post: Clean Data Science: Evaluating The Cleanliness of NYC Craft Beer Bar Kitchens

The Most Popular Language For Machine Learning and Data Science Is …

By Jean-Francois Puget, IBM. What programming language should one learn to get a machine learning or data science job?  That’s the silver bullet question.  It is debated in many forums.  I could provide here my own answer to it and explain why, but I’d rather look at some data first.  After all, this is what machine learners and data scientists should do: look at data, not opinions. So, let’s look at some data.  I will use the trend search available on indeed.com.  It looks for occurrences over time of selected terms in job offers.  It gives an indication of what skills employers are seeking.  Note however that it is not a poll on which skills are effectively in use.  It is rather an advanced indicator of how skill popularity evolve (more formally, it is probably close to the first order…
Original Post: The Most Popular Language For Machine Learning and Data Science Is …

Creating Data Visualization in Matplotlib

By DataScience.com Sponsored Post. Prerequisites Experience with the specific topic: Novice Professional experience: No industry experience The reader should be familiar with basic data analysis concepts and have some experience with a programming language (Python is ideal but not required). The dataset used can be downloaded here. You will only need day.csv after unzipping the dataset. Introduction to Data Visualization Data visualization is a key part of any data science workflow, but it is frequently treated as an afterthought or an inconvenient extra step in reporting the results of an analysis. Taking such a stance is a mistake — as the cliché goes, a picture is worth a thousand words. Data visualization should really be part of your workflow from the very beginning, as there is a lot of value and insight to be gained from just looking at your data. Summary…
Original Post: Creating Data Visualization in Matplotlib

Tidying Data in Python

By Jean-Nicholas Hould, JeanNicholasHould.com. I recently came across a paper named Tidy Data by Hadley Wickham. Published back in 2014, the paper focuses on one aspect of cleaning up data, tidying data: structuring datasets to facilitate analysis. Through the paper, Wickham demonstrates how any dataset can be structured in a standardized way prior to analysis. He presents in detail the different types of data sets and how to wrangle them into a standard format. As a data scientist, I think you should get very familiar with this standardized structure of a dataset. Data cleaning is one the most frequent task in data science. No matter what kind of data you are dealing with or what kind of analysis you are performing, you will have to clean the data at some point. Tidying your data in a standard format makes things easier down…
Original Post: Tidying Data in Python

Supercharge Your Data Science Team with AnacondaCON Team Discount, till Jan 16

2017: THE YEAR OF OPEN DATA SCIENCE 2016 saw a staggering year of growth for Anaconda with total downloads topping 11M. Whether you’re a new user or a longstanding member of the #AnacondaCREW, AnacondaCON ’17 will help you conquer your biggest data science challenges. Learn from more than 20 presentations by industry experts sharing what #OpenDataScienceMeans and their best practices for leveraging Anaconda. BUSINESS TRACK From discovering cures for rare genetic diseases to city planning and tax policy analyses, data scientists are using Anaconda for advanced machine learning and creating interactive dashboards and apps with rich visualizations. Hear from the leaders defining what #OpenDataScienceMeans for business. TECHNOLOGY TRACK Open Data Science is advancing at exponential speeds. How does machine learning address business problems? What’s the role of Open Data Science in the new data-driven culture? What does Artificial Intelligence mean…
Original Post: Supercharge Your Data Science Team with AnacondaCON Team Discount, till Jan 16

5 Machine Learning Projects You Can No Longer Overlook, January

Previous instalments of “5 Machine Learning Projects You Can No Longer Overlook” brought to light a number of lesser-known machine learning projects, and included both general purpose and specialized machine learning libraries and deep learning libraries, along with auxiliary support, data cleaning, and automation tools. After a hiatus, we thought the idea deserved another follow-up. This post will showcase 5 machine learning projects that you may not yet have heard of, including those from across a number of different ecosystems and programming languages. You may find that, even if you have no requirement for any of these particular tools, inspecting their broad implementation details or their specific code may help in generating some ideas of your own. Like the previous iteration, there is no formal criteria for inclusion beyond projects that have caught my eye over time spent online, and…
Original Post: 5 Machine Learning Projects You Can No Longer Overlook, January

Over 600 data science, machine learning, Big Data eBooks/videos for only $5 (until Jan 9)

Packt have more than 600 data science, analysis, machine learning and Big Data eBooks and video courses. And right now every single one is available for just $5. Why? Because until January 9th 2017, every single eBook and video on their website is $5. That makes it the perfect time to prepare for 2017 and build your own personal library of tech resources. Whether you want to hone your skills and reset your focus, or expand your horizons from data to Docker to DevOps to design, with Packt you can. Start your $5 search with Packt now. Packt have also put together a range of handpicked bundles – so you can grab 5 related eBooks for $25. That means you can be confident you’re getting a truly comprehensive set of content for one awesome price.
Original Post: Over 600 data science, machine learning, Big Data eBooks/videos for only (until Jan 9)

3 ways to learn Data Science at Statistics.com

Get the personal touch that you need to deepen your learning. Statistics.com classes are small,  with rich and engaging content that includes readings, videos, quizzes, homework, projects, and practical work with software.  All courses are taught online by well-respected instructors (most are authors of the text you will use, or practitioners in the field) who will answer all your questions on  a private discussion forum. Use promo code holidaykdn16 for $116 off each course until Dec 31, 2016. 1. Certificates in Data Science Want to expand your knowledge in data science while remaining in your current job? These two online certificates have 10 courses each (each course is 4 weeks long) and are taught in small cohorts by leading text authors or experts in each subject.  The cost is $5,000 and each course takes about 10-15 hours per week to…
Original Post: 3 ways to learn Data Science at Statistics.com

Top KDnuggets tweets, Dec 7-13: Want to learn Numpy? A Github repo of Numpy learning exercises

Top 10 most engaging Tweets Want to learn Numpy? A Github repo of Numpy learning exercises #Python https://t.co/YWXoMNPSRV https://t.co/nYmVByFw4y Deep Learning Papers Reading Roadmap: “Which paper should I start reading from?” https://t.co/idorZM8RC0 https://t.co/W9l4eqiJ0I #ICYMI Free ebooks: #MachineLearning with #Python and Practical Data Analysis https://t.co/KUAAu2eyRz https://t.co/16O9NMBiG3 Great resource! A complete daily plan for studying to become a Google software engineer https://t.co/AO7e3N4g9y https://t.co/kfQ99CnTSk Free #ebooks: #MachineLearning with #Python, Practical #Data #Analysis #DataScience @PacktPub https://t.co/2HKrEQCnW2 https://t.co/Slqr8F13Bm 4 Cognitive Bias Key Points Data Scientists Need to Know https://t.co/ucE4QWtnPR #DataScience https://t.co/d9pXB0STQm Top tweets: Great collection of clean implementations of #MachineLearning algorithms; ML Yearning free download https://t.co/E21MXfTZNp https://t.co/BmwOqvaJs3 Bayesian Basics, Explained https://t.co/5DqetmKbLn #Bayes https://t.co/5Tj1tOszuV Open sourcing the Embedding Projector: a tool for visualizing high dimensional data https://t.co/J09SF3L7kw https://t.co/Sq8t8ivben Google #OpenSources its #DataVisualization tool, Embedding Projector – researchers can see data w/out #TensorFlow https://t.co/jVSF2mHicA https://t.co/tCPNQ0skWA Top Stories…
Original Post: Top KDnuggets tweets, Dec 7-13: Want to learn Numpy? A Github repo of Numpy learning exercises

50+ Data Science, Machine Learning Cheat Sheets, updated

This post updates a previous very popular post 50+ Data Science, Machine Learning Cheat Sheets. If we missed some popular cheat sheets, add them in the comments below. Cheatsheets on Python, R and Numpy, Scipy, Pandas Data science is a multi-disciplinary field. Thus, there are thousands of packages and hundreds of programming functions out there in the data science world! An aspiring data enthusiast need not know all. A cheat sheet or reference card is a compilation of mostly used commands to help you learn that language’s syntax at a faster rate. Here are the most important ones that have been brainstormed and captured in a few compact pages. Mastering Data science involves understanding of statistics, mathematics, programming knowledge especially in R, Python & SQL and then deploying a combination of all these to derive insights using the business understanding &…
Original Post: 50+ Data Science, Machine Learning Cheat Sheets, updated