Data Science Primer: Basic Concepts for Beginners

[unable to retrieve full-text content]This collection of concise introductory data science tutorials cover topics including the difference between data mining and statistics, supervised vs. unsupervised learning, and the types pf patterns we can mine from data.
Original Post: Data Science Primer: Basic Concepts for Beginners

Analytically Speaking Featuring Pedro Saraiva, July 12

[unable to retrieve full-text content]Former academician and now Portugal MP Pedro Saraiva says that Parliaments and societies will improve if more people with a good statistical background become MP. Learn about the paradoxes and issues in statistics and politics.
Original Post: Analytically Speaking Featuring Pedro Saraiva, July 12

Is Regression Analysis Really Machine Learning?

[unable to retrieve full-text content]What separates “traditional” applied statistics from machine learning? Is statistics the foundation on top of which machine learning is built? Is machine learning a superset of “traditional” statistics? Do these 2 concepts have a third unifying concept in common? So, in that vein… is regression analysis actually a form of machine learning?
Original Post: Is Regression Analysis Really Machine Learning?

Who is the caretaker? Evidence-based probability estimation with the bnlearn package

by Juan M. Lavista Ferres , Senior Director of Data Science at MicrosoftIn what was one of the most viral episodes of 2017, political science Professor Robert E Kelly was live on BBC World News talking about the South Korean president being forced out of office when both his kids decided to take an easy path to fame by showing up in their dad’s interview.  The video immediately went viral, and the BBC reported that within five days more than 100 million people from all over the world had watched it. Many people around the globe via Facebook, Twitter and reporters from reliable sources like Time.com thought the woman that went after the children was her nanny, when in fact, the woman in the video was Robert’s wife, Jung-a Kim, who is Korean.  The confusion over this episode caused…
Original Post: Who is the caretaker? Evidence-based probability estimation with the bnlearn package

An Introduction to Spatial Data Analysis and Visualization in R

The Consumer Data Research Centre, the UK-based organization that works with consumer-related organisations to open up their data resources, recently published a new course online: An Introduction to Spatial Data Analysis and Visualization in R. Created by James Cheshire (whose blog Spatial.ly regularly features interesting R-based data visualizations) and Guy Lansley, both of University College London Department of Geography, this practical series is designed to provide an accessible introduction to techniques for handling, analysing and visualising spatial data in R. In addition to a basic introduction to R, the course covers specialized topics around handling spatial and geographic data in R, including: Making maps in R Mapping point data in R Using R to create, explore and interact with data maps (like the one shown below) Performing statistical analysis on spatial data: interpolation and kriging, spatial autocorrelation, geographically weighted regression and more. The course, tutorials…
Original Post: An Introduction to Spatial Data Analysis and Visualization in R

Madrid UPM Advanced Statistics and Data Mining Summer School, June 26 – July 7

[unable to retrieve full-text content]The courses cover topics such as Neural Networks and Deep Learning, Bayesian Networks, Big Data with Apache Spark, Bayesian Inference, Text Mining and Time Series, and each has theoretical as well as practical classes, done with R or Python. Early bird till June 5.
Original Post: Madrid UPM Advanced Statistics and Data Mining Summer School, June 26 – July 7

Because it's Friday: Bayesian Trap

If you get a blood test to diagnose a rare disease, and the test (which is very accurate) comes back positive, what’s the chance you have the disease? Well if “rare” means only 1 in a thousand people have the disease, and “very accurate” means the test returns the correct result 99% of the time, the answer is … just 9%. There’s less than a 1 in 10 chance you actually have the disease (which is why doctor will likely have you tested a second time). Now that result might seem surprising, but it makes sense if you apply Bayes Theorem. (A simple way to think of it is that in a population of 1000 people, 10 people will have a positive test result, plus the one who actually has the disease. One in eleven of the positive results, or 9%,…
Original Post: Because it's Friday: Bayesian Trap