Probability functions beginner

On this set of exercises, we are going to explore some of the probability functions in R with practical applications. Basic probability knowledge is required. Note: We are going to use random number functions and random process functions in R such as runif, a problem with these functions is that every time you run them you will obtain a different value. To make your results reproducible you can specify the value of the seed using set.seed(‘any number’) before calling a random function. (If you are not familiar with seeds, think of them as the tracking number of your random numbers). For this set of exercises we will use set.seed(1), don’t forget to specify it before every random exercise. Answers to the exercises are available here If you obtained a different (correct) answer than those listed on the solutions page, please…
Original Post: Probability functions beginner

Working with air quality and meteorological data Exercises (Part-1)

Atmospheric air pollution is one of the most important environmental concerns in many countries around the world, and it is strongly affected by meteorological conditions. Accordingly, in this set of exercises we use openair package to work and analyze air quality and meteorological data. This packages provides tools to directly import data from air quality measurement network across UK, as well as tools to analyse and producing reports. In this exercise set we will import and analyze data from MY1 station which is located in Marylebone Road in London, UK. Answers to the exercises are available here. Please install and load the package openair before starting the exercises. Exercise 1Import the MY1 data for the year 2016 and save it into a dataframe called my1data. Exercise 2Get basic statistical summaries of myd1 dataframe. Exercise 3Calculate monthly means of:a. pm10b. pm2.5b.…
Original Post: Working with air quality and meteorological data Exercises (Part-1)

Shinydashboards from right to left (localizing a shinydashboard to Hebrew)

Post by Adi Sarid (Sarid Institute for Research Services LTD.) Lately I’ve been working a lot with the shinydashboard library. Like shiny, it allows any R programmer to harness the power of R and create professional looking interactive apps. The thing about shinydashboards is that it makes wonderfully looking dashboards. What I’ve been doing with the dashboards is to create dedicated dashboards for my customers. Since most of my customers speak, read, and write in Hebrew I needed to fit that into my shinydashboard apps (i.e., fully localize the app). See an example for such a localized dashboard I made here. Making a shinydashboard localized turned out to be simpler than I thought.  Since the average R programmer doesn’t necessarily know and understand CSS, I thought I post my solution. This should fit any Hebrew or Arabic dashboard to work from right to left,…
Original Post: Shinydashboards from right to left (localizing a shinydashboard to Hebrew)

Hacking statistics or: How I Learned to Stop Worrying About Calculus and Love Stats Exercises (Part-6)

Statistics are often taught in school by and for people who like Mathematics. As a consequence, in those class emphasis is put on leaning equations, solving calculus problems and creating mathematics models instead of building an intuition for probabilistic problems. But, if you read this, you know a bit of R programming and have access to a computer that is really good at computing stuff! So let’s learn how we can tackle useful statistic problems by writing simple R query and how to think in probabilistic terms. In previous set, we’ve seen how to compute probability based on certain density distributions, how to simulate situations to compute their probability and use that knowledge make decisions in obvious situation. But what is a probability? Is there a more scientific way to make those decisions? What is the P-value xkcd keep talking…
Original Post: Hacking statistics or: How I Learned to Stop Worrying About Calculus and Love Stats Exercises (Part-6)

ICML 2017 Thoughts

ICML 2017 has just ended. While Sydney is remote for those in Europe and North America, the conference centeris a wonderful venue (with good coffee!), and the city is a lot of fun. Everything went smoothly and theorganizers did a great job.You can get a list of papers that I liked from my Twitter feed, so instead I’d like to discuss some broad themesI sensed.Multitask regularization to mitigate sample complexity in RL. Both in video games and in dialog, it is useful to add extra (auxiliary) tasks in order to accelerate learning. Leveraging knowledge and memory. Our current models are powerful function approximators, but in NLP especially we need to go beyond “the current example” in order exhibit competence. Gradient descent as inference. Whether it’s inpainting with a GAN or BLUE score maximization with an RNN, gradient descent is an…
Original Post: ICML 2017 Thoughts

New Course – Supervised Learning in R: Regression

Hello R users, new course hot off the press today by Nina Zumel – Supervised Learning in R: Regression! From a machine learning perspective, regression is the task of predicting numerical outcomes from various inputs. In this course, you’ll learn about different regression models, how to train these models in R, how to evaluate the models you train and use them to make predictions. Take me to chapter 1! Supervised Learning in R: Regression features interactive exercises that combine high-quality video, in-browser coding, and gamification for an engaging learning experience that will make you a master in supervised learning with R! What you’ll learn: Chapter 1: What is Regression? In this chapter you are introduced to the concept of regression from a machine learning point of view. We present the fundamental regression method: linear regression. You will learn how to fit a linear regression model and…
Original Post: New Course – Supervised Learning in R: Regression

Understanding the Math of Correspondence Analysis with Examples in R

Correspondence analysis is a popular tool for visualizing the patterns in large tables. To many practitioners it is probably a black box. Table goes in, chart comes out. In this post I explain the mathematics of correspondence analysis. I show each step of the calculation, and I illustrate all the of the steps using R. If you’ve ever wanted a deeper understanding of what’s going on behind the scenes of correspondence analysis, then this post is for you. The data that I analyze shows the the relationship between thoroughness of newspaper readership by education level. It is a contingency table, which is to say that each number in the table represents the number of people in each pair of categories. For example, the cell in the top-left corner tells us that 5 people with some primary education glanced at the newspaper. The table shows the…
Original Post: Understanding the Math of Correspondence Analysis with Examples in R

Data visualization with googleVis exercises part 10

Timeline, Merging & Flash charts This is part 10 of our series and we are going to explore the features of some interesting types of charts that googleVis provides like Timeline, Flash and learn how to merge two googleVis charts to one. Read the examples below to understand the logic of what we are going to do and then test yous skills with the exercise set we prepared for you. Lets begin! Answers to the exercises are available here. Package & Data frame As you already know, the first thing you have to do is install and load the googleVis package with:install.packages(“googleVis”)library(googleVis) Secondly we will create an experimental data frame which will be used for our charts’ plotting. You can create it with:datTLc <- data.frame(Position=c(rep(“President”, 3), rep(“Vice”, 3)),Name=c(“Washington”, “Adams”, “Jefferson”,”Adams”, “Jefferson”, “Burr”),start=as.Date(x=rep(c(“1789-03-29”, “1797-02-03″,”1801-02-03”),2)),end=as.Date(x=rep(c(“1797-02-03”, “1801-02-03″,”1809-02-03”),2))) You can explore the “datTLC” data…
Original Post: Data visualization with googleVis exercises part 10

Because it's Friday: People remain awesome

The People are Awesome people are making the rounds again: the Best of 2017 so far video is popping up all over the place. But that made me realize I’d missed the 2016 video late last year, and if you did too it has some amazing stunts (and editing!): We’re all done for the week. Have a great weekend, and we’ll be back with more here on the blog on Monday. Enjoy!
Original Post: Because it's Friday: People remain awesome

Dallas Animal Services: Shelter Intake Types vs. Outcomes Analysis

Thanks to Dallas OpenData anyone has access to the city animal shelter records.  If you lost or found a pet it could be that he or she spent some time in a shelter – I personally took lost dogs there twice. It’s unfortunate but every year tens of thousands of animals find their way to shelters with significant fraction never finding way out. City of Dallas animal shelter dataset (I will share in another post detailed account on how the data is consumed and processed in R) contains 5 types of animals but solid lead belongs to dogs: Admissions by Animal Types For consistency and plausibility of analysis we will focus only on the records with dogs. More exactly, each animal gets admitted to a shelter with certain intake type and gets discharged with certain outcome:  Dogs Admitted by Intake Types Top 3 reasons…
Original Post: Dallas Animal Services: Shelter Intake Types vs. Outcomes Analysis