When dealing with Machine Learning problems in R, most of the time you rely on already existing libraries. This fastens the analysis process, but do you really understand what is behind the algorithms? Could you implement a logistic regression from scratch with R?The goal of this post is to create our own basic machine learning library from scratch with R. We will only use the linear algebra tools available in R. There will be three posts: Linear and logistic regression (this one) PCA and k-nearest neighbors classifiers and regressors Tree-based methods and SVM Linear Regression (Least-Square) The goal of liner regression is to estimate a continuous variable given a matrix of observations . Before dealing with the code, we need to derive the solution of the linear regression. Solution derivation of linear regression Given a matrix of observations and the…

Original Post: Create your Machine Learning library from scratch with R ! (1/3)

# Posts by Antoine Guillot

## Machine Learning Explained: Kmeans

Kmeans is one of the most popular and simple algorithm to discover underlying structures in your data. The goal of kmeans is simple, split your data in k different groups represented by their mean. The mean of each group is assumed to be a good summary of each observation of this cluster. Kmeans Algorithm We assume that we want to split the data into k groups, so we need to find and assign k centers. How to define and find these centers? They are the solution to the equation: where if the observation i is assigned to the center j and 0 otherwise. Basically, this equation means that we are looking for the k centers which will minimize the distance between them and the points of their cluster. This is an optimization problem, but since the function, we want to…

Original Post: Machine Learning Explained: Kmeans

## Explore your McDonalds Meal with Shiny and D3partitionR

Have you ever wondered what was in your MacDonalds menu? Or in your DoubleCheese Burger (well it’s my favorite one)? A wonderful dataset was released a few months ago, it contains all the nutrition facts from McDonald’s items. You can find the dataset here. In addition to this, I released a new version of D3partitionR a few weeks ago and was looking for use cases. Hierarchical charts like Sunburst or Treemap are very useful to split and analyze the composition of categories and items. Hence, I decided to make a small Shiny application to analyze the composition and the nutrition value of a MacDonald’s menu. And here is the app! Application functionalities The application has four main tabs: Menu selection Calories explorer Nutrients explorer Daily Value explorer Menu Selection The menu selection is used to … select the items you want to…

Original Post: Explore your McDonalds Meal with Shiny and D3partitionR

## Major update of D3partitionR: Interactive viz’ of nested data with R and D3.js

D3partitionR is an R package to visualize interactively nested and hierarchical data using D3.js and HTML widget. These last few weeks I’ve been working on a major D3partitionR update which is now available on GitHub. As soon as enough feedbacks are collected, the package will be on uploaded on the CRAN. Until then, you can install it using devtools library(devtools) install_github(“AntoineGuillot2/D3partitionR”) Here is a quick overview of the possibilities using the Titanic data: A major update This update is a major update from the previous version which will break code from 0.3.1 New functionalities Additional data for nodes: Additional data can be added for some given nodes. For instance, if a comment or a link needs to be shown in the tooltip or label of some nodes, they can be added through the add_nodes_data function You can easily add specific…

Original Post: Major update of D3partitionR: Interactive viz’ of nested data with R and D3.js