An new version 0.0.9 of RcppAnnoy, our Rcpp-based R integration of the nifty Annoy library by Erik, is now on CRAN. Annoy is a small and lightweight C++ template header library for very fast approximate nearest neighbours. This release corrects an issue for Windows users discovered by GitHub user ‘khoran’ who later also suggested the fix of binary mode. It upgrades to Annoy release 1.9.1 and brings its new Manhattan distance to RcppAnnoy. A number of unit tests were added as well, and we updated some packaging internals such as symbol registration. And I presume I had a good streak emailing with Uwe’s robots as the package made it onto CRAN rather smoothly within ten minutes of submission: Changes in this version are summarized here: Changes in version 0.0.9 (2017-08-31) Synchronized with Annoy upstream version 1.9.1 Minor updates in…

Original Post: RcppAnnoy 0.0.9

# Posts for August 2017

## OpenML Workshop 2017

What is OpenML? The field of Machine Learning has grown tremendously over the last years, and is a key component of data-driven science. Data analysis algorithms are being invented and used every day, but their results and experiments are published almost exclusively in journals or separated repositories. However, data by itself has no value. It’s the ever-changing ecosystem surrounding data that gives it meaning. OpenML is a networked science platform that aims to connect and organize all this knowledge online, linking data, algorithms, results and people into a coherent whole so that scientists and practitioners can easy build on prior work and collaborate in real time online. OpenML has an online interface on openml.org, and is integrated in the most popular machine learning tools and statistical environments such as R, Python, WEKA, MOA and RapidMiner. This allows researchers and students…

Original Post: OpenML Workshop 2017

## Mapping to a ‘t'(map)

tmap <- Easy & Interactive – More maps of the Highlands? Yep, same as last time, but no need to install dev versions of anything, we can get awesome maps courtesy of the tmap package. Get the shapefile from the last post library(tmap) library(tmaptools) library(viridis) scot <- read_shape(“SG_SIMD_2016.shp”, as.sf = TRUE) highland <- (scot[scot$LAName==”Highland”, ]) #replicate plot from previous blog post: quint <- tm_shape(highland) + tm_fill(col = “Quintile”, palette = viridis(n=5, direction = -1,option = “C”), fill.title = “Quintile”, title = “SIMD 2016 – Highland Council Area by Quintile”) quint # plot ttm() #switch between static and interactive – this will use interactive quint # or use last_map() # in R Studio you will find leaflet map in your Viewer tab ttm() # return to plotting The results: One really nice thing is that because the polygons don’t have outlines,…

Original Post: Mapping to a ‘t'(map)

## Transformer: A Novel Neural Network Architecture for Language Understanding

## The Proof-Calculation Ping Pong

Abstract The proof-calculation ping-pong is the process of iteratively improving a statistical analysis by comparing results from two independent analysis approaches until agreement. We use the daff package to simplify the comparison of the two results. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. The markdown+Rknitr source code of this blog is available under a GNU General Public License (GPL v3) license from . Introduction If you are a statistician working in climate science, data driven journalism, official statistics, public health, economics or any related field working with real data, chances are that you have to perform analyses, where you know there is zero tolerance for errors. The easiest way to ensure the correctness of such an analysis is to check your results over and over again (the iterated 2-eye principle). A better approach is to…

Original Post: The Proof-Calculation Ping Pong

## Transformer: A Novel Neural Network Architecture for Language Understanding

## What we learned labeling 1 million images

[unable to retrieve full-text content]In this guide you’ll learn how to scope a computer vision project, what kind of source data you need to make it successful, what kind of tools fit your project best, and a whole lot more.

Original Post: What we learned labeling 1 million images

## Multiplicative Congruential Generators in R

Multiplicative congruential generators, also known as Lehmer random number generators, is a type of linear congruential generator for generating pseudorandom numbers in . The multiplicative congruential generator, often abbreviated as MLCG or MCG, is defined as a recurrence relation similar to the LCG with . Unlike the LCG, the parameters and for multiplicative congruential generators are more restricted and the initial seed must be relatively prime to the modulus (the greatest common divisor between and is ). The current parameters in common use are . However, in a correspondence from the Communications of the ACM, Park, Miller and Stockmeyer changed the value of the parameter , stating: The minimal standard Lehmer generator we advocated had a modulus of m = 2^31 – 1 and a multiplier of a = 16807. Relative to this particular choice of multiplier, we wrote “……

Original Post: Multiplicative Congruential Generators in R

## Top 10 Machine Learning Use Cases: Part 1

[unable to retrieve full-text content]This post is the first in a series whose aim is to shake up our intuitions about what machine learning is making possible in specific sectors — to look beyond the set of use cases that always come to mind.

Original Post: Top 10 Machine Learning Use Cases: Part 1

## Probability functions intermediate

In this set of exercises, we are going to explore some of the probability functions in R by using practical applications. Basic probability knowledge is required. In case you are not familiarized with the function apply, check the R documentation. Note: We are going to use random numbers functions and random processes functions in R such as runif. A problem with these functions is that every time you run them, you will obtain a different value. To make your results reproducible you can specify the value of the seed using set.seed(‘any number’) before calling a random function. (If you are not familiar with seeds, think of them as the tracking number of your random number process.) For this set of exercises, we will use set.seed(1).Don’t forget to specify it before every exercise that includes random numbers. Answers to the exercises…

Original Post: Probability functions intermediate