Using Microsoft’s Azure Face API to analyze videos (in R)

Microsoft had a cool API called “Emotion API”. With it you could submit a URL of a video, and the API would return a json file with the faces and emotions expressed in the video (per frame). However, that API never matured from preview mode, and in fact was deprecated on October 30th 2017 (it no longer works). I stumbled upon the Emotion API when I read a post by Kan Nishida from exploritory.io, which analyzed the facial expressions of Trump and Clinton during a presedential debate last year.  However, that tutorial (like many others) no longer works, since it used the old Emotion API. Lately I needed a tool for an analysis I did at work on the facial expressions in TV commercials. I had a list of videos showing faces, and I needed to code these faces into…
Original Post: Using Microsoft’s Azure Face API to analyze videos (in R)

A Workaround For When Anti-DDoS Also Means Anti-Data

More sites are turning to services like Cloudflare due to just how stupid-easy it is to DDoS a site. Sometimes the DDoS is intentional (malicious). Sometimes it’s because your bot didn’t play nice (stop that, btw). Sadly, at some point, most of us with “vital” sites are going to have to pay protection money to one of these services unless law enforcement or ISPs do a better job stopping DDoS (killing the plethora of pwnd IoT devices that make up one of the largest for-rent DDoS services out there would be a good start). Soapbox aside, sites like this one — https://www.bitmarket.pl/docs.php?file=api_public.html — (which was giving an SO poster trouble) have DDoS protection enabled but they also want you to be able to automate the downloads (this one even calls it an “API”). However, try to grab one of the…
Original Post: A Workaround For When Anti-DDoS Also Means Anti-Data

Naming things is hard

Prefixing R function names – ‘There are only two hard things in Computer Science: cache invalidation and naming things.’ The above quip by Phil Karlton is fairly well known and often quoted, sometimes with amusing extensions: There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors. — Jeff Atwood (@codinghorror) August 31, 2014 There are only 2 hard things in computer science:0. Cache invalidation1. Naming things7. Asynchronous callbacks2. Off-by-one errors — Paweł Zajączkowski (@gvaireth) September 18, 2017 These are funny, but they do also convey some truth: in the midst of all the technicalities and abstractions we can find ourselves caught up with in the world of programming, it’s surprising how often seemingly ‘simple’ things like naming things trip us up. I was recently reminded of this when a difference of opinions about function names…
Original Post: Naming things is hard

Diffusion/Wiener Model Analysis with brms – Part I: Introduction and Estimation

Stan is probably the most interesting development in computational statistics in the last few years, at least for me. The version of Hamiltonian Monte-Carlo (HMC) implemented in Stan (NUTS, ) is extremely efficient and the range of probability distributions implemented in the Stan language allows to fit an extremely wide range of models. Stan has considerably changed which models I think can be realistically estimated both in terms of model complexity and data size. It is not an overstatement to say that Stan (and particularly rstan) has considerable changed the way I analyze data. One of the R packages that allows to implement Stan models in a very convenient manner and which has created a lot of buzz recently is brms . It allows to specify a wide range of models using the R formula interface. Based on the formula…
Original Post: Diffusion/Wiener Model Analysis with brms – Part I: Introduction and Estimation

Scoring Multiple Variables, Too Many Variables and Too Few Observations: Data Reduction

This post will grow to cover questions about data reduction methods, also known as unsupervised learning methods. These are intended primarily for two purposes:collapsing correlated variables into an overall score so that one does not have to disentangle correlated effects, which is a difficult statistical task reducing the effective number of variables to use in a regression or other predictive model, so that fewer parameters need to be estimated The latter example is the “too many variables too few subjects” problem.  Data reduction methods are covered in Chapter 4 of my book Regression Modeling Strategies, and in some of the book’s case studies.Sacha Varin writes 2017-11-19: I want to add/sum some variables having different units. I decide to standardize (Z-scores) the values and then, once transformed in Z-scores, I can sum them.  The problem is that my variables distributions are…
Original Post: Scoring Multiple Variables, Too Many Variables and Too Few Observations: Data Reduction

Using an R ‘template’ package to enhance reproducible research or the ‘R package syndrome’

Motivation Have you ever had the feeling that the creation of your data analysis report(s) resulted in looking up, copy-pasting and reuse of code from previous analyses?This approach is time consuming and prone to errors. If you frequently analyze similar data(-types), e.g. from a standardized analysis workflow or different experiments on the same platform, the automation of your report creation via an R ‘template’ package might be a very useful and time-saving step. It also allows to focus on the important part of the analysis (i.e. the experiment or analysis-specific part).Imagine that you need to analyze tens or hundreds of runs of data in the same format, making use of an R ‘template’ package can save you hours, days or even weeks. On the go, reports can be adjusted, errors corrected and extensions added without much effort. A bit of…
Original Post: Using an R ‘template’ package to enhance reproducible research or the ‘R package syndrome’

intsvy: PISA for research and PISA for teaching

The Programme for International Student Assessment (PISA) is a worldwide study of 15-year-old school pupils’ scholastic performance in mathematics, science, and reading. Every three years more than 500 000 pupils from 60+ countries are surveyed along with their parents and school representatives. The study yields in more than 1000 variables concerning performance, attitude and context of the pupils that can be cross-analyzed. A lot of data. OECD prepared manuals and tools for SAS and SPSS that show how to use and analyze this data. What about R? Just a few days ago Journal of Statistical Software published an article ,,intsvy: An R Package for Analyzing International Large-Scale Assessment Data”. It describes the intsvy package and gives instructions on how to download, analyze and visualize data from various international assessments with R. The package was developed by Daniel Caro and me.…
Original Post: intsvy: PISA for research and PISA for teaching

Make memorable plots with memery. v0.3.0 now on CRAN.

Make memorable plots with memery. memery is an R package that generates internet memes including superimposed inset graphs and other atypical features, combining the visual impact of an attention-grabbing meme with graphic results of data analysis. Version 0.3.0 of memery is now on CRAN. The latest development version and a package vignette are available on GitHub. [original post] Below is an example interleaving a semi-transparent ggplot2 graph between a meme image backdrop and overlying meme text labels. The meme function will produce basic memes without needing to specify a number of additional arguments, but this is not the main purpose of the package. Adding a plot is then as simple as passing the plot to inset. memery offers sensible defaults as well as a variety of basic templates for controlling how the meme and graph are spliced together. The example…
Original Post: Make memorable plots with memery. v0.3.0 now on CRAN.

How Happy is Your Country? — Happy Planet Index Visualized

The Happy Planet Index (HPI) is an index of human well-being and environmental impact that was introduced by NEF, a UK-based economic think tank promoting social, economic and environmental justice. It ranks 140 countries according to “what matters most — sustainable wellbeing for all”. This is how HPI is calculated: It’s tells us “how well nations are doing at achieving long, happy, sustainable lives”. The index is weighted to give progressively higher scores to nations with lower ecological footprints. I downloaded the 2016 dataset from the HPI website. Inspired by “Web Scraping and Applied Clustering Global Happiness and Social Progress Index” written by Dr. Mesfin Gebeyaw, I am interested to find correlations among happiness, wealth, life expectancy, footprint and so on, and then put these 140 countries into different clusters, according to the above measures. I wonder whether the findings will surprise…
Original Post: How Happy is Your Country? — Happy Planet Index Visualized

Creating Reporting Template with Glue in R

In case you missed them, here are some articles from October of particular interest to R users. A recent survey of competitors on the Kaggle platform reveals that Python… Read more » MilanoR is a free event, open to all R users and enthusiasts or those who wish to learn more about R. The meeting consists of two talks (this time… Read more » I am working on a super-secret project for which I am harvesting a highly confidential source of data: twitter 🙂 The idea is to gather a small amount of… Read more » First off, here are the previous posts in my Bayesian sampling series: Bayesian Simple Linear Regression with Gibbs Sampling in R Blocked Gibbs Sampling in R for Bayesian Multiple… Read more » The fourth update of the 3.6.x release of simmer, the Discrete-Event Simulator for…
Original Post: Creating Reporting Template with Glue in R