Because it's Friday: Olive Garden Bot

Comedy writer Keaton Patti claims this commercial script for a US Italian restaurant chain was generated by a bot: I forced a bot to watch over 1,000 hours of Olive Garden commercials and then asked it to write an Olive Garden commercial of its own. Here is the first page. pic.twitter.com/CKiDQTmLeH — Keaton Patti (@KeatonPatti) June 13, 2018 Of course this wasn’t bot-generated, but “what a bot might write” is fertile ground for comedy: I forced a bot to read over 1,000 tweets claiming to be scripts written by bots online, then asked it to write a script itself. It wrote these two pages, then hung itself. pic.twitter.com/KbFXStJuAb — Christine Love (@christinelove) June 1, 2018 That’s all from us here at the blog for this week. Have a great weekend, and we’ll be back next week.
Original Post: Because it's Friday: Olive Garden Bot

Interpreting machine learning models with the lime package for R

Many types of machine learning classifiers, not least commonly-used techniques like ensemble models and neural networks, are notoriously difficult to interpret. If the model produces a surprising label for any given case, it’s difficult to answer the question, “why that label, and not one of the others?”. One approach to this dilemma is the technique known as LIME (Local Interpretable Model-Agnostic Explanations). The basic idea is that while for highly non-linear models it’s impossible to give a simple explanation of the relationship between any one variable and the predicted classes at a global level, it might be possible to asses which variables are most influential on the classification at a local level, near the neighborhood of a particular data point. An procedure for doing so is described in this 2016 paper by Ribeiro et al, and implemented in the R…
Original Post: Interpreting machine learning models with the lime package for R

Detecting unconscious bias in models, with R

There’s growing awareness that the data we collect, and in particular the variables we include as factors in our predictive models, can lead to unwanted bias in outcomes: from loan applications, to law enforcement, and in many other areas. In some instances, such bias is even directly regulated by laws like the Fair Housing Act in the US. But even if we explicitly remove “obvious” variables like sex, age or ethnicity from predictive models, unconscious bias might still be a factor in our predictions as a result of highly-correlated proxy variables that are included in our model. As a result, we need to be aware of the biases in our model and take steps to address them. For an excellent general overview of the topic, I highly recommend watching the recent presentation by Rachel Thomas, “Analyzing and Preventing Bias in ML”.…
Original Post: Detecting unconscious bias in models, with R

Hotfix for Microsoft R Open 3.5.0 on Linux

On Monday, we learned about a serious issue with the installer for Microsoft R Open on Linux-based systems. (Thanks to Norbert Preining for reporting the problem.) The issue was that the installation and de-installation scripts would modify the system shell, and did not use the standard practices to create and restore symlinks for system applications. The Microsoft R team developed a solution the problem with the help of some Debian experts at Microsoft, and last night issued a hotfix for Microsoft R Open 3.5.0 which is now available for download. With this fix, the MRO installer no longer relinks /bin/sh to /bin/bash, and instead uses dpkg-divert for Debian-based platforms and update-alternatives for RPM-based platforms. We will also request a discussion with the Debian maintainers of R to further review our installation process. Finally, with the next release — MRO 3.5.1, scheduled for…
Original Post: Hotfix for Microsoft R Open 3.5.0 on Linux

Microsoft R Open 3.5.0 now available

Microsoft R Open 3.5.0 is now available for download for Windows, Mac and Linux. This update includes the open-source R 3.5.0 engine, which is a major update with many new capabilities and improvements to R. In particular, it includes a major new framework for handling data in R, with some major behind-the-scenes performance and memory-use benefits (and with further improvements expected in the future). Microsoft R Open 3.5.0 points to a fixed CRAN snapshot taken on June 1 2018. This provides a reproducible experience when installing CRAN packages by default, but you always change the default CRAN repository or the built-in checkpoint package to access snapshots of packages from an earlier or later date. Relatedly, many new packages have been released since the last release of Microsoft R Open, and you can browse a curated list of some interesting ones on the Microsoft…
Original Post: Microsoft R Open 3.5.0 now available

In case you missed it: May 2018 roundup

In case you missed them, here are some articles from April of particular interest to R users. The R Consortium has announced a new round of grants for projects proposed by the R community. A look back at the ROpenSci unconference held in Seattle.  Video of my European R Users Meeting talk, “Speeding up R with Parallel Programming in the Cloud”. Slides from my talk at the Microsoft Build conference, “Open-Source Machine Learning in Azure”. Discussions on Twitter: R packages by stage of data analysis; thinking differently about AI development; and, why is package management harder in Python than R?  Our May 2018 roundup of AI and data science news. Panelist Francesca Lazzeri reviews the Mind Bytes AI conference in Chicago. And some general interest stories (not necessarily related to R): A really bad road in Nepal The definitive answer…
Original Post: In case you missed it: May 2018 roundup

What's new in Azure for Machine Learning and AI

There were a lot of big announcements at last month’s Build conference, and many of them were related to machine learning and artificial intelligence. With my colleague Tim Heuer, we summarized some of the big announcements — and a few you may have missed — in a recent webinar. The slides are embedded below, and include links to recordings of the Build sessions where you can find in-depth details. You can’t see the videos or demos in the slides, unfortunately — my favorite is a demo of using Microsoft Translator, trained by a hearing-impaired user, to accurately transcribe “deaf voice”. But you can find the videos and discussion from Tim and me in the on-demand recording available at the link below. Azure Webinar Series: Top Azure Takeaways from Microsoft Build
Original Post: What's new in Azure for Machine Learning and AI

Because it's Friday: Buildings shake

In 1978, a 59-story skyscraper in New York City was at risk of collapse. An engineering flaw, serendipitously discovered by an architecture PhD candidate studying the Citigroup Center as a thesis project, meant the building was unexpectedly susceptible to winds — and hurricane Ella was bearing down on the eastern seaboard. Meanwhile, 2500 Red Cross volunteers were on standby to execute a 10-block-radius evacuation plan should the building topple (and possibly cause a domino-like chain reaction), while engineers worked to reinforce the structural integrity of the building. And all of this happened in secret. Credit: Joel Werner A recent tweet from a former resident of the building reminded me of this remarkable story. To learn more about the unusual stilt-based design of the building and how the design flaw was discovered, check out the article from 99% Invisible or listen to the accompanying…
Original Post: Because it's Friday: Buildings shake

StatCheck the Game

If you don’t get enough joy from publishing scientific papers in your day job, or simply want to experience what it’s like to be in a publish-or-perish environment where the P-value is the only important part of a paper, you might want to try StatCheck: the board game where the object is to publish two papers before any of your opponents. As the game progresses, players combine “Test”, “Statistic” and “P-value” cards to form the statistical test featured in the paper (and of course, significant tests are worth more than non-significant ones). Opponents may then have the opportunity to play a “StatCheck” card to challenge the validity of the test, which can then be verified using a companion R package or online Shiny application. Other modifier cards include “Bayes Factor” (which can be used to boost the value of your…
Original Post: StatCheck the Game