Highlights from the Connect(); conference

Connect();, the annual Microsoft developer conference, is wrapping up now in New York. The conference was the venue for a number of major announcements and talks. Here are some highlights related to data science, machine learning, and artificial intelligence: Lastly, I wanted to share this video presented at the conference from Stack Overflow. Keep an eye out for R community luminary David Robinson programming in R! You can find more from the Connect conference, including on-demand replays of the talks and keynotes, at the link below. Microsoft: Connect(); November 15-17, 2017
Original Post: Highlights from the Connect(); conference

The City of Chicago uses R to issue beach safety alerts

Among the many interesting talks I saw a the Domino Data Science Pop-Up in Chicago earlier this week was the presentation by Gene Lynes and Nick Lucius from the City of Chicago. The City of Chicago Tech Plan encourages smart communities and open government, and as part of that initiative the city has undertaken dozens of open-source, open-data projects in areas such as food safety inspections, preventing the spread of West Nile virus, and keeping sidewalks clear of snow.  This talk was on the Clear Water initiative, a project to monitor the water quality of Chicago’s many public beaches on Lake Michigan, and to issue safety alerts (or in serious cases, beach closures) when E Coli levels in the water get too high. The problem is that E Coli levels can change rapidly: water levels can be normal for weeks,…
Original Post: The City of Chicago uses R to issue beach safety alerts

Good practices for sharing data in spreadsheets

Spreadsheets are powerful tools with many applications: collecting data, sharing data, visualizing data, analyzing data, reporting on data. Sometimes, the temptation to do all of these things in a single workbook is irresistible. But if your goal is to provide data to others for analysis, then features that are useful for, say, reporting are downright detrimental to the task of data analysis. To make things easier on your downstream analysts, and to reduce the risk of inadvertent errors that can be caused by spreadsheets, Karl Broman and Kara Woo have published a paper Data organization in spreadsheets chock-full of useful advice. To reiterate from their introduction: Spreadsheets are often used as a multipurpose tool for data entry, storage, analysis, and visualization. Most spreadsheet programs allow users to perform all of these tasks, however we believe that spreadsheets are best suited…
Original Post: Good practices for sharing data in spreadsheets

Updated curl package provides additional security for R on Windows

There are many R packages that connect to the internet, whether it’s to import data (readr), install packages from Github (devtools), connect with cloud services (AzureML), or many other web-connected tasks. There’s one R package in particular that provides the underlying connection between R and the Web: curl, by Jeroen Ooms, who is also the new maintainer for R for Windows. (The name comes from curl, a command-line utility and interface library for connecting to web-based services). The curl package provides replacements for the standard url and download.file functions in R with support for encryption, and the package was recently updated to enhance its security, particularly on Windows. To implement secure communications, the curl package needs to connect with a library that handles the SSL (secure socket layer) encryption. On Linux and Macs, curl has always used the OpenSSL library, which is…
Original Post: Updated curl package provides additional security for R on Windows

Developing AI applications on Azure: learning plans at three levels

If you’re looking to expand your skills as an AI developer, or just getting started, these learning plans for AI Developers on Azure provide a wealth of information to get you up to speed. The beginner, intermediate and advanced tracks all provide step-by-step guides to setting up the tools and data in Azure, along with worked examples in iPython Notebooks. The Beginner AI Developer Learning Plan provides an introduction to artificial intelligence and cognitive systems. It begins with an overview of Cognitive Services, and then works through several examples of using those APIs in applications: handwriting comprehension, speech comprehension, and face detection. The Intermediate AI Developer Learning Plan walks through the process of creating an AI application that understands voice input. It begins with an overview of LUIS, the Language Understanding Intelligent Service, and walks through the process of defining intents, entities,…
Original Post: Developing AI applications on Azure: learning plans at three levels

Recap: EARL Boston 2017

By Emmanuel Awa, Francesca Lazzeri and Jaya Mathew, data scientists at Microsoft A few of us got to attend EARL conference in Boston last week which brought together a group of talented users of R from academia and industry. The conference highlighted various Enterprise Applications of R. Despite being a small conference, the quality of the talks were great and showcased various innovative ways in using some of the newer packages available for use in the R language. Some of the attendees were veteran R users while some were new comers to the R language, so there was a mix in the level of proficiency in using the R language.   R currently has a vibrant community of users and there are over 11,000 open source packages. The conference also encouraged women to join their local chapter for R Ladies…
Original Post: Recap: EARL Boston 2017

Calculating the house edge of a slot machine, with R

Modern slot machines (fruit machine, pokies, or whatever those electronic gambling devices are called in your part of the world) are designed to be addictive. They’re also usually quite complicated, with a bunch of features that affect the payout of a spin: multiple symbols with different pay scales, wildcards, scatter symbols, free spins, jackpots … the list goes on. Many machines also let you play multiple combinations at the same time (20 lines, or 80, or even more with just one spin). All of this complexity is designed to make it hard for you, the player, to judge the real odds of success. But rest assured: in the long run, you always lose.  All slot machines are designed to have a “house edge” — the percentage of player bets retained by the machine in the long run — greater than…
Original Post: Calculating the house edge of a slot machine, with R