Statistical Machine Learning with Microsoft ML

MicrosoftML is an R package for machine learning that works in tandem with the RevoScaleR package. (In order to use the MicrosoftML and RevoScaleR libraries, you need an installation of Microsoft Machine Learning Server or Microsoft R Client.) A great way to see what MicrosoftML can do is to take a look at the on-line book Machine Learning with the MicrosoftML Package Package by Ali Zaidi. The book includes worked examples on several topics: Exploratory data analysis and feature engineering Regression models Classification models for computer vision Convolutional neural networks for computer vision Natural language processing Transfer learning with pre-trained DNNs The book is part of Ali’s in-person workshop “Statistical Machine Learning with MicrosoftML”, and you can find further materials including data and scripts at this Github repository. If you’d like to experience the workshop in person, Ali will be presenting it…
Original Post: Statistical Machine Learning with Microsoft ML

An Updated History of R

Here’s a refresher on the history of the R project: 1992: R development begins as a research project in Auckland, NZ by Robert Gentleman and Ross Ihaka  1993: First binary versions of R published at Statlib  1995: R first distributed as open-source software, under GPL2 license 1997: R core group formed 1997: CRAN founded (by Kurt Jornik and Fritz Leisch) 1999: The R website, r-project.org, founded 2000: R 1.0.0 released (February 29)  2001: R News founded (later to become the R Journal) 2003: R Foundation founded 2004: First UseR! conference (in Vienna) 2004: R 2.0.0 released 2009: First edition of the R Journal 2013: R 3.0.0 released 2015: R Consortium founded, with R Foundation participation 2016: New R logo adopted I’ve added some additional dates gleaned from the r-announce mailing list archives and a 1998 paper on the history of R written by co-founder…
Original Post: An Updated History of R

The R manuals in bookdown format

While there are hundreds of excellent books and websites devoted to R, the canonical source of truth regarding the R system remains the R manuals. You can find the manuals at your local CRAN mirror and on your laptop as part of the R distribution (try Help > Manuals in RGui, or Help > R Help in RStudio to find them). Unlike books, the R manuals are updated by the R Core Team with every new release, so if you’re not sure how the base R system is supposed to work this is the place to check. Note that the manuals don’t cover any of the R packages (other than the base and recommended packages), so if you want to learn about the wider R ecosystem, well, that’s what all those books and websites are for. (MRAN is one place to…
Original Post: The R manuals in bookdown format

Is it faster to take a bike or taxi in NYC?

Taxis are plentiful and convenient in New York City, but the city is also served by a wide network of commuter bicycles (Citi Bikes). If you need to get from, say, the West Village to the Garment District, are you better off time-wise hailing a cab, or heading over to the nearest Citi Bike station? Data scientist Todd W. Schnieder crunched the number on travel times for both taxis and Citi Bikes to figure out which was better. Neither is universally the best, but for some trips taxis are most often the fastest, and for others bikes are faster. An interactive map (created with R) allows you to select the time of day and an origin neighborhood, and the map will then tell you the fraction of the time (according to the historical data) that a Citi Bike will outpace…
Original Post: Is it faster to take a bike or taxi in NYC?

My interview with ROpenSci

The ROpenSci team has started publishing a new series of interviews with the goal of “demystifying the creative and development processes of R community members”. I had the great pleasure of being interviewed by Kelly O’Briant earlier this year, and the interview was published on Friday. Thanks for being a great interviewer, Kelly! I’m looking forward to hearing from other R community members as the the rest of the series is published. ROpenSci blog: .rprofile: David Smith
Original Post: My interview with ROpenSci

An AI pitches startup ideas

Take a look at this list of 13 hot startups, from a list compiled by Alex Bresler. Perhaps one of them is the next Juicero? FAR ATHERA: A CLINICAL AI PLATFORM THAT CAN BE ACCESSED ON DEMAND. ZAPSY: TRY-AT-HOME SERVICE FOR CONSUMER ELECTRONICS. MADESS: ON-DEMAND ACCESS TO CLEAN WATER. DEERG: AI RADIOLOGIST IN A HOME SPER: THE FASTEST, EASIEST WAY TO BUY A HOME WITHOUT THE USER HAVING TO WEAR ANYTHING. PLILUO: VENMO FOR B2B SAAS. LANTR: WE HELP DOCTORS COLLECT 2X MORE ON CANDLES AND KEROSENE. ABS: WE PROVIDE FULL-SERVICE SUPPORT FOR SUBLIME, VIM, AND EMACS. INSTABLE DUGIT: GITHUB FOR YOUR LOVED ONES. CREDITAY: BY REPLACING MECHANICAL PARTS WITH AIR, WE ELIMINATE INSTALLATION COMPLEXITY AND MAINTENANCE HEADACHES, LEADING TO SIGNIFICANT EFFICIENCY GAINS IN PRODUCTION COSTS AND HARVESTING TIME. CREDITANO: WE BUILD SOFTWARE TO ENABLE HIGH FUNCTIONALITY BIONICS FOR TREATING HEALTH…
Original Post: An AI pitches startup ideas

A cRyptic crossword with an R twist

Last week’s R-themed crossword from R-Ladies DC was popular, so here’s another R-related crossword, this time by Barry Rowlingson and published on page 39 of the June 2003 issue of R-news (now known as the R Journal). Unlike the last crossword, this one follows the conventions of a British cryptic crossword: the grid is symmetrical, and eschews 4×4 blocks of white or black squares. Most importantly, the clues are in the cryptic style: rather than being a direct definition, cryptic clues pair wordplay (homonyms, anagrams, etc) with a hidden definition. (Wikipedia has a good introduction to the types of clues you’re likely to find.) Cryptic crosswords can be frustrating for the uninitiated, but are fun and rewarding once you get to into it.  In fact, if you’re unfamiliar with cryptic crosswords, this one is a great place to start. Not only…
Original Post: A cRyptic crossword with an R twist

Data Science Bootcamp in Zurich, Switzerland, January 15 – April 6, 2018

[unable to retrieve full-text content]Come to the land of chocolate and Data Science where the local tech scene is booming and the jobs are a plenty. Learn the most important concepts from top instructors by doing and through projects. Use code KDNUGGETS to save.
Original Post: Data Science Bootcamp in Zurich, Switzerland, January 15 – April 6, 2018

Tutorial: Azure Data Lake analytics with R

The Azure Data Lake store is an Apache Hadoop file system compatible with HDFS, hosted and managed in the Azure Cloud. You can store and access the data within directly via the API, by connecting the filesystem directly to Azure HDInsight services, or via HDFS-compatible open-source applications. And for data science applications, you can also access the data directly from R, as this tutorial explains.  To interface with Azure Data Lake, you’ll use U-SQL, a SQL-like language extensible using C#. The R Extensions for U-SQL allow you to reference an R script from a U-SQL statement, and pass data from Data Lake into the R Script. There’s a 500Mb limit for the data passed to R, but the basic idea is that you perform the main data munging tasks in U-SQL, and then pass the prepared data to R for analysis. With this…
Original Post: Tutorial: Azure Data Lake analytics with R