RcppArmadillo 0.7.960.1.0

The bi-monthly RcppArmadillo release is out with a new version 0.7.960.1.0 which is now on CRAN, and will get to Debian in due course. And it is a big one. Lots of nice upstream changes from Armadillo, and lots of work on our end as the Google Summer of Code project by Binxiang Ni, plus a few smaller enhancements — see below for details. Armadillo is a powerful and expressive C++ template library for linear algebra aiming towards a good balance between speed and ease of use with a syntax deliberately close to a Matlab. RcppArmadillo integrates this library with the R environment and language–and is widely used by (currently) 379 other packages on CRAN—an increase of 49 since the last CRAN release in June! Changes in this release relative to the previous CRAN release are as follows: Changes in…
Original Post: RcppArmadillo 0.7.960.1.0

2017 App Update

As you may have noticed, we have made a few changes to our apps for the 2017 season to bring you a smoother and quicker experience while also adding more advanced and customizable views. Most visibly, we moved the apps to Shiny so we can continue to build on our use of R and add new features and improvements throughout the season.  We expect the apps to better handle high traffic load this season during draft season and peak traffic. In addition to the ability to create and save custom settings, you can also choose the columns you view in our Projections tool.  We have also added more advanced metrics such as weekly VOR and Projected Points Per Dollar (ROI) for those of you in auction leagues.  With a free account, you’ll be able to create and save one custom setting.  If…
Original Post: 2017 App Update

20 years of the R Core Group

The first “official” version of R, version 1.0.0, was released on February 29, 200. But the R Project had already been underway for several years before then. Sharing this tweet, from yesterday, from R Core member Peter Dalgaard: It was twenty years ago today, Ross Ihaka got the band to play…. #rstats pic.twitter.com/msSpPz2kyA — Peter Dalgaard (@pdalgd) August 16, 2017 Twenty years ago, on August 16 1997, the R Core Group was formed. Before that date, the committers to R were the projects’ founders Ross Ihaka and Robert Gentleman, along with Luke Tierney, Heiner Schwarte and Paul Murrell. The email above was the invitation for Kurt Kornik, Peter Dalgaard and Thomas Lumley to join as well. With the sole exception of Schwarte, all of the above remain members of the R Core Group, which has since expanded to 21 members.…
Original Post: 20 years of the R Core Group

Tesseract and Magick: High Quality OCR in R

Last week we released an update of the tesseract package to CRAN. This package provides R bindings to Google’s OCR library Tesseract. install.packages(“tesseract”) The new version ships with the latest libtesseract 3.05.01 on Windows and MacOS. Furthermore it includes enhancements for managing language data and using tesseract together with the magick package. Installing Language Data The new version has several improvements for installing additional language data. On Windows and MacOS you use the tesseract_download() function to install additional languages: tesseract_download(“fra”) Language data are now stored in rappdirs::user_data_dir(‘tesseract’) which makes it persist across updates of the package. To OCR french text: french <- tesseract(“fra”) text <- ocr(“https://jeroen.github.io/images/french_text.png”, engine = french) cat(text) Très Bien! Note that on Linux you should not use tesseract_download but instead install languages using apt-get (e.g. tesseract-ocr-fra) or yum (e.g. tesseract-langpack-fra). Tesseract and Magick The tesseract developers recommend…
Original Post: Tesseract and Magick: High Quality OCR in R

Update on Our ‘revisit’ Package

On May 31, I made a post here about our R package revisit, which is designed to help remedy the reproducibility crisis in science. The intended user audience includes reviewers of research manuscripts submitted for publication, scientists who wish to confirm the results in a published paper, and explore alternate analyses, and members of the original research team itself, while collaborating during the course of the research. The package is documented mainly in the README file, but we now also have a paper on arXiv.org, which explains the reproducibility crisis in detail, and how our package addresses it. Reed Davis and I, the authors of the software, are joined in the paper by Prof. Laurel Beckett of the UC Davis Medical School, and Dr. Paul Thompson of Sanford Research. Related To leave a comment for the author, please follow the…
Original Post: Update on Our ‘revisit’ Package

Visualising Water Consumption using a Geographic Bubble Chart

A geographic bubble chart is a straightforward method to visualise quantitative information with a geospatial relationship. Last week I was in Vietnam helping the Phú Thọ Water Supply Joint Stock Company with their data science. They asked me to create … Continue reading → The post Visualising Water Consumption using a Geographic Bubble Chart appeared first on The Devil is in the Data. Related R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more… If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook…
Original Post: Visualising Water Consumption using a Geographic Bubble Chart

Use the LENGTH statement to pre-set the lengths of character variables in SAS – with a comparison to R

I often create character variables (i.e. variables with strings of text as their values) in SAS, and they sometimes don’t render as expected.  Here is an example involving the built-in data set SASHELP.CLASS. Here is the code: data c1; set sashelp.class; * define a new character variable to classify someone as tall or short; if height > 60 then height_class = ‘Tall’; else height_class = ‘Short’; run; * print the results for the first 5 rows; proc print data = c1 (obs = 5); run; Here is the result: Alfred M 14 69.0 112.5 Tall Alice F 13 56.5 84.0 Shor Barbara F 13 65.3 98.0 Tall Carol F 14 62.8 102.5 Tall Henry M 14 63.5 102.5 Tall What happened?  Why does the word “Short” render as “Shor”? This occurred because SAS sets the length of a new character…
Original Post: Use the LENGTH statement to pre-set the lengths of character variables in SAS – with a comparison to R

How to build an image recognizer in R using just a few images

Microsoft Cognitive Services provides several APIs for image recognition, but if you want to build your own recognizer (or create one that works offline), you can use the new Image Featurizer capabilities of Microsoft R Server.  The process of training an image recognition system requires LOTS of images — millions and millions of them. The process involves feeding those images into a deep neural network, and during that process the network generates “features” from the image. These features might be versions of the image including just the outlines, or maybe the image with only the green parts. You could further boil those features down into a single number, say the length of the outline or the percentage of the image that is green. With enough of these “features”, you could use them in a traditional machine learning model to classify…
Original Post: How to build an image recognizer in R using just a few images

Thank You For The Very Nice Comment

Somebody nice reached out and gave us this wonderful feedback on our new Supervised Learning in R: Regression (paid) video course. Thanks for a wonderful course on DataCamp on XGBoost and Random forest. I was struggling with Xgboost earlier and Vtreat has made my life easy now :). Supervised Learning in R: Regression covers a lot as it treats predicting probabilities as a type of regression. Nina and I are very proud of this course and think it is very much worth your time (for the beginning through advanced R user). vtreat is a statistically sound data cleaning and preparation tool introduced towards the end of the course. R users who try vtreat find it makes training and applying models much easier. vtreat is distributed as a free open-source package available on CRAN. If you are doing predictive modeling in…
Original Post: Thank You For The Very Nice Comment

Data wrangling : Cleansing – Regular expressions (1/3)

Data wrangling, is the process of importing, cleaning and transforming raw data into actionable information for analysis. It is a time-consuming process which is estimated to take about 60-80% of analyst’s time. In this series we will go through this process. It will be a brief series with goal to craft the reader’s skills on the data wrangling task. This is the fourth part of the series and it aims to cover the cleaning of data used. At previous parts we learned how to import, reshape and transform data. The rest of the series will be dedicated to the data cleansing process. On this post we will go through the regular expressions, a sequence of characters that define a search pattern, mainlyfor use in pattern matching with text strings.In particular, we will cover the foundations of regular expression syntax. Before…
Original Post: Data wrangling : Cleansing – Regular expressions (1/3)