Dear data scientists, how to ease your job

You have got the best job of the 21st century. Like a treasure hunter, looking for a data treasure chest while sailing through data lakes. In many companies you are a digital maker – armed with skills that turn you to a modern-day polymath and a toolset which is so holistic and complex at once, even astronauts feel dizzy.  However, there is still something you carry every day to work: the burden of high expectations and demands of others – whether it is a customer, your supervisor or colleagues. Being a data scientist is a dream job, but also very stressful as it requires creative approaches and new solutions every day.   Would it not be great if there is something that makes your daily work easier?  Many requirements and your need for a solution R, Python and Julia – does…
Original Post: Dear data scientists, how to ease your job

Copy/paste t-tests Directly to Manuscripts

One of the most time-consuming part of data analysis in psychology is the copy-pasting of specific values of some R output to a manuscript or a report. This task is frustrating, prone to errors, and increases the variability of statistical reporting. This is an important issue, as standardizing practices of what and how to report might be a key to overcome the reproducibility crisis of psychology. The psycho package was designed specifically to do this job. At first, for complex Bayesian mixed models, but the package is now compatible with basic methods, such as t-tests. # Load packages library(tidyverse) # devtools::install_github(“neuropsychology/psycho.R”) # Install the latest psycho version library(psycho) df <- psycho::affective # Load the data results <- t.test(df$Age ~ df$Sex) # Perform a simple t-test You simply run the analyze() function on the t-test object. psycho::analyze(results) The Welch Two Sample…
Original Post: Copy/paste t-tests Directly to Manuscripts

Chat with the rOpenSci team at upcoming meetings

You can find members of the rOpenSci team at various meetings and workshops around the world. Come say ‘hi’, learn about how our software packages can enable your research, or about our process for open peer software review and onboarding, how you can get connected with the community or tell us how we can help you do open and reproducible research. Where’s rOpenSci? When Who Where What June 23, 2018 Maëlle Salmon Cardiff, UK satRday Cardiff June 27-28, 2018 Scott Chamberlain Portland, OR Bioinformatics Open Source Conference 2018 (BOSC) July 4-6, 2018 Maëlle Salmon Rennes, FR French R conference July 10-13, 2018 Jenny Bryan Brisbane, AU UseR! July 28-Aug 2, 2018 Jenny Bryan Vancouver, CA Joint Statistical Meetings (JSM) Aug 6-10, 2018 Carl Boettiger, Dan Sholler New Orleans, LA Ecological Society of America (ESA) Aug 15-16, 2018 Stefanie Butland Cambridge,…
Original Post: Chat with the rOpenSci team at upcoming meetings

RStudio Connect v1.6.4

RStudio Connect version 1.6.4 is now available! There are a few breaking changes and a handful of new features that are highlighted below.We encourage you to upgrade as soon as possible! Breaking Please take note of important breaking changes before upgrading. Pandoc 2 RStudio Connect includes Pandoc 1 and will now also include Pandoc 2. Admins donot need to install either. If you have deployed content with rmarkdown version 1.9 or higher, then thatcontent will now use Pandoc 2 at runtime. This brings in several bug fixes andenables some new functionality, but does introduce some backwardsincompatibilities. To protect older versions of rmarkdown, Pandoc 1 will stillbe used for content deployed with any rmarkdown version prior to 1.9. Contentnot using the rmarkdown package will have Pandoc 2 available. Pandoc is dynamically made available to content when it is executed, so contentusing…
Original Post: RStudio Connect v1.6.4

R vs Python: Image Classification with Keras

Even though the libraries for R from Python, or Python from R code execution existed since years and despite of a recent announcement of Ursa Labs foundation by Wes McKinney who is aiming to join forces with RStudio foundation, Hadley Wickham in particularly, (find more here) to improve data scientists workflow and unify libraries to be used not only in Python, but in any programming language used by data scientists, some data professionals are still very strict on the language to be used for ANN models limiting their dev. environment exclusively to Python. As a continuation of my R vs. Python comparison, I decided to test performance of both languages in terms of time required to train a convolutional neural network based model for image recognition. As the starting point, I took the blog post by Dr. Shirin Glander on…
Original Post: R vs Python: Image Classification with Keras

Statistics Sunday: Accessing the YouTube API with tuber

I haven’t had a lot of time to play with this but yesterday, I discovered the tuber R package, which allows you to interact with the YouTube API. To use the tuber package, not only do you need to install it in R – you’ll need a Google account and will have to authorize 4 APIs through Developer Console: all 3 YouTube APIs (though the Data API will be doing the heavy lifting) and the Freebase API. Before you authorize the first API, Google will have you create a project to tie the APIs to. Then, you’ll find the APIs in the API library to add to this project. Click on each API and on the next screen, select Enable. You’ll need to create credentials for each of the YouTube APIs. When asked to identify the type of app that…
Original Post: Statistics Sunday: Accessing the YouTube API with tuber

Effectively scaling Shiny in enterprise

James Blair, RStudio Scalability is a hot word these days, and for good reason. As data continues to grow in volume and importance, the ability to reliably access and reason about that data increases in importance. Enterprises expect data analysis and reporting solutions that are robust and allow several hundred, even thousands, of concurrent users while offering up-to-date security options. Shiny is a highly flexible and widely used framework for creating web applications using R. It enables data scientists and analysts to create dynamic content that provides straightforward access to their work for those with no working knowledge of R. While Shiny has been around for quite some time, recent introductions to the Shiny ecosystem make Shiny simpler and safer to deploy in an enterprise environment where security and scalability are paramount. These new tools in connection with RStudio Connect…
Original Post: Effectively scaling Shiny in enterprise

Not only LIME

I’ve heard about a number of consulting companies, that decided to use simple linear model instead of a black box model with higher performance, because ,,client wants to understand factors that drive the prediction’’.And usually the discussion goes as following: ,,We have tried LIME for our black-box model, it is great, but it is not working in our case’’, ,,Have you tried other explainers?’’, ,,What other explainers’’? So here you have a map of different visual explanations for black-box models. Choose one in (on average) less than three simple steps. These are available in the DALEX package. Feel free to propose other visual explainers that should be added to this map (and the package). Related R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation),…
Original Post: Not only LIME

bounceR 0.1.2: Automated Feature Selection

New Features As promised, we kept on working on our bounceR package. For once, we changed the interface: users now do not have to choose a number of tuning parameters, that – thanks to my somewhat cryptic documentation – sound more complicated than they are. Inspired by H2o.ai feature to let the user set the time he or she wants to wait, instead of a number of cryptic tuning parameters, we added a similar function. Further, we changed the source code quite a bit. Henrik Bengtsson gave a very inspiring talk on parallization using the genius future package at this year’s eRum conference. A couple days later, Davis Vaughan released furrr. An incredibly smart – kudos – wrapper on-top of the no-less genius purrr package. Davis’ package combines purrr’s maping functions with future’s parallization madness. As you can tell, I…
Original Post: bounceR 0.1.2: Automated Feature Selection

Sketchnotes from TWiML&AI: Practical Deep Learning with Rachel Thomas

These are my sketchnotes for Sam Charrington’s podcast This Week in Machine Learning and AI about Practical Deep Learning with Rachel Thomas: Sketchnotes from TWiMLAI talk: Practical Deep Learning with Rachel Thomas You can listen to the podcast here. In this episode, i’m joined by Rachel Thomas, founder and researcher at Fast AI. If you’re not familiar with Fast AI, the company offers a series of courses including Practical Deep Learning for Coders, Cutting Edge Deep Learning for Coders and Rachel’s Computational Linear Algebra course. The courses are designed to make deep learning more accessible to those without the extensive math backgrounds some other courses assume. Rachel and I cover a lot of ground in this conversation, starting with the philosophy and goals behind the Fast AI courses. We also cover Fast AI’s recent decision to switch to their courses…
Original Post: Sketchnotes from TWiML&AI: Practical Deep Learning with Rachel Thomas