Interactive R visuals in Power BI

Power BI has long had the capability to include custom R charts in dashboards and reports. But in sharp contrast to standard Power BI visuals, these R charts were static. While R charts would update when the report data was refreshed or filtered, it wasn’t possible to interact with an R chart on the screen (to display tool-tips, for example). But in the latest update to Power BI, you can create create R custom visuals that embed interactive R charts, like this: The above chart was created with the plotly package, but you can also use htmlwidgets or any other R package that creates interactive graphics. The only restriction is that the output must be HTML, which can then be embedded into the Power BI dashboard or report. You can also publish reports including these interactive charts to the…
Original Post: Interactive R visuals in Power BI

Two years as a Data Scientist at Stack Overflow

Last Friday marked my two year anniversary working as a data scientist at Stack Overflow. At the end of my first year I wrote a blog post about my experience, both to share some of what I’d learned and as a form of self-reflection.After another year, I’d like to revisit the topic. While my first post focused mostly on the transition from my PhD to an industry position, here I’ll be sharing what has changed for me in my job in the last year, and what I hope the next year will bring. Hiring a Second Data Scientist In last year’s blog post, I noted how difficult it could be to be the only data scientist on a team: Most of my current statistical education has to be self-driven, and I need to be very cautious about my work:…
Original Post: Two years as a Data Scientist at Stack Overflow

All the fake data that’s fit to print

charlatan makes fake data.Excited to annonunce a new package called charlatan. While perusingpackages from other programming languages, I saw a neat Python librarycalled faker. charlatan is inspired from and ports many things from Python’shttps://github.com/joke2k/faker library. In turn, faker was inspired fromPHP’s faker,Perl’s Faker, andRuby’s faker. It appears that the PHPlibrary was the original – nice work PHP. Use cases What could you do with this package? Here’s some use cases: Students in a classroom setting learning any task that needs a dataset. People doing simulations/modeling that need some fake data Generate fake dataset of users for a database before actual users exist Complete missing spots in a dataset Generate fake data to replace sensitive real data with before public release Create a random set of colors for visualization Generate random coordinates for a map Get a set of randomly…
Original Post: All the fake data that’s fit to print

EARL London agenda – top picks

Nic Crane, Data ScientistThe agenda for EARL London has just been released, and with three streams running in parallel, I always like to pick which talks I’ll be going to in advance, to make sure I get the most out of the conference. Here are my top picks so far: Derek Norton, Microsoft: “SAS to R: How to, and is it enough?” – Day 1, 11am, Stream 3.I’m a strong supporter of the promotion of free open-source software, so SAS to R is a topic that has instant appeal to me. Cards on the table – I’m co-presenting on a similar theme immediately afterwards, and so I’m looking forward to see the common themes that both Mango and Microsoft have identified on helping clients make the move to R. Dr Troy Hernandez, IBM: “Using Twitter sentiment to predict stock…
Original Post: EARL London agenda – top picks

Visualising Twitter coverage of recent bioinformatics conferences

Back in February, I wrote some R code to analyse tweets covering the 2017 Lorne Genome conference. It worked pretty well. So I reused the code for two recent bioinformatics meetings held in Sydney: the Sydney Bioinformatics Research Symposium and the VIZBI 2017 meeting. So without further ado, here are the reports in markdown format, which display quite nicely when pushed to Github: and you can dig around in the repository for the Rmarkdown, HTML and image files, if you like. Filed under: bioinformatics, meetings, R, statistics Tagged: sbrs2017, twitter, vizbi2017 Related If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook…
Original Post: Visualising Twitter coverage of recent bioinformatics conferences

Using Partial Least Squares to Conduct Relative Importance analysis in R

Partial Least Squares (PLS) is a popular method for relative importance analysis in fields where the data typically includes more predictors than observations. Relative importance analysis is a general term applied to any technique used for estimating the importance of predictor variables in a regression model. The output is a set of scores which enable the predictor variables to be ranked based upon how strongly each influences the outcome variable. There are a number of different approaches to calculating relative importance analysis including Relative Weights and Shapley Regression as described here and here. In this blog post I briefly describe how to use an alternative method, Partial Least Squares, in R. Because it effectively compresses the data before regression, PLS is particularly useful when the number of predictor variables is more than the number of observations. PLS is a dimension reduction technique with some similarity to principal…
Original Post: Using Partial Least Squares to Conduct Relative Importance analysis in R

My set of packages for (daily) data analysis #rstats

I started writing my first package as collection of various functions that I needed for (almost) daily work. Meanwhile, packages were growing and bit by bit I sourced out functions to put them into new packages. Although this means more work for CRAN members when they have more packages to manage on their network, from a user-perspective it is much better if packages have a clear focus and a well defined set of functions. That’s why I now released a new package on CRAN, sjlabelled, which contains all functions that deal with labelled data. These functions use to live in the sjmisc-package, where they now are deprecated and will be removed in a future update. My aim is not only to provide packages with a clear focus, but also with a consistent design and philosophy, making it easier and…
Original Post: My set of packages for (daily) data analysis #rstats

Automatic tools for improving R packages

On Tuesday I gave a talk at a meetup of the R users group of Barcelona. I got to choose the topic of my talk, and decided I’d like to expand a bit on a recent tweet of mine. There are tools that help you improve your R packages, some of them are not famous enough yet in my opinion, so I was happy to help spread the word! I published my slides online but thought that a blog post would be nice as well. During my talk at RUG BCN, for each tool I gave a short introduction and then applied it to a small package I had created for the occasion. In that post I’ll just shortly present each tool. Most of them are only automatic because they automatically provide you with a list of things to…
Original Post: Automatic tools for improving R packages

Using purrr with APIs – revamping my code

I wrote a little while back about using Microsoft Cognitive Services APIs with R to first of all detect the language of pieces of text and then do sentiment analysis on them. I wasn’t too happy with the some of the code as it was very inelegant. I knew I could code better than I had, especially as I’ve been doing a lot more work with purrr recently. However, it had sat in drafts for a while. Then David Smith kindly posted about the process I used which meant I had to get this nicer version of my code out ASAP! Get the complete code in this gist. Prerequisites API access Packages Setup library(httr) library(jsonlite) library(dplyr) library(purrr) cogapikey% mutate(id=row_number()) -> textdf textdf %>% list(documents=.) -> mydata Language detection We need to identify the most likely language for each bit…
Original Post: Using purrr with APIs – revamping my code

New rOpenSci Packages for Text Processing in R

Textual data and natural language processing are still a niche domain within the R ecosytstem. The NLP task view gives an overview of existing work however a lot of basic infrastructure is still missing.At the rOpenSci text workshop in April we discussed many ideas for improving text processing in R which revealed several core areas that need improvement:Reading: better tools for extracing text and metadata from documents in various formats (doc, rtf, pdf, etc). Encoding: many text packages work well for ascii text but rapidly break down when text contains Hungarian, Korean or emojis. Interchange: packages don’t work well together due to lack of data classes or conventions for textual data (see also ropensci/tif) Participants also had many good suggestions for C/C++ libraries that text researchers in R might benefit from. Over the past weeks I was able to…
Original Post: New rOpenSci Packages for Text Processing in R