Just use a scatterplot. Also, Sydney sprawls.

Dual-axes at tipping-point Sydney’s congestion at ‘tipping point’ blares the headline and to illustrate, an interactive chart with bars for city population densities, points for commute times and of course, dual-axes. Yuck. OK, I guess it does show that Sydney is one of three cities that are low density, but have comparable average commute times to higher-density cities. But if you’re plotting commute time versus population density…doesn’t a different kind of chart come to mind first? y versus x. C’mon. Let’s explore. First: do we even believe the numbers? Comments on the article point out that the population density for Phoenix was corrected after publication, and question the precise meaning of city area. Hovering over the graphic to obtain the values, then visiting Wikipedia’s city pages, we can create a Google spreadsheet which I hope is publicly-visible at this link.…
Original Post: Just use a scatterplot. Also, Sydney sprawls.

Using leaflet, just because

I love it when researchers take the time to share their knowledge of the computational tools that they use. So first, let me point you at Environmental Computing, a site run by environmental scientists at the University of New South Wales, which has a good selection of R programming tutorials. One of these is Making maps of your study sites. It was written with the specific purpose of generating simple, clean figures for publications and presentations, which it achieves very nicely. I’ll be honest: the sole motivator for this post is that I thought it would be fun to generate the map using Leaflet for R as an alternative. You might use Leaflet if you want: An interactive map that you can drag, zoom, click for popup information A “fancier” static map with geographical features of interest concise and clean…
Original Post: Using leaflet, just because

Twitter coverage of the useR! 2018 conference

In summary: The code that generated the report (which I’ve used heavily and written about before) is at Github too. A few changes required compared with previous reports, due to changes in the rtweet package, and a weird issue with kable tables breaking markdown headers. I love that the most popular media attachment is a screenshot of a Github repo. Related R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more… If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook…
Original Post: Twitter coverage of the useR! 2018 conference

Idle thoughts lead to R internals: how to count function arguments

“Some R functions have an awful lot of arguments”, you think to yourself. “I wonder which has the most?” It’s not an original thought: the same question as applied to the R base package is an exercise in the Functions chapter of the excellent Advanced R. Much of the information in this post came from there. There are lots of R packages. We’ll limit ourselves to those packages which ship with R, and which load on startup. Which ones are they? What packages load on starting R?Start a new R session and type search(). Here’s the result on my machine: search()[1] “.GlobalEnv” “tools:rstudio” “package:stats” “package:graphics” “package:grDevices””package:utils” “package:datasets” “package:methods” “Autoloads” “package:base” We’re interested in the packages with priority = base. Next question: How can I see and filter for package priority?You don’t need dplyr for this, but it helps. library(tidyverse) installed.packages()…
Original Post: Idle thoughts lead to R internals: how to count function arguments

PubMed retractions report has moved

A brief message for anyone who uses my PubMed retractions report. It’s no longer available at RPubs; instead, you will find it here at Github. Github pages hosting is great, once you figure out that docs/ corresponds to your web root 🙂 Now I really must update the code and try to make it more interesting than a bunch of bar charts. Related R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more… If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook…
Original Post: PubMed retractions report has moved

Twitter coverage of the Australian Bioinformatics & Computational Biology Society Conference 2017

You know the drill by now. Grab the tweets. Generate the report using RMarkdown. Push to Github. Publish to RPubs. This time it’s the Australian Bioinformatics & Computational Biology Society Conference 2017, including the COMBINE symposium. Looks like a good time was had by all in Adelaide. A couple of quirks this time around. First, the rtweet package went through a brief phase of returning lists instead of nice data frames. I hope that’s been discarded as a bad idea 🙂 Second, results returned from a hashtag search that did not contain said hashtags, but were definitely from the meeting. Yet to get to the bottom of that one. I treated ABACBS and COMBINE as one event for this analysis. Third, given that most Twitter users have had 280 characters since about November 7, is this reflected in the conference…
Original Post: Twitter coverage of the Australian Bioinformatics & Computational Biology Society Conference 2017

Mapping data using R and leaflet

The R language provides many different tools for creating maps and adding data to them. I’ve been using the leaflet package at work recently, so I thought I’d provide a short example here. Whilst searching for some data that might make a nice map, I came across this article at ABC News. It includes a table containing Australian members of parliament, their electorate and their voting intention regarding legalisation of same-sex marriage. Since I reside in New South Wales, let’s map the data for electorates in that state. Here’s the code at Github. The procedure is pretty straightforward: Obtain a shapefile of New South Wales electorates (from here) and read into R Read the data from the ABC News web page into a data frame (very easy using rvest) Match the electorate names from the two sources (they match perfectly,…
Original Post: Mapping data using R and leaflet

XML parsing made easy: is that podcast getting longer?

Sometime in 2009, I began listening to a science podcast titled This Week in Virology, or TWiV for short. I thought it was pretty good and listened regularly up until sometime in 2016, when it seemed that most episodes were approaching two hours in duration. I listen to several podcasts when commuting to/from work, which takes up about 10 hours of my week, so I found it hard to justify two hours for one podcast, no matter how good. Were the episodes really getting longer over time? Let’s find out using R. One thing I’ve learned as a data scientist: management want to see the key points first. So here it is: Technical people want to see how we got there. It turns out that the podcast has an RSS feed (in XML format), containing detailed information about every episode…
Original Post: XML parsing made easy: is that podcast getting longer?

Feels like a dry winter – but what does the data say?

A reminder that when idle queries pop into your head, the answer can often be found using R + online data. And a brief excursion into accessing the Weather Underground. One interesting aspect of Australian life, even in coastal urban areas like Sydney, is that sometimes it just stops raining. For weeks or months at a time. The realisation hits slowly: at some point you look around at the yellow-brown lawns, ovals and “nature strips” and say “gee, I don’t remember the last time it rained.” Thankfully in our data-rich world, it’s relatively easy to find out whether the dry spell is really as long as it feels. In Australia, meteorological data is readily available via the Bureau of Meteorology (known as BoM). Another source is the Weather Underground (WU), which has the benefit that there may be data from…
Original Post: Feels like a dry winter – but what does the data say?

Infographic-style charts using the R waffle package

Infographics. I’ve seen good examples. I’ve seen more bad examples. In general, I prefer a good chart to an infographic. That said, there’s a “genre” of infographic that I do think is useful, which I’ll call “if X were 100 Y”. A good example: if the world were 100 people. That method of showing proportions has been called a waffle chart and for extra “infographic-i-ness”, the squares can be replaced by icons. You want to do this using R? Of course you do. Here’s how.There’s not much more here than you’ll find at the Github home of the R packages, waffle and extrafont. I’ve just made it a little more step-by-step. 1. Install the R packagesYou need waffle to create waffle charts and extrafont to use icons in the charts. install.packages(c(“waffle”, “extrafont”)) library(waffle) library(extrafont) 2. Install Font AwesomeThe icons that…
Original Post: Infographic-style charts using the R waffle package