Twitter coverage of the Australian Bioinformatics & Computational Biology Society Conference 2017

You know the drill by now. Grab the tweets. Generate the report using RMarkdown. Push to Github. Publish to RPubs. This time it’s the Australian Bioinformatics & Computational Biology Society Conference 2017, including the COMBINE symposium. Looks like a good time was had by all in Adelaide. A couple of quirks this time around. First, the rtweet package went through a brief phase of returning lists instead of nice data frames. I hope that’s been discarded as a bad idea 🙂 Second, results returned from a hashtag search that did not contain said hashtags, but were definitely from the meeting. Yet to get to the bottom of that one. I treated ABACBS and COMBINE as one event for this analysis. Third, given that most Twitter users have had 280 characters since about November 7, is this reflected in the conference…
Original Post: Twitter coverage of the Australian Bioinformatics & Computational Biology Society Conference 2017

Mapping data using R and leaflet

The R language provides many different tools for creating maps and adding data to them. I’ve been using the leaflet package at work recently, so I thought I’d provide a short example here. Whilst searching for some data that might make a nice map, I came across this article at ABC News. It includes a table containing Australian members of parliament, their electorate and their voting intention regarding legalisation of same-sex marriage. Since I reside in New South Wales, let’s map the data for electorates in that state. Here’s the code at Github. The procedure is pretty straightforward: Obtain a shapefile of New South Wales electorates (from here) and read into R Read the data from the ABC News web page into a data frame (very easy using rvest) Match the electorate names from the two sources (they match perfectly,…
Original Post: Mapping data using R and leaflet

XML parsing made easy: is that podcast getting longer?

Sometime in 2009, I began listening to a science podcast titled This Week in Virology, or TWiV for short. I thought it was pretty good and listened regularly up until sometime in 2016, when it seemed that most episodes were approaching two hours in duration. I listen to several podcasts when commuting to/from work, which takes up about 10 hours of my week, so I found it hard to justify two hours for one podcast, no matter how good. Were the episodes really getting longer over time? Let’s find out using R. One thing I’ve learned as a data scientist: management want to see the key points first. So here it is: Technical people want to see how we got there. It turns out that the podcast has an RSS feed (in XML format), containing detailed information about every episode…
Original Post: XML parsing made easy: is that podcast getting longer?

Feels like a dry winter – but what does the data say?

A reminder that when idle queries pop into your head, the answer can often be found using R + online data. And a brief excursion into accessing the Weather Underground. One interesting aspect of Australian life, even in coastal urban areas like Sydney, is that sometimes it just stops raining. For weeks or months at a time. The realisation hits slowly: at some point you look around at the yellow-brown lawns, ovals and “nature strips” and say “gee, I don’t remember the last time it rained.” Thankfully in our data-rich world, it’s relatively easy to find out whether the dry spell is really as long as it feels. In Australia, meteorological data is readily available via the Bureau of Meteorology (known as BoM). Another source is the Weather Underground (WU), which has the benefit that there may be data from…
Original Post: Feels like a dry winter – but what does the data say?

Infographic-style charts using the R waffle package

Infographics. I’ve seen good examples. I’ve seen more bad examples. In general, I prefer a good chart to an infographic. That said, there’s a “genre” of infographic that I do think is useful, which I’ll call “if X were 100 Y”. A good example: if the world were 100 people. That method of showing proportions has been called a waffle chart and for extra “infographic-i-ness”, the squares can be replaced by icons. You want to do this using R? Of course you do. Here’s how.There’s not much more here than you’ll find at the Github home of the R packages, waffle and extrafont. I’ve just made it a little more step-by-step. 1. Install the R packagesYou need waffle to create waffle charts and extrafont to use icons in the charts. install.packages(c(“waffle”, “extrafont”)) library(waffle) library(extrafont) 2. Install Font AwesomeThe icons that…
Original Post: Infographic-style charts using the R waffle package

Years as coloured bars

I keep seeing years represented by coloured bars. First it was that demographic tsunami chart. Then there are examples like the one on the right, which came up in a web search today. I even saw one (whispers) at work today. I get what they are trying to do – illustrate trends within categories over time – but I don’t think years as coloured bars is the way to go. To me, progression over time suggests that time should be an axis, so as the eye moves along the data from one end to the other, without interruption. What I want to see is categories over time, not time within categories. So what is the way to go? Let’s ask “what would ggplot2 do?”The following charts illustrate different ways to visualise the same data using ggplot2. My motivation here is…
Original Post: Years as coloured bars

Twitter Coverage of the ISMB/ECCB Conference 2017

Search all the hashtags ISMB (Intelligent Systems for Molecular Biology – which sounds rather old-fashioned now, doesn’t it?) is the largest conference for bioinformatics and computational biology. It is held annually and, when in Europe, jointly with the European Conference on Computational Biology (ECCB). I’ve had the good fortune to attend twice: in Brisbane 2003 (very enjoyable early in my bioinformatics career, but unfortunately the seed for the “no more southern hemisphere meetings” decision), and in Toronto 2008. The latter was notable for its online coverage and for me, the pleasure of finally meeting in person many members of the online bioinformatics community. The 2017 meeting (and its satellite meetings) were covered quite extensively on Twitter. My search using a variety of hashtags based on “ismb”, “eccb”, “17” and “2017” retrieved 9052 tweets, which form the basis of this…
Original Post: Twitter Coverage of the ISMB/ECCB Conference 2017

Hacking Highcharter: observations per group in boxplots

Highcharts has long been a favourite visualisation library of mine, and I’ve written before about Highcharter, my preferred way to use Highcharts in R. Highcharter has a nice simple function, hcboxplot(), to generate boxplots. I recently generated some for a project at work and was asked: can we see how many observations make up the distribution for each category? This is a common issue with boxplots and there are a few solutions such as: overlay the box on a jitter plot to get some idea of the number of points, or try a violin plot, or a so-called bee-swarm plot. In Highcharts, I figured there should be a method to get the number of observations, which could then be displayed in a tool-tip on mouse-over. There wasn’t, so I wrote one like this. First, you’ll need to install highcharter…
Original Post: Hacking Highcharter: observations per group in boxplots

Chart golf: the “demographic tsunami”

“‘Demographic tsunami’ will keep Sydney, Melbourne property prices high” screams the headline. While the census showed Australia overall is aging, there’s been a noticeable lift in the number of people aged between 25 to 32.As the accompanying graph shows… Whoa, that is one ugly chart. First thought: let’s not be too hard on Fairfax Media, they’ve sacked most of their real journalists and they took the chart from someone else. Second thought: if you want to visualise change over time, time as an axis rather than a coloured bar is generally a good idea. Can we do better? As usual, you can find this project at Github, with an accompanying published document at RPubs. I rarely copy/paste/format code here any more so if you want the details, take a look at the Rmd file. Some of the charts in…
Original Post: Chart golf: the “demographic tsunami”