Just use a scatterplot. Also, Sydney sprawls.

Dual-axes at tipping-point Sydney’s congestion at ‘tipping point’ blares the headline and to illustrate, an interactive chart with bars for city population densities, points for commute times and of course, dual-axes. Yuck. OK, I guess it does show that Sydney is one of three cities that are low density, but have comparable average commute times to higher-density cities. But if you’re plotting commute time versus population density…doesn’t a different kind of chart come to mind first? y versus x. C’mon. Let’s explore. First: do we even believe the numbers? Comments on the article point out that the population density for Phoenix was corrected after publication, and question the precise meaning of city area. Hovering over the graphic to obtain the values, then visiting Wikipedia’s city pages, we can create a Google spreadsheet which I hope is publicly-visible at this link.…
Original Post: Just use a scatterplot. Also, Sydney sprawls.

Video: R for AI, and the Not Hotdog workshop

Related To leave a comment for the author, please follow the link and comment on their blog: Revolutions. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more… If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook…
Original Post: Video: R for AI, and the Not Hotdog workshop

How to add Trend Lines to Visualizations in Displayr

In Displayr, Visualizations of chart type Column, Bar, Area, Line and Scatter all support trend lines.  Trend lines can be  linear or non-parametric (cubic spline, Friedman’s super-smoother or LOESS). Adding a linear trend line Linear trend lines can be added to a chart by fitting a regression to each series in the data source. In the chart below, the linear trends are shown as dotted lines in the color corresponding to the data series. We see there is considerable fluctuation in the frequency of each search term. But the trend lines clarify that the overall trend for SPSS is downward, whereas the trend for Stata is increasing. The data for this chart was generated by clicking Insert > More > Data > Google Trends. In the textbox for Topic(s) we typed in a comma-separated list of search terms (i.e., “SPSS,…
Original Post: How to add Trend Lines to Visualizations in Displayr

Clean Your Data in Seconds with This R Function

All data needs to be clean before you can explore and create models. Common sense, right. Cleaning data can be tedious but I created a function that will help. The function do the following: Clean Data from NA’s and Blanks Separate the clean data – Integer dataframe, Double dataframe, Factor dataframe, Numeric dataframe, and Factor and Numeric dataframe. View the new dataframes Create a view of the summary and describe from the clean data. Create histograms of the data frames. Save all the objects This will happen in seconds. Package First, load Hmisc package. I always save the original file.The code below is the engine that cleans the data file. cleandata <- dataname[complete.cases(dataname),] The function The function is below. You need to copy the code and save it in an R file. Run the code and the function cleanme will…
Original Post: Clean Your Data in Seconds with This R Function

pinp 0.0.6: Two new options

A small feature release of our pinp package for snazzier one or two column vignettes get onto CRAN a little earlier. It offers two new options. Saghir Bashir addressed a longer-standing help needed! issue and contributed code to select papersize options via the YAML header. And I added support for the collapse option of knitr, also via YAML header selection. A screenshot of the package vignette can be seen below. Additional screenshots of are at the pinp page. The NEWS entry for this release follows. Changes in pinp version 0.0.6 (2018-07-16) Added YAML header option ‘papersize’ (Saghir Bashir in #54 and #58 fixing #24). Added YAML header option ‘collapse’ with default ‘false’ (#59). Courtesy of CRANberries, there is a comparison to the previous release. More information is on the tint page. For questions or comments use the issue tracker off…
Original Post: pinp 0.0.6: Two new options

10 Jobs for R users from around the world (2018-07-17)

To post your R job on the next post Just visit  this link and post a new R job  to the R community. You can post a job for  free  (and there are also “featured job” options available for extra exposure). Current R jobs Job seekers:  please follow the links below to learn more and apply for your R job of interest: Featured Jobs Freelance Data Analytics Instructor Level Education From Northeastern University – Posted by LilyMeyer Boston Massachusetts, United States 16 Jul 2018 Full-Time Financial Systems Analyst National Audit Office – Posted by ahsinebadian London England, United Kingdom 13 Jul 2018 Full-Time Data Scientist National Audit Office – Posted by ahsinebadian London England, United Kingdom 13 Jul 2018 Freelance Senior Data Scientist Data Science Talent – Posted by damiendeighan Frankfurt am Main Hessen, Germany 6 Jul 2018 Full-Time Lead Quantitative Developer The Millburn Corporation – Posted by The Millburn Corporation New York New York,…
Original Post: 10 Jobs for R users from around the world (2018-07-17)

Using leaflet, just because

I love it when researchers take the time to share their knowledge of the computational tools that they use. So first, let me point you at Environmental Computing, a site run by environmental scientists at the University of New South Wales, which has a good selection of R programming tutorials. One of these is Making maps of your study sites. It was written with the specific purpose of generating simple, clean figures for publications and presentations, which it achieves very nicely. I’ll be honest: the sole motivator for this post is that I thought it would be fun to generate the map using Leaflet for R as an alternative. You might use Leaflet if you want: An interactive map that you can drag, zoom, click for popup information A “fancier” static map with geographical features of interest concise and clean…
Original Post: Using leaflet, just because

A quick #WorldEmojiDay exploration

Let’s celebrate #WorldEmojiDay with a quick exploration of my owntwitter account. The 📦 We’ll need: From Github remote::install_github(“hadley/emo”) From CRAN {dplyr} {tidyr} {rtweet} {tidytext} Note: This page has been created at: Sys.time() ## [1] “2018-07-17 17:22:29 CEST” The 🔍 Let’s get my last 3200 tweets: library(emo) library(rtweet) library(dplyr) ## ## Attaching package: ‘dplyr’ ## The following objects are masked from ‘package:stats’: ## ## filter, lag ## The following objects are masked from ‘package:base’: ## ## intersect, setdiff, setequal, union res <- get_timeline( “_ColinFay”, n = 3200 ) names(res) ## [1] “user_id” “status_id” ## [3] “created_at” “screen_name” ## [5] “text” “source” ## [7] “display_text_width” “reply_to_status_id” ## [9] “reply_to_user_id” “reply_to_screen_name” ## [11] “is_quote” “is_retweet” ## [13] “favorite_count” “retweet_count” ## [15] “hashtags” “symbols” ## [17] “urls_url” “urls_t.co” ## [19] “urls_expanded_url” “media_url” ## [21] “media_t.co” “media_expanded_url” ## [23] “media_type” “ext_media_url” ## [25] “ext_media_t.co” “ext_media_expanded_url”…
Original Post: A quick #WorldEmojiDay exploration

What’s inside? pkginspector provides helpful tools for inspecting package contents

R packages are widely used in science, yet the code behind them often does not come under scrutiny. To address this lack, rOpenSci has been a pioneer in developing a peer review process for R packages. The goal of pkginspector is to help that process by providing a means to better understand the internal structure of R packages. It offers tools to analyze and visualize the relationship among functions within a package, and to report whether or not functions’ interfaces are consistent. If you are reviewing an R package (maybe your own!), pkginspector is for you. We began building pkginspector during unconf18, with support from rOpenSci and guidance from Noam Ross. The package focuses on facilitating a few of the many tasks involved in reviewing a package; it is one of a collection of packages, including pkgreviewr (rOpenSci) and goodpractice,…
Original Post: What’s inside? pkginspector provides helpful tools for inspecting package contents

Hamiltonian tails

“We demonstrate HMC’s sensitivity to these parameters by sampling from a bivariate Gaussian with correlation coefficient 0.99. We consider three settings (ε,L) = {(0.16; 40); (0.16; 50); (0.15; 50)}” Ziyu Wang, Shakir Mohamed, and Nando De Freitas. 2013 In an experiment with my PhD student Changye Wu (who wrote all R codes used below), we looked back at a strange feature in an 2013 ICML paper by Wang, Mohamed, and De Freitas. Namely, a rather poor performance of an Hamiltonian Monte Carlo (leapfrog) algorithm on a two-dimensional strongly correlated Gaussian target, for very specific values of the parameters (ε,L) of the algorithm. The Gaussian target associated with this sample stands right in the middle of the two clouds, as identified by Wang et al. And the leapfrog integration path for (ε,L)=(0.15,50) keeps jumping between the two ridges (or tails) ,…
Original Post: Hamiltonian tails