Because it's Friday: Language and Thought

Does the language we speak change the way we think? This TED talk by Lera Boroditsky looks at how language structures like gendered nouns, or the way directions are described, might shape they way speakers of those languages think about things: This talk was cited by Bill Venables in his excellent keynote talk at the useR!2018 conference, where he also presented the results of an elegant designed experiment that looked at the influence of gendered nouns by comparing languages without gendered nouns (like Hungarian) with those that have multiple genders for nouns like Spanish (which has feminine and masculine nouns) and German (which adds neuter nouns, for three genders total). That’s all from us for this week. Have a great weekend, and we’ll be back next week. Enjoy!
Original Post: Because it's Friday: Language and Thought

New open data sets from Microsoft Research

Microsoft has released a number of data sets produced by Microsoft Research and made them available for download at Microsoft Research Open Data.   The Datasets in Microsoft Research Open Data are categorized by their primary research area, such as Physics, Social Science, Environmental Science, and Information Science. Many of the data sets have not been previously available to the public, and many are large and useful for research in AI and Machine Learning techniques. Many of the datasets also include links to associated papers from Microsoft Research. For example, the 10Gb DESM Word Embeddings dataset provides the IN and the OUT word2vec embeddings for 2.7M words trained on a Bing query corpus of 600M+ queries. Other data sets of note include: A collection of 38M tweets related to the 2012 US election 3-D capture data from individuals performing a variety of hand gestures…
Original Post: New open data sets from Microsoft Research

In case you missed it: June 2018 roundup

In case you missed them, here are some articles from June of particular interest to R users. An animated visualization of global migration, created in R by Guy Abel. My take on the question, Should you learn R or Python for data science? The BBC and Financial Times use R — without post-processing — for publication graphics. “Handling Strings in R”, a free e-book by Gaston Sanchez, has been updated. The AI, Machine Learning and Data Science roundup for June 2018. The PYPL Popularity of Languages Index ranks R as the 7th most popular programming language. The “lime” package for R provides tools for interpreting machine learning models. An R vignette by Paige Bailey on detecting unconscious bias in predictive models. Microsoft R Open 3.5.0 has been released (with a subsequent fix for Debian systems). Slides from the webinar, What’s…
Original Post: In case you missed it: June 2018 roundup

R 3.5.1 update now available

Last week the R Core Team released the latest update to the R statistical data analysis environment, R version 3.5.1. This update (codenamed “Feather Spray” — a Peanuts reference) makes no user-visible changes and fixes a few bugs. It is backwards-compatible with R 3.5.0, and users can find updates for Windows, Linux and Mac systems at their local CRAN mirror. (The update to Microsoft R Open featuring the R 3.5.1 engine is scheduled for release on August 29.) The complete list of fixes to R 3.5.1 is included in the release announcement, found at the link below.  R-announce mailing list: R 3.5.1 is released
Original Post: R 3.5.1 update now available

Because it's Friday: Wavy Lines Illusion

Another fine illusion: in this one, the pairs of horizontal lines are all smooth sine curves, despite the appearance of the jagged zig-zags: It’s really hard for me at least to tell that the zig-zags in the light grey region are actually curved. Zooming all the way in may help if you want to check for yourself. That’s all from us at the blog for this week. Next week, we’ll be taking a break for the week of the US Independence Day holiday, but we’ll be back the week of Monday July 9, reporting from the useR! conference in Brisbane. In the meantime, have a great weekend!
Original Post: Because it's Friday: Wavy Lines Illusion

Global Migration, animated with R

The animation below, by Shanghai University professor Guy Abel, shows migration within and between regions of the world from 1960 to 2015. The data and the methodology behind the chart is described in this paper. The curved bars around the outside represent the peak migrant flows for each region; globally, migration peaked during the 2005-2010 period and the declined in 2010-2015, the latest data available. This animated chord chart was created entirely using the R language. The chord plot showing the flows between regions was created using the circlize package; the tweenr package created the smooth transitions between time periods, and the magick package created the animated GIF you see above. You can find a tutorial on making this animation, including the complete R code, at the link below. Guy Abel: Animated Directional Chord Diagrams (via Cal Carrie)
Original Post: Global Migration, animated with R

Should you learn R or Python for data science?

One of the most common questions I get asked is, “Should I learn R or Python?”. My general response is: it’s up to you! Both are popular open source data platforms with active, growing communities; both are are highly sought after by employers, and both have a rich set of capabilities for working with data. It really depends most on your interests and the kind of employer you want to work for. If your interests lean more towards traditional statistical analysis and inference as used within industries like manufacturing, finance, and the life sciences, I’d lean towards R. If you’re more interested in machine learning and artificial intelligence applications, I’d lean towards Python. But even that’s not a hard-and-fast rule: R has excellent support for machine learning and deep learning frameworks, and Python is often used for traditional data science…
Original Post: Should you learn R or Python for data science?

The Financial Times and BBC use R for publication graphics

While graphics guru Edward Tufte recently claimed that “R coders and users just can’t do words on graphics and typography” and need additonal tools to make graphics that aren’t “clunky”, data journalists at major publications beg to differ. The BBC has been creating graphics “purely in R” for some time, with a typography style matching that of the BBC website. Senior BBC Data Journalist Christine Jeavans offers several examples, including this chart of life expectancy differences between men and women: … and this chart on gender pay gaps at large British banks: Meanwhile, the chart below was made for the Financial Times using just R and the ggplot2 package, “down to the custom FT font and the white bar in the top left”, according to data journalist John Burn-Murdoch. There are also entire collections devoted to recreating Tufte’s own visualizations…
Original Post: The Financial Times and BBC use R for publication graphics

Because it's Friday: The lioness sleeps tonight

Handlers for the lion enclosure at San Diego Zoo have developed a novel way to provide stimulation for their big cats: let them play tug-of-war with people outside. People plural that is — it turns out that a young lioness is no match for a trio of pro wrestlers: 🤼‍♂️ How many #NXT #WWE superstar wrestlers does it take to win in tug of war with a 2 1/2 year old lion cub? Apparently more than 3! #NXTSanAntonio #SAZoo pic.twitter.com/avyPVwRYjN — San Antonio Zoo & Zoo School🦏 (@SanAntonioZoo) May 19, 2018 That’s all for this week. Have a great weekend (and a very happy Pride!) and we’ll be back next week.
Original Post: Because it's Friday: The lioness sleeps tonight