Color palettes derived from the Dutch masters

Among tulip fields, canals and sampling cheese,the museums of the Netherlands are one of its biggest tourist attractions. Andfor very good reasons! During the seventeenth century, known as the Dutch GoldenAge, there was an abundance of talented painters. If you ever havethe chance to visit the Rijksmuseum you will be in awe by the landscapes,households and portraits, painted with incredible detail and beautiful colors. The dutchmasters color palette Rembrandt van Rijn and Johannes Vermeer are the most famous of the seventeenthcentury Dutch masters. Both are renowned for their use of light and color,making encounters with their subjects feel as being in the room with them.Recently, during the OzUnconference, the beautiful ochRe package wasdeveloped. This package contains color palettes of the wonderful Australianlandscape (which my wife got to witness during our honeymoon lastyear). Drawing colors from both works of art and…
Original Post: Color palettes derived from the Dutch masters

Dummy Variable for Examining Structural Instability in Regression: An Alternative to Chow Test

One of the fast growing economies in the era of globalization is the Ethiopian economy. Among the lower income group countries, it has emerged as one of the rare countries to achieve a double digit growth rate in Grows Domestic Product (GDP). However, there is a great deal of debate regarding the double digit growth rate, especially during the recent global recession period. So, it becomes a question of empirical research whether there is a structural change in the relationship between the GDP of Ethiopia and the regressor (time). How do we find out that a structural change has in fact occurred? To answer this question, we consider the GDP of Ethiopia (measured on constant 2010 US$) over the period of 1981 to 2015. Like many other countries in the world, Ethiopia has adopted the policy of regulated globalization during…
Original Post: Dummy Variable for Examining Structural Instability in Regression: An Alternative to Chow Test

Serving shiny apps in the internet with your own server

– In this post I’ll share my experience in setting up my own virtualserver for hosting shiny applications in DigitalOcean. First, context. I’m working in aacademic project where we build a package for accessing financial dataand corporate events directly from B3, the Brazilian financial exchange.The objective is to set a reproducible standard and facilite dataacquisition of a large, and very interesting, dataset. The result isGetDFPData.Since many researchers and students in Brazil are not knowledgeable inR, we needed to make it easier for people to use the software. A shinyapp hosted in the internet is perfect for that. The app is available athttp://www.msperlin.com/shiny/GetDFPData/. You can host your own shiny app for free inwww.shiny.io, but that comes with someusage limitations. While searchingfor alternatives, I’ve found this greatpostby Dean Attali that clearly explains thesteps for setting up a web server in a virtual…
Original Post: Serving shiny apps in the internet with your own server

A minimal Project Tree in R

You can also check this post, written in #blogdown, here: minimal-project-tree-r. The last two days arrived at my twitter feed some discussions on how bad are the following sentences at the beginning of your R script/notebook, sparked by @JennyBryan’s slides at the IASC-ARS/NZSA Conference: setwd() and rm(list = ls()) Jenny Bryan offered a detailed explanation for this, as well as some fixes, in her tidyverse blog post. The main idea was: To ensure reproducibility within a stable working directory tree. She proposes the very concise here::here() but other methods are available such as the template package. To avoid break havoc in other’s computers with rm(list = ls())!. All of this buzz around project self-containment and reproducibility motivated me to finish a minimal directory tree that (with some variations) I have been using for this year’s data analysis endeavours. It is a extremely simple tree which separates a /data,…
Original Post: A minimal Project Tree in R

A chart of Bechdel Test scores

A movie is said to satisfy the Bechdel Test if it satisfies the following three criteria: The movie has at least two named female characters … who have a conversation with each other … about something other than a man The website BechdelTest.com scores movies accordingly, granting one point for each of the criteria above for a maximum of three points. The recent Wonder Woman movie scores the full three points (Diana and her mother discuss war, for example), while Dunkirk, which features no named female characters, gets zero. (It was still a great film, however.) The website also offers an API, which enabled data and analytics director Austin Wehrwein to create this time series chart of Bechdel scores for movies listed on BechdelTest.com: This chart only includes ratings for that subset of movies listed on Bechdeltest.com, so it’s not clear whether this…
Original Post: A chart of Bechdel Test scores

Leveraging pipeline in Spark trough scala and Sparklyr

As said, we’ll show how we can use scala API to access pipeline in MLlib, therefore we need to include references to classes we’re planning to use in our example and to start a spark session :then we’ll read dataset and will start to manipulate data in order to prepare for the pipeline. In our example, we’ll get data out of local repository (instead of referring to an eg. HDFS or Datalake repository, there are API – for both scala and R – which allows the access to these repositories as well). We’ll leverage this upload activity also to rename some columns, in particular, we’ll rename the “income” column as “label” since we’ll use this a label column in our logistic regression algorithm. //load data source from local repository val csv = spark.read.option(“inferSchema”,”true”) .option(“header”, “true”).csv(“…yyyyxxxxadult.csv”) val data_start = csv.select(($”workclass”),($”gender”),($”education”),($”age”), ($”marital-status”).alias(“marital”),…
Original Post: Leveraging pipeline in Spark trough scala and Sparklyr

New R Course: Introduction to the Tidyverse!

Hi! Big announcement today as we just launched Introduction to the Tidyverse R course by David Robinson! This is an introduction to the programming language R, focused on a powerful set of tools known as the “tidyverse”. In the course you’ll learn the intertwined processes of data manipulation and visualization through the tools dplyr and ggplot2. You’ll learn to manipulate data by filtering, sorting and summarizing a real dataset of historical country data in order to answer exploratory questions. You’ll then learn to turn this processed data into informative line plots, bar plots, histograms, and more with the ggplot2 package. This gives a taste both of the value of exploratory data analysis and the power of tidyverse tools. This is a suitable introduction for people who have no previous experience in R and are interested in learning to perform data…
Original Post: New R Course: Introduction to the Tidyverse!

Be a BigBallR in #rstats : Stayeth or Switcheth

If you’re a Big Baller, you know when to stayeth in your lane but also when to switcheth lanes. @LangstonKerman four score and seven years ago our fathers brought forth on this continent, a new lane to stayeth in —MikeJackTzen (@MKJCKTZN) November 19, 2017 The Big Baller Brand brothers just inked a deal to play professional basketball in Europe. They switched into a different lane to achieve their original goal. It’s not about the money for the Ball Brothers. They have a passion to play Basketball and to experience playing as… twitter.com/i/web/status/9… —Big Baller Brand (@bigballerbrand) December 12, 2017 The knee-jerk reaction from the non-globally minded is that this spells doom for any NBA hoop dreams. Not so. @bigballerbrand @MELOD1P @LiAngeloBall shoutouts to @bigballerbrandfor pursuing other ways 2 achieve goals.maybe… twitter.com/i/web/status/9… —MikeJackTzen (@MKJCKTZN) December 12, 2017 Four score and ten years ago, your only…
Original Post: Be a BigBallR in #rstats : Stayeth or Switcheth

When a Tweet Turns Into an R Package

Reproduced with the kind permission of our Head of Data Engineering, Mark Sellors and first published on his blog Boy, that escalated quickly I just wanted to write up a brief post about the power of R, its community, and tell the story of how actually putting stuff out into the world can have amazing consequences. About 24 hours ago I was going to tweet something like this: Hey Mac #rstats users – system(‘say “hello rstats user”’) I’d been playing with the MacOS command line tool, ‘say’, with the kids, and just figured it would be funny to make R say stuff. Before I tweeted though, I thought I’d better check that it worked as intended. While I was doing that I decided it would be fun to expose the say command’s different voices and figured I’d make a gist…
Original Post: When a Tweet Turns Into an R Package

Point Pattern Analysis using Ecological Methods in R

Here is a quick example for how to get started with some of the more sophisticated point pattern analysis tools that have been developed for ecologists – principally the adehabitathr package – but that are very useful for human data. Ecologists deploy point pattern analysis to establish the “home range” of a particular animal based on the know locations it has been sighted (either directly or remotely via camera traps). Essentially it is where the animal spends most of its time. In the case of human datasets the analogy can be extended to identify areas where most crimes are committed – hotspots – or to identify the activity spaces of individuals or the catchment areas of services such as schools and hospitals. This tutorial offers a rough analysis of crime data in London so the maps should not be taken…
Original Post: Point Pattern Analysis using Ecological Methods in R