The union and intersection set operations were introduced in a previous post using two sets, and . These set operations can be generalized to accept any number of sets. Arbitrary Set Unions Operation Consider a set of infinitely many sets: It would be very tedious and unnecessary to repeat the union statement repeatedly for any non-trivial amount of sets, for example, the first few unions would be written as: Thus a more general operation for performing unions is needed. This operation is denoted by the symbol. For example, the set above and the desired unions of the member sets can be generalized to the following using the new notation: We can then state the following definition: For a set , the union of is defined by: For example, consider the three sets: The union of the three sets is…

Original Post: Set Theory Arbitrary Union and Intersection Operations with R

# R bloggers

## RTutor: Emission Certificates and Green Innovation

Which policy instruments should we use to cost-effectively reduce greenhouse gas emissions? For a given technological level there are many economic arguments in favour of tradeable emission certificates or a carbon tax: they generate static efficiency by inducing emission reductions in those sectors and for those technologies where it is most cost effective. Specialized subsidies, like the originally extremely high subsidies on solar energy in Germany and other countries are often much more costly. Yet, we have seen a tremendous cost reduction for photovoltaics, which may have not been achieved on such a scale without those subsidies. And maybe in a world, where the current president of a major polluting country seems not to care much about the risks of climate change, the development of cheap green technology that even absent goverment support can cost-effectively substitute fossil fuels, is…

Original Post: RTutor: Emission Certificates and Green Innovation

## Face Recognition in R

Face Recognition in R OpenCV is an incredibly powerful tool to have in your toolbox. I have had a lot of success using it in Python but very little success in R. I haven’t done too much other than searching Google but it seems as if “imager” and “videoplayR” provide a lot of the functionality but not all of it. I have never actually called Python functions from R before. Initially, I tried the “rPython” library – that has a lot of advantages, but was completely unnecessary for me so system() worked absolutely fine. While this example is extremely simple, it should help to illustrate how easy it is to utilize the power of Python from within R. I need to give credit to Harrison Kinsley for all of his efforts and work at PythonProgramming.net – I used a lot…

Original Post: Face Recognition in R

## Online portfolio allocation with a very simple algorithm

By Yuri Resende Today we will use an online convex optimization technique to build a very simple algorithm for portfolio allocation. Of course this is just an illustrative post and we are going to make some simplifying assumptions. The objective is to point out an interesting direction to approach the problem of portfolio allocation. The algorithm used here is the Online Gradient Descendent (OGD) and we are going to compare the performance of the portfolio with the Uniform Constant Rebalanced Portfolio and the Dow Jones Industrial Average index. You can skip directly to Implementation and Example if you already know what an online algorithm is. For those who don’t know what Online Convex Optimization is… From now on, we will say that represents a point in dimension , where is the number of possible stocks to invest. Each of…

Original Post: Online portfolio allocation with a very simple algorithm

## Data wrangling : Reshaping

Data wrangling is a task of great importance in data analysis. Data wrangling, is the process of importing, cleaning and transforming raw data into actionable information for analysis. It is a time-consuming process which is estimated to take about 60-80% of analyst’s time. In this series we will go through this process. It will be a brief series with goal to craft the reader’s skills on the data wrangling task. This is the second part of this series and it aims to cover the reshaping of data used to turn them into a tidy form. By tidy form, we mean that each feature forms a column and each observation forms a row. Before proceeding, it might be helpful to look over the help pages for the spread, gather, unite, separate, replace_na, fill, extract_numeric. Moreover please load the following libraries.install.packages(“magrittr”)library(magrittr)install.packages(“tidyr”)library(tidyr)…

Original Post: Data wrangling : Reshaping

## nanotime 0.2.0

A new version of the nanotime package for working with nanosecond timestamps just arrived on CRAN. nanotime uses the RcppCCTZ package for (efficient) high(er) resolution time parsing and formatting up to nanosecond resolution, and the bit64 package for the actual integer64 arithmetic. Thanks to a metric ton of work by Leonardo Silvestri, the package now uses S4 classes internally allowing for greater consistency of operations on nanotime objects. Changes in version 0.2.0 (2017-06-22) Rewritten in S4 to provide more robust operations (#17 by Leonardo) Ensure tz=”” is treated as unset (Leonardo in #20) Added format and tz arguments to nanotime, format, print (#22 by Leonardo and Dirk) Ensure printing respect options()$max.print, ensure names are kept with vector (#23 by Leonardo) Correct summary() by defining names (Leonardo in #25 fixing #24) Report error on operations that are meaningful for type;…

Original Post: nanotime 0.2.0

## Plotting partial pooling in mixed-effects models

In this post, I demonstrate a few techniques for plotting information from arelatively simple mixed-effects model fit in R. These plots can help us developintuitions about what these models are doing and what “partial pooling” means. The sleepstudy dataset For these examples, I’m going to use the sleepstudy dataset from the lme4package. The outcome measure is reaction time, the predictor measure is days ofsleep deprivation, and these measurements are nested within participants—wehave 10 observations per participant. I am also going to add two fakeparticipants with incomplete data to illustrate partial pooling. library(lme4) #> Loading required package: Matrix #> Loading required package: methods library(dplyr) library(tibble) # Convert to tibble for better printing. Convert factors to strings sleepstudy sleepstudy %>% as_tibble() %>% mutate(Subject = as.character(Subject)) # Add two fake participants df_sleep bind_rows( sleepstudy, data_frame(Reaction = c(286, 288), Days = 0:1, Subject…

Original Post: Plotting partial pooling in mixed-effects models

## RcppCCTZ 0.2.3 (and 0.2.2)

A new minor version 0.2.3 of RcppCCTZ is now on CRAN. RcppCCTZ uses Rcpp to bring CCTZ to R. CCTZ is a C++ library for translating between absolute and civil times using the rules of a time zone. In fact, it is two libraries. One for dealing with civil time: human-readable dates and times, and one for converting between between absolute and civil times via time zones. The RcppCCTZ page has a few usage examples and details. This version ensures that we set the TZDIR environment variable correctly on the old dreaded OS that does not come with proper timezone information—an issue which had come up while preparing the next (and awesome, trust me) release of nanotime. It also appears that I failed to blog about 0.2.2, another maintenance release, so changes for both are summarised next. Changes in…

Original Post: RcppCCTZ 0.2.3 (and 0.2.2)

## Large integers in R: Fibonacci number with 1000 digits, Euler Problem 25

The Fibonacci Sequence occurs in nature: The nautilus shell. Euler Problem 25 takes us back to the Fibonacci sequence and the problems related to working with very large integers. The Fibonacci sequence follows a simple mathematical rule but it can create things of great beauty. This pattern occurs quite often in nature, like to nautilus shell shown in the image. The video by Arthur Benjamin at the end of this post illustrates some of the magic of this sequence. Large Integers in R By default, numbers with more than 7 digits are shown in scientific notation in R, which reduces the accuracy of the calculation. You can change the precision of large integers with the options function but R struggles with integers with more than 22 digits. This example illustrates this issue.
factorial(24)
[1] 6.204484e+23
> options(digits=22)
> factorial(24)
[1] 620448401733239409999872
…

Original Post: Large integers in R: Fibonacci number with 1000 digits, Euler Problem 25

## Updated Data Science Virtual Machine for Windows: GPU-enabled with Docker support

The Windows edition of the Data Science Virtual Machine (DSVM), the all-in-one virtual machine image with a wide-collection of open-source and Microsoft data science tools, has been updated to the Windows Server 2016 platform. This update brings built-in support for Docker containers and GPU-based deep learning. GPU-based Deep Learning. While prior editions of the DSVM could access GPU-based capabilities by installing additional components, everything is now configured and ready at launch. The DSVM now includes GPU-enabled builds of popular deep learning frameworks including CNTK, Tensorflow, and MXNET. It also includes Microsoft R Server 9.1, and several machine-learning functions in the MicrosoftML package can also take advantage of GPUs. Note that you will need to use an N-series Azure instance to benefit from GPU acceleration, but all of the tools in the DSVM will also work on regular CPU-based instances as well.…

Original Post: Updated Data Science Virtual Machine for Windows: GPU-enabled with Docker support