[unable to retrieve full-text content]This class will introduce Apache Spark 2, focusing on using it for data analysis Taught by Sujee Maniyam on behalf of the local ACM chapter, SFbayACM.

Original Post: Spark with Scala – ACM Professional Development Seminar, Santa Clara, Aug 5

## Set Theory Arbitrary Union and Intersection Operations with R

The union and intersection set operations were introduced in a previous post using two sets, and . These set operations can be generalized to accept any number of sets. Arbitrary Set Unions Operation Consider a set of infinitely many sets: It would be very tedious and unnecessary to repeat the union statement repeatedly for any non-trivial amount of sets, for example, the first few unions would be written as: Thus a more general operation for performing unions is needed. This operation is denoted by the symbol. For example, the set above and the desired unions of the member sets can be generalized to the following using the new notation: We can then state the following definition: For a set , the union of is defined by: For example, consider the three sets: The union of the three sets is…

Original Post: Set Theory Arbitrary Union and Intersection Operations with R

## RTutor: Emission Certificates and Green Innovation

Which policy instruments should we use to cost-effectively reduce greenhouse gas emissions? For a given technological level there are many economic arguments in favour of tradeable emission certificates or a carbon tax: they generate static efficiency by inducing emission reductions in those sectors and for those technologies where it is most cost effective. Specialized subsidies, like the originally extremely high subsidies on solar energy in Germany and other countries are often much more costly. Yet, we have seen a tremendous cost reduction for photovoltaics, which may have not been achieved on such a scale without those subsidies. And maybe in a world, where the current president of a major polluting country seems not to care much about the risks of climate change, the development of cheap green technology that even absent goverment support can cost-effectively substitute fossil fuels, is…

Original Post: RTutor: Emission Certificates and Green Innovation

## Interactive R visuals in Power BI

Power BI has long had the capability to include custom R charts in dashboards and reports. But in sharp contrast to standard Power BI visuals, these R charts were static. While R charts would update when the report data was refreshed or filtered, it wasn’t possible to interact with an R chart on the screen (to display tool-tips, for example). But in the latest update to Power BI, you can create create R custom visuals that embed interactive R charts, like this: The above chart was created with the plotly package, but you can also use htmlwidgets or any other R package that creates interactive graphics. The only restriction is that the output must be HTML, which can then be embedded into the Power BI dashboard or report. You can also publish reports including these interactive charts to the…

Original Post: Interactive R visuals in Power BI

## Interactive R visuals in Power BI

Power BI has long had the capability to include custom R charts in dashboards and reports. But in sharp contrast to standard Power BI visuals, these R charts were static. While R charts would update when the report data was refreshed or filtered, it wasn’t possible to interact with an R chart on the screen (to display tool-tips, for example). But in the latest update to Power BI, you can create create R custom visuals that embed interactive R charts, like this: The above chart was created with the plotly package, but you can also use htmlwidgets or any other R package that creates interactive graphics. The only restriction is that the output must be HTML, which can then be embedded into the Power BI dashboard or report. You can also publish reports including these interactive charts to the online…

Original Post: Interactive R visuals in Power BI

## Two years as a Data Scientist at Stack Overflow

Last Friday marked my two year anniversary working as a data scientist at Stack Overflow. At the end of my first year I wrote a blog post about my experience, both to share some of what I’d learned and as a form of self-reflection.After another year, I’d like to revisit the topic. While my first post focused mostly on the transition from my PhD to an industry position, here I’ll be sharing what has changed for me in my job in the last year, and what I hope the next year will bring. Hiring a Second Data Scientist In last year’s blog post, I noted how difficult it could be to be the only data scientist on a team: Most of my current statistical education has to be self-driven, and I need to be very cautious about my work:…

Original Post: Two years as a Data Scientist at Stack Overflow

## Face Recognition in R

Face Recognition in R OpenCV is an incredibly powerful tool to have in your toolbox. I have had a lot of success using it in Python but very little success in R. I haven’t done too much other than searching Google but it seems as if “imager” and “videoplayR” provide a lot of the functionality but not all of it. I have never actually called Python functions from R before. Initially, I tried the “rPython” library – that has a lot of advantages, but was completely unnecessary for me so system() worked absolutely fine. While this example is extremely simple, it should help to illustrate how easy it is to utilize the power of Python from within R. I need to give credit to Harrison Kinsley for all of his efforts and work at PythonProgramming.net – I used a lot…

Original Post: Face Recognition in R

## Online portfolio allocation with a very simple algorithm

By Yuri Resende Today we will use an online convex optimization technique to build a very simple algorithm for portfolio allocation. Of course this is just an illustrative post and we are going to make some simplifying assumptions. The objective is to point out an interesting direction to approach the problem of portfolio allocation. The algorithm used here is the Online Gradient Descendent (OGD) and we are going to compare the performance of the portfolio with the Uniform Constant Rebalanced Portfolio and the Dow Jones Industrial Average index. You can skip directly to Implementation and Example if you already know what an online algorithm is. For those who don’t know what Online Convex Optimization is… From now on, we will say that represents a point in dimension , where is the number of possible stocks to invest. Each of…

Original Post: Online portfolio allocation with a very simple algorithm

## Mott Community College: Institutional Research Analyst

[unable to retrieve full-text content]Seeking a candidate to play a lead role in support of such initiatives by providing data and analysis to core teams as requested. This position provides advanced analytical support regarding a broad array of subjects.

Original Post: Mott Community College: Institutional Research Analyst

## Data wrangling : Reshaping

Data wrangling is a task of great importance in data analysis. Data wrangling, is the process of importing, cleaning and transforming raw data into actionable information for analysis. It is a time-consuming process which is estimated to take about 60-80% of analyst’s time. In this series we will go through this process. It will be a brief series with goal to craft the reader’s skills on the data wrangling task. This is the second part of this series and it aims to cover the reshaping of data used to turn them into a tidy form. By tidy form, we mean that each feature forms a column and each observation forms a row. Before proceeding, it might be helpful to look over the help pages for the spread, gather, unite, separate, replace_na, fill, extract_numeric. Moreover please load the following libraries.install.packages(“magrittr”)library(magrittr)install.packages(“tidyr”)library(tidyr)…

Original Post: Data wrangling : Reshaping