Dimensionality Reduction : Does PCA really improve classification outcome?

[unable to retrieve full-text content]In this post, I am going to verify this statement using a Principal Component Analysis ( PCA ) to try to improve the classification performance of a neural network over a dataset.
Original Post: Dimensionality Reduction : Does PCA really improve classification outcome?

New open data sets from Microsoft Research

Microsoft has released a number of data sets produced by Microsoft Research and made them available for download at Microsoft Research Open Data.   The Datasets in Microsoft Research Open Data are categorized by their primary research area, such as Physics, Social Science, Environmental Science, and Information Science. Many of the data sets have not been previously available to the public, and many are large and useful for research in AI and Machine Learning techniques. Many of the datasets also include links to associated papers from Microsoft Research. For example, the 10Gb DESM Word Embeddings dataset provides the IN and the OUT word2vec embeddings for 2.7M words trained on a Bing query corpus of 600M+ queries. Other data sets of note include: A collection of 38M tweets related to the 2012 US election 3-D capture data from individuals performing a variety of hand gestures…
Original Post: New open data sets from Microsoft Research

In case you missed it: June 2018 roundup

In case you missed them, here are some articles from June of particular interest to R users. An animated visualization of global migration, created in R by Guy Abel. My take on the question, Should you learn R or Python for data science? The BBC and Financial Times use R — without post-processing — for publication graphics. “Handling Strings in R”, a free e-book by Gaston Sanchez, has been updated. The AI, Machine Learning and Data Science roundup for June 2018. The PYPL Popularity of Languages Index ranks R as the 7th most popular programming language. The “lime” package for R provides tools for interpreting machine learning models. An R vignette by Paige Bailey on detecting unconscious bias in predictive models. Microsoft R Open 3.5.0 has been released (with a subsequent fix for Debian systems). Slides from the webinar, What’s…
Original Post: In case you missed it: June 2018 roundup

R 3.5.1 update now available

Last week the R Core Team released the latest update to the R statistical data analysis environment, R version 3.5.1. This update (codenamed “Feather Spray” — a Peanuts reference) makes no user-visible changes and fixes a few bugs. It is backwards-compatible with R 3.5.0, and users can find updates for Windows, Linux and Mac systems at their local CRAN mirror. (The update to Microsoft R Open featuring the R 3.5.1 engine is scheduled for release on August 29.) The complete list of fixes to R 3.5.1 is included in the release announcement, found at the link below.  R-announce mailing list: R 3.5.1 is released
Original Post: R 3.5.1 update now available

Global Migration, animated with R

The animation below, by Shanghai University professor Guy Abel, shows migration within and between regions of the world from 1960 to 2015. The data and the methodology behind the chart is described in this paper. The curved bars around the outside represent the peak migrant flows for each region; globally, migration peaked during the 2005-2010 period and the declined in 2010-2015, the latest data available. This animated chord chart was created entirely using the R language. The chord plot showing the flows between regions was created using the circlize package; the tweenr package created the smooth transitions between time periods, and the magick package created the animated GIF you see above. You can find a tutorial on making this animation, including the complete R code, at the link below. Guy Abel: Animated Directional Chord Diagrams (via Cal Carrie)
Original Post: Global Migration, animated with R

The Financial Times and BBC use R for publication graphics

While graphics guru Edward Tufte recently claimed that “R coders and users just can’t do words on graphics and typography” and need additonal tools to make graphics that aren’t “clunky”, data journalists at major publications beg to differ. The BBC has been creating graphics “purely in R” for some time, with a typography style matching that of the BBC website. Senior BBC Data Journalist Christine Jeavans offers several examples, including this chart of life expectancy differences between men and women: … and this chart on gender pay gaps at large British banks: Meanwhile, the chart below was made for the Financial Times using just R and the ggplot2 package, “down to the custom FT font and the white bar in the top left”, according to data journalist John Burn-Murdoch. There are also entire collections devoted to recreating Tufte’s own visualizations…
Original Post: The Financial Times and BBC use R for publication graphics

KDnuggets™ News 18:n25, Jun 27: 5 Clustering Algorithms Data Scientists Need to Know; Detecting Sarcasm with Deep Convolutional Neural Networks?

[unable to retrieve full-text content]Also 30 Free Resources for Machine Learning, Deep Learning, NLP ; 7 Simple Data Visualizations You Should Know in R.
Original Post: KDnuggets™ News 18:n25, Jun 27: 5 Clustering Algorithms Data Scientists Need to Know; Detecting Sarcasm with Deep Convolutional Neural Networks?