Emerging Ecosystem: Data Science and Machine Learning Software, Analyzed

[unable to retrieve full-text content]We examine which top tools are “friends”, their Python vs R bias, and which work well with Spark/Hadoop and Deep Learning, and identify an emerging Big Data Deep Learning ecosystem.
Original Post: Emerging Ecosystem: Data Science and Machine Learning Software, Analyzed

Using sparklyr with Microsoft R Server

The sparklyr package (by RStudio) provides a high-level interface between R and Apache Spark. Among many other things, it allows you to filter and aggregate data in Spark using the dplyr syntax. In Microsoft R Server 9.1, you can now connect to a a Spark session using the sparklyr package as the interface, allowing you to combine the data-preparation capabilities of sparklyr and the data-analysis capabilities of Microsoft R Server in the same environment. In a presentation by at the Spark Summit (embedded below, and you can find the slides here), Ali Zaidi shows how to connect to a Spark session from Microsoft R Server, and use the sparklyr package to extract a data set. He then shows how to build predictive models on this data (specifically, a deep Neural Network and a Boosted Trees classifier). He also shows how…
Original Post: Using sparklyr with Microsoft R Server

How HR Managers Use Data Science to Manage Talent for Their Companies

[unable to retrieve full-text content]Data sciences can also be used by HR manager to create several estimates like the investment on talent pool, cost per hire, cost on training, and cost per employee. It provides better techniques for optimization, forecasting, and reporting.
Original Post: How HR Managers Use Data Science to Manage Talent for Their Companies

Must-Know: What are common data quality issues for Big Data and how to handle them?

[unable to retrieve full-text content]Let’s have a look at common quality issues facing Big Data in terms of the key characteristics of Big Data – Volume, Velocity, Variety, Veracity, and Value.
Original Post: Must-Know: What are common data quality issues for Big Data and how to handle them?