Global Big Data Conference, Santa Clara, Aug 29-31 – KDnuggets Offer

[unable to retrieve full-text content]Global Big Data Conference, a leading vendor agnostic conference for the Big Data community, will hold 5th conference in Santa Clara. Use code KDnuggets to save.
Original Post: Global Big Data Conference, Santa Clara, Aug 29-31 – KDnuggets Offer

dplyrXdf 0.10.0 beta prerelease

I’m happy to announce that version 0.10.0 beta of the dplyrXdf package is now available. You can get it from Github: install_github(“RevolutionAnalytics/dplyrXdf”, build_vignettes=FALSE) This is a major update to dplyrXdf that adds the following features: Support for the tidyeval framework that powers the latest version of dplyr Works with Spark and Hadoop clusters and files in HDFS Several utility functions to ease working with files and datasets Many bugfixes and workarounds for issues with the underlying RevoScaleR functions This (pre-)release of dplyrXdf requires Microsoft R Server or Client version 8.0 or higher, and dplyr 0.7 or higher. If you’re using R Server, dplyr 0.7 won’t be in the MRAN snapshot that is your default repo, but you can get it from CRAN: install.packages(“dplyr”, repos=”https://cloud.r-project.org”) The tidyeval framework This completely changes the way in which dplyr handles standard evaluation. Previously, if…
Original Post: dplyrXdf 0.10.0 beta prerelease

Why Apache Arrow is the future for open source-columnar memory analytics

[unable to retrieve full-text content]Apache Arrow is a de-facto standard for columnar in-memory analytics. In the coming years we can expect all the big data platforms adopting Apache Arrow as its columnar in-memory layer.
Original Post: Why Apache Arrow is the future for open source-columnar memory analytics

KDnuggets™ News 17:n29, Aug 2: Machine Learning Exercises in Python; 8 Reasons Why Many Big Data Analytics Solutions Fail

[unable to retrieve full-text content]Machine Learning Exercises in Python: An Introductory Tutorial Series; The BI & Data Analysis Conundrum: 8 Reasons Why Many Big Data Analytics Solutions Fail to Deliver Value; The Internet of Things: An Introductory Tutorial Series; How to squeeze the most from your training data
Original Post: KDnuggets™ News 17:n29, Aug 2: Machine Learning Exercises in Python; 8 Reasons Why Many Big Data Analytics Solutions Fail

How to use H2O with R on HDInsight

H2O.ai is an open-source AI platform that provides a number of machine-learning algorithms that run on the Spark distributed computing framework. Azure HDInsight is Microsoft’s fully-managed Apache Hadoop platform in the cloud, which makes it easy to spin up and manage Azure clusters of any size. It’s also easy to to run H2O on HDInsight: H2O AI Platform is available as an application on HDInsight, which pre-installs everything you need as the cluster is created. You can also drive H2O from R, but the R packages don’t come auto-installed on HDInsight. To make this easy, the Azure HDInsight team has provided a couple of scripts that will install the necessary components on the cluster for you. These include RStudio (to provide an R IDE on the cluster) and the rsparkling package. With these components installed, from R you can: Query data…
Original Post: How to use H2O with R on HDInsight