[unable to retrieve full-text content]DevOps and DVC tools can help reduce time data scientists spend on mundane data preparation and achieve their dream of focusing on cool machine learning algorithms and interesting data analysis.
Original Post: Data Version Control in Analytics DevOps Paradigm
I’m happy to announce that version 0.10.0 beta of the dplyrXdf package is now available. You can get it from Github: install_github(“RevolutionAnalytics/dplyrXdf”, build_vignettes=FALSE) This is a major update to dplyrXdf that adds the following features: Support for the tidyeval framework that powers the latest version of dplyr Works with Spark and Hadoop clusters and files in HDFS Several utility functions to ease working with files and datasets Many bugfixes and workarounds for issues with the underlying RevoScaleR functions This (pre-)release of dplyrXdf requires Microsoft R Server or Client version 8.0 or higher, and dplyr 0.7 or higher. If you’re using R Server, dplyr 0.7 won’t be in the MRAN snapshot that is your default repo, but you can get it from CRAN: install.packages(“dplyr”, repos=”https://cloud.r-project.org”) The tidyeval framework This completely changes the way in which dplyr handles standard evaluation. Previously, if…
Original Post: dplyrXdf 0.10.0 beta prerelease
[unable to retrieve full-text content]Apache Arrow is a de-facto standard for columnar in-memory analytics. In the coming years we can expect all the big data platforms adopting Apache Arrow as its columnar in-memory layer.
Original Post: Why Apache Arrow is the future for open source-columnar memory analytics
[unable to retrieve full-text content]Toolkits for standard neural network visualizations exist, along with tools for monitoring the training process, but are often tied to the deep learning framework. Could a general, easy-to-setup tool for generating standard visualizations provide a sanity check on the learning process?
Original Post: Visualizing Convolutional Neural Networks with Open-source Picasso
Posted by James Wexler, Senior Software Engineer, Google Big Picture Team(Cross-posted on the Google Open Source Blog)Getting the best results out of a machine learning (ML) model requires that you truly understand your data. However, ML datasets can contain hundreds of millions of data points, each consisting of hundreds (or even thousands) of features, making it nearly impossible to understand an entire dataset in an intuitive fashion. Visualization can help unlock nuances and insights in large datasets. A picture may be worth a thousand words, but an interactive visualization can be worth even more.Working with the PAIR initiative, we’ve released Facets, an open source visualization tool to aid in understanding and analyzing ML datasets. Facets consists of two visualizations that allow users to see a holistic picture of their data at different granularities. Get a sense of the shape of…
Original Post: Facets: An Open Source Visualization Tool for Machine Learning Training Data
Posted by Thang Luong, Research Scientist, and Eugene Brevdo, Staff Software Engineer, Google Brain TeamMachine translation – the task of automatically translating between languages – is one of the most active research areas in the machine learning community. Among the many approaches to machine translation, sequence-to-sequence (“seq2seq”) models [1, 2] have recently enjoyed great success and have become the de facto standard in most commercial translation systems, such as Google Translate, thanks to its ability to use deep neural networks to capture sentence meanings. However, while there is an abundance of material on seq2seq models such as OpenNMT or tf-seq2seq, there is a lack of material that teaches people both the knowledge and the skills to easily build high-quality translation systems.Today we are happy to announce a new Neural Machine Translation (NMT) tutorial for TensorFlow that gives readers a full…
Original Post: Building Your Own Neural Machine Translation System in TensorFlow
Posted by Jonathan Huang, Research Scientist and Vivek Rathod, Software Engineer(Cross-posted on the Google Open Source Blog)At Google, we develop flexible state-of-the-art machine learning (ML) systems for computer vision that not only can be used to improve our products and services, but also spur progress in the research community. Creating accurate ML models capable of localizing and identifying multiple objects in a single image remains a core challenge in the field, and we invest a significant amount of time training and experimenting with these systems. Last October, our in-house object detection system achieved new state-of-the-art results, and placed first in the COCO detection challenge. Since then, this system has generated results for a number of research publications1,2,3,4,5,6,7 and has been put to work in Google products such as NestCam, the similar items and style ideas feature in Image Search and…
Original Post: Supercharge your Computer Vision models with the TensorFlow Object Detection API