Agile Data Science: Building Data Analytics Applications by Russell Jurney

By Russell Jurney

Mining mammoth facts calls for a deep funding in humans and time. how are you going to ascertain you're development the correct versions? With this hands-on publication, you'll study a versatile toolset and method for construction powerful analytics purposes with Hadoop.

Using light-weight instruments akin to Python, Apache Pig, and the D3.js library, your staff will create an agile atmosphere for exploring facts, beginning with an instance program to mine your individual electronic mail inboxes. You'll examine an iterative technique that allows you to quick swap the type of research you're doing, looking on what the knowledge is telling you. All instance code during this publication is on the market as operating Heroku apps.

Create analytics purposes through the use of the agile great facts improvement methodology
Build price out of your information in a sequence of agile sprints, utilizing the data-value stack
Gain perception through the use of numerous information constructions to extract a number of gains from a unmarried dataset
Visualize facts with charts, and disclose various points via interactive reports
Use ancient information to foretell the longer term, and translate predictions into action
Get suggestions from clients after each one dash to maintain your undertaking on the right track

Show description

Read or Download Agile Data Science: Building Data Analytics Applications with Hadoop PDF

Best nonfiction books

The Streets Were Paved With Gold

How - and why - did one of many world's maximum towns grow to be teetering at the fringe of financial disaster? Ken Auletta, author for the hot YORKER and columnist for THE day-by-day information, indicates how the decline of recent York urban was once in part inevitable --- the results of moving migration styles and quick technological suggestions --- and partially because of anarchic political and monetary factions, each one angling for its personal virtue.

Still Life: Adventures in Taxidermy

It's effortless to push aside taxidermy as a kitschy or morbid sideline, the world of trophy fish and jackalopes or an anachronistic throwback to the dusty diorama. but theirs is an international of intrepid hunter-explorers, eccentric naturalists, and proficient museum artisans, all dedicated to the paradoxical pursuit of constructing the semblance of lifestyles.

Face to Face: Amazing New Looks and Inspiration from the Top Celebrity Makeup Artist

Face to stand, the follow-up to the bestselling good looks primer approximately Face, is the typical kind consultant for each girl. Sought-after big name make-up artist Scott Barnes is helping swap up the standard go-to make-up regimen with recommendations for buying definitely the right glance in the course of each transition of the day. What’s a swish, work-appropriate face for the boardroom and shopper conferences? How do you are taking your paintings face up one notch for a dinner out? And what does it take to wow the all-night crowd and create your individual pink carpet glam? Scott exhibits readers with various seems for all sorts and complexions. step by step directions make program basic and photographs convey sooner than and after pictures of simply what's attainable with a few make-up magic dropped at you by means of the relied-on make-up artist of Kim Kardashian, Jennifer Lopez, and more.

Mrs. Dunwoody's Excellent Instructions for Homekeeping

Mrs. Dunwoody is a personality in accordance with the author's nice grandmother and different conventional Southern ladies who think within the significance of constructing a home a home.

Extra resources for Agile Data Science: Building Data Analytics Applications with Hadoop

Example text

So our initial schema might look very simple, like this: { "type":"record", "name":"RawEmail", "fields": [ { "name":"thread_id", "type":["string", "null"], "doc":"" }, { "name":"raw_email", "type": ["string", "null"] } ] } We might extract only a thread_id as a unique identifier, and then store the entire raw email string in a field on its own. If a unique identifier is not easy to extract from raw records, we can generate a UUID (universally unique identifier) and add it as a field. Our job as we process data, then, is to add fields to our schema as we extract them, all the while retaining the raw data in its own field if we can.

Social network Figure 2-6 shows a social network of some 200 megabytes of emails from Enron. 28 | Chapter 2: Data Figure 2-6. Enron corpus viewer, by Jeffrey Heer and Andrew Fiore Social network analysis, or SNA, is the scientific study and analysis of social networks. By modeling our inbox as a social network, we can draw on the methods of SNA (like PageRank) to reach a deeper understanding of the data and of our interpersonal net‐ work. Figure 2-7 shows such an analysis applied to the Enron network.

We can represent this as a simple social network, as shown in Figure 2-4. Figure 2-4. Social network dyad Figure 2-5 depicts a more complex social network. Figure 2-5. Social network Figure 2-6 shows a social network of some 200 megabytes of emails from Enron. 28 | Chapter 2: Data Figure 2-6. Enron corpus viewer, by Jeffrey Heer and Andrew Fiore Social network analysis, or SNA, is the scientific study and analysis of social networks. By modeling our inbox as a social network, we can draw on the methods of SNA (like PageRank) to reach a deeper understanding of the data and of our interpersonal net‐ work.

Download PDF sample

Rated 4.79 of 5 – based on 23 votes