Why use Hadoop Clusters with Salesforce Wave Analytics?

Self-Service Hitches

Even though the NoSQL platform system had been received very well because of its schema-less data storage approach and better self-service, it was still not perfect as may be expected from a first-generation product even though it had been introduced only after years of research and development.

One of the things that had bugged users was the complex process of uploading data to the platform, particularly from Hadoop and Bigtable. Even though Wave Analytics had been touted as a self-service platform, importing data from Hadoop required the use of difficult ETL processes well beyond the capabilities of non-technical users.

Plugging the Gaps

The growing discomfort felt by users due to this lack of functionality led to the launch of the data connectors recently. While the Wave platform can access big data in on-premises Hadoop clusters only after it is uploaded to the cloud, the partnerships with various vendors have led to the creation of automated procedures for enabling data extraction and uploading to the Wave platform. Even though it may be likely that some data will need to be transformed before it can be extracted and uploaded, the connectors have made life far easier for non-technical users, who hitherto only had access to change management tools like Flosum.com.

Bridging the Gap between BI and Hadoop Users

According to Salesforce, the technique adopted by them to go native on Hadoop comprises a Java program that is installed on the Hadoop cluster and which functions to transport the data to the cloud. As Salesforce Analytics Cloud senior vice president and general manager puts it, it was essentially a ‘last mile’ connection. According to him, the real challenge lays in delivering the huge amounts of data residing in big data platforms to the marketing personnel or customer service representatives who then can use the information to interact with customers in a more meaningful and value-added manner. This is exactly the functionality that the newly-announced big data platform connectivity promises to users.

The Java program that Salesforce and its partners jointly developed helps users to get access to the data in their Hadoop clusters from Hortonworks, Cloudera, etc. While there are already a few customers working with this connector, Salesforce continues to work to integrate with Google more efficiently, even though there is already another integration method available currently.

Partnerships for Data Transformation

The $5-billion CRM market leader is simultaneously collaborating with data transformation software developers such as Trifacta and Informatica as well as New Relic, a developer of hosted analytics software to make accessible external data to users of Wave. Instead of undertaking the data transformation and cleanup manually that is not only tedious but also time-consuming, Wave customers can use the data transformation prebuilt by Trifacta and Informatica to clean up their data residing on Hadoop before uploading to the Wave platform on the cloud for analysis.

Conclusion

It is very clear that this sort of data integration facilities is the result of the growing realization that big data is not the sole domain of IT specialists but can lead to a boost in company productivity and better marketing and customer service when accessed by the customer-facing personnel who are not so technically competent. A new class of data-driven professionals is coming into existence with this ongoing universality of big data analytics.

David Wicks

Blog Author

David Wicks is a senior Salesforce developer working at the forefront of new technologies. An articulate speaker and writer on big data analytics, Davidhas also written a series of insightful articles on aFlosum.com capabilities and advantages.

Users of Hadoop Clusters Get More Alternatives with Salesforce Wave Analytics

Self-Service Hitches

Plugging the Gaps

Bridging the Gap between BI and Hadoop Users

Partnerships for Data Transformation

Conclusion

David Wicks

Upcoming Big Data Batches & Dates

Useful Links