Search

Understanding Big Data- Best Big Data Frameworks

The massive world of ‘BIG DATA’If one strolls around any IT office premises, over every decade (nowadays time span is even lesser, almost every 3-4 years) one would overhear professionals discussing new jargons from the hottest trends in technology. Around 5 -6 years ago, one such word has started ruling IT services is ‘BIG data’ and still has been interpreted by a layman to tech geeks in various ways.Although services industries started talking about big data solutions widely from 5-6 years, it is believed that the term was in use since the 1990s by John Mashey from Silicon Graphics, whereas credit for coining the term ‘big data’ aligning to its modern definition goes to Roger Mougalas from O’Reilly Media in 2005.Let’s first understand why everyone going gaga about ‘BIG data’ and what are the real-world problems it is supposed to solve and then we will try to answer what and how aspects of it.Why is BIG DATA essential for today’s digital world?Pre smart-phones era, internet and web world were around for many years, but smart-phones made it mobile with on-the-go usage. Social Media, mobile apps started generating tons of data. At the same time, smart-bands, wearable devices ( IoT, M2M ), have given newer dimensions for data generation. This newly generated data became a new oil to the world. If this data is stored and analyzed, it has the potential to give tremendous insights which could be put to use in numerous ways.You will be amazed to see the real-world use cases of BIG data. Every industry has a unique use case and is even unique to every client who is implementing the solutions. Ranging from data-driven personalized campaigning (you do see that item you have browsed on some ‘xyz’ site onto Facebook scrolling, ever wondered how?) to predictive maintenance of huge pipes across countries carrying oils, where manual monitoring is practically impossible. To relate this to our day to day life, every click, every swipe, every share and every like we casually do on social media is helping today’s industries to take future calculated business decisions. How do you think Netflix predicted the success of ‘House of Cards’ and spent $100 million on the same? Big data analytics is the simple answer.Talking about all this, the biggest challenge in the past was traditional methods used to store, curate and analyze data, which had limitations to process this data generated from newer sources and which were huge in volumes generated from heterogeneous sources and was being generated  really fast(To give you an idea, roughly 2.5 quintillion data is generated per day as on today – Refer infographic released by Domo called “Data Never Sleeps 5.0.” ), Which given rise to term BIG data and related solutions.Understanding BIG DATA: Experts’ viewpoint BIG data literally means Massive data (loosely > 1TB) but that’s not the only aspect of it. Distributed data or even complex datasets which could not be analyzed through traditional methods can be categorized into ‘Big data’ and hence Big data theoretical definition makes a lot of sense with this background:“Gartner (2012) defines, Big data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.”Generic data possessing characteristics of big data are 3Vs namely Variety, Velocity, and VolumeBut due to the changing nature of data in today’s world and to gain most insights of it, 3 more Vs are added to the definition of BIG DATA, namely Variability, Veracity and Value.The diagram below illustrates each V in detail:Diagram: 6 V’s of Big DataThis 6Vs help understanding the characteristics of “BIG Data” but let’s also understand types of data in BIG Data processing.  “Variety” of above characteristics caters to different types of data can be processed through big data tools and technologies. Let’s drill down a bit for understanding what those are:Structured ex. Mainframes, traditional databases like Teradata, Netezza, Oracle, etc.Unstructured ex. Tweets, Facebook posts, emails, etc.Semi/Multi structured or Hybrid ex. E-commerce, demographic, weather data, etc.As the technology is advancing, the variety of data is available and its storage, processing, and analysis are made possible by big data. Traditional data processing techniques were able to process only structured data.Now, that we understand what big data and limitations of old traditional techniques are of handling such data, we could safely say, we need new technology to handle this data and gain insights out of it. Before going further, do you know, what were the traditional data management techniques?Traditional Techniques of Data Processing are:RDBMS (Relational Database Management System)Data warehousing and DataMartOn a high level, RDBMS catered to OLTP needs and data warehousing/DataMart facilitated OLAP needs. But both the systems work with structured data.I hope. now one can answer, ‘what is big data?’ conceptually and theoretically both.So, it’s time that we understand how it is being done in actual implementations.only storing of “big data” will not help the organizations, what’s important is to turn data into insights and business value and to do so, following are the key infrastructure elements:Data collectionData storageData analysis andData visualization/outputAll major big data processing framework offerings are based on these building blocks.And in an alignment of the above building blocks, following are the top 5 big data processing frameworks that are currently being used in the market:1. Apache Hadoop : Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.First up is the all-time classic, and one of the top frameworks in use today. So prevalent is it, that it has almost become synonymous with Big Data.2 Apache Spark : unified analytics engine for large-scale data processing.Apache Spark and Hadoop are often contrasted as an "either/or" choice,  but that isn't really the case.Above two frameworks are popular but apart from that following 3 are available and are comparable frameworks:3. Apache Storm : free and open source distributed real-time computation system. You can also take up Apache Storm training to learn more about Apache Storm.4. Apache Flink : streaming dataflow engine, aiming to provide facilities for distributed computation over streams of data. Treating batch processes as a special case of streaming data, Flink is effectively both batch and real-time processing framework, but one which clearly puts streaming first.5. Apache Samza : distributed Stream processing framework.Frameworks help processing data through building blocks and generate required insights. The framework is supported by the whopping number of tools providing the required functionality.BIG DATA Processing Framework and technology landscape Big data tools and technology landscape can be better understood with layered big data architecture. Give a good read to a great article by Navdeep singh Gill on XENONSTACK for understanding the layered architecture of big data.By taking inspiration from layered architecture, different available tools in the market are mapped to layers to understand big data technology landscape in depth. Note that, layered architecture fits very well with infrastructure elements/building blocks discussed in the above section.Few of the tools are briefed below for further understanding:  1. Data Collection / Ingestion Layer Cassandra: is a free and open-source, distributed, wide column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failureKafka: is used for building real-time data pipelines and streaming apps. Event streaming platformFlume: log collector in HadoopHBase: columnar database in Hadoop2. Processing Layer Pig: scripting language in the Hadoop frameworkMapReduce: processing language in Hadoop3. Data Query Layer Impala: Cloudera Impala:  modern, open source, distributed SQL query engine for Apache Hadoop. (often compared with hive)Hive: Data Warehouse software for data Query and analysisPresto: Presto is a high performance, distributed SQL query engine for big data. Its architecture allows users to query a variety of data sources such as Hadoop, AWS S3, Alluxio, MySQL, Cassandra, Apache Kafka, and MongoDB4. Analytical EngineTensorFlow: n source machine learning library for research and production.5. Data storage LayerIgnite: open-source distributed database, caching and processing platform designed to store and compute on large volumes of data across a cluster of nodesPhoenix: hortonworks: Apache Phoenix is an open source, massively parallel, relational database engine supporting OLTP for Hadoop using Apache HBase as its backing storePolyBase: s a new feature in SQL Server 2016. It is used to query relational and non-relational databases (NoSQL). You can use PolyBase to query tables and files in Hadoop or in Azure Blob Storage. You can also import or export data to/from Hadoop.Sqoop: ETL toolBig data in EXCEL: Few people like to process big datasets with current excel capabilities and it's known as Big Data in Excel6. Data Visualization LayerMicrosoft HDInsight: Azure HDInsight is a Hadoop service offering hosted in Azure that enables clusters of managed Hadoop instances. Azure HDInsight deploys and provisions Apache Hadoop clusters in the cloud, providing a software framework designed to manage, analyze, and report on big data with high reliability and availability. Hadoop administration training will give you all the technical understanding required to manage a Hadoop cluster, either in a development or a production environment.BIG Data Best Practices Every organization, industry, business, may it be small or big wants to get benefit out of “big data” but it's essential to understand that it can prove of maximum potential only if organization adhere to best practices before adapting big data:Answering 5 basic questions help clients know the need for adapting Big Data for organizationTry to answer why Big Data is required for the organization. What problem would it help solve?Ask the right questions.Foster collaboration between business and technology teams.Analyze only what is required to use.Start small and grow incrementally.Big Data industry use-cases We talked about all the things in the Big Data world except real use cases of big data. In the starting, we did discuss few but let me give you insights into the real world and interesting big data use cases and for a few, it’s no longer a secret ☺. In fact, it’s penetrating to the extent you name the industry and plenty of use cases can be told. Let’s begin.Streaming PlatformsAs I had given an example of ‘House of Cards’ at the start of the article, it’s not a secret that Netflix uses Big Data analytics. Netflix spent $100mn on 26 episodes of ‘House of Cards’ as they knew the show would appeal to viewers of original British House of Cards and built in director David Fincher and actor Kevin Spacey. Netflix typically collects behavioral data and it then uses this data to create a better experience for the user.But Netflix uses Big Data for more than that, they monitor and analyze traffic details for various devices, spot problem areas and adjust network infrastructure to prepare for future demand. (later is action out of Big Data analytics, how big data analysis is put to use). They also try to get insights into types of content viewers to prefer and help them make informed decisions.   Apart from Netflix, Spotify is also a known great use case.Advertising and Media / Campaigning /EntertainmentFor decades marketers were forced to launch campaigns while blindly relying on gut instinct and hoping for the best. That all changed with digitization and big data world. Nowadays, data-driven campaigns and marketing is on the rise and to be successful in this landscape, a modern marketing campaign must integrate a range of intelligent approaches to identify customers, segment, measure results, analyze data and build upon feedback in real time. All needs to be done in real time, along with the customer’s profile and history, based on his purchasing patterns and other relevant information and Big Data solutions are the perfect fit.Event-driven marketing is also could be achieved through big data, which is another way of successful marketing in today’s world. That basically indicates, keeping track of events customer are directly and indirectly involved with and campaign exactly when a customer would need it rather than random campaigns. For. Ex if you have searched for a product on Amazon/Flipkart, you would see related advertisements on other social media apps you casually browse through. Bang on, you would end up purchasing it as you anyway needed options best to choose from.Healthcare IndustryHealthcare is one of the classic use case industries for Big Data applications. The industry generates a huge amount of data.Patients medical history, past records, treatments given, available and latest medicines, Medicinal latest available research the list of raw data is endless.All this data can help give insights and Big Data can contribute to the industry in the following ways:Diagnosis time could be reduced, and exact requirement treatment could be started immediately. Most of the illnesses could be treated if a diagnosis is perfect and treatment can be started in time. This can be achieved through evidence-based past medical data available for similar treatments to doctor treating the illness, patients’ available history and feeding symptoms real-time into the system.  Government Health department can monitor if a bunch of people from geography reporting of similar symptoms, predictive measures could be taken in nearby locations to avoid outbreak as a cause for such illness could be the same.   The list is long, above were few representative examples.SecurityDue to social media outbreak, today, personal information is at stake. Almost everything is digital, and majority personal information is available in the public domain and hence privacy and security are major concerns with the rise in social media. Following are few such applications for big data.Cyber Crimes are common nowadays and big data can help to detect, predicting crimes.Threat analysis and detection could be done with big data.  Travel and TourismFlight booking sites, IRCTC track the clicks and hits along with IP address, login information, and other details and as per demand can do dynamic pricing for the flights/ trains. Big Data helps in dynamic pricing and mind you it’s real time. Am sure each one of us has experienced this. Now you know who is doing it :DTelecommunications, Public sector, Education, Social media and gaming, Energy and utility every industry have implemented are implementing several of these Big Data use cases day in and day out. If you look around am sure you would find them on the rise.Big Data is helping everyone industries, consumers, clients to make informed decisions, whatever it may be and hence wherever there is such a need, Big Data can come handy.Challenges faced by Big Data in the real world for adaptationAlthough the world is going gaga about big data, there are still a few challenges to implement and adopt Big Data and hence service industries are still striving towards resolving those challenges to implement best Big Data solution without flaws.An October 2016 report from Gartner found that organizations were getting stuck at the pilot stage of their big data initiatives. "Only 15 percent of businesses reported deploying their big data project to production, effectively unchanged from last year (14 per cent)," the firm said.Let’s discuss a few of them to understand what are they?1. Understanding Big Data and answering Why for the organization one is working with.As I started the article saying there are many versions of Big Data and understanding real use cases for organization decision makers are working with is still a challenge. Everyone wants to ride on a wave but not knowing the right path is still a struggle. As every organization is unique thus its utmost important to answer ‘why big data’ for each organization. This remains a major challenge for decision makers to adapt to big data.2. Understanding Data sources for the organizationIn today’s world, there are hundreds and thousands of ways information is being generated and being aware of all these sources and ingest all of them into big data platforms to get accurate insight is essential. Identifying sources is a challenge to address.It's no surprise, then, that the IDG report found, "Managing unstructured data is growing as a challenge – rising from 31 per cent in 2015 to 45 per cent in 2016."Different tools and technologies are on the rise to address this challenge.3. Shortage if Big Data Talent and retaining themBig Data is changing technology and there are a whopping number of tools in the Big Data technology landscape. It is demanded out of Big Data professionals to excel in those current tools and keep up self to ever-changing needs. This gets difficult for employees and employers to create and retain talent within the organization.The solution to this would be constant upskilling, re-skilling and cross-skilling and increasing budget of organization for retaining talent and help them train.4. The Veracity VThis V is a challenge as this V means inconsistent, incomplete data processing. To gain insights through big data model, the biggest step is to predict and fill missing information.This is a tricky part as filling missing information can lead to decreasing accuracy of insights/ analytics etc.To address this concern, there is a bunch of tools. Data curation is an important step in big data and should have a proper model. But also, to keep in mind that Big Data is never 100% accurate and one must deal with it.5. SecurityThis aspect is given low priority during the design and build phases of Big Data implementations and security loopholes can cost an organization and hence it’s essential to put security first while designing and developing Big Data solutions. Also, equally important to act responsibly for implementations for regulatory requirements like GDPR.  6. Gaining Valuable InsightsMachine learning data models go through multiple iterations to conclude on insights as they also face issues like missing data and hence the accuracy. To increase accuracy, lots of re-processing is required, which has its own lifecycle. Increasing accuracy of insights is a challenge and which relates to missing data piece. Which most likely can be addressed by addressing missing data challenge.This can also be caused due to unavailability of information from all data sources. Incomplete information would lead to incomplete insights which may not benefit to required potential.Addressing these discussed challenges would help to gain valuable insights through available solutions.With Big Data opportunities are endless. Once understood, the world is yours!!!!Also, now that you understand BIG DATA, it's worth understanding the next steps:Gary King, who is a professor at Harvard says “Big data is not about the data. It is about the analytics”You can also take up Big data and Hadoop training to enhance your skills furthermore.Did the article helps you to understand today’s massive world of big data and getting a sneak peek into it Do let us know through the comment section below?
Understanding Big Data- Best Big Data Frameworks
Shruti
Rated 4.5/5 based on 11 customer reviews

Understanding Big Data- Best Big Data Frameworks

The massive world of ‘BIG DATA’If one strolls around any IT office premises, over every decade (nowadays time span is even lesser, almost every 3-4 years) one would overhear professionals discussing new jargons from the hottest trends in technology. Around 5 -6 years ago, one such word has started ruling IT services is ‘BIG data’ and still has been interpreted by a layman to tech geeks in various ways.Although services industries started talking about big data solutions widely from 5-6 years, it is believed that the term was in use since the 1990s by John Mashey from Silicon Graphics, whereas credit for coining the term ‘big data’ aligning to its modern definition goes to Roger Mougalas from O’Reilly Media in 2005.Let’s first understand why everyone going gaga about ‘BIG data’ and what are the real-world problems it is supposed to solve and then we will try to answer what and how aspects of it.Why is BIG DATA essential for today’s digital world?Pre smart-phones era, internet and web world were around for many years, but smart-phones made it mobile with on-the-go usage. Social Media, mobile apps started generating tons of data. At the same time, smart-bands, wearable devices ( IoT, M2M ), have given newer dimensions for data generation. This newly generated data became a new oil to the world. If this data is stored and analyzed, it has the potential to give tremendous insights which could be put to use in numerous ways.You will be amazed to see the real-world use cases of BIG data. Every industry has a unique use case and is even unique to every client who is implementing the solutions. Ranging from data-driven personalized campaigning (you do see that item you have browsed on some ‘xyz’ site onto Facebook scrolling, ever wondered how?) to predictive maintenance of huge pipes across countries carrying oils, where manual monitoring is practically impossible. To relate this to our day to day life, every click, every swipe, every share and every like we casually do on social media is helping today’s industries to take future calculated business decisions. How do you think Netflix predicted the success of ‘House of Cards’ and spent $100 million on the same? Big data analytics is the simple answer.Talking about all this, the biggest challenge in the past was traditional methods used to store, curate and analyze data, which had limitations to process this data generated from newer sources and which were huge in volumes generated from heterogeneous sources and was being generated  really fast(To give you an idea, roughly 2.5 quintillion data is generated per day as on today – Refer infographic released by Domo called “Data Never Sleeps 5.0.” ), Which given rise to term BIG data and related solutions.Understanding BIG DATA: Experts’ viewpoint BIG data literally means Massive data (loosely > 1TB) but that’s not the only aspect of it. Distributed data or even complex datasets which could not be analyzed through traditional methods can be categorized into ‘Big data’ and hence Big data theoretical definition makes a lot of sense with this background:“Gartner (2012) defines, Big data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.”Generic data possessing characteristics of big data are 3Vs namely Variety, Velocity, and VolumeBut due to the changing nature of data in today’s world and to gain most insights of it, 3 more Vs are added to the definition of BIG DATA, namely Variability, Veracity and Value.The diagram below illustrates each V in detail:Diagram: 6 V’s of Big DataThis 6Vs help understanding the characteristics of “BIG Data” but let’s also understand types of data in BIG Data processing.  “Variety” of above characteristics caters to different types of data can be processed through big data tools and technologies. Let’s drill down a bit for understanding what those are:Structured ex. Mainframes, traditional databases like Teradata, Netezza, Oracle, etc.Unstructured ex. Tweets, Facebook posts, emails, etc.Semi/Multi structured or Hybrid ex. E-commerce, demographic, weather data, etc.As the technology is advancing, the variety of data is available and its storage, processing, and analysis are made possible by big data. Traditional data processing techniques were able to process only structured data.Now, that we understand what big data and limitations of old traditional techniques are of handling such data, we could safely say, we need new technology to handle this data and gain insights out of it. Before going further, do you know, what were the traditional data management techniques?Traditional Techniques of Data Processing are:RDBMS (Relational Database Management System)Data warehousing and DataMartOn a high level, RDBMS catered to OLTP needs and data warehousing/DataMart facilitated OLAP needs. But both the systems work with structured data.I hope. now one can answer, ‘what is big data?’ conceptually and theoretically both.So, it’s time that we understand how it is being done in actual implementations.only storing of “big data” will not help the organizations, what’s important is to turn data into insights and business value and to do so, following are the key infrastructure elements:Data collectionData storageData analysis andData visualization/outputAll major big data processing framework offerings are based on these building blocks.And in an alignment of the above building blocks, following are the top 5 big data processing frameworks that are currently being used in the market:1. Apache Hadoop : Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.First up is the all-time classic, and one of the top frameworks in use today. So prevalent is it, that it has almost become synonymous with Big Data.2 Apache Spark : unified analytics engine for large-scale data processing.Apache Spark and Hadoop are often contrasted as an "either/or" choice,  but that isn't really the case.Above two frameworks are popular but apart from that following 3 are available and are comparable frameworks:3. Apache Storm : free and open source distributed real-time computation system. You can also take up Apache Storm training to learn more about Apache Storm.4. Apache Flink : streaming dataflow engine, aiming to provide facilities for distributed computation over streams of data. Treating batch processes as a special case of streaming data, Flink is effectively both batch and real-time processing framework, but one which clearly puts streaming first.5. Apache Samza : distributed Stream processing framework.Frameworks help processing data through building blocks and generate required insights. The framework is supported by the whopping number of tools providing the required functionality.BIG DATA Processing Framework and technology landscape Big data tools and technology landscape can be better understood with layered big data architecture. Give a good read to a great article by Navdeep singh Gill on XENONSTACK for understanding the layered architecture of big data.By taking inspiration from layered architecture, different available tools in the market are mapped to layers to understand big data technology landscape in depth. Note that, layered architecture fits very well with infrastructure elements/building blocks discussed in the above section.Few of the tools are briefed below for further understanding:  1. Data Collection / Ingestion Layer Cassandra: is a free and open-source, distributed, wide column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failureKafka: is used for building real-time data pipelines and streaming apps. Event streaming platformFlume: log collector in HadoopHBase: columnar database in Hadoop2. Processing Layer Pig: scripting language in the Hadoop frameworkMapReduce: processing language in Hadoop3. Data Query Layer Impala: Cloudera Impala:  modern, open source, distributed SQL query engine for Apache Hadoop. (often compared with hive)Hive: Data Warehouse software for data Query and analysisPresto: Presto is a high performance, distributed SQL query engine for big data. Its architecture allows users to query a variety of data sources such as Hadoop, AWS S3, Alluxio, MySQL, Cassandra, Apache Kafka, and MongoDB4. Analytical EngineTensorFlow: n source machine learning library for research and production.5. Data storage LayerIgnite: open-source distributed database, caching and processing platform designed to store and compute on large volumes of data across a cluster of nodesPhoenix: hortonworks: Apache Phoenix is an open source, massively parallel, relational database engine supporting OLTP for Hadoop using Apache HBase as its backing storePolyBase: s a new feature in SQL Server 2016. It is used to query relational and non-relational databases (NoSQL). You can use PolyBase to query tables and files in Hadoop or in Azure Blob Storage. You can also import or export data to/from Hadoop.Sqoop: ETL toolBig data in EXCEL: Few people like to process big datasets with current excel capabilities and it's known as Big Data in Excel6. Data Visualization LayerMicrosoft HDInsight: Azure HDInsight is a Hadoop service offering hosted in Azure that enables clusters of managed Hadoop instances. Azure HDInsight deploys and provisions Apache Hadoop clusters in the cloud, providing a software framework designed to manage, analyze, and report on big data with high reliability and availability. Hadoop administration training will give you all the technical understanding required to manage a Hadoop cluster, either in a development or a production environment.BIG Data Best Practices Every organization, industry, business, may it be small or big wants to get benefit out of “big data” but it's essential to understand that it can prove of maximum potential only if organization adhere to best practices before adapting big data:Answering 5 basic questions help clients know the need for adapting Big Data for organizationTry to answer why Big Data is required for the organization. What problem would it help solve?Ask the right questions.Foster collaboration between business and technology teams.Analyze only what is required to use.Start small and grow incrementally.Big Data industry use-cases We talked about all the things in the Big Data world except real use cases of big data. In the starting, we did discuss few but let me give you insights into the real world and interesting big data use cases and for a few, it’s no longer a secret ☺. In fact, it’s penetrating to the extent you name the industry and plenty of use cases can be told. Let’s begin.Streaming PlatformsAs I had given an example of ‘House of Cards’ at the start of the article, it’s not a secret that Netflix uses Big Data analytics. Netflix spent $100mn on 26 episodes of ‘House of Cards’ as they knew the show would appeal to viewers of original British House of Cards and built in director David Fincher and actor Kevin Spacey. Netflix typically collects behavioral data and it then uses this data to create a better experience for the user.But Netflix uses Big Data for more than that, they monitor and analyze traffic details for various devices, spot problem areas and adjust network infrastructure to prepare for future demand. (later is action out of Big Data analytics, how big data analysis is put to use). They also try to get insights into types of content viewers to prefer and help them make informed decisions.   Apart from Netflix, Spotify is also a known great use case.Advertising and Media / Campaigning /EntertainmentFor decades marketers were forced to launch campaigns while blindly relying on gut instinct and hoping for the best. That all changed with digitization and big data world. Nowadays, data-driven campaigns and marketing is on the rise and to be successful in this landscape, a modern marketing campaign must integrate a range of intelligent approaches to identify customers, segment, measure results, analyze data and build upon feedback in real time. All needs to be done in real time, along with the customer’s profile and history, based on his purchasing patterns and other relevant information and Big Data solutions are the perfect fit.Event-driven marketing is also could be achieved through big data, which is another way of successful marketing in today’s world. That basically indicates, keeping track of events customer are directly and indirectly involved with and campaign exactly when a customer would need it rather than random campaigns. For. Ex if you have searched for a product on Amazon/Flipkart, you would see related advertisements on other social media apps you casually browse through. Bang on, you would end up purchasing it as you anyway needed options best to choose from.Healthcare IndustryHealthcare is one of the classic use case industries for Big Data applications. The industry generates a huge amount of data.Patients medical history, past records, treatments given, available and latest medicines, Medicinal latest available research the list of raw data is endless.All this data can help give insights and Big Data can contribute to the industry in the following ways:Diagnosis time could be reduced, and exact requirement treatment could be started immediately. Most of the illnesses could be treated if a diagnosis is perfect and treatment can be started in time. This can be achieved through evidence-based past medical data available for similar treatments to doctor treating the illness, patients’ available history and feeding symptoms real-time into the system.  Government Health department can monitor if a bunch of people from geography reporting of similar symptoms, predictive measures could be taken in nearby locations to avoid outbreak as a cause for such illness could be the same.   The list is long, above were few representative examples.SecurityDue to social media outbreak, today, personal information is at stake. Almost everything is digital, and majority personal information is available in the public domain and hence privacy and security are major concerns with the rise in social media. Following are few such applications for big data.Cyber Crimes are common nowadays and big data can help to detect, predicting crimes.Threat analysis and detection could be done with big data.  Travel and TourismFlight booking sites, IRCTC track the clicks and hits along with IP address, login information, and other details and as per demand can do dynamic pricing for the flights/ trains. Big Data helps in dynamic pricing and mind you it’s real time. Am sure each one of us has experienced this. Now you know who is doing it :DTelecommunications, Public sector, Education, Social media and gaming, Energy and utility every industry have implemented are implementing several of these Big Data use cases day in and day out. If you look around am sure you would find them on the rise.Big Data is helping everyone industries, consumers, clients to make informed decisions, whatever it may be and hence wherever there is such a need, Big Data can come handy.Challenges faced by Big Data in the real world for adaptationAlthough the world is going gaga about big data, there are still a few challenges to implement and adopt Big Data and hence service industries are still striving towards resolving those challenges to implement best Big Data solution without flaws.An October 2016 report from Gartner found that organizations were getting stuck at the pilot stage of their big data initiatives. "Only 15 percent of businesses reported deploying their big data project to production, effectively unchanged from last year (14 per cent)," the firm said.Let’s discuss a few of them to understand what are they?1. Understanding Big Data and answering Why for the organization one is working with.As I started the article saying there are many versions of Big Data and understanding real use cases for organization decision makers are working with is still a challenge. Everyone wants to ride on a wave but not knowing the right path is still a struggle. As every organization is unique thus its utmost important to answer ‘why big data’ for each organization. This remains a major challenge for decision makers to adapt to big data.2. Understanding Data sources for the organizationIn today’s world, there are hundreds and thousands of ways information is being generated and being aware of all these sources and ingest all of them into big data platforms to get accurate insight is essential. Identifying sources is a challenge to address.It's no surprise, then, that the IDG report found, "Managing unstructured data is growing as a challenge – rising from 31 per cent in 2015 to 45 per cent in 2016."Different tools and technologies are on the rise to address this challenge.3. Shortage if Big Data Talent and retaining themBig Data is changing technology and there are a whopping number of tools in the Big Data technology landscape. It is demanded out of Big Data professionals to excel in those current tools and keep up self to ever-changing needs. This gets difficult for employees and employers to create and retain talent within the organization.The solution to this would be constant upskilling, re-skilling and cross-skilling and increasing budget of organization for retaining talent and help them train.4. The Veracity VThis V is a challenge as this V means inconsistent, incomplete data processing. To gain insights through big data model, the biggest step is to predict and fill missing information.This is a tricky part as filling missing information can lead to decreasing accuracy of insights/ analytics etc.To address this concern, there is a bunch of tools. Data curation is an important step in big data and should have a proper model. But also, to keep in mind that Big Data is never 100% accurate and one must deal with it.5. SecurityThis aspect is given low priority during the design and build phases of Big Data implementations and security loopholes can cost an organization and hence it’s essential to put security first while designing and developing Big Data solutions. Also, equally important to act responsibly for implementations for regulatory requirements like GDPR.  6. Gaining Valuable InsightsMachine learning data models go through multiple iterations to conclude on insights as they also face issues like missing data and hence the accuracy. To increase accuracy, lots of re-processing is required, which has its own lifecycle. Increasing accuracy of insights is a challenge and which relates to missing data piece. Which most likely can be addressed by addressing missing data challenge.This can also be caused due to unavailability of information from all data sources. Incomplete information would lead to incomplete insights which may not benefit to required potential.Addressing these discussed challenges would help to gain valuable insights through available solutions.With Big Data opportunities are endless. Once understood, the world is yours!!!!Also, now that you understand BIG DATA, it's worth understanding the next steps:Gary King, who is a professor at Harvard says “Big data is not about the data. It is about the analytics”You can also take up Big data and Hadoop training to enhance your skills furthermore.Did the article helps you to understand today’s massive world of big data and getting a sneak peek into it Do let us know through the comment section below?
Rated 4.5/5 based on 11 customer reviews
6597
Understanding Big Data- Best Big Data Frameworks

The massive world of ‘BIG DATA’If one strolls ... Read More

Guide to Installation of Spark on Ubuntu

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.In this article, we will cover the installation procedure of Apache Spark on the Ubuntu operating system.PrerequisitesThis guide assumes that you are using Ubuntu and Hadoop 2.7 is installed in your system.Audience:This document can be referred by anyone who wants to install the latest version of Apache Spark on Ubuntu.System requirementsUbuntu OS Installed.Minimum of 8 GB RAM.20 GB free space.PrerequisitesJava8 should be installed in your Machine.Hadoop should be installed in your Machine.Installation Procedure1. Before installing Spark ensure that you have installed Java8 in your Ubuntu Machine. If not installed, please follow below process to install java8 in your Ubuntu System.a. Install java8 using below command.sudo apt-get install oracle-java8-installerAbove command creates java-8-oracle Directory in /usr/lib/jvm/ directory in your machine. It looks like belowNow we need to configure the JAVA_HOME path in .bashrc file..bashrc file executes whenever we open the terminal.b. Configure JAVA_HOME and PATH  in .bashrc file and save. To edit/modify .bashrc file, use below command.vi .bashrc Then press i(for insert) -> then Enter below line at the bottom of the file.export JAVA_HOME= /usr/lib/jvm/java-8-oracle/ export PATH=$PATH:$JAVA_HOME/binBelow is the screen shot of that.Then Press Esc -> wq! (For save the changes) -> Enter.c. Now test Java installed properly or not by checking the version of Java. Below command should show the java version.java -versionBelow is the screenshot2. Now we will install Spark on the System.Go to the below official download page of Apache Spark and choose the latest release. For the package type, choose ‘Pre-built for Apache Hadoop’.https://spark.apache.org/downloads.htmlThe page will look like belowOr You can use a direct link to download.https://www.apache.org/dyn/closer.lua/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz3. Create a directory called spark under /usr/ directory. Use below command to create spark directorysudo mkdir /usr/sparkAbove command asks password to create spark directory under the /usr directory, you can give the password. Then check spark directory is created or not in the /usr directory using below commandll /usr/It should give the below results with ‘spark’ directoryGo to /usr/spark directory. Use below command to go spark directory.cd /usr/spark4. Download spark2.3.3 in spark directory using below commandwget https://www.apache.org/dyn/closer.lua/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgzIf use ll or ls command, you can see spark-2.4.0-bin-hadoop2.7.tgz in spark directory.5. Then extract spark-2.4.0-bin-hadoop2.7.tgz using below command.sudo tar xvzf spark-2.4.0-bin-hadoop2.7Now spark-2.4.0-bin-hadoop2.7.tgz file is extracted as spark-2.4.0-bin-hadoop2.7Check whether it extracted or not using ll command. It should give the below results.6. Configure SPARK_HOME path in the .bashrc file by following below steps.Go to the home directory using below commandcd ~Open the .bashrc file using below commandvi .bashrcNow we will configure SPARK_HOME and PATHpress i for insert the enter SPARK_HOME and PATH  like belowSPARK_HOME=/usr/spark/spark-2.4.0-bin-hadoop2.7PATH=$PATH:$SPARK_HOME/binIt looks like belowThen save and exit by entering below commands.Press Esc -> wq! -> EnterTest Installation:Now we can verify spark is successfully installed in our Ubuntu Machine or not. To verify use below command then enter.spark-shell Above command should show below screenNow we have successfully installed spark on Ubuntu System. Let’s create RDD and Dataframe then we will end up.a. We can create RDD in 3 ways, we will use one way to create RDD.Define any list then parallelize it. It will create RDD. Below are the codes. Copy paste it one by one on the command line.val nums = Array(1,2,3,5,6) val rdd = sc.parallelize(nums)Above will create RDD.b. Now we will create a Data frame from RDD. Follow the below steps to create Dataframe.import spark.implicits._ val df = rdd.toDF("num")Above code will create Dataframe with num as a column.To display the data in Dataframe use below commanddf.show()Below is the screenshot of the above code.How to uninstall Spark from Ubuntu System: Please follow below steps to uninstall spark on Windows 10.Remove SPARK_HOME from the .bashrc file.To remove SPARK_HOME variable from the .bashrc please follow below stepsGo to the home directory. To go to home directory use below command.cd ~Open .bashrc file. To open .bashrc file use below command.vi .bashrcPress i for edit/delete SPARK_HOME from .bashrc file. Then find SPARK_HOME the delete SPARK_HOME=/usr/spark/spark-2.4.0-bin-hadoop2.7 line from .bashrc file and save. To do follow below commandsThen press Esc -> wq! -> Press EnterWe will also delete downloaded and extracted spark installers from the system. Please do follow below command.rm -r ~/sparkAbove command will delete spark directory from the system.Open Command Line Interface then type spark-shell,  then press enter, now we get an error.Now we can confirm that Spark is successfully uninstalled from the Ubuntu System. You can also learn more about Apache Spark and Scala here.
Rated 4.5/5 based on 19 customer reviews
9877
Guide to Installation of Spark on Ubuntu

Apache Spark is a fast and general-purpose cluster... Read More

Differences Between Power BI and Tableau: Power BI Vs Tableau Comparison

Power BIPower BI is a Data Visualization and Business Intelligence tool provided by Microsoft. It can collect data from different data sources like Excel spreadsheets, on-premise database, cloud database and convert them into meaningful reports and dashboards. Its features such as creating quick insights, Q&A, Embedded Report, and Self Service BI made it top among all BI tools. It is also robust and always ready for extensive modeling and real-time analytics, as well as custom visual development.TableauTableau offers business analysts to take business decisions by its feature, data visualization available to all business users of any background. It can establish a connection with any data source (Excel, local/on-premise database, cloud database).Tableau is the fastest growing Data Visualization Tool among all visualization tools. Its visualizations are created as worksheets and dashboards. The beauty of tableau is that it does not require any technical or programming knowledge to create or develop reports and dashboards. Image SourceWhat is Power BI? Power BI is a Data Visualization & Business Intelligence tools that offer us to connect to single or multiple data sources and convert that connected raw data into impressive visual and share insights across an organization. It also offers us to embed the report into our application or website.Product Suite of Power BI:Power BI Desktop Free to download and install.Connect and access various of on-prem and cloud sources like Excel, CSV/Text files, Azure, SharePoint, Dynamics CRM, etc.Prepare by mashing up the data and create a data model using power query which uses M Query LanguageAfter loading data to Power BI Desktop can establish the relationship between tables.Create calculated measures, columns, and tables using Data Analysis eXpression(DAX).Drag & drop interactive visuals on to pages using calculated measures and columns.Publish to Power BI Web Service.Power BI ServiceThis is one of the ways to embed the reports within the Website under an organization.In the Power BI service forum, there are a collection of sections like Workspace, Dashboards, Reports, and Datasets.Can create our own workspace as My-Workspace which helps to maintain personal work in Power BI Service.Can pin number of Reports to a Dashboard to get together a number of meaningful Datasets for clear insight.In this we can interact with our data with the help of Q&A {natural language query.}Power BI Report ServerThis is one of the products to allow businesses to host Power BI reports on an on-premise report server.Can use the server to host paginated reports, KPI’s, Mobile reports and Excel workbook.Shared data sets and shared data sources are in their own folders, to use as building blocks for the reports.Power BI MobileOver Power BI provides mobile app services for IOS, Android and Windows 10 mobile devices.In the mobile app, you can connect to and interact with your cloud and on-premise data.It is very convenient to manage dashboard and reports on the go with your mobile app to stay connected and being on the same page with the organization.On-Premise GatewayThis is a bridge to connect your on-premise data to online services like Power BI, Microsoft flow, Logic App’s and Power App’s services, we can use a single gateway with different services at the same time.e.g.: - If you are using Power BI as well as Power App’s, a single gateway can be used for both which is dependent on the account you signed with it.The on-premises data gateway implements data compression and transport encryption in all modes.On-premises data gateway is supported only on 64-bit Windows operating system.Multiple users can be share and reuse a gateway in this mode.For Power BI, this includes support for schedule refresh and Direct Query.Image sourceWhat is Tableau?Tableau is a Business Intelligence & Data Visualization Tool that used to analyze our data visually. Users can create and share interactive reports & dashboards using it. It offers Data Blending to users to Connect multiple data sources.Product Suite of Tableau:Tableau ServerTableau Server is an enterprise-wide visuals analytics platform for creating interactive dashboards.It is essentially an online hosting platform to hold all your tableau Workbooks, Data sources and more.Being the product of tableau, you can use the functionality of tableau without needing to always be downloading and opening workbooks with tableau desktop.Can give security level permission to different work in an organization to determine who can access and interact with what.As a tableau server user, you will be able to access UpToDate content and gain quick insight without relying on static distributed content.   Tableau DesktopThis is a downloadable on-premise application for Computers and it is used for developing visualization in the form of sheets, Dashboards, and Stories.There are some useful functionalities of tableau desktop are: Data transformation, Creating Data Sources, Creating Extracts and Publishing Visualizations on tableau server.Tableau desktop produces files with extensions twb and twbx.It is a licensed product but comes with two weeks of the trial period.Starting from creating reports and charts to combining them to form a dashboard, all this work is done in tableau desktop.Tableau PrepTableau Prep is a personal data preparation tool that empowers the user with the ability to cleanse aggregate, merge or otherwise prepare their data for analysis in tableau.Tableau Prep has a simple and clean user interface that looks and feels like a final form of tableau desktop data sources screen.In Tableau Prep the data is stored in flow pane manner with has universal unique identifier [UUID] which can store big data sets in a secure way.Tableau ReaderTableau Reader is a free desktop application that you can use to open with data visualizations built in tableau desktop.It required to read and interact with tableau packaged workbooks.Tableau reader has the ability to retain interaction with visualization created in tableau desktop but will not allow connections to data which can be refreshed.It only supports to read tableau data files; without the reader, you may need to share it publicly or convert the workbook into a pdf format.Tableau OnlineTableau online is an analytics platform which is fully hosted in the cloud.It can publish Dashboards and share your discoveries with anyone.It has a facility to empower your organization to ask any question from any published data source using natural language.It can connect to any cloud databases at any time anywhere and it can automatically refresh the data from Web-App like Google analytics and salesforce.It empowers site admins to easily manage authentication and permissions for users, content, and data.Tableau PublicThis is a free service that lets anyone public interactive data visualizations to the web.Visualizations are created in the accompanying app Tableau Desktop Public edition which required no programming skills.It is for anyone who's interested in understanding data and sharing those discoveries as a data visualization with the world.It has some features highlights those are: - Heat Maps, Transparent sheets, Automatic Mobile Layouts, and Google Sheets.As visualization are public so anyone can access the data and make a change by downloading the workbook so it is totally unsecured.It has limitations of 15,000,000 rows of data per workbook.It has 10GB of storage space for your workbook which is kind of limitation towards storage.It supports Python with Tableau public called ‘Tabpy’, A new API that enables evaluation of python code within a tableau workbook  Image SourceStrengths & Weakness of Power BI:Strengths:Free Power BI Desktop application for authors to develop reportsUses DAX expressions for data calculationsFree Training Modules available for usersComposite Model (Direct Query, Dual, and Import) to connect dispersed multiple data sources and create a modelMultiple visuals in a single pageAlso has Drill Down-Drill Up in visuals, Drill through pages, Toggle page or visual using Bookmarks, selection pane & buttonsAbility to connect multiple data sourcesIt is affordable desktop – free and pro (Power BI Service to share and collaborate with other users in the organization) – $9.99Can integrate with Cortana – Windows Personal Voice AssistantPower BI has integrated with all Microsoft products (Azure, SharePoint, Office 365, Microsoft Dynamics, Power Apps, Microsoft Flow)Dataflow in power BI Service to connect to Azure Data lake storage 2 and other online services.Weakness:It is difficult for users who do not have knowledge of ExcelClients who use large data sets must opt for Premium Capacity services to avoid unpleasant experience with datasets and its users with performance and timeouts issuesPower BI service compatible with few database driversPower BI has got a large set of product options which make it complex to understand, which option is best suited for a Business.Strengths & Weakness of Tableau:Strengths:Tableau provides much beautiful visualization for which it stood top in the market among all BI tools.Quickly combine shape, & clean the data for analysis.It provides Data Blending.Capable of Drill Down-Drill Up in visuals, Drill through pages and filters.It can handle a large amount of data.Uses Scripting languages such as R & Python to avoid performance and for complex table calculations.Can build reports, dashboards, and story using Tableau Desktop.Weakness:Tableau is expensive when compared to other tools.Scheduling or Notification of reports & dashboards.Importing Custom Visualization is a bit difficult.Complexity in embedding report to other applications.Tableau is suitable for Huge organization which can pay for licensing cost.Benefits of Power BIMicrosoft is a Brand. I hope everyone remembers the school or college days, the time when we started learning and using Microsoft products as they are very simple to understand and user-friendly. Hence, obvious that our eyes and brain are trained on all Microsoft products.One who has working experience excel can easily cope up with Power BI Desktop & Mobile in no time.Pin the visual available in Excel to Power BI Service Using Excel Add-on.Once can build swift & reliable reports by simply drag and drop both inbuilt/custom visuals and this URL for Best practices to make an optimum performance for the report.Accessibility of Colossal Learning Assets available Guided Learning in this URL.As Power BI belongs to Microsoft family, hence it has privileged with Single Sign-On (SSO) and also tight integration with Microsoft products like Dynamics 365, Office 365, SharePoint Online, Power Apps, MS Flow, Azure SQL Database, Azure SQL Data warehouse, Azure Analysis server database… etc.Power Query Many options related to wrangling and clean the data bring it as a perfect data model.Post publishing the data into Power BI web service can schedule refresh without manual intervention.Power BI backed superpower of with Artificial intelligence and Machine learningMicrosoft introduced Power Platform (Power BI to Measure, Power Apps to Act & Microsoft Flow to automate) and you can find more details in this URL.Forthcoming Road Map provided for Power BI by Microsoft available in this URL.Power BI is integrated with both Python and R coding to use visualizations.Power BI Desktop Free – $0.00 & Power BI Web Service (Azure) Pro – $9.99 MonthlyDisadvantages of Power BIPower BI desktop is the best tool to analyze your data while you connect using Direct query (or) Live connections and might struggle handle huge if you import data into the application and at times it might get hung or simply crashes. However, in future monthly updates, Microsoft Product team will surely resolve this problem.Benefits of TableauTableau can connect various sources, can effortlessly handle huge data and is a very good tool for Data visualization and create dashboards by simply drag and drop.Tableau supports Python and R languages for creating visuals.Tableau has spent its term as Leader in Gartner’s report URL from 2012 – 2018 and now moved to second place.Disadvantages of TableauTableau Creator – $70.00 & Tableau Online – $35 MonthlyTableau product team has not concentrated advanced technologies missed integrated with Artificial intelligence and Machine learning.Once pushed the reports to tableau online, it does not support scheduled refresh and one must refresh the data manually.Analyst must use only inbuilt visual available in Tableau and no option to import custom visuals from the portal. Instead, according to the requirement developers need to create custom visuals by themselves.To create a data model, data preparation options in Tableau is limited. For advance data wrangling and cleaning one must take the help of other tools like Excel, Python, R, or Tableau Prep.There is integration with other Microsoft products like Dynamics 365, Office 365, Power Apps, Microsoft Flow which uses Single Sign-On (SSO).Difference between Power BI Vs TableauLet's put a difference between Power BI and Tableau in a tabular form:Power BITableauIt is provided by Microsoft.It is provided by Tableau.It is available at an affordable price.It is more expensive than power bi.Uses DAX for measures and calculated columns.Uses MDX for measures and dimensionsConnect limited data sources but increasing its data source connectors in monthly updates.It can connect to numerous data sources.Can handle large data sets using Premium Capacity.Can handle large data sets.It Provides Account based subscription.It Provides Key based subscription.Embedding report is easy.Embedding report is a real-time challenge in tableau.Why are Power BI and Tableau the most business intelligence tools in Business Intelligence and Data Visualization?Power BI & Tableau are most happening BI tools among all tools in business intelligence because of their features and capabilities like Embedded BI,  Data Blending, Multi Data Source connection like Cloud databases and on-premise databases. They make sharing of reports and dashboards for the users, easy. Business Analyst without even having to access these tools can access reports & dashboards and take critical business decisions.These two tools stood top in the BI market because of the attractive visualizations available. Power BI offers a feature of import of custom visual and creation of custom visual which is its beauty. These facts have made these BI tools most happening BI tools in the market till the date.According to Gartner Magic Quadrant for Analytics and Business Intelligence Platforms report, the 1st choice is Power BI and 2nd top choice is Tableau in BI Tool in the present market.Which one to choose, Power BI or Tableau?Data Analytics field has been changed over time from traditional bi practice embedded bi and collaborative bi. Initially, data analytics led by companies like IBM, Oracle, SAP but now this is not a situation. Now, this led by companies like Microsoft & Tableau because of their features like Embedded BI Collaborative BI, Data Blending, Multi Data Source Connection.Both Power BI and Tableau have their own Pros and Cons. The right product can be chosen based on touchstones & priority.TouchstonesPower BITableauDescriptionA cloud-based business intelligence platform which offers an overview of critical dataA collection of intuitive business intelligence tools used for data discoveryVisualizationProvides various visualizationsProvides a larger set of visualizations than Power BIOS supportOnly WindowsWindows and Macintosh OSGraphical featuresRegular charts, graphs, and mapsAny category of charts, bars, and graphsCostCheaperCostlyOrganizationSuitable for Small, Medium & Large type of OrganizationSuitable for Medium & Large type of Organization
Rated 4.5/5 based on 12 customer reviews
6592
Differences Between Power BI and Tableau: Power BI...

Power BIPower BI is a Data Visualization and Busin... Read More

UFT vs Selenium

Why Automation?Manual testing of any web-based application and desktop/standalone application takes time, resources and money. Also. it is not possible to quickly test applications at any random time without any manual intervention. Automation comes into picture here to reduce or eliminate the manual testing as much as possible.We have so many tools to automate web-based and desktop applications, with Selenium and QTP being one of them. In this article, we will see a comparison between Selenium and QTP so that you can decide the choose the best for your need.QTP:-Introduction:QTP stands for QuickTest Professional which is a product of Micro Focus. It is Intelligent test automation for web, mobile, API, hybrid, RPA, and enterprise apps.  Now QTP is known as HPE UFT ( Unified Functional Testing). Hence, we will be mentioning UFT instead of QTP in this post from now.Functionalities-UFT provides functional, regression and API test automation for software applications and environments for enterprise quality assurance. We can test all the three layers of an application: the interface, the service layer and the database layer from a single console of UFT as it provides a graphical user interface.Automation actions can be performed by an end user on a web-based or desktop application using UFT. We can emulate user’s action like clicking on GUI elements of applications, entering keyboard inputs and much more. In fact, UFT can do it much faster than human if scripted efficiently.Brief History of UFT:-UFT has a long history. UFT was originally developed by Mercury Interactive in May 1998 and the first version of it was known as Astra QuickTest. Later in 2006, HP acquired Mercury Interactive and product was known as HP QTP. In 2011, HP merged two tools named “HP Service Test” and “HP QuickTest Professional‘ and released the tool with a new name, HP Unified Functional Testing 11.5, made available from HP Software Division. In 2016, the entire division was sold to Micro Focus. Since then UFT is designed, supported and maintained by Micro Focus.The latest stable version of UFT is 14.03 as of March 2018.Career Aspects:-UFT is a more powerful tool as compared to Selenium, but because of its higher license cost, many organisations do not adopt this tool for automation. Moreover, the maximum number of tools which could be integrated with UFT are paid as well, reducing its demand and popularity.If you are a beginner in Automation, UFT is not preferable as chances of getting hired is less as well as you can not learn and practice it well with the limited free trial of 60 days. There are very less efficient and updated tutorials available on UFT online which makes it more difficult to learn.Selenium:-Selenium is a suite of tools to automate web browsers which means it can be used to automate only the web-based application. More precisely, only the front end of a web application across different browsers and platforms. It can be used with various programming languages and testing frameworks.Different tools in Selenium:-Selenium consists of four tools:- Selenium IDESelenium RCSelenium WebDriverSelenium Grid.Currently, Selenium 4 is about to be launched, with an alpha version already launched as of when this article is being written.Brief History of Selenium:-Selenium was developed by Jason Huggins in 2004 at ThoughtWorks in Chicago. Later, it was made open source and it has become a more powerful and popular tool for web application automation.Career Aspects:-Selenium is an open source tool which increases its popularity in organisations mainly in Service based. Its flexibility of integration with major programming languages makes it more adaptable.Selenium has more future growth and a higher number of job openings. You can easily learn and practice as so many free online tutorials and help communities are available.Let’s Compare UFT and Selenium WebDriver:-As we have seen the basic introduction about UFT and Selenium, you must be confused which tool is best and how can you select a tool which suits your requirements well. Don’t worry, we will compare both tools here so that it could solve your problem.Comparison BasisUFT ( QTP)Selenium WebDriverSoftware TypeIt is a Desktop based Application.It is a set of APIs.CostPaid Tool. You need to purchase a license to use it. It is available as seat-based and concurrent which costs more. You can see pricing here.Open Source/Free tool. You can download and use it for free. You can download it from here.Application TypeIt supports web, mobile, API, hybrid, RPA, and enterprise apps.Selenium can be used only for Web-based applications. It is a major disadvantage of selenium over QTP.Application LayerUFT can be used to test all three layers of an application the interface, the service layer and the database layer.Selenium can be used to test only the front end or interface layer.Supported LanguagesVBS( Visual Basic Script)Java, C#, Ruby, Python, Perl PHP, Javascript, R etc.Supported BrowsersChrome, Firefox, Safari, IE, and EdgeIE, Firefox, Chrome, Safari, Opera, Headless browsersSupported Operating SystemsOnly Microsoft WindowsIt supports Microsoft Windows, Apple OS X, LinuxSupported IDESupports only inbuilt UFT IDEEclipse, IntelliJ and any other IDE which supports JavaBrowser Area AccessibilityUFT can control the menu and toolbars of the browser.Selenium can control only the visible area of the browser where the web page is loaded.Supported TechnologyIt supports for nearly every major software application and environment, including SAP, Oracle, Salesforce, mainframes, embedded frameworks, headless browsers, and much more.Selenium struggles while automating SAP, Salesforce, mainframes applications.Test ExecutionCombine it with Micro Focus ALM to execute tests synchronously and ALM is also paid one.It can run tests synchronously with the integration of test frameworks such as TestNG which is free of cost.Performance TestingWe can mimic user actions in UFT as the basis for performance tests in LoadRunner. LoadRunner is a paid tool.Selenium is not meant to be performance testing but you can integrate with Jmeter,  WebLoad or some paid tool to run your selenium scripts for performance testing.Supported Virtual EnvironmentsDeploy UFT on provisioned Citrix, AWS, and Azure virtual environments, or run web and mobile tests from Docker containers.Selenium can also be integrated with these environments.Required Coding SkillsYou required less knowledge of programming as it offers keyword-driven testing that simplifies test creation and maintenance. Capture flows directly from the application screens and leverage UFT’s robust record/replay capturing technology.You need a good knowledge of programming language. For each binding of Selenium, you need knowledge of that programming language.Test Execution PerformanceIt requires more system resources. It can run on Windows VM which uses more resources and needs more maintenance.Selenium requires fewer system resources and can be used in Windows or Linux VM. Linux VM is lightweight than Windows VM.Object RepositoryIt comes with built-in Object Repository.Selenium has no such feature.Data Driven TestingProvides built-in support for Data Driven Testing by Data tables.Selenium needs good knowledge in programming and integration with test frameworks to achieve it.Tools IntegrationLimited tools and mostly paid.Can be integrated many paid or free tools.Test ReportDefault test reports are generated.Selenium needs to be integrated reporting tools to generate test reportsSupportSince it is a paid tool, so they provide proper support.Since it is n open source tool, no professional support is provided.Career GrowthFewer jobs, limited scopeMore jobs, more growthConclusion:-So we have seen so many differences between UFT and Selenium. The main driving factor is the cost of automation. If you have a project budget and can afford QTP, it is best as it provides various features. If your project has less budget, go for Selenium but you need to put more effort.Selenium is restricted to the browser. If your test scenarios need to interact with desktop in between like uploading a file, downloading a file and verify, etc, Selenium may not work properly in those scenarios while UFT can easily automate those scenarios. Integrating Selenium with Test management tools is not easier than UFT. Selenium needs to integrate with different tools for reporting and managing data which is available by default in UFT. UFT Scripts will be more stable than Selenium. UFT is rich in features as compared to Selenium.UFT can reduce the number of resources required as it has many readymade features available and helps to write scripts. In Selenium, you may need a few more resources and need to write more lines of code. But you will get less support available on online public forums on UFT, but it has proper support as it is a paid tool.
Rated 4.5/5 based on 12 customer reviews
9883
UFT vs Selenium

Why Automation?Manual testing of any web-based app... Read More

Getting Familiar with PRINCE2®

PRINCE2® is the abbreviation of Projects IN Controlled Environments and it is a de facto process-based process for effective project management. This is widely used by the United Kingdom government. It is hugely recognized and utilized in the private sector. The method of PRINCE2® lies in the public domain and it proposes non-proprietorial finest practice assistance on project management.Numerous people and organizations from different sectors prefer to use PRINCE2® because it is flexible, innovative, and practical. Irrespective of the project; whether it is developing software, constructing a flyover, or putting forward an advertising movement, the processes, themes, and principles of PRINCE2® will aid you in realizing them all. This process fits well to the public domain and it proposes a generic but the finest guidance to manage a project.Vital featuresChief features of PRINCE2® are:Project-based planning approach.Focuses on business justification.A well-organized organizational structure meant for the team of project management.The flexibility which can be used at a level suitable to the project.Stresses on dividing the project into controllable and manageable stages.History of PRINCEPRINCE2® was recognized by the Central Computer and Telecommunications Agency or CCTA in the year 1989 and since then renamed the Office of Government Commerce (OGC). In June 2010, the OGC office Best Practice Management functions moved into the Cabinet Office.Originally, PRINCE2® was based on PROMPT which was a project management process developed in 1975 by Simpact Systems Ltd. Later in 1979, it was adopted by CCTA in the form of a standard for being used in all the Government information system projects.At the time when PRINCE2® was launched in the year 1989, it superseded PROMPT effectively within government project. However, PRINCE2® continues to remain in the public domain and has bagged copyright by the Crown. In the year 1996, PRINCE2® was published and it is being donated to a group of nearly 150 European organizations.Remarkable benefits of PRINCE2® for peopleWhen you decide to use PRINCE2® then you will have stronger control of resources besides the capability to manage project and business risks. This will benefit you in the following manner:People looking for leading project management skills and higher employment prospects.Executives or directors of projects as well as organizationsProject managers.For people, the certification of PRINCE2® turns into a precious asset to their career as this escalates the prospects of employment and it aids people in doing their job more effectually. According to a survey done on people in project management as well as other roles, like senior management, IT and operations, it has been discovered that candidates found that PRINCE2® is highly appreciated in their career.How does PRINCE2® benefit an organization?For an organization, the formal recognition of responsibilities of PRINCE2® inside a project along with its concentration on what a project needs to deliver proposes an organization’s projects with the following:An organized and controlled start, middle, and finishing.Regular reviews regarding development against the plan.A consistent and common approach.Assurance that a project continues to possess a business justification.According to a 2007 survey done by PricewaterhouseCoopers, higher-performing projects are more likely to be operated with specialized project managers. Amongst all the certificates which empower higher-performing projects, the PRINCE2® named “Arras People’s 2011 survey” is the most well-known.Another report on PRINCE2® found that the adoption of PRINCE2® irrespective of the size of the organization is useful, particularly because of its characteristic of being compatible and adaptable with agile.The methods of PRINCE2®The method of PRINCE2® includes four united compounds of principles, themes, processes, and finally, the project environment.PrinciplesPrinciples are considered the guiding needs and excellent practices that determine whether or not the project is really being managed to utilize PRINCE2®. In fact, there are seven principles and when they all aren’t applied, then it can’t be viewed as a PRINCE2® project.These are the supervisory requirements and decent practices which determine whether the project is genuinely being managed using PRINCE2®. There are seven principles and it is highly required to apply all of them, else it won’t turn out as a PRINCE2® project.The seven principles of PRINCE2® are:Non-stop business justification – It is extremely important to have an acceptable reason for running and handling the project. When it doesn’t happen that way, then it is better to close the project.Learning from experience –The project teams of PRINCE2® ought to continually seek and draw on lessons that are learned from earlier work.Distinct roles and responsibilities – The project team of PRINCE2® ought to have a distinct organizational structure and it comprises the right individuals in the right jobs.Manage by stages – It is extremely important for the PRINCE2® projects to have proper planning, monitoring, and controlling on a stage-by-stage basis.Succeed by exception – Individuals who have been working in the project must be provided with the appropriate amount of authority so that they can work within the setting effectively.Concentrate on products – The projects of PRINCE2® do focus on the product definition, distribution, and quality needs.Tailor for suiting the project environment – It is highly important that PRINCE2® is tailored for suiting the environment, importance, size, risk, capability, and complexity of the project.ThemesThemes describe aspects of project management which ought to be addressed in parallel all through the project. There are seven themes of PRINCE2® and they explain the particular treatment which is needed for different project management disciplines. The themes also explain why they are important.PRINCE2® is important as it aids people in applying the themes by mentioning the minimum requirement which is needed for the themes and it also provides particular guidance on the process of tailoring to particular settings.The seven themes of PRINCE2® are:Business case – It creates and maintains a record of business justification for the project.Quality – It defines the quality requirements and also the measures and additionally, it explains how the project is likely to deliver them.Organization – It defines the individual roles and duties of the entire project team.Plans – These are the steps that are needed for developing the plans and the techniques of PRINCE2® which ought to be used.Change – It is how the project manager would evaluate and perform on alterations to the project.Risk – It identifies risks and opportunities effectively which could influence the project.Progress – It is considered the ongoing practicability and the plans’ performance and how and whether or not the project ought to proceed.ProcessesProcesses describe the project lifecycle’s steps, from the early idea to the project closure. Every process proposes checklists of suggested activities, linked responsibilities and of course, guidance regarding how to tailor to a particular setting.The seven processes of PRINCE2® are the following:Beginning a projectDirecting a projectCommencing a projectControlling a stageHanding product deliveryHandling state boundariesThe finishing of a projectThe management products of PRINCE2®The manual of PRINCE2® comprises twenty-six recommended templates meant for documentation which are linked with the project and are termed “management products.” They are divided into records, baselines, and reports. Some instances of management products are the following:Business Management Approach – Earlier, in the year 2009, it was known as the Benefits Review Plan. It defines the process and time of the measurement of the project’s benefits that can be made.Business case – It is used for capturing financial justification meant for the project. This is viewed as a principle of PRINCE2® that a project ought to have for non-stop business justification. When the Business Case doesn’t make sense, it is suggested to alter or halt the project.Checkpoint Report – It is recognized as a progress report which is developed by the Team Manager before being sent to the Project Manager regularly for reporting the Work Package status.Communications Management Approach – In the 2009 Edition, it was viewed as Communications Management Strategy – It is an account of the processes and occurrence of communication to the stakeholders which covers the flow of information to both directions; to the stakeholders and from the stakeholders. However, information is needed to be supplied from the project and the information is needed to be supplied to the project.Change Control Approach – In the Edition of 2009, it was named Configuration Management Strategy – It is used for recognizing how the products of the projects will get acknowledged, controlled, and shielded. This document is developed by the Project Manager during the initiation of a Project process.Configuration Item Record – It proposes a record that consists of the Product History, Version, Status, Details, and Variant of relationships between products/items, and Product copyholders or Product Owner.End Project Report – It reviews the performance of the project against the actual PID (Project Initiation Documentation).Daily Log – It is used for recording informal issues.Lessons Log – It is a set of notes of lessons that are learned, and they turn vital to future projects.Issues Register – It is an issue log containing notes regarding change requests, complaints, and problems that are sent by all the project members.Quality Register – It comprises details of every planned quality control activity, date, and personnel involved.Project Brief – It is utilized by the Project Board for authorizing the Initiation stage. During the initiating of a project process, the compounds of the Project Brief get extended and refined. The Project Brief grows for forming the PID (Project Initiation Documentation).Risk Register – It is a record of recognized risks, such as opportunities and threats related to the project.How PRINCE2® is different from PMP®PMP® or Project Management Professional might be observed as PRINCE2® 's competitor. Generally, Europe, Australia, and the UK opt for PRINCE2®, whereas the American countries and the US prefer PMP®. Again, Africa, the Middle East, and Asia do not possess any strong fondness for either PRINCE2® or PMP®. However, the vital thing is PMP® can be utilized along with PRINCE2®. Actually, PMP® and PRINCE2® both acknowledge the existence of each other in the advertising material and they try to place themselves in the form of complementary compounds; PRINCE2® in the form of a methodology and PMP® in the form of a standard that can be utilized alongside each other. Practically, practitioners and companies select one system or both based on the environment of the project, costs, and geographical location.The certificationsThe certifications of PRINCE2® which is being awarded by AXELOS need the users to undergo a training course with an ATO (Accredited Training Organization). This is followed by an examination. However, the training and the exam can be conducted in person or online. AXELOS needs that an organization that is proposing certified PRINCE2® training ought to go through an accreditation method for validating the course’s quality content and delivery. When sanctioned, the organization can make use of the title “ATO”. A trainer ought to be re-accredited every three years and experience a surveillance check every year.The four levels of certifications needed for PRINCE2® are:PRINCE2® 2017 Foundation – It confirms that the holder has got enough understanding and knowledge regarding the methods of PRINCE2® and is capable of working in a project management team that is working with this process.PRINCE2® 2017 Practitioner – It confirms that the holder has got ample understanding of the process of applying PRINCE2® in a situation and will be able to begin applying the methods to the real projects.PRINCE2® Agile Foundation – It confirms that the holder has got adequate understanding and knowledge about PRINCE2® methods and responsive way of working. It also explains how agile can get combined with PRINCE2®.PRINCE2® Agile Practitioner – It approves that the holder can apply the project management values of PRINCE2® while combining the concepts of Agile, like Kanban and Scrum.After the completion of the above-mentioned certifications, AXELOS publishes a successful candidate register that can be checked on the web.
Rated 4.5/5 based on 12 customer reviews
9887
Getting Familiar with PRINCE2®

PRINCE2® is the abbreviation of Projects IN Contr... Read More

PRINCE2® Qualifications (Levels Explained)

Competition brings out the best in us; we strive to become the best and outperform our peers and fellow colleagues and prove that we are better than them. This drive and need to excel is a good trait to have, especially in this competitive corporate world where standards keep getting higher and quality and excellence are rewarded. So what is it that will distinguish you from the countless others? What is that X factor that will push your resume to the top of the heap? Experience is one thing but a major factor is the qualifications you possess.With several project management methods, tools and techniques being introduced in the market for streamlining businesses, a qualification that reflects your knowledge of these methods and tools is what endears you to employers. Because at the end of the day, businesses thrive on these methodologies.A qualification also shows your knowledge, skills, and dedication towards the profession. It suggests that you have more refined experience in the profession, making you an optimal choice for recruiters.What is PRINCE2®? What qualifications does it provide?PRINCE2® is a structured project management methodology, recognized as the common standard for project management across the world. Its an acronym for - ( Projects IN Controlled Environment ). It is a flexible and user-friendly project management methodology that eases out the process of selecting optimal resources and minimizes the risks involved while handling a project. Its versatility makes it easier to apply in various projects from different sectors, such as engineering, construction, IT, business and even finance. While it was initially designed for IT projects, its scope has now expanded to other sectors.PRINCE2® facilitates three qualifications, these qualifications are:PRINCE2® FoundationPRINCE2® PractitionerPRINCE2® AgileBoth the PRINCE2® Foundation and PRINCE2® Practitioner are process-driven approaches that generally emphasize on fragmenting projects into manageable and controllable stages and portions. This fragmentation of the project does not depend on the type or size of the project.PRINCE2® Agile focuses on teaching the fundamentals and the purpose of combining PRINCE2® with the agile methodology which is another form of project management.Benefits of PRINCE2®Acquiring a qualification is a tedious and costly process, and you must ensure that the qualification you are investing in is beneficial and caters to your needs. The qualification you hope to achieve should also be relevant to the demands of current times. An individual spends a great sum of money and time on qualifications, that include tuition, resource materials, questionnaires, and examinations, thus a qualification is not a joke and requires serious research and investment. Hence, you must be aware of the benefits offered by the PRINCE2® methodology before deciding upon pursuing it. Here are some benefits offered by PRINCE2®:It is very flexible and versatile and can be applied in various kinds of projects irrespective of their sizes.It is the most used and identified project management methodology across the globe.It offers segregation of projects and efficiently fragments the work thus further improving the process by simplifying it.PRINCE2® is known for providing customer satisfaction, as the practitioners are well-versed and skillful because of the training provided during the course and they are also kept informed and up-to-date regarding the improvements and progress made in their respective fields.The benefits mentioned above shows that the PRINCE2® methodology provides the best project management techniques that are globally recognized. Hence, PRINCE2® has become a prominent certification recruiters look for while hiring a professional for managing a project.The PRINCE2® Certification or Qualification ExplainedSo, you have finally decided that the PRINCE2® methodology is great and it could be really useful to you. You have decided to expand your field of expertise and want to pave the way for several lucrative job opportunities by opting for the PRINCE2® qualifications. But there are certain prerequisites that are required before taking up this qualification.As mentioned above PRINCE2® Certification is offered in three different levels:PRINCE2® FoundationPRINCE2® PractitionerPRINCE2® Agile/ProfessionalThree Levels of PRINCE2® Qualification or CertificationPRINCE2® FoundationIt is the basic level of certification. It is the first of the two PRINCE2® qualifications that are required to successfully become a Registered PRINCE2® Practitioner. The PRINCE2® Foundation certification or qualification is designed to teach you the must-know PRINCE2® principles, themes and terminology/jargon. It is developed in a way to make sure that once qualified you will be able to act as a well-informed member of a project management team that is operating within an environment based on the foundation of the PRINCE2® methodology. PRINCE2® Foundation is a training program that can be studied on its own or as a prerequisite for the upcoming PRINCE2® Practitioner course.SYLLABUSThis qualification aims at developing the candidate as a well-informed member of the project management team in a PRINCE2® environment. To obtain this goal, the candidate should be able to understand the principles and terminology of this methodology. Especially, the candidate must be capable of:Describing the purpose and major content of all the roles, the seven principles, the seven themes and finally the seven processes.Stating which management products would be an input to, output from the seven processes.Explaining the major purpose, and the key contents of the major management products.Describing the relationships between processes, deliverables, roles and the management dimensions associated with a project.  WHO SHOULD ATTEND THIS QUALIFICATION?This qualification is suitable for both new and experienced project staffIt is also good for Consultants / Contract staff that is operating under a PRINCE2® environment.Project ManagersProgramme ManagersSupport Staff and Team Managers associated with PRINCE2® based projectsFor staff that will have a defined role within a project team responsible for operating under a PRINCE2® environment.REQUIREMENTSThe candidate should have some understanding of Project Management, other than that there is no such formal pre-requisite for the Foundation Certification or Qualification.EXAMINATION FORMATThe exam will be held for an hourThere will be 60 questionsIt will be a closed book examinationYou would have to correctly answer 33 out of 60 questions to pass (55% is the passing score)PRINCE2® PractitionerIt is the second of the two PRINCE2® examinations you are required to clear to become a registered PRINCE2® Practitioner. The PRINCE2® Practitioner qualification aims at teaching you the basics of applying the PRINCE2® for running and managing a project within the environment supporting PRINCE2®. After the successful completion of this qualification, you will be able to apply and tune PRINCE2® for undertaking the requirements and problems of a specific project scenario. You will gain a comprehensive understanding of the relationship between the PRINCE2® principles, themes,  processes, and the PRINCE2® products and will understand these elements in great detail.SYLLABUSThe curriculum designed for the PRINCE2® Practitioner enables the candidate to:Produce in-depth explanations of all the principles, themes, processes and the worked examples of all PRINCE2® products as it is possible to apply them to address the specific circumstances of a given project scenario.Present their understanding of the relationships between principles, themes, processes, and the PRINCE2® products and must be able to apply this understanding.Demonstrate their expertise of the reasons behind the principles, themes, and processes of PRINCE2®, and are also that they understand the principles underpinning these elements of PRINCE2®.Showcase their ability to tune PRINCE2® according to different project circumstances.WHO SHOULD ATTEND THIS QUALIFICATION?This qualification is suitable for:Project ManagersGeneral ManagersProgramme ManagersTeam Managers and the Support StaffStaff that have a defined role in a projectProject Management ConsultantsREQUIREMENTSYou would be required to possess one of the following qualifications before appearing for the PRINCE2® Practitioner exam:PRINCE2® FoundationProject Management Professional (PMP)®Certified Associate in Project Management (CAPM)®IPMA Level A (Certified Projects Director)IPMA Level B (Certified Senior Project Manager)IPMA Level C (Certified Project Manager)IPMA Level D (Certified Project Management Associate)EXAMINATION FORMATThe examination will be for a duration of 150 minutesThe exam will contain 68 questionsYou would be required to correctly answer 38 out of 68 questions (55 % is the passing score)Open book. Only the official printed hard copy of the Managing Successful Projects with PRINCE2®2017 Edition is allowed.PRINCE2® AgileThe PRINCE2® Agile qualification consists of Foundation and Practitioner levels. PRINCE2® Agile primarily focuses on teaching you the fundamentals and the purpose of merging PRINCE2® with the Agile methodology. It provides extensibility along with the capability of working together with the corporate management processes.PRINCE2® Agile FoundationSimilar to the PRINCE2® Foundation, this Agile Foundation course and certification can be taken by anyone without any prior qualification. This qualification course introduces PRINCE2® governance and offers a wide range of Agile concepts and techniques. Once you are qualified, you would be able to act as a well-informed member of an Agile project team and understand how the PRINCE2® operates with Agile concepts.PRINCE2® Agile PractitionerThis qualification is focused on teaching the certified Project Managers the method of mixing the structure, governance and control of PRINCE2® with the Agile methodology, techniques and approaches. Additionally, it has the same prerequisites as the PRINCE2® Practitioner, so for appearing in the exam you must possess the following qualifications:PRINCE2® FoundationProject Management Professional (PMP)®Certified Associate in Project Management (CAPM)®IPMA Level A (Certified Projects Director)IPMA Level B (Certified Senior Project Manager)IPMA Level C (Certified Project Manager)IPMA Level D (Certified Project Management Associate)Who manages the PRINCE2® Certification or Qualification?All the major certifications and qualifications are managed by a sponsor, that develops, manages and operates qualifications. PRINCE2® also has a sponsor. It is owned and managed by AXELOS - AXELOS is a joint venture that was set up in 2014 by the government of the United Kingdom and Capita, it is responsible for developing, managing and operating qualifications in best practices, using the methodologies formerly owned by the Office of Government Commerce (OGC). It is Axelos that supervises the entire process of the PRINCE2® Certification or Qualification.Importance of PRINCE2® Certification or QualificationThe PRINCE2® ensures that the candidate has the following capabilities:The PRINCE2® Certification or Qualification primarily aims at ensuring that a candidate has sufficient knowledge and expertise of the PRINCE2® methodology so that the candidate can effectively work with, or as a member of a project management team that is working within an environment that supports PRINCE2®.The PRINCE2® Professional Certification or Qualification is also aimed to test the candidate's capability of managing a non-complex PRINCE2® project across all aspects of the entirety of the project.  The PRINCE2® re-registration?Like every other certification or qualification, PRINCE2® qualification is also required to be renewed after a certain period of time. Henceforth, any PRINCE2® Practitioner would be required to register within a time period of three to five calendar years based on the original certification. This re-registration includes an hour-long test that the candidate must pass to retain the certification or qualification. The examination is based on similar standards as the original Practitioner examination.
Rated 4.5/5 based on 12 customer reviews
9887
PRINCE2® Qualifications (Levels Explained)

Competition brings out the best in us; we strive t... Read More