- The Apache Hadoop project actively supports multiple projects intended to extend Hadoop’s capabilities and make it easier to use.
- There are several top-level projects to create development tools as well as for managing Hadoop data flow and processing
Data Ingestion
Flume :A distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of streaming event data.Kafka :A messaging broker that is often used in place of traditional brokers in the Hadoop environment because it is designed for higher throughput and provides replication and greater fault tolerance.
SQOOP : Is a tool designed to transfer data between Hadoop and relational database servers like MySQL or Oracle
Storage
HDFS (Hadoop Distributed File System) :Is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systemsHBASE : Is a distributed, scalable, distributed column-oriented database built on top of the Hadoop file system. It is an open-source project and is horizontally scalable
Data Formats
Avro : Is an opinionated format which understands that data stored in HDFS is usually not a simple key/value combo like Int/String. The format encodes the schema of its contents directly in the file which allows you to store complex objects natively.Parquet (Columnar File Format) Store data adjacent to one another and also store column values adjacent to each other. So datasets are partitioned both horizontally and vertically. This is particularly useful if your data processing
Processing
Map Reduce: It is this programming paradigm that allows for massive scalability across hundreds or thousands of servers in a Hadoop cluster.Resource Management
YARN: Is the architectural center of Hadoop that allows multiple data processing engines such as interactive SQL, real-time streaming, data science and batch processing to handle data stored in a single platform, unlocking an entirely new approach to analytics.Analysis
Pig: Is a procedural language for developing parallel processing applications for large data sets in the Hadoop environment. Pig is an alternative to Java programming for MapReduce, and automatically generates MapReduce functions.It was originally developed at YahooHive : Is a data warehousing software that addresses how data is structured and queried in distributed Hadoop clusters.. It provides tools for ETL operations and brings some SQL-like capabilities to the environment
Spark SQL: Apache spark's module for working with structured data.
Spark Mlib: Apache spark's scalable machine learning library.
Graphx: Apache Spark's API for graphs and graph-parallel computation
Search
Solr: Is a standalone enterprise search server with a REST-like API
ElasticSearch: Is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases
Visualization
Hue: Self service analytics workbench which helps in browsing, querying and visualizing data.
Removes the need for users to have advanced knowledge of query languages by providing a clean visual analysis interface that makes working with big data more manageable for more stakeholders.
Coordination
ZooKeeper: High performance coordination service for distributed applicationsCluster Management
Hadoop is an open source project and several vendors have stepped in to develop their own distributions on top of Hadoop framework to make it enterprise ready. Some of the famous companies are HortonWorks, Cloudera and MAPR
Other Apache hadoop related open source projects
Ambari : A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which includes support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig, and Sqoop.Cassandra : A scalable multi-master database with no single points of failure.
Chukwa : A data collection system for managing large distributed systems.
Impala :The open source, native analytic database for Apache Hadoop. Impala is shipped by Cloudera, MapR, Oracle, and Amazon.
Mahout : A scalable machine learning and data mining library.
Tajo : A robust big data relational and distributed data warehouse system for Apache Hadoop. Tajo is designed for low-latency and scalable ad-hoc queries, online aggregation, and ETL on large-data sets stored on HDFS and other data sources.
Tez : A generalized data-flow programming framework, built on Hadoop YARN, which provides a powerful and flexible engine to execute an arbitrary DAG of tasks to process data for both batch and interactive use-cases.
Learned a lot of new things from your post! Good creation and HATS OFF to the creativity of your mind. Very interesting and useful blog!
ReplyDeleteDevOps Training in Chennai
DevOps Certification
DevOps Certification Chennai
Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. big data projects for students But it’s not the amount of data that’s important. Project Center in Chennai It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.
DeleteSpring Framework has already made serious inroads as an integrated technology stack for building user-facing applications. Corporate TRaining Spring Framework the authors explore the idea of using Java in Big Data platforms.
Specifically, Spring Framework provides various tasks are geared around preparing data for further analysis and visualization. Spring Training in Chennai
The Angular Training covers a wide range of topics including Components, Angular Directives, Angular Services, Pipes, security fundamentals, Routing, and Angular programmability. The new Angular TRaining will lay the foundation you need to specialise in Single Page Application developer. Angular Training
Thanks for providing such great and useful informations on your blog.update more data later.
ReplyDeleteHadoop Training in Chennai
Big data training in chennai
Big Data Hadoop Training in Chennai
JAVA Training in Chennai
Python Training in Chennai
IOS Training in Chennai
Hadoop training in chennai
Big data training in chennai
big data training in chennai anna nagar
Thanks for your deep explanation and I hope more produce the fantastic thinks from your blog. I am waiting for your new posting...
ReplyDeleteLinux Training in Chennai
Linux Course in Chennai
Best Linux Training Institute in Chennai
Excel Training in Chennai
Oracle Training in Chennai
Unix Training in Chennai
Corporate Training in Chennai
Embedded System Course Chennai
Linux Training in OMR
Linux Training in Velachery
Awesome informations that you have shared for us.I eagerly waiting for more updates in future.
ReplyDeleteHadoop Training in Chennai
Big data training in chennai
Big Data Course in Chennai
JAVA Training in Chennai
Python Training in Chennai
Selenium Training in Chennai
Hadoop training in chennai
Big data training in chennai
big data course in chennai
ReplyDeleteGood job. This will useful for others who want to know more about technology. Useful one.
Spring Training in Chennai
Core Spring Training
Spring source Training
spring Training in Porur
Hibernate Training in Chennai
Hibernate Training Chennai
Struts Training in Chennai
Wordpress Training in Chennai
I am very happy to visit your blog. This is definitely helpful to me, eagerly waiting for more updates.
ReplyDeleteR Training in Chennai
R Programming Training in Chennai
Data Science Course in Chennai
Data Science Training in Chennai
Machine Learning Course in Chennai
Machine Learning Training in Chennai
Data Science course in Anna Nagar
It's great post and more effective ...informative blog!
ReplyDeleteIonic Training in Chennai
ionic course in chennai
Big Data Analytics Courses in Chennai
Blockchain course
Hadoop Administration Training in Chennai
Node JS Training in Chennai
Xamarin Training in Chennai
ReplyDeleteI would definitely thank the admin of this blog for sharing this information with us. Waiting for more updates from this blog admin.
Machine Learning course in Chennai
Machine Learning Training in Chennai
Azure Training in Chennai
Cloud Computing Training in Chennai
RPA Training in Chennai
UiPath Training in Chennai
thank you for the valuable information giving on data science it is very helpful.
ReplyDeleteData Science Training in Hyderabad
your article on data science is very good keep it up thank you for sharing.
ReplyDeleteData Science Training in Hyderabad
Really wonderful blog! Thanks for taking your valuable time to share this with us. Keep us updated with more such blogs.
ReplyDeleteR Training in Chennai
Data Analytics Training in Chennai
Data Science Training in Chennai
UiPath Training in Chennai
Cloud Computing Training in Chennai
R Training in OMR
R Training in Porur
R Training in Vadapalani
Hey, would you mind if I share your blog with my twitter group? There’s a lot of folks that I think would enjoy your content. Please let me know. Thank you.
ReplyDeleteTop HADOOP TRAINING IN CHENNAI
Such organizations can set cutoff points on these credit lines by making danger models for low-pay buyers through state, installment chronicles for different utilities or Mastercards.
ReplyDeletemachine learning course
I prefer to study this kind of material. Nicely written information in this post, the quality of content is fine and the conclusion is lovely. Things are very open and intensely clear explanation of issues
ReplyDeleteHadoop Online Training
This is a great article with lots of informative resources. I appreciate your work this is really helpful for everyone. Check out our website Shipping from China to Amazon FBA for more related info!
ReplyDelete
ReplyDeleteYour very own commitment to getting the message throughout came to be rather powerful and have consistently enabled employees just like me to arrive at their desired goals.
Best Angularjs Training in Chennai
Best Java Training in Chennai
Best Bigdata Hadoop Training in Chennai
Best SAS Training in Chennai
Best Python Training in Chennai
Best Software Testing Training in Chennai
Well, The information which you posted here is very helpful & it is very useful for the needy like me.., Wonderful information you posted here nice page
ReplyDeleteAi & Artificial Intelligence Course in Chennai
PHP Training in Chennai
Ethical Hacking Course in Chennai Blue Prism Training in Chennai
UiPath Training in Chennai
The hadoop manangement system is important concepts of core.It is described very well.The effective uses of hadoop is point outed.Your valuable contents are making me to come back again your blog.
ReplyDeleteJava training in Chennai
Java training in Bangalore
Java training in Hyderabad
Java Training in Coimbatore
Java Online Training
I am very excited to see your article, a good way of content delivery.
ReplyDeleteaws certifications list
amazon web services careers
how to get aws certification
blue prism versions
future scope of robotics
big data hadoop interview questions and answers
It's fantastic for me to have a website that is beneficial to my understanding. Thank you, admin.
ReplyDeleteAluminium pipe
padmawatiextrusion4@gmail.com
https://padmawatiextrusion.com/aluminium-pipe
I was eager to find this page. I needed to thank you for ones time for this especially awesome read!! I certainly truly preferred all aspects of it and I likewise have you book-set apart to look at new data in your blog.
ReplyDeletebest life insurance policy