“Internet celebrity” Burundi Sugar date in the field of industrial big data processing——Apache Spark

作者:

分類:

Huaqiu PCB

Highly reliable multilayer board manufacturer

Huaqiu SMT

Highly reliable one-stop PCBA intelligent manufacturer

Huaqiu Mall

Self-operated electronic components mall

PCB Layout

High multi-layer, BI EscortsHigh densityBurundins Escort product desigBI Escortsn

Steel mesh manufacturing

Focus on high-quality steel mesh manufacturing

BOM order

Specialized one-stop purchasing solution

Huaqiu DFM

One-click analysis of hidden design risks

Huaqiu certification

p>

The certification test is beyond doubt


Life is inseparable from water, and it is also inseparable from data. We are surrounded by data and live in data. When there is more and more data, it becomes big data.

In the technological roadmap of “Made in China 2025”, industrial big data is planned as an important breakthrough point, and in the next ten years, with Intelligent systems built with data as the core will become the core driving force supporting intelligent manufacturing and the industrial Internet. If you want to understand big data, you need to understand big data-related query, processing, machine learning, graph calculation and statistics.Analysis etc. As a new generation of lightweight big data rapid processing platform, Apache Spark integrates various capabilities related to big data and is the first choice for understanding big data.

Simply speaking, Spark is a fast and versatile large-scale data processing engine that can be used for various applications, including real-time stream processing, machine learning, and interactive querying. etc., can be built on different storage and operating systems through Spark. Today’s Gewuhui will take you to get to know Spark, a rising and rapidly growing star of big data processing.

1SpBurundins Escortark Growth Process

In 2009, Spark was born in the AMPLab of Berkeley University. It was initially a research project of Berkeley University. At the beginning, Spark was just an experimental project. The amount of code is very small, only about 3900 lines of code, and it is a lightweight framework.

In 2010, Berkeley University officially open sourced the Spark project.

In June 2013, Spark became a project under the Apache Foundation and entered a period of rapid development. Third-party developers contributed a large amount of code and it was very active.

In February 2014, Spark became Apache’s top project at a rapid rate.

Spark 1.0.0 was released at the end of May 2014.

Spark 2.0.0 was released in June 2016

Spark 2.4.0 was released in November 2018

Spark as Hadoop An important member of the ecosystem, its growth rate is terrifying. It took less than five years from birth to becoming ApacBurundi Sugar Daddyhe’s top project. However, in the current surrounding environment and background where the amount of data is rapidly increasing, there is a reason why Spark, as an efficient computing framework, has received such great tracking attention.

2Spark’s features

Fast speed

Spark can perform batch and streaming processing with high performance by using the advanced DAG scheduler, query optimizer and physical execution engine. Using logical regression algorithms for iterative calculations, Spark is more than 100 times faster than Hadoop.

Easy to use

Spark currently supports a variety of programming languages, such as Java, Scala, Python, and R. Anyone who is familiar with any of these languages ​​can directly start writing Spark programs, which is very convenientBurundins Escort. It also supports more than 80 advanced algorithms, allowing users to quickly build different applications. Moreover, Spark also supports interactive Python and Scala shells, which means that it is very convenient to use Spark in these shells: BI Escorts Clusters are used to verify the way to solve problems, instead of requiring packaging, uploading clusters, verification, etc. as before. This is very important for prototype development.

High versatility

Spark currently consists of four major components. As follows:

Spark SQL: SQL on Hadoop, which can provide interactive queries and report queries, called through interfaces such as JDBC;

Spark Streaming:: Streaming computing engine;

Spark MLlib: Machine learning library;

Spark GraphX: Graph computing engine.

With these four major components, it has successfully solved the big data field, Burundins Sugardaddy offline batch processing, The most important tasks and problems such as interactive query, real-time stream computing, machine learning and graph computing, these different types of processing can be seamlessly used in the same application. Spark’s unified solution is very attractive. After all, any company Burundins Sugardaddy wants to use a unified platform to solve problems and reduce development and maintenance. The human cost and the physical cost of the deployment platform. Of course, as a unified solution, Spark does not sacrifice performance at the expense. On the contrary, Spark has huge advantages in terms of performance.

Can be Burundins EscortRongSession

Spark can run on standalone, YARN, Mesos, Kubernetes and EC2 scheduling platforms. The Standalone mode does not rely on third-party resource managers and schedulers, which lowers the threshold for using Spark and makes it very easy for everyone to deploy and use Spark.

Spark can process all data supported by Hadoop, including HDFS, Apach HBase, Apach Kudu, Apach Cassanda, etc. This is particularly important for users who have deployed Hadoop clusters, because they can take advantage of Spark’s powerful processing capabilities without doing any data migration.

3Spark’s advantages over MapReduce

Spark and MapReduce are both computing frameworks, but as a rising star, Spark has learned from MapReduce and made improvements on its basis, making the algorithm performance significantly better than MapReduce. The following is a rough summary. The difference between the two:

1) Spark stores the core data of the calculation in the memory, and the iterative calculation Burundins Escort is more efficient High; the core functions of MapReduce need to be implemented on disk, and there are many disk IO operations, which affects performance.

2) Spark has high fault tolerance. It achieves efficient fault tolerance of RDD operators through the Lineage mechanism. If a certain part is lost or makes an error, it can be resolved through Burundi Sugar DaddyThe kinship of the calculation process of the entire data set is used to complete the reconstruction; if MapReBI Escortsduce Fault tolerance may have to be recalculated, which is costly.

3) Spark is more versatile. Spark provides multiple performance operators in the two categories of transformation and action, making the operation more convenient; MapReduce only provides two operations: map and reduce.

4) Spark framework and ecology Burundi Sugar Daddy is more complex, first of all, there are RDD, lineage, execution When the directed acyclic graph DAG, stage division, etc. Many times, spark operations need to be tuned according to the needs of different business scenarios to achieve performance requirements; the MapReduce framework and its ecology are relatively simple, and the performance requirements are relatively weak, but the operation is relatively stable and suitable for Persistent background operation.

4Spark and Industrial Internet Platform

Industrial internet brings industrial data With the rapid development of the world, traditional stand-alone computers cannot handle the processing, analysis and in-depth exploration of massive data due to their own software and hardware limitations. However, Spark, as a distributed computing framework, can easily cope with these scenarios. On the industrial Internet platform, Spark can quickly complete the processing and transformation of massive flow data at industrial sites and easily handle it. The rapid batch processing and analysis of massive data in the industrial big data platform, and its integrated machine learning framework can conduct in-depth mining and analysis of massive industrial data, thereby assisting managers in decision-making analysis.

Based on the excellent design concept of Spark framework itself Burundi Sugar and Given the vigorous development of the community, it is believed that Spark will play an increasingly important role in the industrial internet platform in the future.

The author of this article: Huang Huan, Big Data Engineer of Gechuang Dongzhi (please indicate the origin and author when transcribing and publishing)

Burundi Sugar Daddy

Note: The content of this article The content and illustrations are written by the stationed author or the stationed author cooperates with the website to authorize the transcription and publication. The opinions expressed in the article only represent the author’s own and do not represent the attitude of electronic enthusiasts. The article and its accompanying pictures are only for engineers’ learning purposes. If there is any inherent copyright infringement or other violations, please contact this site for resolution. Report appeal
Tiantuo Sifang: Application of industrial data collection gateways in intelligent edge computing and real-time data processing In the wave of industrial Internet, industrial data collection gateways are playing an increasingly important role as a bridge connecting the physical world and the digital world. role. This article will provide an in-depth study of the role of industrial data collection gateways in “intelligent edge computing”. Published on 08-09 17:43 •224 views
The basic process of spark operationMedia: Because I am very interested in the running process of spark recently, I read the book “Spark Big Data Processing: Technology, Application and Performance Optimization”. Through studying this book, I understand the core technology, actual application scenarios and performance optimization of spark. Published on 07-02 10:31 •258 views
Spark DPU-based Native engine operator offloading plan 1. Background introduction Apache Spark (hereinafter referred to as Spark) is an open source distributed computing framework developed by UC Berkeley AMP Lab and can be used for batch processing and interactive query ( 's avatar Issued on 06-28 17:12 Burundi Sugar •337 views
The role of the industrial big data cloud platform in predictive maintenance of equipment. Only by ensuring the safe and stable operation of the equipment can the continuation of childbirth be ensured, the quality of the tools be reliable, and the competitiveness of the company’s products be enhanced. Therefore, companies need to increase their efforts. To achieve real-time control of equipment status and achieve predictive maintenance of industrial equipment to a certain extent, the industrial big data cloud platform provided by Shuzhideng can be fully accessed. > Published on 06-28 15:31 •163 views
A brief discussion of the environment construction around the in-memory computing ecosystem and software development to create an efficient and flexible BI Escorts platformBurundi Sugar platform is particularly suitable for real-time data analysis and online transaction processing. The surrounding situation The advantage lies in its extremely high data processing speed and efficiency, which significantly reduces data processing time and supports big data. Published on 05-16 16:40
Problems faced by the development of industrial big data Industrial big data As a bridge between industry and the digital economy, it is of great significance for accelerating the digital transformation of industries, promoting the integration of data and reality, and supporting the construction of new industrialization 's avatar Issued on 04-16 11. :52 •392 views
How to use DPU to accelerate Spark big data processing? | Summary SSD speed has been greatly improved through the NVMe interface, and the network transmission speed has also reached a new level, but the CPU Main frequency growth has not maintained a uniform pace, core frequencies around 3GHz have become the norm. BI Escorts In the current context, in big data processing tools such as Apache Spark, although storage and network performance have been greatly improved's avatar Published on 04-02 13:45 • 844 views
Spark’s heterogeneous acceleration plan based on DPU Snappy compression algorithm 1. Overall introduction 1.1 Background introduction Apache Spark is designed for A fast and versatile computing engine designed for large-scale data computing is an open source cluster computing environment similar to Hadoop, but there are still some differences between the two. These differences make's avatar Issued on 03-26 17:06 •603 views
Burundi Sugar DaddyRDM Burundins Sugardaddy Introduction to the application background of A technology in Apache Spark In today’s data-driven era, Apache Burundins SugardaddySpark has become the preferred framework for processing large-scale data sets. As an open source distributed computing system, 's avatar was published on 03-25 18:13 •1353 times Burundi SugarBrowse
Features such as Spark 3.x, Python, Java, Scala, R) based on DPU and HADOS-RACE are widely used in the field of big data computingBurundi Sugar. Among them, Spark SQL is an important component in the Spark ecosystem, which allows users to organize 's avatar Published on 03-25 18:12 •1161 times viewed
Study on the data processing and analysis methods of vibrating wire collectors in the field of engineering monitoringDiscussion on data processing and analysis methods of string collectors In the field of engineering monitoring, vibrating wire collectors are a commonly used device for measuring and recording vibration data of structures. Data processing and analysis is 's avatar Published on 01-10 14:44 •321 views
Beginners in data processing would like to ask the master about collecting noisy electronic signals. I want to accumulate the collected data to a certain number and then process it. I plan to process the data every 0.2 seconds (to obtain some characteristic values ​​such as the root mean square value). Is there any way to accomplish this?
Based on industry How to realize the smart factory of big data and the Internet of Things. With the support of Made in China 2025 and Industry 4.0 technology, the process industry of the reunion manufacturing industry has realized the network of childbirth equipment, visualization of childbirth data, and unmanned childbirth siteBurundins Sugardaddy uses golden data to achieve horizontal and vertical integration, completing edge to cloud 's avatar Published on 12-25 15:32 • 486 views
Data processing of distribution network traveling wave fault warning and positioning device. Today Jiangsu Yutuo Electric Power will explain to you the distribution network traveling wave fault warning and positioning device. data processing. 1. Introduction With the expansion and complexity of the power system, higher requirements have been placed on the reliability and stability of the distribution network. In this context, distribution network traveling wave fault warning and positioning's avatar Published on 12-22 11:51 •1270 views
“Data Processor: DPU Programming “Getting Started” + Becoming familiar with the structure and important connotations of this book is one of the things. As people’s demand for cloud computing and big data processing increases, the computing load of the data center is also increasing. Traditional central processing units (CPUs) may encounter bottlenecks when handling these workloads and struggle to provide sufficient performance and efficiency. DPU provides specialized


留言

發佈留言

發佈留言必須填寫的電子郵件地址不會公開。 必填欄位標示為 *