Written by Miguel Perez Pasalodos. 

Not that long ago, the amount of data became so massive that it was virtually impossible to process it in a single computer or server, of any size, in a reasonable time. This was the turning point of computing, when engineers saw the necessity of designing the first distributed computing models. Meaning, they divided the computing work amongst several machines, which would later put it together, in a much faster process than a single machine would have done.

The amount of data has been growing so fast that, what started as a problem for big corporations, ended up spreading to many other sectors with fewer resources. At this point, it became necessary to create a community to share this field’s knowledge.

Distributed computing models are complex. They require, for example, a fault tolerance, which means that the system will keep working properly even in the event that a certain number of machines crash. Sending data from one machine to another is also expensive, requiring to be as optimized as possible so processing times don’t start skyrocketing. Long story short, it takes the hard work of the whole data engineering community to make progress.

That’s where Spark came to life, as a way to spread distributed computing to as much people as possible. It allows to execute the same code from a single computer or from a machine cluster from different continents. The fact that it’s so easy to use meant that a lot of people worldwide grew interest on the topic and could access to this technology. Even if behind doors, the technology itself is very complex, and many new problems appear daily. When this happens, there are two possibilities: 1. The problem has already been solved in the most optimal way 2. It hasn’t been tackled yet, but somebody else might have the same issue.

That’s why at Trovit we frequently host gatherings of the Barcelona Spark meetup, like the one of 27th April when Stratio came to speak about their long-term relationship with Spark, and the one of the 21st of March, when we had some speed-talks about day-a-day frequent issues, how to solve them or tricks to make the job easier. We think that in this way, we help maintaining a community that provides an important service to the current system.