1. Science
  2. Publications
  3. Information Processing Systems
  4. 2(157)'2019
  5. Load distribution of big data request and reports

Load distribution of big data request and reports

A. Savenko, A. Gavrilenko
Annotations languages:


Description: An approach is proposed for redistributing the load of web-, mobile- and desktop client-server applications to increase speed big data reports and queries on PHP, RabbitMQ, Redis technologies and a relational database. The following scheme of redistribution of load is proposed. For the API, which initially was responsible for receiving information, processing and issuing it, only the function for receiving information is left. Receiving any request, the API returns the request parameters to the RabbitMQ queue. By launching the next worker, a subscription to the switch is performed, which performs load distribution between subscribers in the following way: if the subscriber is busy at the current moment, the switch will send a message to the one who is free or will wait for the one who is the fastest to be released. Thus, the subscribers perform all the actions that were previously performed by the sole API. Using any page, request, action or function of the project, it is distributed to the worker with the least load. A worker who has performed a specific algorithm of actions sends the answer to a specific queue intended for the execution of algorithms, which is indicated in the message. There may come a time when the workers are loaded with large long-term tasks that interfere with the implementation of small tasks. In order for this not to happen, it is necessary to distinguish between workers. To distinguish between workers, you need to identify small tasks and tasks with large amounts of data, thereby dividing the workers into several groups. The first group of workers will perform big data processing tasks, the second group of workers will perform small tasks, and a third group of workers who need to process reports is also needed. These actions can achieve maximum system acceleration, since workers will be demarcated, and small tasks, I will not wait for workers who process big data. If we confine ourselves to an equal number of workers of different groups, then it is not appropriate to talk about speed, as some workers of some groups will stand idle for some time. To eliminate this situation, you need to distribute the number of workers: for small tasks, it is necessary to allocate a small number of workers (since tasks are performed quickly and do not require much time to answer) and for more complex tasks, it is necessary to allocate a larger number of workers (since they are processed much longer). In order for the servers on which the workers are working to be used only for urgent needs, it is also necessary to use the Redis server as a means to store the cache of the processed results.The proposed approach allows us to significantly increase the speed of existing, not originally intended for processing big data applications, as well as to develop new client-server applications designed to process big data (a large number of requests).


Keywords: load distribution, big data, requests, API, client-server applications

References

1. Lvovich, V.D., Anisimov, V.Y., Khoteev, A.L. and Sterzhanov, M.V. (2019), “Vybor yazyka programmirovaniya dlya resheniya zadach, svyazannyh s primeneniem tekhnologii Big Data” [Programming language selection for solving tasks related to Big Data], BIG DATA and Advanced Analytics: collection of materials of the V International Scientific Practical Conference, Minsk, March 13–14. At 2 parts, Part 2, Minsk, pp. 117-120.
2. Borovikov, S.M., Dzick, S.K. and Dzick, S.S. (2019), “Bolshie dannye i printsipi razrabotki analiticheskih sistem” [BIG DATA and principles of development of analytical systems], BIG DATA and Advanced Analytics: collection of materials of the V International Scientific Practical Conference, Minsk, March 13–14. At 2 parts, Part 2, Minsk, pp. 167-171.
3. Uspenskiy, N. (2019), “Novye podkhody upravleniya dannymi” [New data management approaches], BIG DATA and Advanced Analytics: collection of materials of the V International Scientific Practical Conference, Minsk, March 13–14. At 2 parts, Part 1, Minsk, pp. 28-33.
4. Tsyrelchuk, I.N., Shneiderov, E.N., Berashevich, P.A., Los, N.A. and Tereshkov, A.S. (2018), “Raspredelennye failovye sistemy dlya organizatsii khranilishch strukturirovannykh dannykh” [Distributed file systems for structured datastores organization], BIG DATA and Advanced Analytics: collection of materials of the IV International Scientific Practical Conference, Minsk, May 3–4, Minsk, pp. 463-466.
5. Khadasevich, A.I. and Shvets, V.I. (2018), “Obrabotka bolshih obemov informatsii s ispolzovaniem platformy Hadoop i sluzhby oblachnuh vycheslenii Microsoft Azure” [Processing large amounts of information using the Hadoop platform and Microsoft Azure cloud computing services], Computer systems and networks: materials of the 54th scientific conference of graduate students, undergraduates and students, Minsk, April 23-27, Minsk, pp. 208.
6. Heger, D. (2015), Future of big data, BIG DATA and Predictive Analytics. Using BIG DATA to optimize business and information technology: a collection of materials of the international scientific-practical conference, Minsk, pp. 72-75.
7. Karau, H. and Warrens, R. (2018), “Effektivnyi Spark. Masshtabirovanie i optimizatsiya” [Effective Spark. Scaling and optimization], Piter, St. Petersburg, 352 p.
8. Franks, B. (2014), Taming the big data. Finding Opportunities in Huge Data Streams with Advanced Analytics, Wiley, Hoboken, New Jersey, 336 p.
9. Borovikov, S.M., Shneiderov, E.N., Tsyrelchuk, M.I. and Dzik, S.S. (2019), Prediction in Big Data Technology, BIG DATA and Advanced Analytics. Using BIG DATA to optimize business and information technology: a collection of materials of the II International Scientific and Practical Conference; Minsk, June 15-17, Minsk, pp. 98-101.
10. Horoneko, M., Haritonov, N., Medunetski, M. and Sterjanov, M. (2019), “Luchshie praktiki razrabotki Big Data prilozhenii na baze Hadoop” [Best practices of Big Data applications development on the base of Hadoop], BIG DATA and Advanced Analytics: collection of materials of the V International Scientific Practical Conference, Minsk, March 13–14. At 2 parts, Part 2, Minsk, pp. 188-193.
11. Deepak, V. (2016), Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools, Apress, New York, 165 p.
12. The official site of BaseGroup Labs (2019), “Analiz bolshih obemov dannyh” [Big Data Analysis], available at: https://basegroup.ru/community/articles/very-large-data/ (accessed 24 March 2019).

Reference:
 Savenko, A.H. and Havrylenko, A.S. (2019), “Raspredelenye nahruzky pry postroenyy otchёtov y zaprosov s bolshym obъёmom dannыkh” [Load distribution of big data request and reports], Information Processing Systems, Vol. 2(157), pp. 71-75. https://doi.org/10.30748/soi.2019.157.09.