The 14F on Instagram. A proposal for articulation of web scraping and network analysis techniques
Abstract
In this article we analyse the election campaign of 14 February 2021 in the Catalan Parliament through the parties’ conversation on Instagram: one of the digital platforms with the most registered users and one of the least attended in sociological research on social media. We have applied a few ethical and legal web scraping techniques to acquire the data, which have been retrieved, processed and stored in a relational database. Subsequently, we have applied data mining techniques and unsupervised learning algorithms oriented, on the one hand, towards the descriptive and exploratory analysis of the conversation, and on the other, towards the elaboration of networks of lexical co-occurrences that allow us to apply an analysis on the discourse articulated by the parties. Using an inductive and structural methodology, we have characterised various aspects of the narrative constructed by the Catalan parties in the electoral campaign: aspects relating to their content publication practices, the reception of their audiences and the different uses of hashtags and words they have made. Beyond the specific case of analysis and the characterisation of the parties’ narratives during the 14F political campaign and their internal differences, with this article we also aim to put on the table a model of analysis based on big data analysis techniques applicable and replicable in any data scenario acquired through ethical and legal web scraping techniques that guarantee the research autonomy of social scientists.
Keywords
social network analysis, Instagram, web scraping, big dataReferences
Anderson, Chris (2008). «The end of theory: The data deluge makes the scientific method obsolete». Wired Magazine, 16 (7). https://www.wired.com/2008/06/pb-theory/, vist el 18 de febrer de 2021.
Apache NiFi (2018). «NiFi Developer’s Guide». http://nifi.apache.org/developer-guide.html, vist el 18 de febrer de 2021.
Bastian, Mathieu; Heymann, Sebastien i Jacomy, Mathieu (2009). Gephi: an open source software for exploring and manipulating networks. International AAAI Conference on Weblogs and Social Media. https://doi.org/10.13140/2.1.1341.1520
Blondel, Vincent D.; Guillaume, Jean-Loup; Lambiotte, Renaud i Lefebvre, Etienne (2008). «Fast unfolding of communities in large networks». Journal of Statistical Mechanics: Theory and Experiment, P10008. https://doi.org/10.1088/1742-5468/2008/10/P10008
Brandes, Ulrik (2001). «A Faster Algorithm for Betweenness Centrality». Journal of Mathematical Sociology, 25 (2), 163-177. https://doi.org/10.1080/0022250X.2001.9990249
Bruns, Axel (2019). «After the “APIcalypse”: social media platforms and their fight against critical scholarly research». Information, Communication & Society, 22 (11), 1544-1566. https://doi.org/10.1080/1369118X.2019.1637447
CCMA (2021). «Esquerra confia en l’acord ampli després de reunir-se amb Comuns». https://www.ccma.cat/tv3/alacarta/telenoticies/esquerra-confia-en-lacord-ampli-despres-de-reunir-se-amb-comuns/video/6085917/, vist el 2 de març de 2021.
Demetzou, Katerina (2019). «Data Protection Impact Assessment: A tool for accountability and the unclarified concept of “high risk” in the General Data Protection Regulation». Computer Law & Security Review, 35 (6), 105.342. https://doi.org/10.1016/j.clsr.2019.105342
Demsar, Janez; Curk, Tomaz; Erjavec, Ales; Gorup, Crt; Hocevar, Tomaz; Milutinovic, Mitar; Mozina, Martin; Polajnar, Matija; Toplak, Marko; Staric, Anze; Stajdohar, Miha; Umek, Lan; Zagar, Lan; Zbontar, Jure; Zitnik, Marinka i Zupan, Blaz (2013). «Orange: Data Mining Toolbox in Python». Journal of Machine Learning Research, 14 (1), 2.349-2.353. https://dl.acm.org/doi/10.5555/2567709.2567736
Diario Oficial de la Unión Europea (2016). «Reglamento (UE) 2016/679 del Parlamento Europeo y del Consejo de 27 de abril de 2016 relativo a la protección de las personas físicas en lo que respecta al tratamiento de datos personales y a la libre circulación de estos datos y por el que se deroga la Directiva 95/46/CE (Reglamento General de Protección de Datos)». https://www.boe.es/doue/2016/119/L00001-00088.pdf, vist el 18 de febrer de 2021.
Freeman, Linton C. (1977). «A set of measures of centrality based on betweenness». Sociometry, 40 (1), 35-41. https://doi.org/10.2307/3033543
Fuchs, Christian (2017). «From digital positivism and administrative big data analytics towards critical digital and social media research!». European Journal of Communication, 32 (1), 37-49. https://doi.org/10.1177/0267323116682804
Gayo-Avello, Daniel (2013). «A Meta-Analysis of State-of-the-Art Electoral Prediction From Twitter Data». Social Science Computer Review, 31 (6), 649-679. https://doi.org/10.1177/0894439313493979
Grinberg, Nir; Kenneth, Joseph; Friedland, Lisa; Swire-Thompson, Briony i Lazer, David (2019). «Fake News on Twitter during the 2016 U.S. Presidential Election». Science, 363 (6.425), 374-378. https://doi.org/10.1126/science.aau2706
Highfield, Tim i Leaver, Tama (2016). «Instagrammatics and digital methods: studying visual social media, from selfies and GIFs to memes and emoji». Communication Research and Practice, 2 (1), 47-62. https://doi.org/10.1080/22041451.2016.1155332
Lambiotte, Renauld; Delvenne, Jean-Charles; Barahona, Mauricio (2009). «Laplacian Dynamics and Multiscale Modular Structure in Networks». IEEE Transactions on Network Science and Engineering, 1 (2), 76-90. https://doi.org/10.1109/TNSE.2015.2391998
Landers, Richard N.; Brusso, Robert C.; Cavanaugh, Katelyn J. i Collmus, Andrew B. (2016). «A primer on theory-driven web scraping: Automatic extraction of big data from the Internet for use in psychological research». Psychological Methods, 21 (4), 475-492. https://doi.org/10.1037/met0000081
Lozares, Carlos (1996). «La teoría de redes sociales». Papers: Revista de Sociologia, 48, 103-126. https://doi.org/10.5565/rev/papers/v48n0.1814
Masip, Pere; Ruiz-Caballero, Carlos; Suau, Jaume (2019). «Active audiences and social discussion on the digital public sphere. Review article». El Profesional de la Información, 28 (2), e280.204. https://doi.org//10.3145/epi.2019.mar.04
Matamoros-Fernández, Ariadna i Farkas, Johan (2021). «Racism, Hate Speech, and Social Media: A Systematic Review and Critique». Television & New Media, 22 (2), 205-224. https://doi.org/10.1177/1527476420982230
Metzgar, Emily i Maruggi, Albert (2009). «Social Media and the 2008 U.S. Presidential Election». Journal of New Communications Research, 4 (1), 141-65.
Morales-i-Gras, Jordi (2020). «Cognitive Biases in Link Sharing Behavior and How to Get Rid of Them: Evidence from the 2019 Spanish General Election Twitter Conversation». Social Media + Society, 6 (2), 1-4. https://doi.org/10.1177/2056305120928458
Mosco, Vincent (2014). To the Cloud: Big Data in a Turbulent World. Boulder (CO): Paradigm Publishers.
Newman, Mark E. (2006). «Modularity and community structure in networks». Proceedings of the National Academy of Sciences, 103 (23), 8.577-8.582. https://doi.org/10.1073/pnas.0601602103
Oxford Dictionaries (2016). «Word of the year 2016». https://languages.oup.com/word-of-the-year/2016/, vist el 4 de febrer de 2021.
Pfeffer, Juergen; Mrvar, Andrej i Batagelj, Vladimir (2013). «txt2pajek: Creating Pajek Files from Text Files». Technical Report, CMU-ISR-13-110, Carnegie Mellon University, School of Computer Science, Institute for Software Research. http://www.pfeffer.at/papers/2015_txt2pajek.pdf
PostgreSQL (2021). «PostgreSQL 13.2 Documentation». https://www.postgresql.org/docs/13/index.html, vist el 18 de febrer de 2021.
Savand, Alireza (2014). «Stop-Words». https://github.com/Alir3z4/stop-words, vist el 18 de febrer de 2021.
The Social Media Family (2020). «Informe de los perfiles en redes sociales de España». https://thesocialmediafamily.com/informe-redes-sociales/, vist el 18 de febrer de 2021.
Veltri, Giuseppe A. (2019). Digital social research. Cambridge: Polity Press, John Wiley & Sons.
We Are Social (2020). «Digital 2020 España». https://wearesocial.com/es/digital-2020-espana, vist el 18 de febrer de 2021.
Published
Downloads
Copyright (c) 2022 Jordi Morales-i-Gras, Oriol Sánchez-i-Vallès
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.