Automated Readability Assessment for Spanish e-Government Information

Jorge Morato; Ana Iglesias; Adrián Campillo; Sonia Sanchez-Cuadrado

doi:10.29333/jisem/9620

Automated Readability Assessment for Spanish e-Government Information

Jorge Morato ¹ ^* , Ana Iglesias ¹, Adrián Campillo ¹, Sonia Sanchez-Cuadrado ²

More Detail

¹ Computer Science Department, Universidad Carlos III de Madrid, Leganes, SPAIN
² Library and Information Sc. Dep., Universidad Complutense de Madrid, Madrid, SPAIN
^* Corresponding Author

Full Text (PDF)

Research Article

Journal of Information Systems Engineering and Management, 2021 - Volume 6 Issue 2, Article No: em0137
https://doi.org/10.29333/jisem/9620

Published Online: 21 Jan 2021

Views: 1989 | Downloads: 1699

Open Access

How to cite this article

APA 6th edition

In-text citation: (Morato et al., 2021)
Reference: Morato, J., Iglesias, A., Campillo, A., & Sanchez-Cuadrado, S. (2021). Automated Readability Assessment for Spanish e-Government Information. Journal of Information Systems Engineering and Management, 6(2), em0137. https://doi.org/10.29333/jisem/9620

Vancouver

In-text citation: (1), (2), (3), etc.
Reference: Morato J, Iglesias A, Campillo A, Sanchez-Cuadrado S. Automated Readability Assessment for Spanish e-Government Information. J INFORM SYSTEMS ENG. 2021;6(2):em0137. https://doi.org/10.29333/jisem/9620

AMA 10th edition

In-text citation: (1), (2), (3), etc.
Reference: Morato J, Iglesias A, Campillo A, Sanchez-Cuadrado S. Automated Readability Assessment for Spanish e-Government Information. J INFORM SYSTEMS ENG. 2021;6(2), em0137. https://doi.org/10.29333/jisem/9620

Chicago

In-text citation: (Morato et al., 2021)
Reference: Morato, Jorge, Ana Iglesias, Adrián Campillo, and Sonia Sanchez-Cuadrado. "Automated Readability Assessment for Spanish e-Government Information". Journal of Information Systems Engineering and Management 2021 6 no. 2 (2021): em0137. https://doi.org/10.29333/jisem/9620

Harvard

In-text citation: (Morato et al., 2021)
Reference: Morato, J., Iglesias, A., Campillo, A., and Sanchez-Cuadrado, S. (2021). Automated Readability Assessment for Spanish e-Government Information. Journal of Information Systems Engineering and Management, 6(2), em0137. https://doi.org/10.29333/jisem/9620

MLA

In-text citation: (Morato et al., 2021)
Reference: Morato, Jorge et al. "Automated Readability Assessment for Spanish e-Government Information". Journal of Information Systems Engineering and Management, vol. 6, no. 2, 2021, em0137. https://doi.org/10.29333/jisem/9620

ABSTRACT

This paper automatically evaluates the readability of Spanish e-government websites. Specifically, the websites collected explain e-government administrative procedures. The evaluation is carried out through the analysis of different linguistic characteristics that are presumably associated with a better understanding of these resources. To this end, texts from websites outside the government websites have been collected. These texts clarify the procedures published on the Spanish Government’s websites. These websites constitute the part of the corpus considered as the set of easy documents. The rest of the corpus has been completed with counterpart documents from government websites. The text of the documents has been processed, and the difficulty is evaluated through different classic readability metrics. At a later stage, automatic learning methods are used to apply algorithms to predict the difficulty of the text. The results of the study show that government web pages show high values for comprehension difficulty. This work proposes a new Spanish-language corpus of official e-government websites. In addition, a large number of combined linguistic attributes are applied, which improve the identification of the level of comprehensibility of a text with respect to classic metrics.

KEYWORDS

readability e-government information assessment web pages accessibility authoring tools

REFERENCES

Benjamin, R. G. (2012). Reconstructing Readability: Recent Developments and Recommendations in the Analysis of Text Difficulty. Educational Psychology Review, 24(1), 63-88. https://doi.org/10.1007/s10648-011-9181-8
Campillo, A., Morato, J., Maqueda, A. I. and Sanchez-Cuadrado, S. (2020). Readability of Spanish e-government information. 2020 15th Iberian Conference on Information Systems and Technologies (CISTI), Seville, 25-27 June, IEEE, 1-4.
Capito project. (n.d.). Available at: https://www.capito.eu/ (Accessed: 1 January 2021).
Curto, P., Mamede, N. and Baptista, J. (2015). Automatic Text Difficulty Classifier - Assisting the Selection of Adequate Reading Materials for European Portuguese Teaching, in Markus Helfert, Maria Teresa Restivo, Susan Zvacek, James Uhomoibhi (eds.) Proceedings of the 7th International Conference on Computer Supported Education, INSTICC, 23 - 25 May, 2015. Setubal: Scitepress, 1, 36-44.
Dale, E. and Chall, J. S. (1948). A Formula for Predicting Readability. Educational Research Bulletin, 27(1), 11-28.
DuBay, W. H. (2007). Smart Language Readers, Readability, and the Grading of Text. Costa Mesa: Impact Information.
European Commission. (2019). Clear writing for Europe Conference. Available at: https://ec.europa.eu/info/sites/info/files/clear_writing_conference_notes_for_website.pdf (Accessed: 1 January 2021).
European Commission. (2020a). The Digital Economy and Society Index (DESI): Shaping Europe’s digital future. Available at: https://ec.europa.eu/digital-single-market/en/digital-economy-and-society-index-desi (Accessed: 1 January 2021).
European Commission. (2020b). Commission Style Guide. Available at: https://wikis.ec.europa.eu/download/attachments/6824833/commission_style_guide.pdf?version=1&modificationDate=1594633342434&api=v2 (Accessed: 1 January 2021).
FALC project. (n.d.). Available at: https://www.ideographik.org/communication/ (Accessed: 1 January 2021).
Fernández-Huerta, J. (1959). Medidas sencillas de lecturabilidad [Simple readability measures]. Consigna, 214, 29-32.
François, T. and Fairon, C. (2012). An “AI readability” formula for French as a foreign language. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL '12), Jeju (Korea), July 12-14. Stroudsburg, PA: Association for Computational Linguistics, 466 -477.
Freyhoff, G., Hess, G., Kerr, L., Menzel, E., Tronbacke, B. and Van Der Veken, K. (1998). Make It Simple, European Guidelines for the Production of Easy‐to‐Read Information for People with Learning Disability. Brussels: ILSMH European Association.
Kauchak, D., Leroy, G. and Hogue, A. (2017). Measuring text difficulty using parse-tree frequency. Journal of the Association for Information Science and Technology, 68(9), 2088-2100. https://doi.org/10.1002/asi.23855
Kincaid, J., Fishburne, R., Rogers, R. and Chissom, B. (1975). Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) For Navy Enlisted Personnel. Institute for Simulation and Training, 56. Available at: https://stars.library.ucf.edu/istlibrary/56 (Accessed: 1 January 2021).
Klare, G. R. (2000). The measurement of readability: useful information for communicators. ACM Journal of Computer Documentation, 24(3), 107-121. https://doi.org/10.1145/344599.344630
Landauer, T. K., Foltz, P. W. and Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, 25(2-3), 259-284. https://doi.org/10.1080/01638539809545028
Larsson, P. (2006). Classification into Readability Levels. Implementation and Evaluation (Master’s Thesis), Uppsala University. Available at: http://www.diva-portal.org/smash/get/diva2:131028/FULLTEXT01.pdf (Accessed: 1 January 2021).
Leroy, G. and Endicott, J. E. (2012). Combining NLP with Evidence-based Methods to Find Text Metrics Related to Perceived and Actual Text Difficulty. Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium, 749-754.
LEY 19/2013, de 9 de diciembre, de transparencia, acceso a la información pública y buen gobierno [LAW 19/2013, of December 9, on transparency, access to public information and good governance]. BOE, N. 295, 10^th December. Available at: https://www.boe.es/buscar/doc.php?id=BOE-A-2013-12887 (Accessed: 1 January 2021).
LEY 34/2002, de 11 de julio, de Servicios de la Sociedad de la información del comercio electrónico [LAW 34/2002, of July 11, on Services of the Electronic Commerce Information Society]. BOE, n.166, 12th July. Available at: https://www.boe.es/buscar/act.php?id=BOE-a-2002-13758 (Accessed: 1 January 2021).
Lijun, F. (2010). Automatic Readability Assessment (Dissertation Ph.D.). City University of New York (CUNY). Available at: https://academicworks.cuny.edu/gc_etds/1934/ (Accessed: 1 January 2021).
Mohammadi, H. and Khasteh, S. H. (2019). Text as Environment: A Deep Reinforcement Learning Text Readability Assessment Model. ArXiv: 1912.05957 [Cs]. Available at: http://arxiv.org/abs/1912.05957 (Accessed: 1 January 2021).
Morato, J., Ruiz-Robles, A., Sanchez-Cuadrado, S. and Marzal García-Quismondo, M. A. (2016). Technologies for Digital Inclusion: Good Practices Dealing with Diversity. In B. Passarelli, J. Straubhaar and A. Cuevas-Cerveró (eds), Handbook of Research on Comparative Approaches to the Digital Age Revolution in Europe and the Americas (pp. 332-351). Hershey, PA: IGI Global https://doi.org/10.4018/978-1-4666-8740-0
Morato, J., Sánchez-Cuadrado, S. and Gimmelli, P. (2018). Estimación de la comprensibilidad en paneles de museos [Measuring the readability of exhibit panels in museums]. El Profesional de la Información, 27(3), 570-581. https://doi.org/10.3145/epi.2018.may.10
Muñoz Baquedano, M. (2006). Legibilidad y variabilidad de los textos [Legibility and variability of texts]. Universidad Playa Ancha de Ciencias de la Educación. Available at: https://legibilidadmu.cl/1.pdf (Accessed: 1 January 2021).
OECD. (2016). Skills matter. Further results from the survey of adult skills. Paris: OECD Publishing. http://doi.org/10.1787/9789264258051-en
Ojha, P. K., Ismail, A. and Kuppusamy, K. S. (2018). Perusal of readability with focus on web content understandability. Journal of King Saud University - Computer and Information Sciences, 32(10), 1221. https://doi.org/10.1016/j.jksuci.2018.03.007
Padró, L. (2011). Analizadores multilingües en Freeling. Linguamática, 33(1), 13-20. https://doi.org/10.1111/j.1540-4781.2011.01146.x
Public Law 111 - 274 - Plain Writing Act of 2010. Available at: https://www.govinfo.gov/app/details/PLAW-111publ274 (Accessed: 1 January 2021).
Ramos, J., Fawcett, T. and Mishra, N. (2003). Using TF-IDF to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning, 2003 December. Piscataway, NJ, Vol. 242, pp. 133-142.
Real Academia Española. (2020). Real Academia Española. Available at: http://www.rae.es (Accessed: 1 January 2021).
Schmitt, N., Jiang, X. and Grabe, W. (2011). The Percentage of Words Known in a Text and Reading Comprehension. The Modern Language Journal, 95(1), 26-43. https://doi.org/10.1111/j.1540-4781.2011.01146.x
Serna, Y., Morato, J. and Sánchez-Cuadrado, S. (2018). Evaluación de la comprensión de los paneles interpretativos en parajes naturales [Assessment of understanding of interpretive panels in natural settings]. Scire: Representación y organización del conocimiento-Scire: Representation and organization of knowledge, 24(2), 53-62.
Simplext project. (n.d.). Available at: http://simplext.taln.upf.edu/ (Accessed: 1 January 2021).
Venturi, G., Bellandi, T., Dell’Orletta, F. and Montemagni, S. (2015). NLP-Based Readability Assessment of Health-Related Texts: a Case Study on Italian Informed Consent Forms. In Cyril Grouin, Thierry Hamon, Aurélie Névéol, Pierre Zweigenbaum (eds) Proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis, LIMSI-CNRS, 17 September 2015, New York: Association for Computational Linguistics, pp. 131-141. https://doi.org/10.18653/v1/W15-2618
W3C (2018). Web Content Accessibility Guidelines 2.1. W3C World Wide Web Consortium. Available at: https://www.w3.org/TR/WCAG21/ (Accessed: 1 January 2021).
Witten, I. H., Frank, E. and Hall, M. A. (2011). Data mining: practical machine learning tools and techniques (3rd ed). Amsterdam: Elsevier, Morgan Kaufmann.
Zeng-Treitler, Q., Kim, H., Goryachev, S., Keselman, A., Slaughter, L. and Smith, C. A. (2007). Text characteristics of clinical reports and their implications for the readability of personal health records. Studies in Health Technology and Informatics, 129(Pt 2), 1117-1121.

LICENSE

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.