CLARIAH-EUS
Permanent URI for this community
Browse
Browsing CLARIAH-EUS by Issue Date
Showing 1 - 4 out of 4 results
Results Per Page
Sort Options
Item Garalex(2011-10-13)Item ANALHITZA: a tool to extract linguistic information from large corpora in Humanities research(2017-03-01) Arantza,Otegi; Oier,Imaz; Arantza,Diaz de Ilarraza; Larraitz,Uria; Mikel,IruskietaThe reduced size of corpora in some areas of research is due to the lack of tools to process massively and easily the language under study. In this article, we present ANALHITZA, a tool which is being developed within the Clarink project, whose aim is the creation of linguistic technologies that are useful for research on Social Sciences and Humanities. ANALHITZA has been designed to extract linguistic information online from large corpora in an easy way. Besides, it is a multilingual tool which can process texts written in three languages: Basque, Spanish and English. Moreover, we present three real examples of study where ANALHITZA has been used. The tool can be redesigned or changed, according to the needs of the scientific community in the field of Humanities.Item IGARRITZ: adapted web environment for text prediction in Basque(2024-07-24) Mikel,Iruskieta; Iker,de la Iglesia; Unai,AtutxaStudents with limited mobility, for example, those caused by brain paralysis, have adapted tools for writing texts, such as eye-tracking hardware, to select letters and predict words. For instance, they can use eye-tracking hardware to select letters and choose words predicted by the system. These systems offer resources for writing in Basque, and predictionscan be customized by inputting Basque word lists. The primary aim of text prediction is to alleviate the effort involved in typing and to facilitate faster or increased text production. However, writing with Iris is slower and more challenging compared to conventional typing with ten fingers. Furthermore, predictive text functionality in Basque is comparatively less effective than in other languages, offering minimal quality output. Thus, the objective of this study is to develop an adapted web environment for Basque text prediction employing artificial intelligence techniques. To achieve this goal, we have developed a web interface named IGARRITZ based on the HiTZ/roberta-eus-euscrawl-base-cased language model, utilizing a Transformer architecture. It was re-trained with an educational Basque corpus sourced from student texts, educational texts from Gizapedia, Wikipedia, and Berria. Finally, we evaluated the tool and compared it with another currently available system using texts produced by a secondary school student with cerebral palsy. The results indicate that IGARRITZ enhances text prediction in Basque. The student, who writes using eye-tracking technology, reported that the writing process has become significantly easier and more efficient. Additionally, our automatic evaluation demonstrated improved results compared to the existing system.Item Corperrore euskarazko erroreekin aberasturiko corpusa(2024-09-27) Iruskieta,Mikel; Atutxa,Unai; Osinalde,Mikel; Goenaga,Xabier; Arregi,Xabier; Miranda,Esther; Fernandez,KikeCORPerror euskarazko erroreekin aberasturiko corpusa da.. Lan hau egiteko UPV/EHUko Euskara Sustatzeko Zuzendaritzaren laguntza jaso dugu. Erroreak HABE-IXA corpusaren gainean etiketatu dira INCEPTION tresna erabilita. Hizkuntzen Europar Marko Bateratuaren (EEMB) B1 mailatik C2 mailara eta mailako 3205 erroretik 3.887 errore bitartean etiketatu dira, guztira: 14.000 errore baino gehiago etiketatu dira 480 testutan. CORPerror webgunean hainbat gauza egin daitezke: Errore eta azpierrorearen araberako bilaketak egin daitezke. Mailaka erroreak kontsultatu daitezke. Testuaren gaineko bileketak egin daitezke. Lan-taldea: Mikel Osinalde Unai Atutxa Xabier Goenaga Esther Miranda Kike Fernandez Xabier Arregi Mikel Iruskieta Egitasmo honek Euskara, Kultura eta Nazioartekotzearen arloko Errektoreordetzaren laguntza jaso du.