[GALA Valencia 2024] Exploiting LLMs for what they weren’t designed for: the Use Case at EFE, the Largest News Agency in the Spanish Language

23 Apr 2024

This event has expired, video available

To view the recording, you must be logged in with a GALA Member account or have purchased the webinar.

LLMs have taken the language by surprise with their ability to generate very plausible translations in high-resource languages. However, there are challenges about terminology management, customization and above all speed and cost. NMT, still a very viable solution, is being dropped in favor of AI-hyped translation solutions by decision-makers.

Pangeanic co-presents one of its key solutions for successful full-MT with EFE, the largest news agency in the Spanish-speaking world and 4th in the world. EFE has moved from a passive MT usage to a more NLP solution where there is added value from the things an LLM can do well at scale: act as a human post-editor after fine-tuning, a post-editor capable of running quality-estimation on the NMT input, providing valuable metrics from domain, NER analysis, data classification, and smoothing out full document generation using LLM capabilities. The presentation will focus on the needs of EFE as the largest news agency in the Spanish language and 4th in the world, their growing multilingual needs for news processing and publication as well as the lessons learnt in fine-tuning open-source Llama2 for a task it was not primarily designed for, and the use of side systems (RAG).

Host organization: Globalization and Localization Association

Event Speakers

Manuel Herranz
Pangeanic

MIT in Entrepreneurship, Manuel worked for major automobile manufacturers and power co-generation in the UK in the 90’s with postings in Argentina, Mexico and his native Spain. His background in machine translation comes from his mission to automate language processes for B.I Corp., the Japanese corporation for which he was European Director from 1998-2005. He has traveled to Japan and China extensively. Since 2009, he has focused on the development of Natural Language Processing technologies to provide process automation and true value to clients. A frequent speaker at industry events, Manuel’s areas of interest cover statistics, deep neural networks, adaptive technologies, pattern recognition and deep learning applied to Natural Language Processing. His interest in data acquisition led him to make of Pangeanic a founding member of TAUS and data-sharing initiatives. Manuel is also committed to supporting NGO actions like the Malima Project for primary education in Central Africa, as well as Translators Without Borders, medical research into rare diseases and sports events. Manuel is a double graduate from Manchester University.