Thursday, September 02, 2004

Can an android ever be able to translate any sentence from Language X to Y?

Abstract
Across the globe, humans use languages to communicate amongst each other. Some languages are related in one way or another, other languages are not strongly related and have major differences. There are reasons behind this, such as the cultural diversification, geographical locations and the influences of other nations. The world’s population speak 3,000 to 8,000 different languages, and we still don’t whether the existence of other civilizations in the universe exist or not. The android C3PO in the Star Wars film series primly claims that he is familiar with 6 million known languages, thus making himself invaluable to the protagonists in their journeys across the universe. Is C3PO really able to translate from any languages to any other language he is familiar with? Is the huge amount of research in the field of machine translation finally paid off? Or is it one of those science fiction stories written by Brian Aldiss and we still have a long way to go in order to come up with some autonomous robot like C3PO. The question I pose is whether an android can ever be able to translate any sentence from Language X to a sentence in any Language Y? Introduction ‘I see translation as the attempt to produce a text so transparent that it does not seem to be translated. A good translation is like a pane of glass. You only notice that it’s there when there are little imperfections scratches, bubbles. Ideally, there shouldn’t be any. It should never call attention to itself’, Venuti, Lawrence (1995). The field of Machine Translation goes way back into the 19th century, when Carles Babbage persuaded the British government to finance a research on a “computing machine” by promising that it would lead to automated translation of spoken languages. Then, In 1948 Claude Shanon of Bell Telephone Laboratories and Weaver “an American Mathematician” proposed to work on translating Russian to English, and he claimed: “All I need to do is strip off the code in order to retrieve the information.” Then, in 1954 they revealed the program and it was recognized as revolution, since it can translate several sentences from Russian to English. Although, later on it came under huge criticisms by Church: “A canned demo of the worst kind” The reasons behind Church criticisms are: 1. The vocabulary was limited to 250 words only. 2. There are only six rules of grammar without any random data. After the failure of Claude Shanon and Weaver on attempts to produce a machine translator, the work on machine translation has stopped, all of sudden this subfield of Artificial Intelligence was classified by many scientists in the field to be an impossible task to achieve. Later, In 1959 Israeli linguist Yehoshua Bar-Hillel reported that machines produced an average translation that could be useful only in special contexts or in collaboration with humans. Once again, the field was reborn and now it is considered to be one of the major area of Artificial Intelligence. The field of Artificial Intelligence (A.I.) has traditionally had two aims, one being to automate tasks which, when performed by humans are considered to require intelligence. Another goal of A.I. is to create an agent capable of interaction with the world. There are many other aims in addition to these, however these two goals are imperative to my essay. When considering the aforementioned goals their importance is evident, from the fact that providing such automated translation requires the robot to be able to ‘think’ about the tasks required from it, and adapt to the individual properties the language can hold. This process necessitates that the computer be aware of the semantic, syntactic, pragmatics, metaphorical and cultural diversification issues of all the existence languages in our universe. Once all the issues of the above process have been achieved, the robot can then begin the process of providing a level of translation, comparable, if not better than that offered by human translator. Theory1: Androids can translate any given source language X to any target language Y, since there many machine translators currently used with up to 98% accuracy. The advanced research that has been carried out by some of the specialist research groups in the area of Machine Translation has lead to accurate MT systems. An example, is the Delta Translator , since it works with up to 98% accuracy. No body can deny the existence of other MT systems that works with same accuracy as delta but with different languages (i.e. Babel), but the question now is how can we put all these MT systems together and form a one single Universal Translator that can simulated on an android? Plus there are certain debates and ethical issues of how these systems stated above have been evaluated, as well as the performance of the system depends on how ambiguous the input sentence is. A Universal Translator can be described as translator that can have an intermediate language (i.e. universal canonical representation or in other words a Universal Language of thought only the MT can understand). The failure on the first attempts to construct MT systems, made scientist realise the need of Universal Language, Bar-Hillel’s stated in 1951 that developing MT required the discover of a “stock of concepts held in common by humanity,” Others like Warren Weaver believed he could find a “great open basement, common to all the towers” of language. However, the publication of Aspects in the Theory and Syntax by Noam Chomsky gave the answer to what Bar-Hillel’s and Weaver suggestion on having perfect machine translation. He focused on finding the deep structure and surface structure of a sentence by transformations. “The Surface Structure defines the intuitions about constituent structure by given a tree structure representation, and Deep Structure defines the intuitions about what a complete representation of the logical semantic relations is” . Although, Noam Chomsky’s transformational model didn’t fully solve the problem of finding the Universal Interlingua, since the model he constructed works only in one direction “ from depth to surface”, but what linguist realised is that working in the other direction” from surface to depth” would be crucial in order to find the Universal Interlingua. Alan Malby later carried out further research; he worked on finding ways to reduce language to sememes that could be translated to many other languages. However, in 1978 he came to the conclusion that the universal sememes he was looking for do not exist. Providing he now strongly believes that Whorf-Sapir hypothesis which states that “languages are not interchangeable and one’s world view is influenced by one’s language” is correct. This fact is important when considering the context of my theory; it is obvious to conclude that constructing Universal Translator that can be simulated on an Android is not feasible. No, since there are some difficulties in the area of Computational Linguistics that will never to be overcome by machines “i.e. Metaphor, Pragmatics, Cultural Differences…etc”. One of the great linguist and computer scientist classify the task of Machine Translation as AI-Complete problem. The field of Artificial Intelligence has been taking places for the past fifty years, and there are some major advances in the field in areas such as( Evolutionary Computation, Machine Learning, Neural Network…etc). Usually, A.I goes about simulating human behaviours that can solve specific problems, Although we as an anticipators within the field knows that’s the only way to write program that can simulate such behaviours, we need to know the exact mental states our brains goes through in order to simulate it on machines. Obviously, the collaborated work with researchers in the area of Psychology did give basic understanding of the mental states our brains goes through to perform some complex tasks. However, the collaboration and the efforts of researchers in the “Linguistics, Psychology and A.I” didn’t pay off. Hence, the question need to be asked in here”How can we simulate such task on machines if we humans don’t know exactly how do we perform such task. Human beings can infer the exact meaning of a word if this word has more than one meaning (i.e. Homonym), this one of the hardest area in the field of Natural Language Processing and is known as Word Disambiguation. Typically, in order to design an MT system, we need a database that includes all the lexicon entries in the language the MT system dealing with. This database contains some additional information such” syntactic, and semantic properties” of each entries, there are some languages that consider being a language with high vocabularies, English is good example and RogeWrongr came up with something useful called “WordNet”. Ultimately, linguist an A.I researchers every year publish hundreds of papers that critics the weaknesses WordNet has, significant works done towards improving such weaknesses but still there is long way. The question to pose in here, if we human been working on one of the most known languages in the work “English” for the past 60 years and we still long way to go to achieve the global dictionary of English, how can we achieve a global dictionary for the six million language CP3O speaks, I believe it is task beyond human abilities. Last week, I came into an article made by Moroccan player in the club claiming that Aston Villa Football Club Manager treat him as dog. The manager responded with rather surprising response to the player” I have two dogs, and I would love to if any one can treat me the way I treat my dogs”, in other words he was being sarcastic in sense. This incident will raise the question of “can machines understands the right meaning of sentence, even though if the speak was using an indirect speech”. The word order is another issue, Arabic for example is front verb language, and English is not. So far, machine do nat have the capability to deal sufficiently with complexities of language than humans cope with naturally: syntactic, multiple word meanings, the influence of context, irregularity and ambiguity. The following example is illusterated in the following two sentences: Time flies like an arrow. Fruit flies like an apple. The first sentence is metaphorical sentence while the second is a literal description. Note, that the words flies-like are used in different grammatical categories. Machines can be programmed to understand the above two sentences, but it is impossible to distinguish between them. There some myths about other advanced civilization within the universe; I am one of the people who believes in the existence of these civilizations. The question to ask in here is how well developed these civilizations are? Are they smarter than? Or perhaps they still in there early days. With all the effort NASA and other space agencies have put on go establish contacts with those Aliens, they still didn’t come up with results. We hardly as humans reached the moon and now we start our mission to mars, I don’t believe we can reach anywhere else in our solar system with the next hundred years. The question to ask in here” Are they going to find us?” If they going to find us then they are far developed civilization than us, hence the arrival of those aliens can solve the mystery, and might give us great understanding and ways to overcome the difficulties that made the MT field impossible. Steve Silberman thinks that MT with 100% accuracy will arrive in less than 10 years, as humans co-evolve with the technology and adapt to its inherent weaknesses. Steve Silberman, "Hello, World," Wired, May 2000) Physicists are converging on a "theory of everything," probing the 11th dimension, developing computers for the next generation of robots, and speculating about civilizations millions of years ahead of ours, says Dr. Michio Kaku, author of the best-sellers Hyperspace and Visions and co-founder of String Field Theory Maybe in thirty years times, with the advance of Artificial Intelligence and Computer Science we might see a more intelligent machines based on different architect of the current machine we use ”i.e. Machines based on Biological Neurons, Quantum Machines…etc” Ray Kurzweil, predicate the advance in the research area of Artificial Intelligence will result on a non-biological units that will exceed the human intelligence by the next couple of decades. He stated “By 2019, a $1,000 computer will match the processing power of the human brain about 20 million billion calculations per second. This level of processing power is a necessary but not sufficient condition for achieving human-level intelligence in a machine. Organizing these resources the “software” of intelligence will take us to 2029, by which time your average personal computer will be equivalent to a thousand human brains”. One of the methods to designing intelligent machine will be to copy the human brain, the machines will seems like human, by combining the nanotechnology that will give the ability of creating physical objects atom by atom. Other methods can be considered, such as Neural Implants by embedding a chip to our brain with the knowledge we desire to learn. This will result on improving our sensory experiences, perception, memory, and logical thinking. It seems like the future is more promising than we expected, although the question to raise in here” Are all these predictions are feasible?” the answer by many scientist is “No”. Maybe in the weekends only, since the total languages in our universe estimated to be about 6 millions language, and such task requires a great universal computers that on based on parallel network of machines, hence the use of the internet on weekdays might slow our universal machines. Last year, IBM have announced the introduction of the new product(i.e. IBM WebSphere Translation Server. It can translate English into Spanish, French, German, Italian, traditional Chinese, Japanese and Korean, although works still going on to include another set of languages “Modern Proto-Semitic” which includes Arabic, Hebrew. So, another couple of years and we accomplished our desirable task, sadly we not since it only focuses on business translation rather general translation. The idea can lead us to another good and valid idea, by running our desired Universal Translator on every existing server on earth, we can solve the mystery by the super speed we get from those servers. In fact, that will solve another major issue which is the “Cultural Differences across languages”, so how we actually go about solving this. Traditional MT systems uses approachechs such as “ Interligua Approach or The Transfer Approach”, although these approaches has proven there limitations. A New approach was born and now days it proved its worth and it is called the Example Based Approach, its based on the ideas of learning and inferring from existed corpus that has been translated to many languages. Improving the accuracy of the Example Based systems can require huge effort, and it is endless task since we have to provide a well translated set of corpuses of all languages. However, if we are distributing our Universal Translator on every server on earth, and with growing number of people spending hours each day on local Chat server, why don’t monitor those chat servers and treat them as the corpus for each specific language the Example Based System needs. With the avalanche of new people getting online is still growing, and with the amount business using in the weekdays that can take huge amount of the servers speed , the Universal Translator wont works only on the weekend. Since, the internet traffic slow down, hence many of the business closed in the weekend. Depends on the Input language and the Target language, for example machine translator between Spanish and Portuguese already exist with great accuracy since these two languages have a lot of commons in there domains. Although, it would be a hard task to translate from English to Arabic, since the differences in Arabic is far more than the similarities the two languages share Languages differs from one to other, this difference can be syntactic, semantic difference …etc. However, some languages share some properties perhaps they originated from one original language and evolved over the years. Linguist worked on grouping languages that originated from the same original language, an example of that is the Indo-European and Proto-Semitic languages. MT systems claimed high accuracy when the input and the target languages the system is functioning from the same group (i.e. Indo-European), since there aren’t many issues the system should deal with such as (dual case in Arabic, word order in Chinese…etc). Now, we know that MT systems perform better if the system is dealing with languages belong to the same group. What about If we want to translate between language from different group? The evolution of languages can be influenced by Cultures, Geographic Location…etc Hence, some languages from one group can influence other languages from other groups, and example of that the influence of the Arabic language on the Persian. Assuming we want to translate a sentence in English to a sentence in Arabic with 100% accuracy, this can be done by translating the English sentence to an intermediate language within the Indo-European group that has been influenced or had major influence on Arabic which is Persian in our case. Conclusion: From the above discussion, it seems clear that even with current capabilities, machines will never be able to provide us with Universal Machine Translator. The reason maybe because of the major issues such Semantic, Cultural Diversification, Word Sense Disambiguation….etc Although a human wouldn’t dream of reaching the moon a hundred years ago, and maybe this is the case with the dream of having universal machine translator. The predictions of some of the scientist within the field seems more optimistic, and if any of these prediction becomes true then the chances of fulfilling our dream will be more realistic.