HUMAN INTERFACE TRANSLATOR FOR MACHINES
Field of the invention This invention relates to the field of providing input/output translation of human language for human-machine interaction. Such machines include, but are not limited to, industrial .machines and similar computer- controlled devices that receive textual input and provide textual output to display devices such as video screens.
Background A great many machines, such as industrial machines, from a large number of manufacturers and countries, are in service and arrive on the global market each year. Typically, the user (i.e., operator) interface of such a machine is presented in the language of the country of origin (or intended sale) of such machine and perhaps a secondary language. For example the control screen outputs and control menus of machines that are made in (or for sale in) the United States of America are predominantly presented in English and, increasingly, Spanish. Additionally, a manufacturer may provide versions of a machine with user interfaces in a variety of other languages commonly used in countries where the manufacturer expects sizeable sales. In an increasingly global economy, manufacturers find themselves having to support a large number of languages with vastly different structures and requirement. This is costly and problematic. Industrial machines are not made in large quantities. Consequently, the design and development resources of even the largest of manufacturers can easily be strained if a large number of languages need to be supported. In addition, multi-lingual support is typically well beyond the core competency of most industrial machine manufacturers. More acutely, the user interface must be continuously updated and maintained by each manufacturer for up to its full line of products for each of the supported languages. The reverse of this problem is faced by clients in countries that do not generate sufficient sales to a specific manufacturer to make it cost-effective for the manufacturer to support the local languages of these clients. While white collar workers, perhaps, can be expected to be functional in foreign languages other than their own, such is not typically the case for highly skilled blue collar workers that operate industrial machines, even in well
developed countries. So there may not be a skilled work force available in the language of the machine or workers may command a premium wage. Additionally, when a machine owner desires to sell the machine, the potential buyers are typically limited to those that will have operators fluent in the language(s) available in the machine's installed interface. Though there may be a better market in another country where other languages are spoken, the owner cannot take advantage of those markets. A need thus exists for an approach that will permit cost-effective post-manufacture
(i.e., so-called "after market") adaptation of industrial machines to provide human-machine interfaces in languages other than those originally built in. Preferably the target languages will include a wide range of languages having different characteristics (e.g., direction in which text is read).
Summary of the invention As more particularly discussed below, there is shown and described a "universal" human language translator that can be retrofitted to many devices, including a wide range of industrial machines that employ video interfaces. The function of this translator is to provide human language translation of the user interface of the machine from the native language of the machine to nearly any other target language. The translation of user input and output is done in real time and is transparent to the user and does not require modification of the machine. This translator can be retrofitted to a large base of devices (including industrial machines) (generically called "machines or devices herein) even without the active participation of the original manufacturer. The broader usefulness of this solution is that it provides a way to standardize multi-language support and centralize it. This allows each manufacturer the ability to advertise availability of his product in any language without the manufacturer itself supporting any language other than the manufacturer's native language. Language support can be a third-part function. The solution is not specific to any source or any target languages, their fonts or characters. The cost of the added hardware to retrofit a machine usually will be an insignificant percentage of the typical value of the majority of industrial machines. Thus, one aspect of the invention is a method of providing, from a video output signal of a machine which supplies (i.e., will generate on a display device) video output screens displaying in one or more defined fields text in a first language, corresponding video output screens displaying in like fields corresponding text in a selected second language,
comprising: capturing the video output signal of the machine; identifying each screen in said video output signal; detecting in each screen each textual (i.e., containing alpha, numeric or alpha-numeric data) field thereof; for each said textual field thereof, identifying textual contents of said field; and generating video output screens corresponding to each of the identified screens of the machine video output signal, each video output screen having substituted therein corresponding text in the selected second language in place of the original text in the first language. In such a method, identifying textual contents may include identifying, from among a predetermined set of possible textual contents, the textual contents by detecting and matching identification marks thereof with identification marks of the set of possible textual contents. Other aspects of the invention include apparatus for carrying out such a method. In still other aspects, the invention involves methods and corresponding apparatus similar to the foregoing except that instead of identifying and substituting each textual field, only certain selected field(s) is(are) identified and substituted. A further aspect is a method of providing, from a video output signal of a machine which supplies video output screens displaying in one or more defined fields text in a first language, corresponding video output screens displaying in like fields corresponding text in a selected second language, comprising: detecting in a screen, of said video output signal, a textual field thereof; for said textual field thereof, identifying textual contents of said field; and generating a video output screen corresponding to the detected screen of the video output signal, said video output screen having substituted therein corresponding text in the selected second language in place of the text in the first language. Identifying textual contents may include identifying, from among a predetermined set of possible textual contents, the textual contents by detecting and matching identification indicia thereof with identification indicia of each member of a set of possible textual contents. Such method may be performed for each screen of the video output signal or for a subset of screens of the video output signal. Each textual field may be identified according to a field typing classification system. The field typing classification system may include one or more of Static Fields comprising text or graphics that are fixed, Dynamic Fields comprising text or graphics which are changeable in response to an operating state of the machine, Pass-Through Fields comprising text or graphics on the screen which need not be translated or altered from their or its original appearance, and Numerical Fields comprising of a number of single-digit Dynamic Fields. A
Dynamic Field may have a constant location in the screen. The video output signal may be of digital or analog format. Implementations of such methods may further include receiving input key commands in the second language, translating received input key commands from the second language to the first language, and supplying the translated input key commands to the machine. The translation of input key commands may use translations of video output screen text that is also used for translating video output screen content. Generating a video output screen in the foregoing methods may include receiving a static replacement screen, in a selected second language, for each screen of the video signal identified with a predetermined set of indicia, identifying indicia in the video signal and substituting a corresponding replacement screen when the predetermined set of indicia is identified. Detecting in a screen, of said video output signal, a textual field thereof may comprise identifying a minimal set of indicia identifying a unique screen. The minimal set of indicia comprises as few as a single pixel identifying one or more fields uniquely. Identifying a minimal set of indicia identifying a unique screen may comprise calculating a hash function for lines of the screen and comparing the calculated hash functions to hash functions for lines of screen from a database. Such methods may further include confirming the validity of a text field identification prior to generating a video output screen having substituted therein corresponding to text in said text field. Confirming the validity of the text field identification may comprise computing a checksum on the content of the field, determining whether said checksum matches a stored checksum for possible field contents, and generating a video output screen having substituted text only if the computed check sum matches a stored checksum, whereby inaccurate substitution is prevented. According to yet another aspect, there is shown a method of providing, from a video output signal of a machine which supplies video output screens displaying in one or more defined fields a graphics image in a first language, corresponding video output screens displaying in like fields a corresponding graphics image in a selected second language, comprising: detecting in a screen, of said video output signal, a predetermined graphics image; and generating a video output screen corresponding to the detected screen of the video output signal, said video output screen having substituted therein a corresponding graphics image in the selected second language in place of the graphics image in the first language.
The corresponding graphics image may comprise the predetermined graphics image but in a different location than the predetermined graphics image occupied in the detected screen. According to a still further aspect, there is provided an output translation module comprising: means for receiving a video signal of a predetermined format, the video signal containing a representation of textual information for display; means for identifying in the video signal textual information in a first language; means for identifying corresponding, pre-stored textual information in a second language; and means for generating a second video signal corresponding to the first video signal and having the representation of textual information in the first language replaced by a representation of textual information in the second language. In the output translation module, the means for receiving a video signal may comprise an interface to a video cable which supplies video signals according to an industry-standard signaling specification. The module may further include means for receiving input key commands in the second language and outputting corresponding input key commands in the first language. Each of the foregoing "means" may be implemented in any manner apparent to one skilled in the art of electronics. The means for identifying textual information in a first language may uniquely identify screens by locating identifying indicia, wherein said indicia identify fields on the screens. The means for identifying corresponding textual information in a second language may identify a static replacement screen comprising all Static Fields of an identified screen in the video output signal, wherein the corresponding Static Fields of the replacement screen are translated in the second language. The means for generating a second video signal may write replacement text, in the second language, in a replacement screen, wherein the replacement text is written in the same location as that of the original screen. The video signal may include information describing a screen in a screen description language, such as OpenGL. According to another aspect, the invention includes, in combination, a display device and an output translation module as set forth above, the output translation module providing video output signals to the display device. According to yet one more aspect, there is shown an output translation system comprising: a video interface which receives and decodes video signals in a predetermined format; a memory device comprising at least one memory buffer storing at least a portion of a video screen frame; a processor unit that processes the decoded video signals, stores in the memory device corresponding signals wherein at least selected contents of the decoded video
signals provided in a first language are translated into a second language, and reads out of the memory device said corresponding signals so as to provide translated video signals; and an output device which receives and displays the translated video signals. Such a system may further include an input device interface which receives and decodes operator input data and wherein the processor also processes the decoded operator input data and translates at least a portion thereof from the second language to the first language, and the data in the first language is supplied as output input data of the output translation system. The video interface may digitize received video signals and temporarily store all or part of input screen frames in the at least one memory buffer. The output processor unit may read, translate and overwrite data in the at least one memory buffer. The processing unit may comprise one or a plurality of processors. The video signal received by the interface may include an expression of a screen image in a screen description language which is decoded by the video interface. In yet another aspect, the invention comprises a method for translating output screens from a machine, comprising identifying a field in a video output signal by comparing field- identifying information from an offline process with features extracted in an online scan process. The offline operation may comprise execution of an offline algorithm which generates indicia uniquely identifying fields. The fields identified by the offline algorithm preferably are identified by a field type according to a field typing classification system. The . field types according to the field typing system may include one or more of Static Fields comprising text or graphics that are fixed, Dynamic Fields comprising text or graphics which are changeable in response to an operating state of the machine, Pass-Through Fields comprising text or graphics on the screen which need not be translated or altered from their or its original appearance, and Numerical Fields comprising of a number of single-digit Dynamic Fields. At least one textual field in a screen may be a Dynamic Field comprising text or graphics which are changeable in response to the operating state of the machine. The Dynamic Field may be located at a constant location in the screen. At least one textual field in a screen may be a Pass-Through Field comprising text or graphics on the screen which need not be translated or altered from its original appearance. At least one textual field in a screen may be a Numerical Field comprising of a number of single-digit Dynamic Fields. The offline process may generate a processing map, wherein the map contains locations of fields within an identified screen template that appear the same within all screens based on the template.
The processing unit may replace with a static replacement field, in a selected second language, each field of a video signal in a first language identified with a predetermined set of indicia. According to one more aspect, there is provided a method of providing, from a machine that generates video output signals to a display device via a cable, the video signals containing data in a first language, corresponding video signals containing conesponding data in a second language, comprising: connecting the cable from the machine to an input interface of a translator module; and connecting an output interface of the translator module to a display device.
Detailed description of the drawings In the drawing, Figure 1 is a high level block diagram of a prior art machine having a video interface, showing typical connections of output, input and processor elements thereof; Figure 2 is a high level block diagram showing the machine of Figure 1 as adapted to employ a translation module as taught herein, for output translation; Figure 3 is a high level block diagram showing the machine of Figure 1 as adapted to employ a translation module as taught herein, for both input and output translation; Figs. 4 and 5 are block diagrams depicting processing and memory hardware for use with certain aspects and embodiments for practicing the present invention; Figure 6 is a block diagram depicting an example of hardware for practicing certain embodiments for practicing the present invention; Figs. 7 and 8 are illustrations of examples of output screens which may be translated according to certain aspects of and embodiments for practicing the present invention; Figs. 9 and 10 are illustrations depicting examples of a text field which may be identified and translated according to certain aspects of and embodiments for practicing the present invention; Figure 11 is a high level block diagram of another an example of an implementation of certain aspects of the present invention, wherein the video output signal is provided in a page definition language such as OpenGL; and Figure 12 is a block diagram of a further embodiment for practicing certain aspects of the present invention.
Detailed description Industrial machines fall into one of two categories. The first are the old type manually controlled machines with a number of rotary handles. The second category is the newer Computer-Numerically-Controlled machines, referred to in the trade as CNC Machines. The term "CNC Machine" refers to any machine that has built within it a computer-driven controller that automates part or all of the functionality of the machine. An example of a CNC Machine is a CNC Milling Machine that cuts and profiles metal parts and molds. Other examples of CNC Machines, to name a few, include wood cutting and shaping machines, turning lathes, pick and place machines and other electronics products production line machines as well as ovens, x-ray machines and a number of types of medical equipment. This invention is applicable to any device or machine with an electronic control panel and/or a video output, including, but not limited to, CNC machines. All such machines contain a user input device and a user output device. The user input device could be (for example) a keyboard or a panel with switches, knobs and buttons. The user input device could also be a touch screen interface. A user output device (for example) could be a cathode ray tube (CRT) monitor or a liquid crystal display (LCD) or other image screen. The operator operates the machines by selecting functions and modes and entering parameters using the input device. The operator monitors the status and function of the machine by viewing the output device. For such machines, the user input is predominantly menu selection driven (as opposed, for example, to unlimited context input). More importantly, the screen output of such machines, regardless of the sophistication of the machine, is quite limited in context and is also fully determined at the time of the manufacture of the machine. This greatly simplifies the translation task since all of the text variations of the output device, and the text locations, are known in advance. To make use of this situation, one needs a standardized way to interface the translator to the target machine. Without a standardized interface for the translator to plug into, it would be necessary to design a special version of the translator for each manufacturer's machine. That is certainly possible, but less commercially desirable than connecting via a standardized interface so that the translator (i.e., translation module) is usable with a larger group of machines. Fortunately, there is a standardized place to plug the translation module into, that exists on the majority of these machines today. This place is the interface to the video cable that leads to the output display device (e.g., CRT or LCD).
As mentioned earlier, the target machines typically have a CRT or an LCD screen for output. Given the realities of economies of scale, it is probably not practical for most machine manufacturers to make in-house the CRT or LCD monitors that are used in their machines. Predominantly, machine manufacturers purchase such displays and install them into their machines. Hence, the electrical interfaces to these displays are not unique to each manufacturer but nearly always conform to a limited number of standards that are supported by the major manufacturers of display devices and computerized systems. The limited number of standard electrical interfaces is limited by the same realities of economies of scale. Figure 1 shows a typical arrangement 100 for most of the machines today of the input (110) and output devices (120) and the machine's processor board (130). An example of such a standard interface is the VGA standard interface that is familiar to most personal computer (PC) users and allows the user to plug a monitor (LCD or CRT) from a whole host of manufacturers into any PC. The VGA specification defines the signals to be communicated via a cable (interface) 160 between the monitor and the computer. The same holds true inside these industrial machines where the interface between the CNC controller within the machine and the screen conforms to one of these standard interfaces. Industrial machines however have little need for high resolution and large color variation in their displays; therefore, one finds them lagging behind PCs in their use of the latest video interfaces. For example, a large number of machines on the market today still use the old IBM monochrome video standard. Refening to Figure 2, a Translator Module 210 according to some embodiments of the invention sits "in-line," in the path of the cable 160 between the video driver circuit 170 from the processor board 130 of the machine and the CRT or LCD monitor 120. The video cable 220 (like cable 160) that comes from the controller (i.e., processor board) of the machine and that conventionally was plugged into the output monitor is interrupted and the video signal is diverted to the input of the Translator Module. In turn, the Translator Module has a video output which is used to drive the CRT or LCD monitor of the machine via cable 230. Assuming that the display technology employs raster scanning, then functionally, the Translator Module captures each raster frame (from the video driver circuit) that was intended for the CRT or LCD monitor. (Note that a "frame" may comprise one or more video fields. For interlaced video, typically a frame comprises two fields. For non-interlaced video, a frame typically has only one field. The terms "screen" and "frame" are typically
used interchangeable herein unless context indicates otherwise.) It then produces a substitute raster frame that corresponds to the captured frame but displays text content in the targeted (i.e., previously selected) new user language. The mapping from the native language of the machine, English for example, to the operator language, Chinese for example, is performed within the Translator Module in real time and transparently to the user. Translation of the input device can be performed in any suitable way, such as by silk screening, or affixing decals, in the operator language on, or near, the input device keys. In the case of a touch screen input interface, the translation of the output screen automatically covers translation of the input interface. For the majority of machines, the anangement in Figure 2 is sufficient to provide a functional, translated user interface to the operator in his or her language. For some machines, the need may arise to also translate the input key commands. In this situation, the cable 310 from the input device 110 is also routed through a Translator Module; as illustrated in Figure 3, this may be the same Translator Module 210. Of course, separate Translator Modules may be employed for input and output. Input translation involves intercepting text, keystrokes, touch screen button selection, etc. and producing conesponding output. Output translation usually involves transformation of video signal content. Of course, if the video driver is in the display device, then the output of the processor will be a text or digital output and the same circuitry may be usable for both input and output translation. The need to route the input device cable through a Translator Module arises in fairly limited situations. Fortunately, the arrangement in Figure 2 will achieve the goal of transparent translation for the overwhelming majority of machines and their applications.
Raster Image Assuming the video driver circuit of the processor does not provide a digital signal, • then regardless of the video interface standard employed, the electrical signal in the video cable likely canies information that describes a raster or bit map image that is to be displayed by the video CRT or LCD monitor. The raster or bit map image on the monitor basically consists of a rectangular grid of "dots", either monochrome or colored, at various intensities. These dots are known in the computer industry as picture elements, or pixels. The resolution of a monitor reflects the number of pixels in the horizontal and vertical directions, stated as (H x N). There are a number of current and dated video interface standards. Examples of such interfaces include the old IBM monochrome and the IBM CGA, the more recent VGA,
SVGA, XVGA and WXNGA. Fundamentally, all of these interfaces share the same underlying operating principle, varying in parameters such as resolution, color space, scanning rate and screen refresh rate. As widely known, the video controller generates a sequence of frames that are then sequentially displayed on the target monitor. The number of sequential frames displayed per second is known as the refresh rate. Typically refresh rates start at 50-60 frames per second and can go up to as much as 100 frames per second or higher for high end displays and video drivers. To explain the implementation of the Translator Module, it is helpful to describe in more detail what one encounters at video interfaces. Consider first the monochrome interface. The simplest monochrome video interface to explain is one that contains three separate signals encoded on three separate wires. The signals are Video, HSync, and NSync. The voltage on the Video signal controls the intensity of the electron gun within the CRT monitor and hence controls the brightness of each pixel as it is drawn on the screen. Predominantly, the higher the voltage on this line at the time a certain pixel is drawn, the brighter the pixel would be. The simplest version of the video signal is when the Video signal can take one of two values only. If the Video signal is at the high voltage value at the- time a certain pixel is drawn on the screen, that pixel is drawn in full brightness. If the Video signal is at the low voltage value, the pixel is drawn at full darkness. The type of video is a bi-level monochrome video. In a more versatile version of this interface, the Video signal can take on any of various voltage values within a full darkness and full brightness range. This allows the pixels on the screen to be rendered in gray-scale tones depending on the Video signal voltage. This type of video is a gray-scale monochrome video. Color CRT displays, on the other hand, have three electron guns within them, usually the Red gun, the Green gun, and the Blue gun. Therefore, video interfaces that drive color displays predominantly have three Video signals. As in the case for monochrome, these video signals could be of the bi-level type or of the continuous type. Color video using three separate signals is usually refened to as RGB video. The HSync signal is the horizontal synchronization signal for the CRT monitor. It takes on one of two voltage levels at any given time. It is normally held at the high voltage level while the electron gun is horizontally traversing the visible part of the screen. At the end of a horizontal line scan, the HSync goes low signaling to the CRT to retrace back to the beginning of the next line. HSync returning high again starts the scanning of the new line.
As the gun scans the line, its intensity is modulated according to the voltage level on the Video signal or signals as mentioned earlier. The NSync signal is the vertical synchronization signal for the CRT monitor. It takes on one of two voltage levels at any given time. All throughout the scanning of all the lines on the screen, the NSync signal is normally held at its high level. At the end of the horizontal scan of the most bottom line on the screen the VSync signal goes low, signaling to the CRT to retrace the electron gun vertically from the bottom to the top of the screen. The VSync going high again signals the beginning of the scan of a new frame. There are two main variations on the above. The first variation combines the Video signal, the HSync signal and the NSync on the same waveform. This is refened to as Composite Video and is what is found in the majority of household video players and recorders. In addition, composite video (under the most wide used standard) encodes the color signal as phase shifts of a color-burst sub-canier on the same wire. Few if any industrial machine manufacturers still use composite video since it results in comparatively poor text and graphics rendering. However, in case the machine uses composite video, there are a number of well known techniques and also commercially available integrated circuits that convert composite video into RGB video, where the vertical and horizontal synchronization signals are extracted and output as distinct VSync and HSync signals. The second variation encodes the Video signal, RGB or monochrome, as a digital signal rather than the analog signal discussed so far. Regardless of whether it is encoded in analog or digital, both formats have to be turned into digital format before processing in the Translator Module. This is accomplished in a conventional manner by appropriate circuitry and processor elements in the video interface which receives the Translator Module's input video signal (i.e., the machine's video output signal). After processing by the processor(s) in the Translation Module, the signal needs to be converted to a video format which is compatible with the CRT monitor in use; this may be the input video format or another format. On the video input side, for a bi-level video signal, converting to digital preferably is done using 1-bit level comparator. For gray-scale video, a multi-bit Analog-to-Digital (ADC) converter may be used. RGB color may be converted using three Analog-to-Digital (ADC) converters in parallel, one for each of the Red, Green, and Blue signals. On the video output side, for bi-level video signal, converting to analog from digital is done using voltage shifters. For gray-scale video, a multi-bit Digital-to-Analog Converter
(DAC) is used. RGB color is converted using three Digital-to-Analog converters (DACs) in parallel, one for each of the Red, Green, and Blue signals. Similar approaches will be apparent for other display technologies such as plasma displays and LED matrix displays. All of the above techniques are widely known in the field and commercial products are available to perform the conversion operation. Please reference, for example, "Video Demystified", ISBN1-878707-23-X, which is hereby incorporated by reference. Also refer to datasheets from Analog Devices, Inc. of Norwood, Massachusetts, USA, for a number of examples of high-speed ADCs and DACs, such as the ADC with part number AD9224 and the DAC with part number ADV7125. While the above discussion refened to CRT monitors, the discussion applies equally to LCD monitors. Even though LCD screens need not scan the frame in the same fashion as a CRT does, most LCD displays still expect their video input signals to be the same as for a CRT, to maintain compatibility. The video input interfaces of some LCD screens receive a Video Clock input in addition to the Video, HSync and VSync input signals. This simplifies the task of the
Translator Module. For most video interfaces today, the pixel clock signal is not provided on the interface. In order to sample the signal conectly, this pixel clock needs to be generated within the Translator Module. In the event that the video interface does not provide this clock, the Translator Module may use the HSync signal as a reference to drive a Phase- Locked-Loop (PLL) that generates the desired pixel clock. In one implementation of the Translator Module, the pixel clock was deduced to be 14.161MHz but was not made available to the Translator Module on the video interface. It was also deduced that the HSync frequency was exactly (14.161MHz/768). The Translator Module had a PLL with two inputs. The first input of the PLL was driven with a tunable 14.161 MHz Crystal Oscillator. The reference input of the PLL was driven by the HSync signal. Internally the PLL used 768 as a divider on the 14.161 MHz to lock it to the HSync reference thus generating a 14.161 MHz clock locked in frequency and phase to the pixel clock. This technique has been widely used in the past and is widely documented in the field.
Translator Module Architecture Figure 4 shows a high level architectural diagram of the Translator Module 210. This architecture is for the case when only the video needs to be routed through the Translator Module, as shown in Figure 2. The diagram represents a high level description and ignores
for now a number of details, including the processing of the Vertical and Horizontal Sync Signals. A description of the processing of the Horizontal Sync and Vertical Sync signals is covered in the "Detailed Block Diagram" section below. In Figure 4, the diagram shows an input signal 410 labeled Video Input. This input signal comes from the video generator (driver) circuit of the industrial machine and is the signal that, but for the Translator Module, would drive the monitor. The output signal 420 of the module 210 is labeled Video Output and is connected (though the connection is not shown in Figure 4) to drive the display device (e.g., CRT or LCD monitor) of the machine. Ml and M2 are memory buffers, with each capable of storing one screen frame. This anangement is refened to as double buffering and is widely known in the computer industry. A processor 430 writes to and reads from memory buffers Ml and M2. (Note that the processor need not be limited to any particular kind of implementation. For example, it may be a programmed general purpose digital central processing unit - CPU, a programmed special purpose processing until, an application-specific integrated circuit (ASIC), or any of a variety of other kinds of processing elements or combinations of elements suitable for manipulating data.) The video frames received as input are alternately stored in the Ml and M2 buffers. At the same time, the Video Output signal alternately streams the contents of buffers Ml and M2 to the CRT or LCD monitor. Chronologically, as buffer Ml is being filled, the processor reads, translates and overwrites the data in buffer M2. At the same time, buffer Ml content is sent out to the Video Output signal. The bytes that are output firom buffer Ml to the Video Output signal come from locations just ahead of locations that are cunently being filled by a new frame from the Video Input signal. After one frame period, the Video Input and Video Output signals switch to streaming in and out of buffer M2 while the processor starts working on buffer Ml. The cycle repeats after each frame.
Latency and Throughput To guarantee satisfactory user experience, value limits should be specified for the latency and throughput of the Translator Module. Latency is the length of lapsed time from the moment the beginning of a new frame enters the Translator Module to the moment the beginning of the translated version of this frame is output to the screen. Throughput, on the other hand, is the maximum number of frames per second that the module is capable of processing.
Ideally, latency would be kept at a mimmum. Driving latency down, however, drives up the required hardware performance and cost in the Translator Module and limits the algorithmic complexity that can be accommodated within this latency period. The most straightforward design is to set latency to roughly two frame periods of the double buffering architecture. For example if the refresh rate of the screen is 50Hz (50 frames per second), then two-frame-period latency would be 2/50 = 40 milliseconds. (Please note that in the double buffered architecture, one frame period is available to process each frame even though the latency is two-frame periods. This is because one frame period is always taken to store the new frame and output the newly translated one.) It is possible to overlap the storing and output streaming operations with the translation and mapping operations. This has the effect of reducing the latency to one frame period. However, the processor has to always process that part of the buffer between the locations that just got loaded with a new frame and the locations where the output is being read out. The upper limit on latency, however, is motivated by the user experience and can be much longer than two frame periods. In fact, latency can be equal to several frame periods so long as the user continues to perceive the screen output to be instantaneous in response to his key presses and commands. When the latency increases beyond two frame periods, it is really as a result of the processor not being able to process one frame in one frame period. Please remember that one of the frame periods is used up by the streaming and storing operations. If the processor takes more than one frame period to process a frame, at least two options are available. The first is to increase the number of buffers and processors (or processor capabilities). Figure 5 shows an example of a high level architecture to achieve this approach. The extra buffers (from M3 through Mn) will be needed to store the frames waiting to be processed while the extra processors 430-2 through 430-k are needed to bring the aggregate throughput to more than one processed frame per one frame period. Since this anangement would be used when the computing power of one processor is less than that which is needed to process one frame in one frame period, the number of processors used would naturally exceed the number of memory frame used. A second option is to not try to translate every input frame. Rather, the Translator Module would translate one frame and skip a set number of subsequent frames. For example the Translator Module can translate one frame of each three input frames. If the input frame rate is 60 Hz, the throughput now would be 20 Hz. The other two frames are simply
discarded. This would give the processor three times the time to complete its task. The Translator Module output refresh rate is kept equal to the input refresh rate by repeating the output of the translated frame three times. This assures that the screen does not flicker unpleasantly despite the fact that two of each three input frame are discarded. A high level architecture for this option would be very similar to the one shown in Figure 4. An artifact to periodically dropping input frames and repeating the translated ones is that any motion that is depicted on the CRT or LCD monitor of the machine would appear jerky. Fortunately, the output of industrial machines hardly ever contains rich high-speed animation. Predominantly, the screen is filled with static text and static graphics, or at most a blinking field of text and/or graphics. So, skipping frames will go unnoticed by the operator. Fortunately, the processing power of affordable hardware available on the open market today makes it possible for the Translator Module to have a throughput equal to the input refresh rate. Finally, one might get the impression from the foregoing discussion that the buffer size must be an integer multiple of the area needed to store a single frame. This is not necessarily true. As is widely known, conect translation of text from one language to another relies on the correct detection and deduction of the context. Fortunately the context cm the screen output of a machine is fairly limited and hence the context is predominantly known a priori. Moreover, the text output for a location on the screen for a given screen shot comes from a very limited choice of words, phrases or sentences. Loaded with this machine specific knowledge, the Translator Module is able at many times to translate a full field of text on the screen (not to be confused with a video field, and meaning an area with related text) after only reading the first character or even a fraction of a character. With this in mind, one can see that the video buffer storage space of the Translator Module need only be large enough to accommodate a fraction of the full frame for the Translator Module to be able to perform its translation operation conectly. This attribute reduces the memory size requirement and thus reduces the hardware cost of the Translator Module. This is important, as the resolution and color capability of the output devices used on industrial machines keep improving and hence requiring larger and larger frame buffers. A corollary to using a fractional frame buffer is that latency is reduced below one frame period as well, thus improving performance. Thus far, there has been described Translator Module architectures in which the memory buffer that stores an input video frame is equal in size to the memory buffer that is used to store a video output frame. They do not have to be equal. The Translator Module
can still function even if the space allocated to the video input frame is made smaller. The Translator Module needs only to examine a few places within the input video frame for it to perform its function conectly. For example, the Translator module can still function if only every alternate horizontal scan line of the input video frame is stored, rather than the whole frame. Furthermore, the translator module still functions conectly if only every alternate pixel within the same line is stored, rather than each and every pixel . Storing every other line saves half of the required input memory. Storing every other pixel saves an additional half. Of course this technique can be extended to save one line or pixel every three lines or three pixels, and so on; the limit is the point at which throwing away more of the input video frame results in insufficient data being stored in the input video frame to allow proper generation of the translated output frame. The above optimizations exploit the cunent characteristics of industrial machines. Namely that the context is limited and predetermined and that the output is somewhat static. The double buffered architecture shown in Figure 4 will be used as the example in the remainder of this discussion while explaining the other aspects of examples for practicing the invention and its principles. This choice is made arbitrarily since th_e ideas to follow apply equally to any of the buffering, latency and throughput configurations mentioned previously, so the use of the Figure 4 architecture is intended to be seen by way of example only. To recap, the double buffering architecture of Figure 4 has a latency of two frame periods. The processor has one frame period to translate a frame. Where no frames are dropped from translation, the throughput of the Translator Module is equal to the input frame rate.
Detailed Block Diagram of an Embodiment of a Translator Module Figure 6 is a more detailed block diagram of an exemplary embodiment for the Translator Module hardware. ADC block 602 takes in the analog video input signal and digitizes it. In the case of a bi-level monochrome video, this block may be simple comparator. Furthermore, if the bi- levels of the input signal conform to the digital logic threshold requirements, the level comparator operation becomes implicit and the ADC may be implemented as a simple flip- flop gate. At the other extreme, when multi-level color is used, the input ADC 602 consists of three high-speed ADC's, one for each of the Red, Green and Blue Video signals, with each ADC producing several bits of digital output for each sample of the Video signals.
On the output side, the DAC 604 (if required) converts the video signal from digital to analog format to conform to the requirements of the driven CRT or LCD monitor. The DAC may be a simple resistor network for bi-level monochrome Video. For high resolution color Video, the DAC may comprise three high-speed multi-bit DACs producing the required three Red, Green and Blue Video signals. The central sub-block in the diagram is the Memory 606. Input video data is continually, and sequentially, being written into this Memory and output video data is continually, and sequentially, being read out of it to generate the output Video signal. The architecture in Figure 6 is general enough to implement the functionality of the double- buffering architectures mentioned above as well as the many variations on input and output frame buffer sizes and counts. In addition, this Memory 606 is also tied to a processor 608 that can examine and modify any word within this memory. The Memory 606 serves the input video circuit, the output video circuit and the processor through time-slot multiplexing. A Translator Module designed for bi-level monochrome video may use 16-bit-wide memory, but any other practical memory word width can be used, too. At the input side in Figure 6, a data Packer 610 preferably sits between the ADC 602 and the Memory 606. Conespondingly, at the output side an Unpacker 612 preferably sits between the Memory 606 and the DAC 604. The data Packer 610 samples the digitized pixels from the ADC and packs them into a format that is more efficient for processing within the main memory. For example, bi-level monochrome video needs one bit per pixel. If the Memory is 16 bits wide, the Packer packs every 16 incoming pixel samples, or bits, into one 16-bit word before writing the word into the memory. This packing makes memory usage more efficient. At the output side, the data Unpacker 612 reverses this packing into the format required by the output video DAC. The Packer and Unpacker also serve a different function. For most video interfaces, the Video signal is blanked during the time that the electron gun (for a CRT) is retracing back horizontally or vertically. For example, at the end of every horizontal line scan, the electron gun is retraced back to the starting point of the next line. During the time that the electron gun is retracing the screen backwards, the Video signal is blanked to make sure that the gun does not draw anything on the screen while traversing backwards. During the time that the Video signal is blanked, there is no Video information that needs to be stored or processed in the Translator Module. Using the HSync and VSync signals, the Translator Module determines the times and durations of these Video blanking intervals and makes the Packer
stop packing and storing the incoming samples of the Video signal. At the output side, the Unpacker outputs blanked Video during the blanking intervals instead of reading out from the Memory. This reduces the amount of Memory space required to store and process an incoming Video Frame. Data written into the Memory 606 comes from two sources. The first source is the data Packer 610. The second source is the Processor 608. Data read from the Memory goes to one of two destinations. The first is the data Unpacker 612. The second is the Processor 608. For either of the Memory reads or writes, the proper memory address needs to be generated. When data is written into the Memory from the Packer, the Memory address is provided by the input Data Address Generator (IN DAG) 614. When data is read from the Memory to the data Unpacker the address is generated by the output Data Address Generator (Out DAG) 616. Both the IN DAG and the Out DAG are synchronized to the HSync and VSync signals such that data conesponding to a given video frame is always aligned in the Memory in a predetermined way in order for the Processor to process the frame conectly. The DACs are primarily simple counters that continuously increment and wrap around at the end of the predetermined Memory address space. The processor 608 desirably runs at a faster clock rate than the Packer and Unpacker and hence performs multiple Memory reads and Memory writes in between the times that the Packer and Unpacker are writing to or reading from the Memory. Whenever the processor is reading or writing to the Memory, the address bus of the processor is made to drive the address pins of the Memory. Data is then written to or read from the Memory on the processor data bus. To maintain compatibility with the CRT or LCD monitor being used, the frequency and pulse widths of the output HSync and output VSync signals must be the same as the input HSync and VSync signals. However, the timing of these signals needs to be realigned with the timing of the output Video signal. Given that the output Video signal is shifted in time from the input Video signal, due to Translator Module processing, the function of the HSync and VSync delay sub-blocks is to provide adjustable timing delays to restore the conect inter- signal timing among the Video, HSync and VSync signals. Finally, the processor 608 is alerted every time a new frame is captured and stored in the Memory 606. In the case of the double buffering architecture, the processor then reads the input frame buffer, translates it to the target language using the algorithm that shall be described below and writes out the image of the translated frame to the output frame buffer
space in the Memory 606. After processing the frame, the processor idles until alerted again by the control hardware that a new frame has been captured and stored in the Memory and is ready for processing, at which time the processor starts processing this new frame and the cycle repeats.
On-line and Off-line Optimal Translation Partitioning Significantly reduced cost and complexity, and improved performance are obtained by the partitioning of the translation operation into two components. One component is performed off-line while the other component is performed on-line. The off-line component of the translation operation is carried out using specially developed software running on standard desktop computers and is not done in real time. The on-line component of the operation is carried out by the Translator Module in real time. Accurate human language translation is a complex and very hard operation to automate. By skewing most of the complex and time consuming part of the operation to the off-line component, the on-line component is greatly simplified.
On-line Translation Algorithm In a later section, the details of the offline algorithm shall be provided. A quick preview is provided here to aid in the discussion of the On-Line algorithm. During the offline processing step, the offline processing generates a list of Identification
Marks (i.e., indicia) for the Translator Module to use in uniquely identifying each screen. For example, the offline processing would have generated the locations in which the Translator Module should look for the words "SETTING" and "GENERAL" on the screen to uniquely identify the screen in Figure 7. The offline processing would have generated similar marks for all the possible screens of the machine. In addition, for each identified screen, the offline processing generated a processing map for the Translator Module to follow when doing online processing for each uniquely identified screen. The map contains locations of fields within each identified screen that always appear the same within each uniquely identified screen. It also contains the locations of fields that can have varying content depending on the machine operation. Fields within a given screen that always appear the same are called Static Fields. On the other hand, fields on the screen that can have varying content depending on the operation of the machine are categorized into Dynamic Fields, Numeric Fields, Pass-Through Fields, Scrolling Fields and Floating Fields. The offline processing identifies the locations
and types of these content varying fields and incorporates them into the screen map. This information within the map allows the online processing to select the appropriate algorithm that conectly handles each type of field. The offline processing also generates the bit-mapped images of the translation of every possible field value that can appear on the screen in the target language. These images are downloaded into the Translator Module at the time of manufacture or installation into the machine.
On-line Screen Detection To explain the on-line operations of the Translator Module, a snapshot is shown of two screens from a typical Milling Machine, in Figures 7 and 8. Examining the screen shot in Figure 7, and with some knowledge of the machine operation provided in the list of Identification Marks provided by offline processing, one finds that the word "SETTING" located at the upper left corner of the screen identifies this screen as one of the screens that are used to alter some of the settings of the machine. Furthermore, the word "GENERAL" identifies the screen as the screen used to alter the subcategory GENERAL of the SETTINGS. For this machine, the combination of these two words is enough to uniquely identify this screen. No other screen that is output by this machine has this combination. The same can be said for the words "PARAMETERS" and phrase "COMMON SW 1" for the screen in Figure 8. As Mentioned before, the Translator Module is provided with the unique
Identification Marks of each screen during the offline processing step. It is also provided with the static replacement screen during offline processing, in the target language, for each screen identified with an Identification Mark. These static replacement screens contain all the Static Fields of the original screens but translated into the target language. In addition to the static replacement screens, the Translator Module is also given the information about the locations and alternate content of all the variable fields within each screen and the replacement texts for each of the possible values of these variable fields. Armed with this information, the Translator Module goes about its task as follows. Each new input frame, once it is captured into the buffer, is examined by the processor to detect its unique Identification Marks. Once the screen is identified, the values for each of the Static Fields are written into the output video buffer using the translation image that was constructed by offline processing. Then, the processor determines the content of each of the variable Fields using the list of possible values that was constructed by offline processing for
each field. Once the content is uniquely identified, the replacement text in the target language is written in the location of those fields within the buffer. At this moment, the translated screen then is ready to send to the video monitor of the machine. Thus the operation of the Translator Module is one of pattern matching and pattern replacement once the information outlined above is preloaded into the Translator Module. The Translator Module may also be preloaded with additional information to handle some special conditions as shall be described below. In the example of the screen in Figure 7, the processor having detected the words "SETTING" and "GENERAL" would uniquely identify the screen. The processor then load the processing map that is associated with this uniquely identified screen from the list of screen maps at were constructed and provided during the offline processing and continues the processing using this map.
Static Fields and Dynamic Screen Fields Looking at the screen shot in Figure 7, the text fields of the screens can be categorized into Static Fields and Dynamic Fields. Static fields contain text or graphics that are fixed (perhaps for the life of the machine) in format, location and appearance within their respective screens. Dynamic fields contain text or graphic that can change in response to the operating state of the machine but their location on the screen remains constant. Examples of Static Fields in the screen in Figure 7 are the phrase "SERIAL NUMBER" and the phrase "LANGUAGE". With the screen identified by the words "SETTING" and "GENERAL", these Static Fields always have the same content and format. Examples of Dynamic Fields in the screen in Figure 7 are the word "INCH" and the word "OFF". The operator can change the units to SI, in which case the word "INCH" on the screen is replaced with the word (or abbreviation, to be more precise) "MM". Also the operator may turn on a certain feature or function of the machine in which case the word "OFF" is replaced on the screen with "ON" next to that feature field. Once the screen is uniquely identified and the map for the screen is loaded from the memory of the Translator Module, the locations for all the Static Fields are identified along with their associated images of the translations of those Static Fields. The Translator Module is now in position to write out the pre-translated images that were constructed during offline processing of all of the Static Fields on this screen to the output frame buffer.
Having translated the Static Fields, the processor then uses the information in the screen map to locate and process the Dynamic Fields. One Dynamic Field is the "INCH" field. From the screen map generated for this screen during offline processing, the Translator Module already knows that it's a Dynamic Field, knows its locations, and also knows that it can only be filled with either the word "INCH" or the word "MM". It quickly and easily determines which of the two values is present and proceeds to write the pre-canned translation of that value to the video output buffer in the new language.
Pass Through Fields Pass through fields are text or graphic fields on the screen that should not or need not be translated or altered from the original appearance on the output device. Examples of these are company logos or static or animated graphics. These fields should pass through the Translator Module unaltered in content, but possibly reformatted. For an example of reformatting, let us assume that the machine-user interface is originally in English and the end-user target language is Arabic. In addition to translating the content there is also the issue of reversing the layout of the screen from a left-to-right centric direction to a right-to-left centric direction. Left justified text and graphics need to be right justified. Other reformatting needs can be dictated by the special needs of other languages. (As used below, the term "translate" is used with respect to non-textual graphics to refer to reformatting, repositioning, resizing and any other operations performed on the graphics data to effectuate presentation of that data along with one or more screens of translated text.)
Numerical Fields Some fields on the input screen would include a single or multi-digit number. The position and text format of these numbers are known in advance for a given machine.
However the value of the number is operation-dependent. One can treat them as Dynamic
Fields with one optimization. Instead of treating the whole Numerical Field as one Dynamic
Field, the Numerical Field is first broken down to a number of single digit Dynamic Fields.
Having done so, the content of a single digit field is limited to be in the range of 0-9 for decimal systems. Digit by digit, each single digit Numerical Field is translated and the digit representation in the target language is substituted for the representation in the original language.
The issue is further complicated by the possible use of non-decimal numerical systems in either the original language or the target language. In either, case, having detected the digits of the input fields individually and knowing their order, the Translator Module can figure out the value of the displayed number using an appropriate stored algorithm. The Translator Module can then proceed to output the number in the conect numerical system and format that is required by the target language and culture. An example of the non-decimal numerical systems is the Roman System of numbers.
Encoding the Replacement Language Output Screens There are a number of options for encoding the replacement language output within the Translator Module. The most compact form is to encode the messages in the new language using the well known Unicode standard. These Unicode encoded messages are processed in real time to produce the bitmap rendering that is needed for the output video buffer. Another approach is to encode the replacement language messages as compressed bitmap images along with their intended locations on the screen. The Translator Module processor then decompresses the needed message and positions it at the required location within the output buffer. This option consumes more permanent storage area in the Translator Module but results in more flexibility. It also further moves more of the complexity from on-line processing to off-line processing.
Floating and Scrolling Text Fields Floating text is text that is dynamic both in content and location. In contrast, Dynamic Fields are those fields that vary in content but are always fixed in their location on the screen. For example take the "ALARM" screen that lists the cunent alarm conditions that the user must address. Possible alarms are numerous, of variable length, and they could be listed in any order. However numerous they are, however, all their texts are known in advance and their horizontal justification on the screen is also known in advance. To process this type of Floating Field, the Translator Module first completes translating the Static Fields. Previously, the Dynamic Fields were processed easily since their exact positions on the screen were known. For the first listed Floating Field, such as an alarm, the location on the screen is static. Encoded within the Translator Module is the vertical length of this alarm
message and hence the Translator Module is able to accurately determine the position of the next alarm message and can hence proceed to process it as a normal Dynamic Field. Scrolling Text is text that the user can partially or fully scroll on the screen in increments of one or more lines or paragraphs. This leads to the situation where the screen can be filled with text that is vertically non-deterministically positioned on the screen. The algorithms discussed above have relied on the ability of the Translator Module to detect the beginning of the field it needs to translate because it only had to deal with two situations. The first was for the Dynamic Fields were the location was fixed and included in the screen map. The second situation was for Floating Fields which at least for the first vertically occuning Floating Field on the screen, the location is also known in the screen map, and hence were processed as Dynamic Fields with the gorithm modification mentione in the preceding paragraph. An Example of a Scrolling Field is an area on the screen that contains a scrolling help text. The displayed message could be scrolled in such a way that neither its beginning nor its end is visible on the screen. In these situations, the processor picks a line from the screen and performs a hash function computation on the line to produce an entry into a hash table of this line. Hashing function techniques are well known in the computer industry for storing and restoring unstructured information. Here, hashing techniques are used to solve the problem of processing Floating or Scrolling Fields. The resulting hash key is used to look up what body of text the line came from. Once that is uniquely identified, all the techniques previously mentioned are applied. What is implicitly assumed in the above mentioned technique for processing scrolling text is that the horizontal positioning and format of each line of text always appears the same way on the screen. Even though the vertical positioning of a line within a paragraph is not static, the horizontal appearance of each line from any of the possible texts typically is fixed. ALARM warnings spanning more than one screen can be treated as a more general case of the algorithm for Scrolling Fields instead of the algorithm previously mentioned. There is one main difference between Floating Fields and Scrolling Fields. For Scrolling Fields, once one line on the screen within the Scrolling Field is identified, the content of the full field is determined. For example if the Scrolling Field contains Help text, once a line within this text is identified, the full text, including its first and last appearing lines on the screen, are determined. Floating Fields are more general. Taking the ALARM screen as an example, identifying one line within a Floating Field identifies the content of an area within
the Floating Field that contains the ALARM message containing this line. This single line alone is not sufficient to determine the lines that are part of a subsequent ALARM message visible within the Floating Field. Identifying one line determines the text of the ALARM message the line belongs to and how it appears on the screen. Once this message is processed and its vertical extent on the screen is known, another line needs to be identified to determine the text in a subsequent ALARM message within the Floating Field. The cycle repeats until the field is fully determined. The discussion above describes two ways to process Floating Fields. The more general method of the two is to use hash functions. In addition, one can see that Scrolling Fields are a simpler special case of Floating Field processing. One can therefore process Floating Fields and Scrolling Field with the hash function method bearing in mind that for Floating Fields, the hashing needs to be done more than once for a given field.
Screen Saver Mode In order to protect the phosphorous coating within CRT screen from burn-in, many machines switch their video outputs to Screen Saver Mode after a predetermined prolonged period during which the displayed image does not change. In Screen Saver Mode, the CRT is made to display a video image the content of which keeps changing in a pseudo-random fashion. Rather than process Screen Saver Mode screens, the Translator Modules determines that the input frames reflect Screen Saver Mode and proceeds to do one of two user- selectable options. The first option is to pass through all Screen Saver Mode frames unchanged. The second options would be for the Translator Module to output its own internally generated Screen Saver Mode animation frames with target language content. The Translator Module continually checks the type of the input frames and immediately reverts to normal processing of input frames as soon as the Screen Saver Mode is exited by the machine. A Translator Module may be constructed with one or both options provided. Or, for some CRTs, the screen may simply be blanked during a Screen Saver Mode.
Dependency Exploiting Translation The above-discussed illustrative embodiments do not exploit any history of the screens being displayed. Each input screen is translated totally independently from all the screens that preceded it. No state is carried over from translating one frame to aid in the processing of a consequent frame. Historical information can sometimes be useful however,
especially for processing floating or scrolling texts mentioned above. By exploiting knowledge of the operation of the machine, it is possible to move from one screen type to another in one step. This can limit the detection scope of the present frame to only the ones reachable from the previous frame. This significantly reduces the on-line processing requirements of the Translator Module and is a desirable option, though not a system requirement. Minimally, one can exploit the fact that for the vast majority of operations, the cunent frame is an identical copy of the last processed frame. For the few times that it is not, the highest probability is that it will differ in very few places from the last processed one. This dependency realization, leads us to optimize the pattern matching and detection engine inside the Translator Module to always look for the option most recently encountered. Additionally, one can exploit the inter-screen dependency in the machine. Given a screen that is cunently output by the machine, from knowledge of the machine operation one can tell what other screens of the machine are reachable with one operational step. All the other screens then can be eliminated from the detection search. Machine knowledge information can also be used to exploit inter-field dependency.
Variable Area Replacement The assumption so far is that the replacement text area on the screen is equal to or smaller than the area of the original text being replaced. Given this constraint, one is always able to translate the screen while fully maintaining the original formatted look or a direct counterpart. If a test string on the screen takes an area of 40 character spaces, for example, and if the translation text is restricted to occupy 40 character spaces or less, then the original format can be preserved (or minored) with little additional effort. Some human languages are more expressive with less text than others, under certain contexts. This can create a situation where the translation text needs to occupy an area that is larger than the area of the original language text. Given that the Translator Module sits inline of the video cable, the output screen can be reformatted to allow for such conditions. If the target language consistently leads to larger text area requirements that might span multiple screens, one of two solutions are recognized. The first solution is to design the Translator Module to drive an upgraded larger area CRT or LCD monitor and also to reformat the screen for this new language and new monitor such that the content of an original frame would fit on one output frame. This solution is
straight-forward and does not require additional knowledge of the operation of the machine beyond what the Translator Module knows at this point. The second solution is to reformat the output in the target language to span multiple screens on the original target monitor. This breaks the one-to-one conespondence of the input frame and the output frames and requires more detailed knowledge of the operation of the machine. While feasible, it is typically not needed for the majority of target languages.
Blinking Cursor and Inverted or Flashing Fields For some machines, while the operator is navigating through the control panel, some text on the screen may be highlighted or made to blink. The highlighted text could be highlighted in a plurality of ways. Most common is inverted video. For example if the text of a message normally appears on the screen as bright text on a dark background, inverting that text displays it as dark text on a bright background. In other places, the highlighting could be done by blinking the text or inverting it in an intermittent fashion. One method to treat fields that might be inverted or highlighted is to treat them as dynamic text fields. In essence, even though the text content of a field that is highlighted is the same as when it was not highlighted, one can treat them as being two different text values and provide translation accordingly while preserving the state of highlight or not. A side effect of this strategy is the doubling, and possibly quadrupling, of the possible combinations of Dynamic Fields, requiring larger and larger storage in the Translator Module. An alternative is to indicate to the Translator Module that a certain field could be inverted. The Translator Module processor tests the field against the possible dynamic values in both normal and highlighted format. Having detected both the value and the format of the field, highlighted or not, the non-highlighted translation image is deposited in place of the original text. If the original was highlighted, the Translator Module then highlights the replacement text. In that way, only the non-highlighted images of the translated text need to be stored in the Translator Module. The various formats in which this text might appear in are handled algorithmically. This increases the required processing by the Translator Module but reduces the memory requirement. Alternately, the presence of highlighted fields can be detected algorithmically. Prior to processing a field for Identification Marks, the Translator Module applies an algorithm to detect whether the field is highlighted or not. If it is highlighted, the field is de-highlighted first and then processed for non-highlighted Identification Marks. After the proper
replacement text image is determined for the field, that image is highlighted as the original text before being written to the output video. The algorithm for detecting highlighting is as simple as adding some of the values of pixels within the fields. If the text is normal and being of bright text on dark background, this sum would have a low value. If the text is highlighted, the sum would have a high value. It would not take summing more than a few values before the highlight state of the text could be accurately determined.
Color Fields One fortuitous aspect of color is that it need not be translated but can simply be passed on. The prefened strategy for the Translator Module is to detect the text regardless of the foreground and background colors, translate the text, and then to send the translation image to the output video in the same color of foreground and background as the original input field. To do this conectly, the Translator Module needs to have conect determination of the foreground and background colors of the input fields. Color, like text, can either be Static or Dynamic. Looking at a typical color screen on an industrial machine fields are noted that always appear in the same color while others change color based on operating conditions. The Translator Module may be preloaded with this information at installation time. The Translator Module does not try to detect the color of static color fields since it already has that information. Fields of dynamic color are handled similar to dynamic text fields. While the choices of colors on the screens are numerous, the colors that a given text field can appear in within a given screens' contextual information is limited. The limited set of possible color choices for each field is made known to the Translator Module at installation time. On-line, the Translator Module need only distinguish between the expected possible colors to accurately determine the color of a given field.
Fast Field and Color Detection All the algorithms (i.e., processes) outlined above rely on the Translator Module's ability to recognize input text and patterns from a number of pre-stored possible values. If one examines the Dynamic Field that could contain the text "INCH" or the text "MM", one notes that the Translator Module is able to make an accurate determination of the value by examining the first character only. "INCH" starts with "I" while "MM" starts with an "M". Extending this further, it will be seen that an accurate determination can be made of the value
cunently present in the field by examining a single pixel location within the field known to have different values for "INCH" than for "MM". In Figure 9, a close up is shown of the word "SETTING" that appears in the top left corner of the screen shot in Figure 7. In Figure 10 below a close up is shown of the word "PARAMETERS" that appears in the top left corner of the screen shot in Figure 8. Typical of probably all video standards, the pixel pattern is rendered on the screen through a raster scan pattern. The scan starts at the top of the screen and proceeds downwards line by line. Each line is drawn on the screen in left to right direction. After drawing a line, the next subsequent line is drawn immediately following it. Because of this scanning method, the Translator Module receives the word "SETTING" shown in Figure 9 as a stack of horizontal slices. When there is a need to distinguish between the words "SETTING" in Figure 9 and "PARAMETERS" in Figure 10 to determine which screen is being received, it is only necessary to look at the pixel that is at the top left most of the first character of the word. If the pixel is dark, then the word must be "SETTING". If the pixel is bright then the word must be "PARAMETERS" because the letter "P" has that pixel brightened. This pixel location on the screen and its value associated with each of the two screens is called an "Identification Mark." At each step of processing a new screen, the Translator Module uses its pre-canned knowledge of these Identification Marks (stored in memory) to make a determination between the possible screen options that might be present. A database of Identification Marks for the Translator Module to use on-line can be generated off-line by any suitable means. In one embodiment, this list contains a number of screen entries. Each entry in the list is associated with a unique screen. Each entry consists of one or more pairs of information. Each pair contains a location on the screen and a pixel value at that location for this screen. The list is constructed during offline processing such that no two screens can have the same values within the list of pairs in their screen entries. In a brute force algorithm implementation, the Translator Module would pick the first pair of location from the first entry in the screen Identification Marks list. If the location in the pair contains a value on the screen other than the one indicated in the associated value then this screen is discarded as a possible screen candidate and the processor proceeds to the second entry in the screen ID list. If all the location- value pairs within a screen entry in the list are a match with what is in the frame buffer, then that screen is positively identified and the processing map for this screen is loaded
Armed with such information, determination of screens and their content is made very efficient, fast, and requires little on-line hardware or software resources. That is, the processor cycles through the preloaded Identification Marks until a match between an Identification Mark and its conesponding location on the screen is matched. A match provides an identification of the screen type or the field value. Note that the location of these pixels could be anywhere within the field and need not be the top left-most. This is especially true in situations where there is a need to distinguish between words with identical initial characters or between phrases with identical initial words. For example, distinguishing between the phrase "CHECK PUMP COOLANT" and the phrase "CHECK PUMP FUSE" highlights the need for locating the distinguishing pixel at a later part of the phrase. An astute reader can suggest making the "CHECK PUMP' part of the phrase as a Static Field. However, this might not be possible if the same field can take on a third value such as "PUMP FAILED" for example. In cases where the Translator Module needs to make a selection among a number of possible field values, the distinguishing pixels preferably are chosen that would result in a binary search strategy. For example, if there are 32 possible text values for a given Dynamic Field, the first distinguishing pixel preferably is chosen such that its outcome eliminates 16 of the 32 possible values from the search. The second pixel location should be chosen to eliminate 8 possible values from the search, and so on. With this strategy, the Translator Module could reach the conect outcome in 5 tests. Binary searching is the most efficient search algorithm given that the values are equally likely. Employing knowledge of the machine, it can be determined whether some values are more likely to appear than others. Under such conditions, the binary search algorithm may be modified to result in the least number of tests to make an accurate determination of input text. As mentioned earlier, Numerical Fields may be broken down into single digit
Dynamic Fields. The Identification Mark(s) for any digit within a Numeric Field is (are) the same and is (are) extracted based on the bitmap of the numeral 0-9 used in the machine. The case for detecting color is very similar. Assume that the text can either appear as pure red or pure green. In 24-bit RGB, this conesponds to the color values (255,0,0) for Red and (0,255,0) for Green. In advance, the Translator Module is loaded with information about the location of a foreground pixel within each variable field and the possible color. In binary format, the Translator Module need only examine one bit of the 8-bits used for RED to
determine if the color of the pixel is Red or Green. Based on the value of that bit, the Translator Module is able to deduce the color of the whole field.
Color Storage Optimization In the screen shots in Figures 7 and 8, the screens were of the monochrome type.
Furthermore, each pixel took on one of two values, black or white. The screen was captured and processed in the Translator Module using 1 bit for each pixel. The storage requirements for 1-bit-per-pixel frames are modest. If the screen was of the full 32-bit color type, each pixel would need 32-bits of storage and the frame storage requirements in the Translator Module becomes 32 times larger than for monochrome. In cases of static color, text can be stored without the color. The conect color will then be produced at the output of the Translator Module based on the color knowledge that is preinstalled in the Translator Module. Color information is stripped and the information is turned into 1-bit monochrome using thresholding techniques. For the dynamic color fields, in addition to the stripped monochrome information mentioned above, some information is stored that will aid in determining which color was used. Using thresholding and minimal color encoding, the storage requirement is significantly reduced. The additional information that is needed depends on the knowledge of the operation of the industrial machine and what color pallet it might be using within the context of each of its output screens. Using 1 bit for Static Fields and minimally encoded color information based on the knowledge of the machine for the dynamic color fields, the frame buffer storage requirements of the Translator Module can be reduced while still transparently translating rich color screens.
Unrecognized Text Recovery In a perfect world, the Translator Module would recognize and translate any combination of text and graphics that the machine can throw at it, relying heavily on its preinstalled knowledge of the output screens of the machine that was constructed during offline processing. In the real world, it is possible due to staff programming or communication enors to miss providing the Translator Module with one or more screen conditions. The same is the case when a new feature is added to the machine in the field
without upgrading the Translator Module software to accommodate those changes. Assume that after some period of operation in the field, the Translator Module encounters a screen that contains a content it does not recognize. The Translator Module may be configured to do one of two things. The first is to pass on the unrecognized text unaltered in the original language of the machine. Alternatively, the Translator Module may write a warning message to the operator in the target language, where the unrecognized text was encountered, alerting the operator to the presence of unknown non-translated text. The choice between the two options preferably is selectable by the user, but a Translator Module may be designed to include only one option. In either case, the Translator Module will also store the offending original language text along with other context identifying markers, such as screen type and location within screen, into its non- volatile memory. The manufacturer of the Translator Module can at a later date download this text and can then produce a software upgrade to be loaded into the Translator Module to address this previously unrecognized condition.
Checksum on Static and Dynamic Fields Conceptually, the proper operation, of the Translator Module relies on the fact that both its knowledge of the machine and the output of the machine match. In case that they do not match, minimally the Translator Module must detect that there is a discrepancy. From a user point of view, it is acceptable on rare occasions that the Translator Module informs the operator that it had encountered unrecognized text and proceeds to highlight its location. It is less acceptable however to provide the wrong translation even on rare occasions. As mentioned above, a situation in which the Translator Module would encounter text that is unrecognized could arise. The difficulty is that this text could show up anywhere on the screen. If the new text is due to newer partial replacement in the original language of a sub-area of a Static Field, then it is likely that this change will go undetected by the Translator Module and the old, not-matching translation would be given. To guard against this situation, checksums preferably are employed. A checksum is a running sum of all of the 16-bit words in the frame modulo 65536. (16-bit words are used here as an example, but checksums could be done on any width words.) As the input image is captured, it is packed into 16-bit words. The Translator Module starts to process the frame buffer to determine which machine screen it holds. It then processes the Static Fields and goes on to detect and process the content of the Dynamic Fields.
For each possible screen, the Translator Module is provided with pre-computed checksum values for each of the Static Fields of the screen. Just before replacing each of these Static Fields with its replacement image in the target language, a checksum is computed on the Static Field area within the input frame. This checksum is compared to the value that was pre-computed for this field and was pre-stored in the Translator Module. If the checksums match, the replacement image is written out to the output frame. If they do not match, the Static Field is not replaced but is instead highlighted to indicate to the user the presence of not translated text and unrecognized text. In general, the operation of the Translator Module can be divided into two high level tasks. The first is pattern detection and the second is image replacement. The pattern detection part comprises all the operations that were described above and that were performed to identify the text value at a certain location on the screen from a number of possible alternatives. Once identified, the Translator Module proceeds to replace the detected text with the pre-canned image of its translation. The checksum step is inserted between the detection and replacement. Once the detection process of a given field yields a possible candidate, the replacement image of the translation is located in the memory. Linked with this image in the data structure is the checksum value of the original text that this image replaces. Prior to fetching the image and replacing the original text, the processor compares the checksum of the original text on the screen with the checksum value that is stored linked to the replacement image. If a checksum match occurs, then the replacement proceeds. If the checksum comparison fails, then the detection is not accurate and the replacement does not occur. This feature is important to insure the safety of operating the machine. The Translator Module is also provided with a checksum value for each of the possible values of the Dynamic Fields in the screen. After determining the values of these Dynamic Fields, and given the provided pre-computed checksums, the Translator Module computes the checksum on the Dynamic Field from the input frame that is about to be replaced and compares it to the pre-stored checksum that is associated with the replacement image. In case of a match, the replacement proceeds. If they do not match, the Dynamic Field is passed to the output frame without translation but is highlighted to the user. Floating and Scrolling Fields require special attention. In both cases, a given paragraph of text can be partially visible on the screen. It is possible that the first few lines or the last few lines of the paragraph may be clipped and not visible on the screen. Because of this fact, checksums for text that is displayed either in Floating or Scrolling Fields may be
calculated separately for each line of the text. During processing, the Translator Module then would determine which lines of the paragraph are visible. The Translator Module would have been supplied with checksums for each line of the text. This data is not very large. If it is assumed that the machine contains 100 pages of scrolling and floating texts, each page containing 66 lines, then the storage requirements for these checksums is equal to 100 pages times 66 lines times 2, equaling 13200 bytes. Given that affordable storage nowadays is in the hundreds of millions of bytes or more, this amount of data is reasonable to store. Just before replacing the Floating or Scrolling Fields, the Translator Module computes a checksum of each line in the input frame that is within the area to be replaced. These checksums are then compared with the pre-stored line-by-line checksums. If any checksum pair does not match, the conesponding line of text is not translated and is highlighted to indicate to the user that this area was not translated. This mechanism ensures that the operator is either given accurate translation of input screens or is alerted to unrecognized and not translated input screen content. Whenever the Translator Module detects unknown text, it makes a note of the screen and state under which this condition occuned. This information is stored in the Translator Module (preferably non- volatile) memory and can be downloaded at a later time to aid in upgrading the data within the Translator Module to handle this unrecognized condition conectly.
One-Touch Original or Translated Switch The Translator Module is intended to provide transparent real time translation of the user interface of industrial and similar machines in order to facilitate the easy use of such machines by non-native language operators. Technology development is fast paced and new devices and methods are invented all the time. The inventor of a new method or device coins a new word in his language to refer to this new item. The makers of the Translator Module preferably would attempt to coin a conesponding new word in the target language, to best substitute for this new item. Regardless of how formal, enlightened or scholarly the process by which a new word is coined in a language, in the end "the street" rules. The result is that the street and trade name prevails in usage and the original term in the Translator Module may fade out of use. An operator familiar with the street and trade terminology would be confused by the original term used by the Translator Module. Eventually the Translator Module should be downloaded with a software upgrade to reflect the more widely understood
terminology. In the meantime however, the Translator Module preferably will be provided with one-touch button, accessible to the operator that can in real time switch between the unaltered original language screens and the translated output of the Translator Module. The reverse is also true. If an operator encounters a term in his native language that he has never heard before, switching to the original language of the machine sometimes gives clues to the meaning of this new word. The single-touch switch is also very useful when an operator, used to operating the machine in its native language, is sent to train operators that would use the mach ne in a different language.
Standardization The initial introduction of the Translator Module into the market needs to assume that the manufacturers of the industrial machines will not change or adapt the design of their machines to better accommodate the installation or operation of the Translator MIodule. As the use of the Translator Module becomes more wide spread, one can expect that the manufacturers will be open to some modifications to gain faster time to market w/ith their machines in new target languages. Eventually, the makers of the machines and Λie makers of the Translator Module may converge on a standard of communication to better serve them and to better serve the end users of the machines. Some of these possible adaptations and their advantages are discussed below.
Integrated Translator Modules The first optimization is for the makers of the CRT or LCD monitors that are installed in the industrial machines to integrate the functionality of the Translator Module within their products. Since the Translator Module hardware is not specific to an input or an output language, and since the hardware part is not specific to a target machine either, it becomes cost effective to build the hardware functionality of the Translator Module into these monitors. The Translator Module hardware built in within these monitors preferably would be downloaded with the information about the target machine and the target language during installation into these machines. Another similar situation arises when the interface between the CRT or I__CD monitor and the industrial machine processing is not raster video but serial, parallel or ant Ethernet or other interface. Under such anangements, the video generator that generates the image on
the screen is built into the display. Instead of processing the video image and producing an output video image, now the Translator Module processing this higher level description of the screen detects the original language text and replaces it with the target language text and sends a new high-level description of the screen, in the target language, to the CRT or LCD module. Other than replacing the raster video processing with processing this higher level description of the screen, the remainder of the techniques outlined above still apply.
Screen Information Compression Sitting in-line on the video cable, the Translator Module receives a relatively high data bandwidth on its input video cable. The data describes the value of each pixel on the screen and this full screen image is sent at the screen refresh rate, typically 50 or 60 frames per second. The data volume received is in the millions of bytes each second. The actual information content that the Translator Module distills from this large data volume can fit within a few bytes. The Translator Module only needs to know which of the numerous screens is being displayed, and which of the possible values of the Dynamic Fields is present. For example, the Translator Module does not need the full text of a screen to know which screen or which values are in the Dynamic Fields. The values of a relatively few pixels is all that it needs to build a full screen output image. This is because the Translator Module already has a fairly detailed knowledge about the screen outputs of the machine. In the future, there could be more collaboration between the builders of the industrial machines and the manufacturers of the Translator Modules such that only the information that is minimally needed by the Translator Module would be sent by the machine, instead of the video image. This significantly lowers the data rate being sent from the machine to the Translator Module and also lowers the memory storage and processing power needs in the Translator Module. The Translator Module would then likely be used even for controlling the display for the machine in its native language.
OpenGL-like Interfaces Ultimately, the input to the Translator Module probably will conform to one of the high level, platform-independent graphical description languages. The Translator Module will interpret such input according to the standard graphical language specification. It will then translate any human language content from the original language to the target language and finally will generate the video image at its output video cable.
Such architecture will allow translation of screen outputs with great flexibility. For example it can accommodate the translation of animated text messages. This is because standards like OpenGL describe the physical view of the world instead of specifying the pixel-by- pixel, raster view. The processing required to turn an OpenGL description of a screen into the raster video signal needed for CRT or LCD monitors is quite sophisticated. Fortunately, and driven by the recent explosion of PC gaming, very affordable dedicated graphics integrated circuits are available on the market that can perform OpenGL rendering effortlessly. The proposed architecture follows the diagram shown in Figure 11. An Open GL description 110 of a screen would be sent to the Translator Module 210 from the industrial machine processor conforming to the OpenGL standard. This data stream is first searched for original language text and those texts are translated into the target language by process 1120. The OpenGL description combined with the text messages in the target language is passed onto the OpenGL rendering engine 1130 within the Translator Module. This graphics rendering engine generates the desired video signal for the CRT or LCD monitor. (The process 1120 and engine 1130 may be implemented with appropriate software running on the processor(s) of the Translation Module and in dedicated integrated circuits, as will be familiar to electronics and computer engineers.) Note that OpenGL here is mentioned, but it is only an example of a high level graphical description language. Other similar languages could be used. The adoption of such interfaces as OpenGL helps in the off-line automation and online processing of the screen images. The adoption also helps enrich the graphics capability of the screen outputs during the design of a new industrial machine. However, once the machine user interface is defined and implemented, the content and context, even when expressed in OpenGL are still of limited scope and context, exactly as the case was with raster video. Downloading into the Translator Module the structure and content of the expected screens for a given machine, the Translator Module can provide accurate rendering of any screen in the target language without needing to receive the full, detailed OpenGL description of each screen. Identification Marks, Static Fields, Dynamic Field techniques, etc, can be equally used as before. Except with OpenGL-like interfaces, the techniques are applied to the OpenGL screen description data stream instead of applying them to the raster video data stream. All the benefits of the Translator Module invention when applied to raster video interfaces are inherited to operation in OpenGL-like configuration.
Software Emulation The operation of the Translator Module can also be achieved in software using a general purpose or special purpose processor or processors. If the ρrocessor(s) within the CNC industrial machine is(are) fast enough to perform all of its(their) functions for the machine and still have sufficient processing bandwidth left over, this extra bandwidth can be used to implement the function of the Translator Module within the machine. Most likely, if the function of the Translator Module is performed in software by the machine processor, the Translator Module will be of the type that expects the standardized lower data rate input signal, such as the terminal emulator or OpenGL standard, and likely will not be of the type that processes the raw video signal. Regardless of whether the Translator Module is implemented in hardware or emulated in software, whether it processes raw raster video or a high-level language, one thing remains in common to all of these different implementations. The Translator Module will always need to have knowledge about the screen outputs of the industrial machine that it will be installed within. Such information is needed in order to limit the context of the required translation and hence make the on-line translation feasible within the limited resources of the Translator Module.
On-line Processor Algorithm Below is a step-by-step listing of an algorithm which may be run on the processor of the Translator Module and which performs the On-line Processing. The listing is in pseudocode format.
STEP 1 : WAIT UNTIL {a New Frame is captured and stored}
STEP 2: IF {One-Touch button is Pressed} THEN { WRITE OUT the input frame to output frame without processing, GOTO STEP1} ELSE {GOTO STEP 3}
STEP 3 : IF {in Screen Saver Mode}
THEN { WRITE OUT internally generated screen, update counter GOTO STEP 1} ELSE {GOTO STEP 4}
STEP 4: DETERMINE {Input Frame Type}
STEP 5: LOAD {Processing map associated with this identified Frame Type}
STEP 6: COPY {Pass-Through Fields from Input to Output}
STEP 7: WRITE OUT {Static Fields Images}
STEP 8 : PRE-PROCESS {Highlighted, Flashing and Color fields}
STEP 9 : DETERMINE {Dynamic Fields Values}
STEP 10: WRITE OUT {Dynamic Field Images}
STEP 11 : DETERMINE {Value of each digit within Numeric Fields}
STEP 12: WRITE OUT {Image of each digit within Numeric Fields}
STEP 13 : DETERMINE {Floating Fields Content}
STEP 14: WRITE OUT {Floating Fields Images}
STEP 15 : DETERMINE {Scrolling Fields Content}
STEP 16: WRITE OUT {Scrolling Fields Images}
STEP 17: POSTJPROCESS {Highlighted, Flashing and Color Fields}
STEP 18: GOTO STEP 1
Algorithm Details Below is a description of each step of the algorithm outlined above.
STEP 1 In this step, the processor waits until a new frame has been digitized and stored in the Memory and is ready for processing. When the processor gets notification to this effect, from the IN DAG block and the controlling circuit, the processor proceeds to STEP 2.
STEP 2 In this step, the processor checks the status of the Original/Translated One-Touch button. This is the button that is added to the control panel of the machine and allows the user, in one touch, to bypass the operation of the Translator Module and display the original frames. If the processor that detects the button is pressed, the processor does not process the frame and copies the input frame to the output frame area in the Memory and goes to STEP 1 to wait for a new frame. If the button is not pressed, the processor proceeds to STEP 3.
STEP 3 In this step, the processor determines whether the frame indicates that the monitor is in Screen Saver Mode. If in Screen Saver Mode, the processor generates a Screen Saver Mode frame in the target language. When in Screen Saver Mode, the machine outputs a repeating loop of frames that results in simple animation on the screen. The main purpose is to present to the monitor a sequence of varying frames to prevent phosphorous burn-in. (Other techniques, discussed above, may be substituted, and Screen Saver Mode can be omitted entirely with some display technologies.) The Translator Module is preloaded with a sequence of frames that result in simple animation on the screen but with target language flavor. Each time a frame is displayed, an internal counter is incremented to indicate to the processor the frame number, from the animated sequence, to display if the next input frame indicates that the machine is still in Screen Saver Mode. After writing out to the output buffer the selected frame from the built-in animation sequence, the processor goes to STEP 1 to wait for another frame. If it is determined in this step that the input frame does not reflect Screen Saver Mode, then the counter is zeroed and the processor proceeds to STEP 4.
STEP 4 In this step, the processor determines the type of the input frame. The processor makes this determination by examining the input frame against the various Identification Marks that are associated with every known frame type. Once a match is found, the frame type is determined and the processor proceeds to STEP 5. During Off-line Processing, the various video screen frames are categorized into a number of types. The categorization is based on the similarity of processing of frames within a given type. Taking the example in Figure 7, the Off-line processing software established the "SETTING" frame type. Frames of "SETTING" type all have the same format and also have very similar locations of Static and Dynamic Field. They also have similar traits with regards to Pass-Through Fields and
Scrolling and Floating Fields. For example, the "SETTING" frame type does not have any Scrolling or Floating Fields. Also, the locations of the Dynamic Fields are the same for all frames belonging to the "SETTING" frame type.
STEP 5 Having identified the frame type, in this step the processor loads the pre-stored map for this type of frame along with the locations, types and Identification Marks for each type of Field within the identified frame type. This loaded map guides the processor in processing the frame. For example, the map identifies the locations and replacement Images for all the Static Fields within the frame. The map also lists the locations for all the Dynamic Fields within this type of frame. For each of the Dynamic Fields listed, the map lists the Identification Marks for each possible value within a given Dynamic Field and the replacement Image for each value. It also indicates areas on the frame for Numeric, Scrolling and Floating texts along with the data needed to process each Field. The map also indicated Pass-Through areas on the screen if the frame type contains them.
STEP 6 In this step, the processor examines the loaded map for the presence of Pass-Though Fields. For each Pass-Through area on the screen, the processor copies this area from the input area to the output area within the Memory, without processing. As explained earlier, examples of Pass-Through Fields are graphics that are not translated. Bear in mind that even though the Pass-Through fields are copied verbatim from the input frame to the output frame, their location within the frame may be altered, driven by requirements of the target language.
STEP 7 In this step, the processor writes out the images of all the Static Fields within the identified frame to the output frame buffer. These images were produced during Off-line processing and loaded into the Translator Module. Each image was created in the target language to replace specific areas on the screen in the original language. When these target language replacement images were produced, by processing the screens in the original language, checksums were computed on the content of each of these areas and the results were stored along with the target language replacement images. Before replacing a Static Field-from the iiψut feme with the image irrthe-^ is defined and a checksum on the content of this area in the input frame is computed. This computed checksum is compared with the checksum that was computed in the Off-line processing when the replacement image was generated. If the two checksum do not match, then the replacement image in the target language may not be an accurate translation of the Field in the original language. If a checksum mismatch occurs, then the processor does not replace the Static Field but instead copies this field from the input frame to the output frame with special highlighting. This indicates to the user that the area with special highlighting is unknown to the Translator Module and could not be translated. If the checksums match, the area of the Static Field is replaced with the pre-stored replacement image. The replacement images preferably are stored in the Translator Module in compressed bit-map format. Therefore, the required image is decompressed just prior to writing it out to the selected area on the output frame buffer. Also note that the replacement image contains the information about what color to use, if applicable.
STEP 8 In this step, the processor detects highlighted and flashing areas on the screen. The processor is aided by the map that was loaded in STEP 5. The map identifies to the processor the areas on the screen that could be highlighted. When the processor detects a highlighted area, it makes a note of it and processes the area so that it is no longer highlighted. In way, the subsequent processing steps do not have to wony about whether an area is highlighted or not and all the processing and replacement is done in normal non-highlighted format. In a
later step, the highlighted areas that were noted in this STEP 8 will be highlighted as in the input frame. Flashing Fields are created by alternately highlighting and not highlighting the same area on the screen within subsequent frames. Since the Translator Module processes each new frame independently, one can see that the algorithm for dealing with highlighting will automatically and conectly deal with flashing areas on the screen. For color frames, the size of the frame buffer is much larger than for monochrome. Even though color screens may look more pleasing, the informational content in a color screen is not much more than that of monochrome screens, for industrial machines. To minimize the amount of data to process, the color space of the screen is compressed. For example, text appearing in color can have up to 24 bits of storage for each pixel on the screen. In this step, the color of that text field is noted and the screen is turned into monochrome. This allows the subsequent steps of the algorithm to always assume that the text areas within the input frame appear in monochrome. Once all the input frame has been processed and the texts have been replaced, a later step takes in the information about color that was stored in this Step 8 and reverts the conesponding replaced areas on the screen from monochrome to the original color.
STEP 9 In this step, the processor, using the map, locates the Dynamic Fields within the input frame. Using the Identification Marks provided in the map for each possible value in a given Dynamic Field, the processor determines the actual values cunently present in each Dynamic Field on the screen. The processor stores this information and proceeds to Step 10.
STEP 10 In this step, the processor uses the information determined in Step 9 and locates the compressed replacement images for each of the Dynamic Fields. It then decompresses these images and writes them out to their conesponding areas on the output frame. In this step, as for Step 7, a checksum verification algorithm is performed before any field is replaced to insure that only accurate translation of Dynamic Fields is performed. Just as in Step 7, areas that do not match the pre-stored checksum of the replacement image are not replaced but are indicated with special highlighting.
STEP 11 In this step, the processor determines the location of each Numeric Field by using the map. The map also indicated the maximum number of digits for each Numeric Field. The processor then uses the Identification Marks for the numerals 0-9 on each digit within a Numeric Field to determine the value present in the field. This information is stored for use ■ in the following step of the algorithm. In case the Numeric Field was not in simple decimal format, the map also indicated which sub-algorithm to use to deduce the decimal digit equivalent, as in the case with Roman numerals.
STEP 12 Tn this step, the Numeric Field is translated. Using the information deduced in Step 11, each digit in the original language is replaced by the bitmap image in the target language at the conesponding area on the screen. Checksum verification is also performed.
STEP 13 In this step, the processor identifies the content and vertical justification of text with the Floating Fields on the screen. For each Floating Field, the processor picks the first line within the field and computes the hash function of this line. The processor then uses the hash value to determine the text that belongs to this line. Once the text is identified, the location of this line within the text becomes known and therefore the area within the Floating Field belonging to the same text message becomes determined. The processor then jumps to the area within the same Floating Field that comes after the last line of the identified text and performs the hash function again. The cycle repeats until all the text messages, and their vertical location on the screen are identified.
STEP 14 In this step, the content of the Floating Fields are replaced with their replacement image in the target language. The images are selected and written out to the output frame at the conect vertical justification using the information that was deduced in Step 13. Checksum processing, like that in previous steps, is also performed to insure accuracy. The difference here is that checksum comparison is made on a line-by-line basis rather than the on the full replacement Field. This is because the vertical justification is variable, so the pre- computed checksums need to be provided on a line-by-line basis.
STEP 15 In this step the processor identifies the content and vertical justification of text within the Scrolling Fields on the input frame. The processing is identical to that in Step 13, except that only one line is examined within a given Scrolling Field.
STEP 16 In this step, the contents of the Scrolling Fields are replaced with their replacement image in the target language. The images are selected and written out to the output frame at the conect vertical justification using the information that was deduced in Step 15. Checksum processing, like that in Step 14, is also performed to insure accuracy.
STEP 17 At this stage in the processing, the input frame has been fully translated. The processing was canied out in monochrome. In this step, the information detected and noted in Step 8 is used to redraw the format of the replaced text with the desired state of highlight, or with the desired color that reflects information present in the input frame.
STEP 18 This is the last step of processing. At this stage, the processor has fully produced the output frame and the hardware will start to output the pixels to the Video Out signal. Having completed the processing of this frame, the processor jumps to Step 1 in the algorithm and waits for the anival of a new frame to start the algorithm again.
Some notable optimizations of the above-outlined algorithm are discussed in the next section.
Generalization of Checksum. Identification Marks and Hashing In the discussion above, checksum computation was used extensively to insure that the replacement text reflects an accurate translation of the original text. The main property of the checksum function, and the reason it is used, is that checksums are easy to compute and the computed result can easily detect a change between the predicted and the actual original text. The checksum function is not unique in this property and one can find a number of other functions that can be used for the same purpose that are well documented in books and the literature written on the subject of Enor Detecting and Conecting Codes. An example of
a reference is the book titled "Enor Control Coding: Fundamentals and Applications", ISBN 013283796X. On the other hand, the hash function technique mentioned earlier was used because hash functions are designed to be easy to compute and are designed to produce a number that is used to lookup an entry from a list of numerous possible entries. Examples of hash functions can be found in the book "Introduction to Algorithms", ISBN 0262032937. The algorithm mentioned in Steps 1-18 above used Identification Marks and hash function values to determine the original language text presented on the screen. Once identified, the checksum was computed to insure that the determination of the original text is accurate. Instead of treating the identification and verification as two separate operations, one function may be used to cany out both the determination and verification. Consider the Dynamic Field processing, for example. The Translator Module first performs the checksum computation on this field and then uses the result of this computation to look for the correct replacement image from among the possible candidates. The Translator Module looks up the replacement image by comparing the computed checksum to the pre-computed checksums that were stored with each replacement image for this Dynamic Field. A match between the computed checksum and one of the pre-computed checksums associated with a possible replacement image for this Dynamic Field accomplished both detection and verification in one step. In the very rare occasions that more than one possible value of a given Dynamic Field share identical checksums, the ambiguity about the input text is resolved by resorting to Identification Mark or hash function techniques that were outlined above.
Off-line Preprocessing So far, the function and implementation of the Translator Module have been described. As mentioned earlier, in order for the Translator Module to perform the on-line processing of the screen output to generate screens in the target language, it needs a great deal of information about the screen outputs of the target industrial machine. This information is generated off-line and is done once for each new industrial machine model. The information is loaded in the Translator Module during installation or manufacturing and can be updated if needed, while in the field.
The following sections describe how this information may be generated, in more detail. It also describes one method of how the generation of this information may be automated. The following sections will be more specific towards the hardware implementation of the Translator Module that accept a raster video signal at its input and generates a video signal at its output in the target language. Extension to other implementations should become evident.
Required Information by the Translator Module In order to perform its function, the Translator Module needs to know substantially the following information: 1 - Number of different screen types expected 2- Identification Marks locations and values for each different screen type. 3- Location and size of Static Fields within each screen type. 4- The replacement image in the target language for the Static Fields within each screen. 5- The checksum value for the Static Fields for each screen. 6- Number and locations of the Dynamic Fields within each screen. 7- Identification Marks for each of the values within Dynamic Fields for each screen. 8- Replacement image in the target language for each value of the Dynamic Fields within each screen. 9- Checksum value for each of the values of the Dynamic Fields within each screen. 10- Alternate Identification Marks for Static and Dynamic Field if fields could appear highlighted. 11- Location and dimensions for Pass-Through Field types within each screen. 12- Location and value for Color Identification Marks for each screen and dynamic value if applicable. 13- Floating and Scrolling Field locations, line by line hash values, line locations within text body and line by line checksum. 14- Location and format of Numeric Fields, Identification Marks of the numerals 0-9 and image of each numeral in the Target Language.
Below, the off-line methods used to produce this information will be described. The first method assumes little or no cooperation from the industrial machine manufacturer. This is called the Third Party Method. A second method improves on the first method but requires
some cooperation on the part of the industrial machine manufacturer. This is called the Cooperative Method. Regardless of the method used, the offline information extraction encompasses two activities. The first is screen format extraction, while the second is textual content extraction. Screen format extraction builds a database that indicates the locations at which each of the various Static, Dynamic, Numeric, Pass-through, Floating and Scrolling Fields occur within each screen or a related class of screens reside. Textual extraction involves determining the possible text messages that can occupy each Field. It is worth noting that even for the Cooperative Method where the manufacturer of the machine is willing to fully cooperate with the translation efforts; the screen format logic is generally embedded through out the control program of his machine and is not available in a conveniently exportable format. So the step of screen format extraction remains the same for both methods.
Third Party Method In this method an electronic frame digitizing board is installed in-line with the video cable of the machine in the same place as the Translator Module is installed. The electronic digitizing board is also connected to a Personal Computer. The anangement is shown in Figure 12. The digitizer is used to capture the video images on their way to the monitor and send them to the External Personal Computer. The personal computer is used to store the pixel-by-pixel image values into a bit-map file. Each different frame capture is given a different file name. Some video standards provide a clock that aids the digitizer in synchronously digitizing the pixels on the screen. Other more prevalent video standards do not provide this clock. In case the clock is not provided, the digitizers will use the Vertical or Horizontal Synch signal as a reference clock to a Phase-Locked-Loop (PLL). The PLL using the reference signal generates the desired synchronized clock to use in digitizing the pixel. The technique of using a PLL to generate the proper digitizing clock is well known in the electronic engineering and video capturing community. An operator of the machine runs the machine through all the screen types and cycles through all possible values of options and parameters in response to his key punches and selections. The digitizer board captures each frame and sends it to the external Personal
Computer. The personal computer stores each frame for later processing. The digitizer board continuously compares each screen at its input with the last screen. If the screen differs from the previous one, then the new screen is sent to the External PC. If the screen is identical to
the previous screen, this screen is not sent to the PC. If the digitizer module detects a difference for each new input frame, then the video contains animated graphics. Under such conditions, the digitizer would stop the automatic PC dumping operation and would revert to manual mode where the user clicks for each frame dump needed. The digitizer can resume automatic dumping when this condition disappears. After capturing all of the variations of screens, the Personal Computer operator runs specially developed software that simplifies extracting the needed information during this offline processing. The operator imports a representative screen from each class of the digitized frames. This screen is then redrawn on the Personal Computer Screen. Using a mouse and type of field icons, the operator colors each Field on the appearing screen with a color that conesponds to the type of the field. For example, if we assume that Static Fields are assigned the color Yellow, then the operator clicks on the Static Field icon and then identifies Static Fields by drawing a rectangle around each Static Field. The software now identifies the selected areas on the screen as a Static Field and automatically enters the text within it to a data base to be translated into the target language at a later Stage. Similarly, the operator examines each frame and identifies to the software the Pass-Through Fields within each of the screen. The operator also identifies the Dynamic Fields, Floating or Scrolling Fields as well as the Numerical Fields. This operation identifies the screen format logic for this class of screens. This format extraction is applicable to a number of the digitized screens and as such need not be repeated for all of the digitized screens belonging to the same screen type. For example, a screen may have been digitized more than once to capture the various possible values for the Dynamic Fields within it. Therefore, identifying the location and types of the Fields within this screen is only needed to be done once for all the digitized instances of the same screen despite the differing text content in the Dynamic Field of the different frame capture files. The operator then labels each screen with a name. Screens differing only in the Dynamic Fields but having identical Static Fields are all labeled with the same name. With all the screens stored and labeled, the computer software is now in position to deduce locations of Static Fields and locations of Dynamic Fields. It can also list the various possible values of the Dynamic Fields. Also, the software is now in position to compute all of the required Identification Marks and checksum values. To compute the checksums of the lines in the Floating or Scrolling Fields, the operator may have at least one of two options. The first option is to attempt to scroll through
all of the possible texts that can be output from the machine. This is tedious and is prone to enor but in some case it is the only available method. The second option involves cooperation from the machine manufacturer. The manufacturer would make available to a developer of the Translator Module for this manufacturer's machine all of the possible texts that can appear in the Floating or the Scrolling Fields on the screen. This information can be loaded into the External Personal Computer, where the information that is needed by the Translator Module for on-line processing is extracted. Some text, such as ALARM messages, can be found in the publicly available manuals of the machines. Either way, once the possible contents of these fields are known, the software can compute the line-by-line checksum and the line-by-line Hash function value for each line of this text. It also will determine the vertical spacing of each contiguous text message in order to conectly determine the conect placement of the replacement text and to aid in the processing of Floating and Scrolling Fields as mentioned in the On-line Processing section of this paper. Given a hint about how the target machine highlights a text field, the software can automatically determine places of possible text highlight and produce the required information needed by the Translator Module. Color information extraction can be performed in the same way.
Text Translation In addition to the off-line operations already mentioned, one important operation remains. That operation is the generation of the compressed images of the replacement text in the target language. This operation is broken down into three high level steps. The first step is the extraction of the original language text from the captured screen images. The second is to translate these text messages into the target language. The third step is to take the text messages in the target language and produce compressed bitmap images of them to be loaded into the Translator Module.
Original Language Text and Font Extraction Extraction of the text from the captured images can use any number of character recognition techniques, including those available in a multitude of commercial software packages. However, the character recognition needed is considerably simpler. This is because the digitized screen images being processed are perfectly accurate, containing no
scanning enors, and the alignment of the captured screen is also perfect. In contrast to this perfection, optical scans of text always have some scanning noise in them and they are seldom perfectly aligned. Due to enors in scans and alignment, two separate optical scans of the same text image always differ in a number of pixel locations. Since the digitizer and also the Translator Module synchronously process the electronic image from the video cable, any two snap shots of the same output screen will produce identical data. This greatly simplifies the character recognition to one of a straight forward pattern matching operation. First, the font of the original language that is used in the machine is determined. This operation can be done manually, semi-manually, or completely automatically by techniques well known in the art. Once the font has been determined, the software is given some hints about the positions of characters on the screen. For example, one machine can display text as 80 columns on 24 lines. Armed with this information, and knowing the locations and dimensions of each of the Static, Dynamic, Numerical, Pass-Through, Floating and Scrolling Fields, the software can extract the screen text messages and store them in a word processing file by examining the digitized images of the machine screens that were captured earlier. It is possible that the text word processing file in the original language would be incomplete. One example is that it may have some but not all of the possible enor messages that the machine can produce. In collaboration with the manufacturer of the machine, the missing text messages are merged into the original language word processing file, either manually or automatically, before proceeding to translate it. Please bear in mind that even in the event that the text files could not be gotten from the manufacturer, translating the majority of the text that the machine operator encounters on a daily basis still yields a powerfully useful product. The fact that some infrequent messages, such as Alarms, might appear in the original language, while imperfect, is still very acceptable to the machine operators.
Mapping into the Target Language Once the text is extracted and stored in the word processing file, manual, automatic or semi-automatic translation is performed and the new text in the target language is entered into a target language word processing file. Automated human language translation software has greatly advanced in features over the last few years. However, even given today's state of the art in automatic translation, it is still advisable for a competent human translator to review the translation of automated translators before accepting it as conect.
Generating Target Language Compressed Replacement Images The word processing file in the target language is then processed by the software to produce the compressed images of each of the text phrases in the target language. The images are also tagged with the conesponding situation under which they should be injected into the output screen, such as in what screen, at what location and replacing original text canying which Identification Marks.
Cooperative Method The Cooperative Method differs from the Third Party Method in the degree of machine manufacturer cooperation and support. In the Third Party Method, reliance on the cooperation of the industrial machine manufacturer was minimal. Hence it was necessary to extract the required information about the structure and logic of each output screen by analyzing a number of digitized screen snap shots of the machine. A cooperative manufacturer can provide the Translator Module manufacturer with all the necessary information in electronic format. This reduces the need for digitization and analyzing the screens to be exhaustive. It also results in more robust operation. As mentioned earlier, this option is not always available and hence the need for the techniques outlined in the Third Party Method. In the summary above and in the various embodiments discussed, only certain combinations of elements are shown. It should be understood, however, that the invention encompasses other combinations of the disclosed elements even if the combination is not expressly identified. Having thus presented the invention, its operating principles and examples of possible embodiments for practicing the invention, it will be apparent to those skilled in the art that one could not set forth all possible embodiments that will readily occur to those skilled in the art. It is thus contemplated that those skilled in the art will readily devise variations and alterations of the disclosed embodiments and that they will also readily conceive entirely new embodiments. Accordingly, the invention is not intended to be limited to the disclosed embodiments, but also to encompass a considerable range of alternatives, being limited not by the foregoing disclosure but, rather, only as required by the claims which are appended hereto. What is claimed is: