WO2014045304A2

WO2014045304A2 - Method and apparatus for designing vision based software applications

Info

Publication number: WO2014045304A2
Application number: PCT/IN2013/000548
Authority: WO
Inventors: Vinay G VAIDYA
Original assignee: KPIT Cummins Infosystems Ltd
Current assignee: KPIT Technologies Ltd
Priority date: 2012-09-10
Filing date: 2013-09-10
Publication date: 2014-03-27
Anticipated expiration: 2015-03-10
Also published as: WO2014045304A3

Description

METHOD AND APPARATUS FOR DESIGNING VISION BASED SOFTWARE

APPLICATIONS

RELATED APPLICATION Benefit is claimed to Indian Provisional Application No. 2620/ U /2012 titled "A SYSTEM AND METHOD FOR PERFORMANCE CHARACTERIZATION" by KPIT Cummins Infosystems Private Limited, filed on 10^th September 2012, which is herein incorporated in its entirety by reference for all purposes. FIELD OF THE INVENTION

The present invention generally relates to the field of computer vision, and more particularly relates to designing vision based software applications. BACKGROUND OF THE INVENTION

A software application is designed to perform a specific task and is expected to perform the specific task with maximum accuracy. If the input data for the software application is perfect, then the output of the software application has to be accurate. Practically, the input data would have random variations and imperfections. Any software application deployed for use in real life environment is subject to variations in environmental parameters. Such variation may severely impact, the performance of the software application. For example, environmental parameters such as rain, snow, fog, dust, low light, camera parameters can effect a pedestrian detection software application.

Prior to deployment of a software application, the software application needs to be tested to check whether the software application meets the requirements. However, testing of the software application is time consuming and a cumbersome activity. The current test environment can test the software application with standard input data l T IN2013/000548

which is different as compared to input data in a real life environment. Consequently, the software application which has passed the test in the testing environment may not necessarily perform accurately when deployed in a real life environment.

BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS

Figure 1 is a block diagram illustrating an exemplary computing device configured for designing a robust vision based software application, according to one embodiment.

Figure 2 is a process flowchart illustrating an exemplary method of designing a robust vision based software application, according to one embodiment.

Figure 3 is a schematic representation depicting a characterization graph of accuracy of a vision based software application versus the set of parameters. Figure 4 is a flow diagram illustrating a process of evaluating performance of various modules of the vision based software application.

Figure 5is a process flowchart illustrating an exemplary method of evaluating performance of a vision based software application, according to another embodiment.

Figure 6is a process flowchart illustrating an exemplary method of determining a vision based software application based on optimal performance, according to yet another embodiment. Figure 7is a schematic representation depicting a characterization graph of accuracy versus set of parameters for a first vision based software application and a second vision based software application.

The drawings described herein are for illustration purposes only and are not intended to limit the scope of the present disclosure in any way. DETAILED DESCRIPTION OF THE INVENTION

The present invention provides a method and system for designing vision based software applications. In the following detailed description of the embodiments of the invention, reference is made to the accompanying drawings that form a part hereof, and in which are shown by way of illustration specific embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those, skilled in the art to practice the invention, and it is to be understood that other embodiments may be utilized and that changes may be made without departing from the scope of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined only by the appended claims.

Figure 1 is a block diagram illustrating an exemplary computing device 102 configured for designing a robust vision based software application, according to one embodiment. The computing device 102 includes a processor 104, a memory unit 106, a storage unit 114, an output device 1 16 and a communication interface 1 18 coupled to a bus 120.

The processor 104 may be configured to implement functionality and/or process instructions for execution within the computing device 102. The processor 104 may be capable of processing instructions stored in the memory unit 106 or instructions stored on the storage unit 114. The processor 104 may include any one or more of a processor, a controller, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or equivalent discrete or integrated logic circuitry. Additionally, the functions attributed to the processor 104, in this disclosure, may be embodied as software, firmware, hardware or any combination thereof.

The memory unit 106 may be configured to store information within the computing device 102 during operation. The memory unit 106 may, in some examples, be described as a computer-readable storage medium. The memory unit 106 may be described as a volatile memory, meaning that the memory does not maintain stored contents when the computing device 102 is turned off. Examples of volatile memories include random access memories (RAM), dynamic random access memories (DRAM), static random access memories (SRAM), and other forms of volatile memories known in the art. In some examples, the memory unit 106 may be used to store program instructions for execution by the processor 104.

For example, the memory unit 106 includes a media object generation module 108, a performance evaluation module 1 10, an output module 1 12 and an application re- designing module 126 stored in the form of program instructions for execution by the processor 104. The media object generation module 108, the performance evaluation module 1 10, the output module 1 12 and/or the application re-designing module 126 can be implemented as software, hardware, or some combination of software and hardware. For example, the media object generation module 108, the performance evaluation module 110, the output module 1 12 and/or the application re-designing module 126 could be implemented as a part of an application specific integrated circuit (ASIC). The memory unit 106 contains one or more vision based software application(s) 124 whose performance is to be evaluated under real life scenarios and environmental conditions. According to one embodiment, the media object generation module 108 is configured for automatically generating a plurality of media objects (image or video) from input media content by applying different values of a set of parameters to the input media content (pre-stored image(s)/video or streaming video). It can be noted that, the plurality of media objects contains information representing distinct real life scenarios and environmental conditions. For example, the distinct real life scenarios and environmental conditions may include but not limited to different lighting conditions, different times of the day and different times of the night, different weather conditions, various naturally occurring disturbances (e.g., radiations, magnetic fields, etc.). In one embodiment, the performance evaluation module 1 10 is configured forprocessing each of the plurality of media objects using modules of the vision based software application 124 stored in the memory unit 106 and evaluating performance of each of the modules of the vision based software application 124 based on the processing of the plurality of media objects by the respective modules. The output module 1 12 is

A

configured for outputting the evaluated performance of the modules of the vision based software application. The application re-designing module 126 is configured for redesigning one or more of the modules of the vision based software application based on the evaluated performance of the modules such that the performance of the vision based software application becomes optimal for all the distinct real life scenarios and environmental conditions, thereby making the vision based software application robust against all the distinct real life scenarios and environmental conditions.

In another embodiment, the computing device 102 is configured for evaluating performance of a plurality of vision based software applications 124. In such case, the performance evaluation module 110 is configured for processing each of the plurality of media objects using each of the vision based software applications 124 stored in the memory unit 106 and evaluating performance of each of the vision based software applications 124 based on the processing of the plurality of media objects by the respective vision based software applications 124. The performance evaluation module 1 10 is configured for determining a vision based software application whose performance in all the distinct real life scenarios and environmental conditions is evaluated as optimal among all the plurality of the vision based software applications 124. The output module 1 12 is configured for outputting the identified vision based software application on a display of the computing device 102. The detailed functionalities of the modules 108, 1 10, 1 12 and 126 are explained in description of Figures 2, 5 and 6.

The storage unit 1 14 may include one or more computer-readable storage media. The storage device 1 14 may include non-volatile storage elements. Examples of such non- volatile storage elements may include magnetic hard discs, optical discs, floppy discs, flash memories, or forms of electrically programmable memories (EPROM) or electrically erasable and programmable (EEPROM) memories. In addition, the storage device 1 14 may, in some examples, be considered a non-transitory storage medium. The term "non-transitory" may indicate that the storage medium is not embodied in a carrier wave or a propagated signal. However, the term "non-transitory" should not be interpreted to mean that the storage device 1 14 is non-movable. In some examples, the storage device 114 may be configured to store larger amounts of information than the memory unit 106. In certain examples, a non-transitory storage medium may store data that can, over time, change (e.g., in Random Access Memory (RAM) or cache). As shown, the storage unit 1 14 includes the media content 122 which may be an image or video. Alternatively, the media content 122 may be streaming video in which case the video is streamed over the Internet.

The communication interface 118 may be used to transmit or receive instructions over a network using a bus 120 coupled to the communication interface 118. The communication interface 118 may be a part of the processor 104 or may be a separate component. The communication interface 118 may be created in software or may be a physical connection in hardware. The communication interface 118 may be configured to connect with a network, external media, a display or any other component in the system, or in combinations thereof. Figure 2 is a process flowchart 200 illustrating an exemplary method of designing a robust vision based software application, according to one embodiment. At step 202, multiple media objects are automatically generated from input media content by applying different values of a set of parameters on the input media content. For example, the media content can be pre-stored image(s)/video or streaming video. The set of parameters may include noise, brightness, contrast, blurriness and the like. In real time, these parameters severely affect the quality of image/video captured by a camera fitted on an automobile. By varying the values of the set of parameters, multiple media objects containing information representing different real life scenarios and environment conditions (e.g., rain, snowfall, fog, dust, low light, etc.) are generated from the input media content (standard video/image). For example, the input media content is synthetically changed to obtain multiple media objects by adding noise of different distributions (e.g., Gaussian, salt and pepper, uniform, and speckle), changing brightness and contrast and introducing motion blurs. For example, in case the input media content is a RGB image, parameters such as contrast can be varied by changing RGB values of pixels of the image using the following equations:

Rout = (^Rin - 0·⁵) * ^tan {{.contrast + 1) * (π/4)) + 0.5

Gout = Gin - 0·⁵) * ^tan {{contrast + 1) * (π/4)) + 0.5

Bout = {^Bin - 0-5) * tan {{contrast + 1) * (π/4)) + 0.5 where, R_in, Gin, and B_in represent input intensity values of R, G, and B channels, and Rout, Gout, and B_out represent intensity values of R, G, B after varying contrast. For example, the contrast in the RGB image can be varied from - 0.75 to + 0.75.

In some embodiments, various models associated with specific real life scenarios and environment conditions are pre-stored in a database (e.g., the storage unit 1 14 , of Figure 1 ). In these embodiments, when a particular model is selected by a user via a graphical user interface of the computing device 102, values of parameters such as noise, blurriness, brightness and contrast corresponding to the model are applied on to the standard media content to obtain real life scenario and environmental conditions for which the performance of the vision based software application is to be characterized. For example, when a model for rain and/or snow conditions with variable density and orientation of rain length, size of snowflakes is selected, the computing device 102 may synthetically change the media content to obtain rain and/or snow like conditions and variable density and orientation of rain length and size of snowflakes. In this manner, any media object depicting a real life scenario and environmental condition is obtained. Hence, the need and efforts of capturing images/videos of real life scenarios and environmental conditions or evaluating performance of a vision based software application in a road test is eliminated, thereby saving time and cost involved in capturing real life images/videos. At step 204, each of the multiple media objects are processed using various modules of a vision based software application. In a preferred embodiment, the multiple media objects are iteratively run through the modules during processing the multiple media objects. The vision based software application is a software program, developed for use in various domains, whose performance needs to be evaluated under different real life scenarios and environmental conditions in the test environment. For example, a vision based software application developed for use in automobiles may be a driver assistance software application. The vision based software application may behave differently under different scenarios and different environmental conditions. Since each of the multiple media objects represent different real life scenario and environmental conditions, processing of the media objects by the various modules of the vision based software application would assist in performance characterization and hence, evaluation of the vision based software application. One skilled in the art would understand that the vision based software application processes the media objects in the test environment in the manner similar to processing of media objects when deployed in advanced driver safety systems mounted in automobiles. At step 206, performance of each of the modules of the vision based software application against each real life scenario and environmental condition is evaluated based on processing of each of the input media objects. In some embodiments, a characterization graph of accuracy of each module of the vision based software application versus the set of parameters is plotted based on processing of said each of the multiple media objects. An exemplary characterization graph for a vision based software application is shown in Figure 3.Typically, the accuracy of each module of the vision based application depends on the number of true detections and number of false detections obtained after processing each of the multiple media objects by the respective modules. For example, for a pedestrian detection vision based software application, the term 'true detection' refers to number of objects in a media object (image/video) that are accurately identified as pedestrians by the vision based software application, while the term 'false detection' refers to number of objects in the media object (e.g., image/video) that are erroneously identified as pedestrians by the vision based software application during processing of the media objects. In order to determine whether the vision based software application performs optimally in all the distinct real life scenarios and environmental conditions, ratio of false rejection and true detections outputted by each module is plotted against variation in the set of parameter values (e.g., brightness, noise, blurriness, contrast) of corresponding media objects in the characterization graph. Thus, the characterization graph provides a precise idea of the operating region of the vision based software application with respect to the values of pre-defined set of parameters.

At step 208, one or more modules whose accuracy is below a predetermined threshold are identified from the characterization graph. The accuracy refers to number of elements detected in the distinct real life scenarios and environmental conditions. The predetermined threshold is minimum number of elements desired to be detected by each of the modules in the distinct real life scenarios and environmental conditions.''- At step 210, the one or more identified modules are re-designed so that the vision based software application performs optimally in all the distinct real life scenarios and environmental conditions. In one exemplary implementation, software code logic and values of code parameters associated with the one or more modules is modified till the accuracy of the respective modules becomes equal to or greater than the predetermined threshold. The one or more modules are re-designed such that the performance of the vision based software application becomes optimal for all the distinct real life scenarios and environmental conditions, thereby making the vision based software application robust. The robust vision based software application is a software application providing optimal output in distinct real life scenarios and environmental conditions. Figure 3 is a schematic representation depicting a characterization graph 300 of accuracy of a vision based software application versus the set of parameters. In Figure 3, X axis represents the number of objects detected and Y axis represents the range of values of parameters. The number of objects detected varies from 0 to 1000. The parameter values are varied between a certain range of values. It can be seen from the characterization graph 300 that, the performance of the vision based software application varies and does not remain constant over a range of parameters. The desired output of the vision based software application (i.e., number of objects detected in the image/video after processing) need to be determined for a robust vision based software application. In the instant example, the number of objects desired to be detected by the vision based software application is 900. It is important to determine whether the vision based software application is robust for the range of parameters values (+0.5 to -0.5). For example, performance of various modules of the vision based software application needs to be evaluated in distinct real life scenarios and environmental conditions and then one or more modules are to be re-designed based on the evaluation of performance of the modules as illustrated in Figure 4.

Figure 4 is a flow diagram 400 illustrating a process of evaluating performance of various modules of the vision based software application.Typically, a vision based software application consists of a number of modules for carrying out different tasks and functions to achieve the desired output (to detect objects in the image(s)/video) with maximum accuracy. For example, a typical image processing and pedestrian detection application includes an image enhancement module, a scene recognition module, a segmentation module, a classification module and a tracking moduleand the objects to be detected in the image/video are pedestrians. It is to be noted that, the desired output and hence the desired objects to be identified in the image/video would vary based on the type of vision based software application.

Consider that, the vision based software application consists of 'N' modules as shown below. Amedia object (image/video) corresponding to a real life scenario and environmental condition is processed throughthe 'N' modules of thevision based software application. Consider that, the number of objects detected (visible to the naked eye) in an input image/videois 1000. When the input media object is run through the module Mi,the number of objects detected in the input image/videodrops to 995. Thus, the accuracy of the module Mi corresponds drop in objects detected in the input image/video which is equal to 995. The output of the module M-i is then given as an input to the module M₂. At the output of the module M₂, the number of objects detected further drops to 940 and so on uptothe module M_k. At the input of the module M_k, the number of objects detected is 900, while at the output of the module M_K, the number of objects detected drops to600.

By the time the input file is run through all the modules, at the final output of the module Mn, the number of objects detected is reduced to 500. As can be seen, the maximum drop in the number of detected objects is found across the module M_K. Hence, the module M_K is a target module identified and needs to be redesigned to reduce the drop in the number of objects detected in the image/video, thus increasing the accuracy and robustness of the vision based software application.

Accordingly, the module M_K is redesigned by modifying software code logic and values code parameter of the module M_Kand the whole evaluation process of Figure 4 is rerun. This process is repeated until the drop in number of objects detected before processing and after processing the image/video by the module is minimized. Thus, the evaluation process of Figure 4 identifies one or more modules for which number of objects detected decreases significantly (beyond a certain set threshold), leading to decrease in accuracy and robustness of the vision based software application. All such modules are then redesigned until the drop is reduced to as minimal as possible (i.e., below a pre-determined threshold). When the drop of accuracy across all the identified modules is minimized, the vision based software application becomes robust. It is to be noted that the pre-determined threshold maybe varied based on nature of the vision based software application.

Figure 5 is a process flowchart 500 illustrating an exemplary method of evaluating performance of a vision based software application, according to another embodiment. At step 502, multiple media objects are automatically generated from input media content by applying different values of a set of parameters on the input media content. At step 504, each of the multiple media objects are processed using a vision based software application. The vision based software application may behave differently under different scenarios and different environmental conditions. Since each of the multiple media objects represent different real life scenario and environmental condition, processing of the media objects by the vision based software application in the test environment would assist in performance characterization of the vision based software application. One skilled in the art would understand that the vision based software application processes the media objects in the test environment in the manner similar to processing of media objects when deployed in advanced driver safety systems mounted in automobiles.

At step 506, performance of the vision based software application in each real life scenario and environmental condition is evaluated based on processing of each of the input media objects. In some embodiments, a characterization graph of accuracy of the vision based software application versus the set of parameters is plotted based on processing of said each of the multiple media objects. Typically, the accuracy of the vision based application depends on the number of true detections and number of false detections obtained after processing each of the multiple media objects. For example, for a pedestrian detection vision based software application, the term 'true detection' refers to number of objects in the media object (e.g., image/video) that are accurately identified as pedestrians by the vision based software application, while the term 'false detection' refers to number of objects in the media object that are erroneously identified as pedestrians by the vision based software application during processing of the media objects. In order to determine whether the vision based software application performs optimally across all the distinct real life scenarios and environmental conditions, ratio of false rejection and true detections is plotted against variation in the set of parameter values (e.g., brightness, noise, blurriness, contrast) of corresponding media objects in the characterization graph. Thus, the characterization graph provides a precise idea of the operating region of the vision based software application with respect to the values of pre-defined set of parameters. Thus, the values of the set of parameters for which accuracy (i.e., higher number of true detections compared to false detections) of the vision based software application is maximum is determined based on the characterization graph, where the values of the set of parameters correspond to real life scenarios and environmental conditions. At step 508, the performance of the vision based software application in the distinct real life scenario and environmental conditions is outputted. In some embodiments, the performance of the vision based software application is outputted on a graphical user interface of the computing device 102. For example, the characterization graph may be displayed on the graphical user interface of the computing device 102.

Figure 6 is a process flowchart 600 illustrating an exemplary method of determining a vision based software application based on optimal performance, according to yet another embodiment. At step 602, multiple media objects are automatically generated from input media content by applying different values of a set of parameters on the input media content. At step 604, each of the multiple media objects are processed using different vision based software applications. One skilled in the art would understand that the vision based software application processes the media objects in the test environment in the manner similar to processing of media objects when deployed in advanced driver safety systems mounted in automobiles.

At step 606, performance of each of the vision based software applications in all the distinct real life scenarios and environmental conditions is evaluated based on processing of each of the input media objects by the respective vision based software applications. In some embodiments, a characterization graph of accuracy of each of the vision based software applications versus the set of parameters is plotted based on processing of said each of the multiple media objects by the respective vision based software applications. An exemplary characterization graph for two vision based software applications is depicted in Figure 7. Typically, the accuracy of each of the vision based application depends on the number of true detections and number of false detections obtained after processing each of the multiple media objects. For example, for a pedestrian detection vision based software application, the term 'true detection' refers to number of objects in a media object (e.g., image/video) that are accurately identified as pedestrians by the vision based software application, while the term 'false detection' refers to number of objects in the media object that are erroneously identified as pedestrians by the vision based software application during processing of the media objects. In order to determine whether which of the vision based software applications performs optimally across all the distinct real life scenarios and environmental conditions, ratio of false rejection and true detections is plotted against variation in the set of parameter values (e.g., brightness, noise, blurriness, contrast) of corresponding media objects in the characterization graph for each of the vision based software application. Thus, the characterization graph provides a precise idea of the operating region of the respective vision based software applications with respect to the values of pre-defined set of parameters. At step 608, a vision based software application whose performance is evaluated as optimal in all the distinct real life scenarios and environmental conditions is identified among the plurality of vision based software applications based on the characterization graph. In one exemplary implementation, each of the vision based software applications may have a unique identifier assigned for identification among the vision based software applications. The identifier associated with the vision based software application whose performance is optimal under all the distinct real life scenarios and environmental conditions is displayed in a graphical user interface of the computing device 102. In this manner, the best possible vision based software application can be identified for deployment in Advanced Driver Safety Systems (ADAS) mounted in automobiles.

Figure 7 is a schematic representation depicting a characterization graph 700 of accuracy versus set of parameters for a first vision based software application and a second vision based software application. The characterization graph 700 depicts a plot 702 of accuracy (i.e., percentage detections) versus set of parameters for the first vision based software application and a plot 704 of accuracy (i.e., percentage detections) versus set of parameters for the second vision based software application. It can be seen from the plots 702 and 704 that the accuracy of the first vision based software application remains fairly constant in region from -0.5 to +0.5 values, whereas, for the second vision based software application, the accuracy of the second vision based software application falls off drastically on both sides from peak value of approximately 70 percent. Based on the characterization graph 700, the computing device 102 evaluates the performance of the first vision based software application and the second vision based software application. Since the performance of the first vision based software application is better than the second vision based software application, the computing device 102 identifies the first vision based software application as suitable for real life scenario and environmental conditions for which the first vision based software application and the second vision based software application were tested. One skilled in art can envision that, the present invention can be modified to characterize performance of software applications in various domains. For example, for a serial to parallel code converter, multiple models are created in each domain by varying set of parameters such as number of threads or cores, hyper-threads enabled or disabled, granularity and complexity level of a code, parallelizability of the code, number of loops, etc. These parameters are then quantified to characterize performance of the serial to parallel code converter application. One skilled in the art will understand that the present invention provides a convenient and faster designing of robust software applications by dynamically subjecting input media content to changing real life scenarios and conditions. It is to be noted that the set of parameters is not limited to the examples provided in the disclosure and that the embodiments of the present invention may use a range of parameters based on various vision based software applications. As mentioned above, multiple set of parameters may be · configured for applications of different domains. Additionally, a single parameter or multiple sets of parameters may be synthetically varied based on requirement.

The present embodiments have been described with reference to specific example embodiments; it will be evident that various modifications and changes may be made to these embodiments without departing from the broader spirit and scope of the various embodiments. Furthermore, the various devices, modules, and the like described herein may be enabled and operated using hardware circuitry, for example, complementary metal oxide semiconductor based logic circuitry, firmware, software and/or any combination of hardware, firmware, and/or software embodied in a machine readable medium. For example, the various electrical structure and methods may be embodied using transistors, logic gates, and electrical circuits, such as application specific integrated circuit.

Claims

WE CLAIM:

1. A computer-implemented method of designing a robust vision based software application, comprising:

automatically generating, using a processor, a plurality of media objects from input media content by varying a set of parameters, wherein the plurality of media objects contain information representing distinct real life scenarios and environmental conditions;

processing each of the plurality of media objects using modules of a vision based software application;

evaluating performance of each of the modules of the vision based software application based on processing of the plurality of media objects; and

re-designingat least one module of the vision based software application based on the evaluated performance of the modules of the vision based software application suchthat the performance of the vision based software application becomes optimal for all the distinct real life scenarios and environmental conditions, thereby making the vision based software application robust.

2. The method of claim 1 , wherein automatically generating the plurality of media objects from the input media content by varying the set of parameters comprises:

generating the plurality of media objects by applying different values of the set of parameters to the input media content, wherein the different values of the set of parameters correspond to the distinct real life scenarios and environmental conditions.

3. The method of claim 2, wherein processing each of the plurality of media objects using the modules of the vision based software application comprises:

iteratively processing the plurality of media objects through each of the modules of the vision based software application.

4. The method of claim 3, wherein evaluating the performance of the modules of the vision based software application based on processing of the plurality of media objects, comprises:

plotting a characterization graph of accuracy of each of the modules versus the set of parameters based on the processing of said each of the plurality of media objects by the respective modules of the vision based software application; and

identifying at least one module whose accuracy is below a pre-determined threshold based on the characterization graph.

5. The method of claim 4, whereinre-designing the at least one module of the vision based software application based on the evaluated performance of the modules of the vision based software application, comprises:

modifying software code logic and values of code parameters associated with the at least one module so that the vision based software application performs optimally in all the distinct real life scenarios and environments.

6. The method of claim 5, wherein modifying the software code logic and values of code parameters associated with the at least one module comprises:

modifying the software code logic and values of code parameters associated with the at least one module till the accuracy of the at least one module becomes equal to or greater than the pre-determined threshold.

7. The method of claim 1 , wherein the input media content comprises one of an image, a set of images, a pre-recorded video, and streaming video.

8. A computer-implemented method of evaluating performance of a vision based software application, comprising:

automatically generating, using a processor, a plurality of media objects frominput media content by varying a set of parameters, wherein the plurality of media objects contain information representing distinct real life scenarios and environmental conditions; processing each of the plurality of media objects using a vision based software application;

evaluating performance of the vision based software application based on processing of the plurality of media objects; and

outputting the evaluated performance of the vision based software application based on the processing of the plurality of media objects.

9. The method of claim 8, wherein automatically generating the plurality of media objects from the input media content by varying the set of parameters comprises:

10. The method of claim 9, wherein evaluating the performance of the vision based software application based on the processing of the plurality of media objectscomprises: plotting a characterization graph of accuracy of the vision based software application versus the set of parameters based on the processing of said each of the plurality of media objects.

11. The method of claim 10, wherein outputting the evaluated performance of the vision based software application based on the processing of the plurality of media objects comprises:

displaying the characterization graph of accuracy of the vision based software application versus the set of parameters based on the processing of said each of the plurality of media objects.

12. A computer-implemented method of identifying a robust vision based software application, comprising:

processing each of the plurality of media objects using a plurality of vision based software applications;

evaluating performance of each of the vision based software applications based on processing of the plurality of media objects by said each of the vision based software applications; and

identifyinga robust vision based software application from the plurality of vision based software application, wherein the performance of the identified visions based software application in the distinct real life scenarios and environment conditions is evaluated as optimal.

13. The method of claim 12, wherein automatically generating the plurality of media objects from the input media content by varying the set of parameters comprises:

14. The method of claim 13, wherein evaluating the performance of each of the vision based software applications based on the processing of the plurality of media objectscomprises:

plotting a characterization graph of accuracy of each of the vision based software applications versus the set of parameters based on the processing of said each of the plurality of media objects by the respective vision based software applications.

15. The method of claim 14, wherein determining the vision based software application from the plurality of vision based software applications comprises:

identifying the vision based software application among the plurality of vision based software applications based on the characterization graph of the plurality of vision based software applications.

16. An apparatus comprising:

a processor; and

a memory unit coupled to the processor, wherein the memory unit comprises:

an media object generation module configured for automatically generating a plurality of media objects frominput media content by varying a set of parameters, wherein the plurality of media objects contain information representing distinct real life scenarios and environmental conditions; and

a performance evaluation module configured for processing each of the plurality of media objects using a vision based software application, and evaluating performance of the vision based software application based on processing of the plurality of media objects.

17. The apparatus of claim 16, wherein the memory unit comprises an output module configured for outputting the performance of the vision based software application. .·^':

18. The apparatus of claim 16, wherein in evaluating performance of the vision based application based on processing of the plurality of media objects, the performance evaluation module is configured for:

plotting a characterization graph of accuracy of the vision based software application versus the set of parameters based on the processing of said each of the plurality of media objects by the vision based software application; and

19. The apparatus of claim 18, wherein the memory unit comprises an application re- designing module configured for re-designing the at least one module of the vision based software application based on the evaluated performance of the vision based software application suchthat the performance of the vision based software application becomes optimal for all the distinct real life scenarios and environmental conditions, thereby making the vision based software application robust.

20. The apparatus of claim 19, wherein in re-designing the at least one module of the vision based software application, the application re-designing module is configured for: modifying software code logic and values of code parameters associated with the at least one module so that the vision based software application performs optimally in all the distinct real life scenarios and environments.

21. An apparatus comprising:

a processor; and

a memory unit coupled to the processor, wherein the memory unit comprises:

a media object generation module configured for automatically generating a plurality of media objects from input media content by varying a set of parameters, wherein the plurality of media objects contain information representing distinct real life scenarios and environmental conditions; and

a performance evaluation module configured for:

determining a vision based software application from the plurality of vision based software application whose performance in the distinct real life scenarios and environment conditions is evaluated as optimal.

22. The apparatus of claim 21 , wherein in evaluating performance of each of the vision based applications based on processing of the plurality of media objects, the performance evaluation module is configured for:

23. The apparatus of claim 22, wherein in determining the vision based software application from the plurality of vision based software applications, the performance evaluation module is configured for: