WO2010055159A2

WO2010055159A2 - Completely automatic public turing test to tell computers and humans apart (captcha) based on a multiple of different captcha methods

Info

Publication number: WO2010055159A2
Application number: PCT/EP2009/065231
Authority: WO
Inventors: Hans Christian Meyer; Knut Tharald Fosseide
Original assignee: Lumex AS
Current assignee: Lumex AS
Priority date: 2008-11-14
Filing date: 2009-11-16
Publication date: 2010-05-20
Anticipated expiration: 2011-05-14
Also published as: WO2010055159A3

Abstract

The present invention is related to the field of Completely Automatic Public Turing test to tell Computers and Humans Apart (CAPTCHA) methods comprising selecting CAPTCHAs based on linguistic coherent content of sentences, or using graphics based CAPTCHAs being conveyed to a user terminal. In addition, a random delay may be introduced when a user is responding to the CAPTCHA on the user terminal.

Description

Completely Automatic Public Turing test to tell Computers and Humans Apart (CAPTCHA) based on a multiple of different CAPTCHA methods.

The present invention is related to the field of Completely Automatic Public Turing test 5 to tell Computers and Humans Apart (CAPTCHA) methods comprising selecting CAPTCHAs based on linguistic coherent content of sentences, or using graphics based CAPTCHAs being conveyed to a user terminal. In addition, a random delay may be introduced when a user is responding to the CAPTCHA on the user terminal. o Modern computer technology has provided humanity with an extremely versatile tool that can be used in many human activities. However, the great flexibility inherent in modern computer technology also provides means for very effective malicious use of attacking software on computer systems. Well known examples are SPAM and computer virus infections, WORMS etc. There are many types of counterattacks techniques (i.e. anti SPAM and anti virus programs etc.) in prior art. One interesting and intriguing problem in it self is the possibility a computer system has to distinguish between human users and computers. If a remote computer system is trying to connect to a local computer system and is pretending to be (simulating) a human user, and this can be revealed, many types of malicious attacks such as SPAM can be avoided as0 known to a person skilled in the art. An example of such a technique is called Completely Automatic Public Turing test to tell Computers and Humans Apart (CAPTCHA). Other similar prior art techniques are called Human Interactive Proofs (HIP). In prior art there are many examples of CAPTHCA solutions. However, the CAPTHAs that are normally used are based on distorting an image containing a wordS such that only a human can interpret the distorted image while a computer based Optical Character Recognition (OCR) system for example will have problems identifying the word in the image. Whenever a user initiates a session with a computer system the system presents a CAPTCHA to the user on a display, and the computer system instructs the user to enter the word viewed in the image as an ASCII text string on a0 keyboard connected to the computer system the user is using. This type of "handshake" or feedback will then insure that it is actually a person and not a computer system that at present is establishing the session with the computer system. If it is decided that it is actually a person participating in the session, then access is granted, otherwise the access is denied. Ib

One of the challenges with this technique is of course that for example OCR systems are being developed further, and can be able to identify the content of a distorted image of a certain type of CAPTHCA. For example, Greg Mori and Jitendra Malik: Recognizing objects in adversarial clutter: Breaking a visual CAPTHCA, Conference on Computer Vision and Pattern recognition (CVPR 03), volume I, 2003, disclose such a technique.

Another problem is that the distortion of an image must follow some rules to ensure that the distorted text still can be interpreted by a human. In other words, CAPTCHAs based on image distortion cannot introduce large scale random noise in the image. The distortion must follow a "secret" algorithm, or be limited to some extent by an algorithm, which makes it possible to detect and reveal this secret algorithm for others, and then the security provided for by this CAPTCHA solution is lost.

The publication "Towards Human Interactive Proofs in the Text-Domain - Using the Problem of Sense- Ambiguity for Security" by Richard Bergmair, and Stefan Katzenbeisser, Lecture notes, Springer Berlin/Heidelberg, ISSN 0302-9743, 2004 disclose the general problems related to sense ambiguity since there exist word synonyms in natural language. The proposal for a method comprises steps replacing randomly a word with another word thereby providing a probability that a human would identify such a sentence as being without any linguistic meaning. The main goal of this publication is to identify word-sense ambiguity as very promising linguistic phenomena to build a secure text based HIP upon.

Another publication is the "Text-based CAPTCHA Algorithms" by Philip Brighton Godfrey from December 15, 2001 that disclose an example of text based CAPTCHA comprising an algorithm selecting a noun, verb, or adjective from a source text. The algorithm then replace the chosen word with another "bogus" word selected randomly - either from a set of words of the same part of speech as the chosen word, or more generally, according to some probability distribution. However, as pointed out by the author himself, it is difficult to obtain a robust CAPTHCA system based only on these principles. For example, a trigram model was used that provided a measure of the probability that a bogus word existed in natural language in the source sentence used to construct a specific CAPTCHA together with the foregoing word relative to the bogus word, and together with the consecutive word relative to the bogus word in the source sentence (the trigram) constituting the CAPTCHA. By using a trigram model, a program was able to defeat the CAPTCHA. One other problem was that a human would many times have a problem of detecting the sentence with the bogus word since the trigram model used to design the CAPTCHA would give a certain level of confidence that the combination of the bogus word together with the other words actually appears in natural language.

Another problem related to image based CAPTCHAs is that visual impaired persons cannot access a computer system if the computer system requires a visual CAPTCHA to be identified for granting access to the user. In many countries it is by law required that handicapped persons should be given the same opportunities as other persons. Therefore, a visual based CAPTCHA system could in many countries be banned for public use out of this reason.

Therefore there is a need for an improved type of CAPTCHA method in addition or as an alternative to graphically based CAPTCHAs that provides better protection against being revealed, and at the same time can be used by visually impaired persons. According to an aspect of the present invention such a goal can be met by having a CAPTCHA system that can switch between using graphics based CAPTCHAs or linguistically based CAPTCHAs, either using both methods at the same time, or using one of them one at the time, for example randomly, or by a user requests from for example visually impaired persons trying to contact a computer system via a Brail type of terminal. The CAPTCHA method based on linguistic coherence of contents of sentences is based on the fact that only a human can understand the meaning of a content (linguistic coherent content) of a sentence, and therefore a human can also identify an otherwise grammatically correct sentence as nonsense, or that comprises a linguistic incoherent content. Visual impaired persons can be presented such sentences by converting them to sound or by printing them on a Brail printer or Brail input/output device, as known to a person skilled in the art. If the user of a computer system is presented with, on a connected appropriate output device, a collection of sentences, wherein all are coherent except for one that is linguistically incoherent, the user can be granted access if the user can point out which sentence is the incoherent sentence, for example by moving a cursor on a screen displaying the sentences and then highlight the incoherent sentence, or when using a Brail input/output device pushing a certain button, for example. If a user fails to identify (which could of course be a computer program simulating being a human user), a set of new sentences are generated and the user is again prompted to make a selection. This scheme would then continue until a correct selection has been made by the user.

According to yet another example of embodiment of the present invention a linguistic based CAPTCHA according to the present invention may be presented to a user as a graphics based CAPTCHA as known to a person skilled in the art. The response to the CAPCHA can then be to tell if the sentence visualized in the CAPTCHA graphics is false or true, or there is a multiple of graphics based CAPTCHAs comprising linguistically based CAPTCHAs and the response can be to enter a sequence number identifying which CAPTCHA image that is not linguistically coherent. In another example of embodiment, each graphically based CAPTCHA comprises a keyword in addition to the linguistically based CAPTCHA sentences which is to be typed as a response from the user. This key word can be the last word of the sentence, or the keyword is highlighted with a certain colour, for example.

It is further an aspect of the present invention that the method comprises selecting randomly either a graphics based CAPTCHA, a linguistic based CAPTCHA, or selecting one at a time, or together which the user must respond to. By introducing several types of CAPTCHA mechanisms in the same system CAPTCHA defeating software faces an increased challenge.

According to another aspect of the present invention, all CAPTHA sentences are screened with known grammatical tools and other word processing tools etc. as known in prior art from word processing technology in computer systems. Therefore, a computer system will not be able to interpret which sentence that is wrong by identifying grammatical flaws etc. According to yet another aspect of the present invention, sentences are collected from a secret corpus of text when designing sentences for the CAPTCHA. Otherwise, the correct sentences except the incorrect sentence would be possible to identify via an Internet search, for example. However, the incoherent sentences designed according to the present invention can be retrieved from a public corpus, for example via Internet searches. Since the sentences comprised in the CAPTCHA is partly from a secret corpus, or is an incoherent sentence, the score values or hit rate for each sentence when used in an Internet search would leave the search results inconclusive.

According to an example of embodiment of the present invention, a private corpus can be used as a source text for providing linguistic coherent sentences while linguistic incoherent sentences can be constructed out of the private corpus and/or in combination with sentences from the public domain, for example from Internet searches. An example of a private corpus can be archive material that is no longer part of on-line Internet accessible documents, for example older newspaper articles that are archived off-line by the newspaper publisher.

According to an example of embodiment of the present invention, a complete sentence is combined with another complete sentence via identifying a common n-gram (a common phrase that is either detected, or by selecting a phrase and then search the private corpus for sentences comprising the selected phrase or n- gram) in both sentences, and then produce a linguistic incoherent sentence starting with one of the sentences ending in the common identified or selected n-gram part of the sentence and then continue using a second part of the other sentence from the end of the common identified or selected n-gram in this second sentence. According to yet another example of embodiment of the present invention a plurality of sentences can be combined through n-grams.

According to another example of embodiment of the present invention, a plurality of sentences are combined from different sentences, wherein the different sentences are picked from the private corpus according to a minimum number of words in the sentences. According to this example of embodiment, a center word in a first sentence is used with neighboring words and checked through an n-gram search for a frequency of occurrence in a natural language. If the returned value is not acceptable (below a preset threshold) an n-gram is selected by shifting the n-gram towards the beginning of the sentence (or towards the end of the sentence). When an accepted n-gram is identified this one is used when joining the sentences.

According to yet another example of embodiment of the present invention, a wording joining parts of a sentence is chosen to be words providing a deduction or a deductive context between parts of sentences. A set of words and phrases that can be searched for in a sentence being the basis for joining the sentences can be from the set of words comprising but not limited to: "because of this", "consequently", "implicating", "thus", "therefore" etc.

According to yet another example of embodiment of the present invention, words indicating a natural joining such as "and" and/or "or" can be used to join sentences. If a first sentence is retrieved from a first section of a private corpus comprising a first subject, and then is joined by a second sentence from a second section comprising a second subject, the joining of the sentences provides incoherent sentences.

According to another aspect of the present invention, there is always a possibility that any CAPTCHA method may provide results that to some extent can be disclosed, and a malicious software can be constructed that circumvent some of the CAPTCHAs that are generated. For prior art CAPTCHA solutions, it is possible to use publicly available software that circumvent many CAPTCHAs, or automatically identifies the content of a CAPTCHA. According to an example of embodiment of the present invention, any such CAPTCHA defeating software can be used to verify a CAPTCHA generated by embodiments of the present invention. Whenever new defeating software enters the marketplace, any such software will be updated in a system according to the present invention.

Only those CAPTCHAs that survives a test like this will be used. However, since the ability of for example "spammers" to actually be able to defeat a CAPCHA system, the present invention provides in addition to the CAPCHA itself a random delay in the response sequence is introduced when a user is responding to the CAPCHA. The rationale for this additional step is that such delays makes it impractical for a CAPTCHA defeating software to reveal the CAPTCHA because of the time spent to reveal the CAPTCHA. The usual SPAM tactics is to attack a mail server system, for example, and try to be registered as a legitimate user of the mail server. This is often done in parallel, but the number of tries that has to be conducted to defeat a CAPTCHA is very high. If then a random delay is introduced, for example as an algoritme providing a randomly genreted delay, as a puzzle or as additional questions when a user is trying to register, the total time that the SPAM software needs to use to statistically defeat the CAPCHA will be too long. For an ordinary human user, this impose no problems. Usually it takes minutes anyhow to be registered as a user due the amount of information that has to be registered to create for example an e-mail account.

According to an example of embodiment of the present invention, the CAPTCHA is presented as part of an information element, wherein the information element initiates actions in the computer system connected to a terminal used by the user and/or by interactive actions performed by the user on the user terminal, wherein these initiated actions provides a random delay when transmitting a response back to for example the server that presented the CAPTCHA to the user.

Figure 1 illustrates examples of CAPTCHA sentences with randomly exchanged words.

Figure 2 illustrates examples of CAPTCHA sentences comprising a sentence with multiple endings according to the present invention.

Figure 3 illustrates another example of CAPTCHA sentences according to the present invention.

Figure 4 illustrates a flow diagram depicting method steps for providing a random delay before a user responds to a CAPCHA. The Internet makes more and more information instantly available to both human users and machines. Even though it is a point to be able to distinguish between humans and machines, most documents that are available through Internet searches are directly available for both humans and machines. The task of distinguishing between a human user and a machine is basically related to services initiated over the Internet, for example e-mail services and the related SPAM problem as known to a person skilled in the art.

Text based CAPTCAH methods according to the present invention need a private corpus as a source for providing correct sentences. If not all the correct sentences used in the CAPTCHA method is from a private or secret text collection, the correct sentences could be revealed by an Internet search. However, according to an aspect of the present invention a private corpus of text can be provided for at any instance of time as basis for text based CAPTCHA methods since Internet on-line documents in many instances are archived and made off-line after a time period. For example, a publisher of a newspaper archives older newspaper articles, otherwise the server system the publisher is using would rapidly be overloaded. To be able to provide a reasonable response time over networks for a plurality of users accessing the on-line newspaper editions, the server must be restricted to handle only a limited amount of information. It is also in the nature of a newspaper operation only to provide the latest and "hot" news. However, most publishers keep records of older newspapers off-line that can be made available for a user on request. Therefore, at any arbitrary instance in time it is possible to retrieve a private corpus comprising for example older newspaper articles. According to yet another aspect of the present invention, the private corpus do need to be private only for a short time period, i.e. the time period the method according to the present invention is actually using the specific private corpus. The private corpus can at any time be changed with content, or even periodically or randomly being updated or changed at the beginning of for example each normal working day. Any malicious CAPTCHA defeating software that would rely on identifying the private corpus must therefore request access to huge amounts of off-line documents, making them on-line or available before being able to identify the private corpus. This time period the CAPTCHA defeating software would need to achieve this goal would therefore be too long to be of any practical use for this malicious software, since the method according to the present invention can change the private corpus at any arbitrary point of time. It is further an aspect of the present invention that CAPTCHA sentences can be construed or designed out of sentences from the private corpus and/or publicly available documents, i.e. Internet retrieved documents, since a valid CAPCHA sentence of course should not be found in an Internet search. Therefore, in an example of embodiment of the present invention, any construed or designed CAPTCHA sentences according to the present invention is checked through Internet searches before being used. Further, to validate a CAPTCHA sentence as an acceptable linguistic construction, any known word processing tool such as a grammar check etc. is used. Therefore, no CAPTCAH defeating software can rely on defeating the CAPTCHA sentence just by checking the grammar, for example. It is further an aspect of the present invention to use also CAPTCHA defeating software algorithms as part of the validation of a CAPTCHA according to the present invention. If a construed or designed CAPTCHA actually are defeated by one or more defeating algorithms, this particular CAPTCHA is not used as a CAPTCHA. In this manner, the present invention provides tools for verifying validity of construed or designed CAPTCHA sentences at any time before use. It is further an aspect of the present invention to be able to update a system embodying the method according to the present invention with any further improvements of any existing or new CAPTCHA defeating software.

According to an example of embodiment of the present invention, a CAPTCHA can be designed out of any sentence form the private corpus and/or publicly known sentences. A first step of this example of embodiment of the present invention would be to provide a set of phrases, for example from n-gram statistics, wherein the set of phrases that are used for the set of phrases must contain a certain number of words and have a certain n- gram probability. Then this example of embodiment randomly selects a phrase from the set of phrases (for example a combination of three words, four words etc., and must for example comprise the word "and"). According to an example of embodiment of the present invention, method steps can be:

1. Identify sentences in a private corpus that includes at least one selected phrase, with possible restrictions, e.g. must have a number of characters before the first phrase, between phrases and after the last phrase etc.

2. Filtering of the identified sentences, e.g. removing sentences from being selected that have an uncommon word on both sides of the at least one phrase or that have too many special characters in them.

3. Identifying more sentences either in a private or public corpus comprising the same at least one phrase and possible restrictions as defined in step 2.

4. Splitting the sentences from 3. into parts at the at least one phrase. 5. Join parts from different spitted sentences at the at least one phrase creating new sentences that are linguistically incoherent sentences.

6. providing an automated grammar check on the incoherent sentences, and removing sentences with detected grammatical errors. The outcome of this example of method according to the present invention, are coherent sentences from step 5. or 6., and incoherent sentences from step 5. or 6.

According to another example of embodiment of the present invention, joining of two sentences comprises the steps of:

1. Selecting sentences from a private corpus.

2. Filtering the selected sentences, e.g. removing sentences that have two or more occurrences of an uncommon word or have too many special characters in them.

3. Selecting sentences from a private or public corpus. 4. Selecting at least one acceptable n-gram from sentences selected in step 3. An acceptable n-gram can be identified by either step a. or b. a. Starting from the center word of the sentence testing frequency of n- grams selected from that word with neighboring words in the sentence, if not acceptable frequency is detected, test n-grams shifted one word either towards the beginning or end of the sentence. Repeat until an acceptable n-gram is found or all n-grams that include all but the closed words to the sentence endpoints have been tried. b. Generate all n-grams from words removed from the sentences endpoints by a given limit, and use the most frequent, if this frequency is acceptable

5. The sentences from step 3. that do not have acceptable n-grams are dropped from further use.

6. Identifying sentences that include the n-grams (with restriction on placement inside the sentence, removed from the ends) from 4. in a private or public corpus.

7. Splitting the sentences from 4. and 6. which then represents n-gram pairs. Split the sentences around the n-gram and either the first part from 4. and the last part from 6. or vice versa could be used as false sentences.

8. Running an automated grammar check on the false sentences, and remove sentences with detected grammatical errors.

Another example of embodiment of the present invention is listed below comprising iteration:

1. Select sentences from a private corpus. 10

2. Filtering the results, e.g. removing sentences that has two or more occurrences of an uncommon word or have too many special characters in them.

3. Select sentences from a private or public corpus

4. Select acceptable n-grams from sentences in 3. This could be done in at least two ways: a. Starting from the center word of the sentence testing frequency for n- grams selected from that word with neighboring words in the sentence, if not acceptable frequency test n-grams are shifted one word either towards the beginning or end of the sentence. Repeat until an acceptable n-gram is found or all n-grams that include all but the closed words to the sentence ends have been tried. b. Generate all n-grams from the words removed from the sentences ends by a given limit, and use the most frequent, if this frequency is acceptable. 5. The sentences from 3. that do not have acceptable n-grams are dropped.

6. Identify sentences that include the n-grams (with restriction on placement inside the sentence, removed from the ends) from 4. in a private or public corpus.

7. Do step 4. -6. for the part of the sentence after the n-gram, i.e. select a new n- gram for the sentence from 6. after the n-gram the sentence has been selected for, and finds sentences that include this new n-gram.

8. Do iteration 7. until the number of sentences are generated.

9. For false sentences start with the sentence from 3., use the part before the n- gram and add the last part from 6. with new replacements by the iteratively identified n-grams and sentences.

There is another alternative of combining false and true sentences, it is mixing a true sentence with versions where the last part of the sentence are given as multiple choice of the true ending and other endings from other sentences. For example, the steps can then be: 1. Select sentences from a private corpus that fulfill predetermined criteria

(containing a phrase, or having a sufficient length etc.).

3. Use the predetermined n-gram or find a suitable n-gram from the sentences. 4. Identify sentences from a private or public corpus comprising the n-gram and required length after the n-gram.

5. Make sentences having the first part (before the n-gram) from 1. or 2. and the last part from 4.

6. Running an automated grammar check on the false sentences, and remove sentences with detected grammatical errors. 11

According to another example of embodiment of the present invention sentences are clipped somewhere, preferable around the middle, and then combined again in a different order before running a grammatical test of the resulting CAPTCHA sentence, removing sentences that do not pass the test.

According to another example of embodiment of the present invention, substitute words and/or n-grams are added generating false sentences.

According to another example of embodiment of the present invention, n-grams are changed, but keeping the word classes in order to keep the grammatical structure of the sentence.

According to another example of embodiment of the present invention, fixed substitutions are used, e.g. substituting a phrase like "a good boy" with "a fat elephant", A substitution table of n-grams may be provided that might be set up manually, but which could be used many times.

According to another example of embodiment of the present invention at least one word in the n- gram is replaced with another of the same class randomly.

According to another example of embodiment of the present invention one word in the n-gram is selected, and then generating all n-grams with this one word and the correct word classes. Either we select the n-gram with the highest frequency or select randomly from n-grams having frequencies with a given fraction of the highest frequency.

Typical examples would be adding or removing a negation, e.g. changing "because of the" to "not because of the" or removing the word.

According to another example of embodiment of the present invention, names of famous people (or at least known to the user of the CAPTCHAs) could be substituted in sentences, e.g. substituting "Hitler", with "Gandhi" or "Albert Einstein" with "Donald Duck" would normally lead to sentences which are obviously false. 12

According to another example of embodiment of the present invention, substitutions are using antonyms (opposites, e.g. "hot" with "cold", "big" with "small" etc.) as this would make the falsehood of the generated sentences clearer.

A CAPCHA according to the present invention can be presented to a user as a sentence among a plurality of sentences. The expected response to the CAPTCHA is that the server providing the CAPCTHA to the user terminal expects the user to point out which sentence is the linguistic incoherent sentence, for example by highlighting the sentence, pressing down a certain key on the keyboard etc. Such interactive "handshake" methods are well known in the prior art. It is also an aspect of the present invention that the sentences including the CAPTCHA sentence is conveyed to the user as a computer generated voice. Such equipment is well known in the prior art. Another example of terminal can be a Brail type of terminal. The response to the CAPCHA from the user can in any instance be the pressing of a certain key on a keyboard, either a standard keyboard or a Brail type of keyboard.

It is further an aspect of the present invention that CAPCHA sentences can be presented to a user as part of a graphical based CAPCHA, wherein any known CAPCHA based on graphical distortion of an image comprising the text based CAPCHA can be used. The image with the text based CAPCHA can be presented as one image among a plurality of images. The expected response from the user can be to input a number associated with the image comprising the linguistic incoherent sentence, or by typing a certain keyword, for example the last word in the sentence.

This combination of CAPCHA methods increase the challenge for the CAPCHA defeating software.

It is further an aspect of the present invention to combine a pure graphics based CAPCHA as known to a person skilled in the art, a text based CAPCHA according to the present invention, or as a combination of these CAPCHA methods with an information element being part of the CAPCHA that initiates actions in the computer system connected to the terminal used by the user and/or demands interactive actions performed by the user on the terminal, wherein these initiated actions provides a random delay for the expected response from the user to the presented CAPCHA These actions must be performed before the user response to the CAPCHA is transmitted from the computer system used by the user to the server issuing the CAPCHA. 13

This example of embodiment of the present invention is especially suited when users for example register their personal information when opening for example an e-mail account. This process is known in prior art to comprise a CAPCHA to prevent SPAM to be sent from this user account. The delay introduced by this information element does not delay the registration process as such, but any CAPCHA defeating software will experience a considerable delay in the trial responses from the CAPCHA defeating software. As known to a person skilled in the art, the CAPCHA defeating software has to try many alternatives to be able to pass the CAPCHA test. The defeating software has to wait for the response from the server to be able to establish if the CAPCHA response was the correct response or not. The aspect of the present invention that the CAPTCHA is either a graphics based CAPCHA, a linguistic based CAPCHA or a combination of these methods adds further processing requirements on the CAPCHA defeating software. The CAPCHA defeating software must first enter an analysis process to determine what kind of CAPCHA that has to be defeated. Even though an identification has successfully been conducted, each trial to defeat the CAPCHA must comprise a new CAPCHA method identification since each session comprising a CAPCHA can be different each time. According to another example of embodiment of the present invention, the time delay is provided by an algorithm that generates a volume of data that occupies the network during the random generated delay.

It is within the scope of the present invention that each CAPCHA method mentioned above can be embodied alone, or in any combination of the above examples of embodiments. For example, a graphics based CAPCHA as known in the prior art can be combined with the information element providing the delay, a sentence based CAPCHA can be provide alone or in combination with the delay, or be part of an graphics based CAPCHA as described above. This combined CAPCHA can also be combined with the delay.

It is within the scope of the present invention that the delay introduced in the computer system used by the user can be any known type of algorithm, puzzle, questionnaire etc. that can provide a random delay. This algorithm must terminate before the user response to the CAPCHA is transmitted back to the server that provided the CAPCHA.

Figure 4 illustrates an example of embodiment of the present invention comprising introducing a random delay in the CAPTCHA process. A Web mail host 1 is requested by a user 7 to open an account. The Web mail host 1 then transfer a CAPTCHA 6 comprising an information element 5 that comprises for example a data generating 14 procedure which occupies the network during a timing slot that has a randomly generated length as defined in step 3. The user 4 can either be a legitimate user or a Spammer. When the random time period from 3 has elapsed a CAPCTCHA response 2 from the user 4 is transferred back to the Web mail host 1 that decides if the user 4 is a legitimate user or not.

It is further within the scope of the present invention that any configuration of server/client in a network can be used, and also any type of software configuration. The computer system that provides the CAPCHA can be different from the computer system that receives the response from the user, for example.

Claims

15 C L A I M S :

1.

A method for generating a Completely Automatic Public Turing test to tell Computers and Humans Apart (CAPTCHA), wherein a server computer system transmits the CAPTCHA to a client computer system, wherein the CAPTCHA is conveyed to a user on a terminal connected to the client computer system, wherein the user responds to the CAPTCHA via terminal interactions, wherein the method comprises:

selecting either to use a graphical image based CAPTCHA, a linguistic based

CAPTCHA or a combination of the graphical based CAPTCHA and the lingustic based CAPTCHA providing sentences of the lingustic based CAPTCHA as part of the image used in the graphical image based CAPTCHA.

2.

The method according to claim 1 , wherein the lingustic based CAPTCHA is provided for by using a CAPTCHA sentence to be used among a plurality of other linguistically correct sentences in a multi choice application in a computer system, wherein sentences are provided for as computer coded text in at least one dataset, wherein both sentences and respective characters, words in each respective sentence in the at least one dataset are computer readable, comprising:

identifying at least a first sentence and at least a second sentences from the at least one dataset having a phrase or n-gram in common,

keeping a first subset of words comprised in the first sentence, wherein the words are from the start of this first sentence up to the beginning of the common phrase or n-gram in this first sentence,

keeping a second subset of words comprised in the second sentence, wherein the words are from the end of the common phrase or n-gram in this second sentence to the end of this second sentence,

composing a CAPTCAH sentence by concatenating in the order the first subset of words, the common phrase or n-gram and then the second subset of words. 16

3.

The method according to claim 2, wherein the first sentence is selected from a private corpus of computer coded text while the second sentence is selected either from a private corpus of computer coded text or a public corpus of computer coded text.

4.

The method according to claim 2 and 3, wherein a plurality of phrases or n-grams are selected in the first sentence, and which then is combined with a plurality of second sentences comprising each a respective phrase or sentence selected in the first sentence.

5.

The method according to claim 2, wherein the first sentence has a number of words above a predefined threshold level.

6.

The method according to claim 2, wherein a phrase or n-gram is selected in the first sentence as being a predefined number of words located in the center of the first sentence.

7.

The method according to claim 6, wherein the selected phrase or n-gram is tested to identify if the frequency of this phrase or n-gram is above a predefined threshold level in natural language corpus, and if the selected phrase or n-gram is identified to be below the predefined threshold, an new phrase or n-gram is selected by shifting the position of the first selected phrase or n-gram either one position towards the beginning of the sentence, or towards the end of the sentence, and then testing the frequency of this selection, continuing shifting the location of the phrase or n-gram until a phrase or n- gram that is above the predefined threshold level is identified.

8.

The method according to claim 2, wherein the identified phrase or n-gram is from a list of preselected phrases or n-grams.

9. The method according to claim 2, wherein the list of preselected phrases or n-grams comprises words that implies deduction or a deductive context in the sentence. 17

10.

The method according to claim 9, wherein the list of preselected phrases or n-grams comprises: "because of this", consequently", implicating", "thus", "therefore", etc.

11.

The method according to claim 2, wherein the first sentence is selected form a first corpus of text related to a subject matter that is different form a subject matter of a second corpus used to select the second sentence from.

12.

The method according to claim 11, wherein natural joining words such as and, or is used as a phrase or n-gram.

13. The method according to claim 2, further comprising filtering the CAPTCHA sentence with a grammar checker, and discarding any CAPTCHA sentence not passing the grammar checker.

14. The method according to claim 2, further comprising substituting a selected phrase or n- gram with words from a predefined list of substitutions.

15.

The method according to claim 2, further comprising substituting at least one word in a selected phrase or n-gram with a randomly selected word.

16.

The method according to claim 15, comprising substituting a phrase or n-gram with its antonym.

17.

The method according to claim 2, wherein the step of composing the CAPTCHA sentence comprises using a previously composed CAPTHCA sentence, and then composing a new CAPTCHA sentence based on the previously composed CAPTCHA sentence by either substituting the first subset of words with another subset of words, or by substituting the second subset of words with another subset of words, or by substituting the common phrase or n-gram with another phrase or n-gram. 18

18.

The method according to claim 2, wherein the step of composing a CAPTCHA sentence further comprises keeping a composed CAPTCHA sentence and then identifying another common phrase or n-gram in the second subset of words with other sentences in the at least one dataset, and then use the other steps wherein the kept composed CAPTCHA sentence is treated as the first sentence while at least one of the other sentences that has the other selected common phrase or n-gram is treated as the second sentence.

19.

The method according to claim 18, wherein the steps of composing a CAPTCHA sentence is continued iterative a number of predefined times.

20.

The Method according to claim 1, further comprising the steps of presenting the CAPTCHA to a user on a terminal used by the user on a computer system, wherein the CAPTCHA is presented as part of an information element, wherein the information element initiates actions in the computer system connected to the terminal used by the user and/or by interactive actions performed by the user on the terminal, wherein these initiated actions provides a random delay for transmitting a response from the user to the CAPCHA.

21. The method according to claim 1 , wherein the step of selecting to use a graphical image based CAPTCHA, a lingustic based CAPTCHA or a combination of these CAPTCHAs are provided for as a random selection, or as a selection of one after the other, or as a choice selected by a user via terminal interactions.

22.

Use of a CAPTCHA method according to claim 1 to 21 for visually impaired persons, wherein the terminal used by the user is Brail type of terminal.