[go: up one dir, main page]

AU2002230824A1 - Dynamic-content web crawling through traffic monitoring - Google Patents

Dynamic-content web crawling through traffic monitoring

Info

Publication number
AU2002230824A1
AU2002230824A1 AU2002230824A AU3082402A AU2002230824A1 AU 2002230824 A1 AU2002230824 A1 AU 2002230824A1 AU 2002230824 A AU2002230824 A AU 2002230824A AU 3082402 A AU3082402 A AU 3082402A AU 2002230824 A1 AU2002230824 A1 AU 2002230824A1
Authority
AU
Australia
Prior art keywords
dynamic
traffic monitoring
web crawling
content web
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
AU2002230824A
Inventor
Jacob Green
John Schultz
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Johns Hopkins University
Original Assignee
Johns Hopkins University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Johns Hopkins University filed Critical Johns Hopkins University
Publication of AU2002230824A1 publication Critical patent/AU2002230824A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/564Enhancement of application control based on intercepted application data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/30Definitions, standards or architectural aspects of layered protocol stacks
    • H04L69/32Architecture of open systems interconnection [OSI] 7-layer type protocol stacks, e.g. the interfaces between the data link level and the physical level
    • H04L69/322Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions
    • H04L69/329Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions in the application layer [OSI layer 7]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99936Pattern matching access

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Transfer Between Computers (AREA)
AU2002230824A 2000-12-15 2001-12-14 Dynamic-content web crawling through traffic monitoring Abandoned AU2002230824A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US25539200P 2000-12-15 2000-12-15
US60255392 2000-12-15
PCT/US2001/048291 WO2002050703A1 (en) 2000-12-15 2001-12-14 Dynamic-content web crawling through traffic monitoring

Publications (1)

Publication Number Publication Date
AU2002230824A1 true AU2002230824A1 (en) 2002-07-01

Family

ID=22968117

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2002230824A Abandoned AU2002230824A1 (en) 2000-12-15 2001-12-14 Dynamic-content web crawling through traffic monitoring

Country Status (3)

Country Link
US (1) US7143088B2 (en)
AU (1) AU2002230824A1 (en)
WO (1) WO2002050703A1 (en)

Families Citing this family (100)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7370120B2 (en) * 2001-12-07 2008-05-06 Propel Software Corporation Method and system for reducing network latency in data communication
US7194464B2 (en) 2001-12-07 2007-03-20 Websense, Inc. System and method for adapting an internet filter
US8527495B2 (en) * 2002-02-19 2013-09-03 International Business Machines Corporation Plug-in parsers for configuring search engine crawler
GB0315155D0 (en) * 2003-06-28 2003-08-06 Ibm Improvements to hypertext request integrity and user experience
GB0315154D0 (en) * 2003-06-28 2003-08-06 Ibm Improvements to hypertext integrity
US7725452B1 (en) * 2003-07-03 2010-05-25 Google Inc. Scheduler for search engine crawler
US8042112B1 (en) 2003-07-03 2011-10-18 Google Inc. Scheduler for search engine crawler
US8655755B2 (en) * 2003-10-22 2014-02-18 Scottrade, Inc. System and method for the automated brokerage of financial instruments
US8954420B1 (en) 2003-12-31 2015-02-10 Google Inc. Methods and systems for improving a search ranking using article information
US20050149498A1 (en) * 2003-12-31 2005-07-07 Stephen Lawrence Methods and systems for improving a search ranking using article information
US7373524B2 (en) 2004-02-24 2008-05-13 Covelight Systems, Inc. Methods, systems and computer program products for monitoring user behavior for a server application
US9558042B2 (en) 2004-03-13 2017-01-31 Iii Holdings 12, Llc System and method providing object messages in a compute environment
US8782654B2 (en) 2004-03-13 2014-07-15 Adaptive Computing Enterprises, Inc. Co-allocating a reservation spanning different compute resources types
US7581227B1 (en) 2004-03-31 2009-08-25 Google Inc. Systems and methods of synchronizing indexes
US7680888B1 (en) 2004-03-31 2010-03-16 Google Inc. Methods and systems for processing instant messenger messages
US8386728B1 (en) 2004-03-31 2013-02-26 Google Inc. Methods and systems for prioritizing a crawl
US8346777B1 (en) 2004-03-31 2013-01-01 Google Inc. Systems and methods for selectively storing event data
US8275839B2 (en) 2004-03-31 2012-09-25 Google Inc. Methods and systems for processing email messages
US8631076B1 (en) 2004-03-31 2014-01-14 Google Inc. Methods and systems for associating instant messenger events
US7412708B1 (en) 2004-03-31 2008-08-12 Google Inc. Methods and systems for capturing information
US8161053B1 (en) 2004-03-31 2012-04-17 Google Inc. Methods and systems for eliminating duplicate events
US7941439B1 (en) 2004-03-31 2011-05-10 Google Inc. Methods and systems for information capture
US7333976B1 (en) 2004-03-31 2008-02-19 Google Inc. Methods and systems for processing contact information
US8099407B2 (en) 2004-03-31 2012-01-17 Google Inc. Methods and systems for processing media files
US7725508B2 (en) 2004-03-31 2010-05-25 Google Inc. Methods and systems for information capture and retrieval
US20070266388A1 (en) 2004-06-18 2007-11-15 Cluster Resources, Inc. System and method for providing advanced reservations in a compute environment
GB2417342A (en) * 2004-08-19 2006-02-22 Fujitsu Serv Ltd Indexing system for a computer file store
US8176490B1 (en) 2004-08-20 2012-05-08 Adaptive Computing Enterprises, Inc. System and method of interfacing a workload manager and scheduler with an identity manager
US7987172B1 (en) * 2004-08-30 2011-07-26 Google Inc. Minimizing visibility of stale content in web searching including revising web crawl intervals of documents
GB2418108B (en) 2004-09-09 2007-06-27 Surfcontrol Plc System, method and apparatus for use in monitoring or controlling internet access
GB2418037B (en) * 2004-09-09 2007-02-28 Surfcontrol Plc System, method and apparatus for use in monitoring or controlling internet access
GB2418999A (en) * 2004-09-09 2006-04-12 Surfcontrol Plc Categorizing uniform resource locators
CA2586763C (en) 2004-11-08 2013-12-17 Cluster Resources, Inc. System and method of providing system jobs within a compute environment
US8863143B2 (en) 2006-03-16 2014-10-14 Adaptive Computing Enterprises, Inc. System and method for managing a hybrid compute environment
WO2006107531A2 (en) 2005-03-16 2006-10-12 Cluster Resources, Inc. Simple integration of an on-demand compute environment
US9015324B2 (en) 2005-03-16 2015-04-21 Adaptive Computing Enterprises, Inc. System and method of brokering cloud computing resources
US9231886B2 (en) 2005-03-16 2016-01-05 Adaptive Computing Enterprises, Inc. Simple integration of an on-demand compute environment
US8782120B2 (en) 2005-04-07 2014-07-15 Adaptive Computing Enterprises, Inc. Elastic management of compute resources between a web server and an on-demand compute environment
EP3203374B1 (en) 2005-04-07 2021-11-24 III Holdings 12, LLC On-demand access to compute resources
US8538969B2 (en) * 2005-06-03 2013-09-17 Adobe Systems Incorporated Data format for website traffic statistics
US20070055768A1 (en) * 2005-08-23 2007-03-08 Cisco Technology, Inc. Method and system for monitoring a server
GB2430507A (en) * 2005-09-21 2007-03-28 Stephen Robert Ives System for managing the display of sponsored links together with search results on a mobile/wireless device
WO2007042840A1 (en) * 2005-10-11 2007-04-19 Taptu Ltd Search using changes in prevalence of content items on the web
US9262446B1 (en) 2005-12-29 2016-02-16 Google Inc. Dynamically ranking entries in a personal data book
US8615800B2 (en) 2006-07-10 2013-12-24 Websense, Inc. System and method for analyzing web content
US8020206B2 (en) 2006-07-10 2011-09-13 Websense, Inc. System and method of analyzing web content
US8595612B1 (en) 2006-10-26 2013-11-26 Hewlett-Packard Development, L.P. Display of web page with available data
US7672943B2 (en) * 2006-10-26 2010-03-02 Microsoft Corporation Calculating a downloading priority for the uniform resource locator in response to the domain density score, the anchor text score, the URL string score, the category need score, and the link proximity score for targeted web crawling
US9654495B2 (en) 2006-12-01 2017-05-16 Websense, Llc System and method of analyzing web addresses
GB2458094A (en) 2007-01-09 2009-09-09 Surfcontrol On Demand Ltd URL interception and categorization in firewalls
GB2445764A (en) 2007-01-22 2008-07-23 Surfcontrol Plc Resource access filtering system and database structure for use therewith
US9189561B2 (en) * 2007-02-10 2015-11-17 Adobe Systems Incorporated Bridge event analytics tools and techniques
US8015174B2 (en) * 2007-02-28 2011-09-06 Websense, Inc. System and method of controlling access to the internet
US7945849B2 (en) * 2007-03-20 2011-05-17 Microsoft Corporation Identifying appropriate client-side script references
US20080235163A1 (en) * 2007-03-22 2008-09-25 Srinivasan Balasubramanian System and method for online duplicate detection and elimination in a web crawler
GB0709527D0 (en) 2007-05-18 2007-06-27 Surfcontrol Plc Electronic messaging system, message processing apparatus and message processing method
US10176258B2 (en) * 2007-06-28 2019-01-08 International Business Machines Corporation Hierarchical seedlists for application data
US20090024556A1 (en) * 2007-07-16 2009-01-22 Semgine, Gmbh Semantic crawler
US7921097B1 (en) * 2007-08-30 2011-04-05 Pranav Dandekar Systems and methods for generating a descriptive uniform resource locator (URL)
US8560692B1 (en) 2007-09-05 2013-10-15 Trend Micro Incorporated User-specific cache for URL filtering
US8838741B1 (en) 2007-09-05 2014-09-16 Trend Micro Incorporated Pre-emptive URL filtering technique
US8041773B2 (en) 2007-09-24 2011-10-18 The Research Foundation Of State University Of New York Automatic clustering for self-organizing grids
US7672938B2 (en) * 2007-10-05 2010-03-02 Microsoft Corporation Creating search enabled web pages
US8572065B2 (en) * 2007-11-09 2013-10-29 Microsoft Corporation Link discovery from web scripts
US8990173B2 (en) * 2008-03-27 2015-03-24 International Business Machines Corporation Method and apparatus for selecting an optimal delete-safe compression method on list of delta encoded integers
AU2009267107A1 (en) 2008-06-30 2010-01-07 Websense, Inc. System and method for dynamic and real-time categorization of webpages
US8296722B2 (en) * 2008-10-06 2012-10-23 International Business Machines Corporation Crawling of object model using transformation graph
US8244224B2 (en) * 2008-11-20 2012-08-14 Research In Motion Limited Providing customized information to a user based on identifying a trend
US9292612B2 (en) 2009-04-22 2016-03-22 Verisign, Inc. Internet profile service
US8521908B2 (en) * 2009-04-07 2013-08-27 Verisign, Inc. Existent domain name DNS traffic capture and analysis
AU2010254269A1 (en) 2009-05-26 2011-12-22 Websense, Inc. Systems and methods for efficient detection of fingerprinted data and information
US10877695B2 (en) 2009-10-30 2020-12-29 Iii Holdings 2, Llc Memcached server functionality in a cluster of data processing nodes
US11720290B2 (en) 2009-10-30 2023-08-08 Iii Holdings 2, Llc Memcached server functionality in a cluster of data processing nodes
US8407247B2 (en) * 2010-02-22 2013-03-26 Kenshoo Ltd. Universal resource locator watchdog
US20110320715A1 (en) * 2010-06-23 2011-12-29 Microsoft Corporation Identifying trending content items using content item histograms
US8914603B2 (en) 2010-07-30 2014-12-16 Motorola Mobility Llc System and method for synching Portable Media Player content with storage space optimization
US9043306B2 (en) 2010-08-23 2015-05-26 Microsoft Technology Licensing, Llc Content signature notification
TW201222315A (en) * 2010-11-22 2012-06-01 Inst Information Industry Web page crawling method, web page crawling device and computer program product thereof
US8880996B1 (en) 2011-07-20 2014-11-04 Google Inc. System for reconfiguring a web site or web page based on real-time analytics data
US9100205B1 (en) 2011-07-20 2015-08-04 Google Inc. System for validating site configuration based on real-time analytics data
US8869036B1 (en) * 2011-07-20 2014-10-21 Google Inc. System for troubleshooting site configuration based on real-time analytics data
US10154076B2 (en) * 2011-10-11 2018-12-11 Entit Software Llc Identifying users through a proxy
US9432444B1 (en) * 2011-11-22 2016-08-30 The Directv Group, Inc. MSS headend caching strategies
US11023536B2 (en) * 2012-05-01 2021-06-01 Oracle International Corporation Social network system with relevance searching
US9330419B2 (en) 2012-05-01 2016-05-03 Oracle International Corporation Social network system with social objects
US8990183B2 (en) 2012-06-06 2015-03-24 Microsoft Technology Licensing, Llc Deep application crawling
US9619845B2 (en) 2012-12-17 2017-04-11 Oracle International Corporation Social network system with correlation of business results and relationships
US9117054B2 (en) 2012-12-21 2015-08-25 Websense, Inc. Method and aparatus for presence based resource management
US9659058B2 (en) 2013-03-22 2017-05-23 X1 Discovery, Inc. Methods and systems for federation of results from search indexing
US9880983B2 (en) * 2013-06-04 2018-01-30 X1 Discovery, Inc. Methods and systems for uniquely identifying digital content for eDiscovery
US9508360B2 (en) 2014-05-28 2016-11-29 International Business Machines Corporation Semantic-free text analysis for identifying traits
US10346550B1 (en) 2014-08-28 2019-07-09 X1 Discovery, Inc. Methods and systems for searching and indexing virtual environments
US9431003B1 (en) 2015-03-27 2016-08-30 International Business Machines Corporation Imbuing artificial intelligence systems with idiomatic traits
CN105956069A (en) * 2016-04-28 2016-09-21 优品财富管理有限公司 Network information collection and analysis method and network information collection and analysis system
CN105956070A (en) * 2016-04-28 2016-09-21 优品财富管理有限公司 Method and system for integrating repetitive records
US10452723B2 (en) 2016-10-27 2019-10-22 Micro Focus Llc Detecting malformed application screens
CN107066576B (en) * 2017-04-12 2019-11-12 成都四方伟业软件股份有限公司 A big data web crawler page selection method and system
US11361076B2 (en) * 2018-10-26 2022-06-14 ThreatWatch Inc. Vulnerability-detection crawler
US12019691B2 (en) 2021-04-02 2024-06-25 Trackstreet, Inc. System and method for reducing crawl frequency and memory usage for an autonomous internet crawler
CN114154043B (en) * 2021-12-07 2025-05-06 深信服科技股份有限公司 Website fingerprint calculation method, system, storage medium and terminal

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6061682A (en) * 1997-08-12 2000-05-09 International Business Machine Corporation Method and apparatus for mining association rules having item constraints
US5987515A (en) * 1997-08-29 1999-11-16 International Business Machines Corporation Internet protocol assists using multi-path channel protocol
US5987471A (en) * 1997-11-13 1999-11-16 Novell, Inc. Sub-foldering system in a directory-service-based launcher

Also Published As

Publication number Publication date
US7143088B2 (en) 2006-11-28
US20040128285A1 (en) 2004-07-01
WO2002050703A1 (en) 2002-06-27

Similar Documents

Publication Publication Date Title
AU2002230824A1 (en) Dynamic-content web crawling through traffic monitoring
AU2001218750A1 (en) Traffic monitoring
AU2001236770A1 (en) Web site for glucose monitoring
AU2001290889A1 (en) Wireless network monitoring
AUPR187100A0 (en) Slope monitoring system
AU2002230906A1 (en) Receiver-autonomous vertical integrity monitoring
AU2001263887A1 (en) Monitoring method
DE60133859D1 (en) monitoring system
AU2001279105A1 (en) Meltblown web
DE10196023T1 (en) Monitoring System
AU2001288989A1 (en) Cardiopulmonary monitoring
AU2001259720A1 (en) Meltblown web
AU3354700A (en) Spunbond web formation
AU2002213656A1 (en) A monitoring system
AU2002212669A1 (en) Sids monitoring system
GB9914812D0 (en) Traffic monitoring
NO20023413D0 (en) Clogging monitoring
AU2001279143A1 (en) Metlblown web
AU2002211376A1 (en) Meltblown web
AU3754101A (en) Monitoring system
AU2001263701A1 (en) Server monitoring
AU2001279939A1 (en) Monitoring system
AU2001267731A1 (en) Monitoring structures
AU2001282337A1 (en) A monitoring system
AU2002211150A1 (en) Traffic signalling