[go: up one dir, main page]

EA201991908A1 - Способ и устройство для компактного представления биоинформационных данных с помощью нескольких геномных дескрипторов - Google Patents

Способ и устройство для компактного представления биоинформационных данных с помощью нескольких геномных дескрипторов

Info

Publication number
EA201991908A1
EA201991908A1 EA201991908A EA201991908A EA201991908A1 EA 201991908 A1 EA201991908 A1 EA 201991908A1 EA 201991908 A EA201991908 A EA 201991908A EA 201991908 A EA201991908 A EA 201991908A EA 201991908 A1 EA201991908 A1 EA 201991908A1
Authority
EA
Eurasian Patent Office
Prior art keywords
data
compact representation
multiple genomic
descriptors
reads
Prior art date
Application number
EA201991908A
Other languages
English (en)
Inventor
Клаудио Алберти
Гиоргио Зоиа
Даниэле Рензи
Мохамед Хосо Балуч
Original Assignee
Геномсыс Са
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from PCT/US2017/017842 external-priority patent/WO2018071055A1/en
Application filed by Геномсыс Са filed Critical Геномсыс Са
Publication of EA201991908A1 publication Critical patent/EA201991908A1/ru

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/50Compression of genetic data
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/58Random or pseudo-random number generators
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B45/00ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/10Ontologies; Annotations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B99/00Subject matter not provided for in other groups of this subclass
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0861Generation of secret information including derivation or calculation of cryptographic keys or passwords
    • H04L9/0866Generation of secret information including derivation or calculation of cryptographic keys or passwords involving user or device identifiers, e.g. serial number, physical or biometrical information, DNA, hand-signature or measurable physical characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/30Public key, i.e. encryption algorithm being computationally infeasible to invert or user's encryption keys not requiring secrecy
    • H04L9/3066Public key, i.e. encryption algorithm being computationally infeasible to invert or user's encryption keys not requiring secrecy involving algebraic varieties, e.g. elliptic or hyper-elliptic curves
    • H04L9/3073Public key, i.e. encryption algorithm being computationally infeasible to invert or user's encryption keys not requiring secrecy involving algebraic varieties, e.g. elliptic or hyper-elliptic curves involving pairings, e.g. identity based encryption [IBE], bilinear mappings or bilinear pairings, e.g. Weil or Tate pairing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2209/00Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
    • H04L2209/30Compression, e.g. Merkle-Damgard construction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2209/00Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
    • H04L2209/34Encoding or coding, e.g. Huffman coding or error correction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2209/00Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
    • H04L2209/88Medical equipments

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • General Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Data Mining & Analysis (AREA)
  • Epidemiology (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Computer Security & Cryptography (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Pure & Applied Mathematics (AREA)
  • Signal Processing (AREA)
  • Genetics & Genomics (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Algebra (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)

Abstract

Способ и устройство для сжатия данных геномной последовательности, созданных секвенаторами генома. Прочтения последовательности кодируют путем выравнивания их относительно ранее существующих или построенных референсных последовательностей, причем процесс кодирования состоит из классифицирования прочтений в классы данных с последующим кодированием каждого класса посредством множества блоков дескрипторов. Для каждого класса данных, на которые разбивают данные, и каждого соответствующего блока дескрипторов используют специальные модели источников и энтропийные кодеры.
EA201991908A 2017-02-14 2018-02-14 Способ и устройство для компактного представления биоинформационных данных с помощью нескольких геномных дескрипторов EA201991908A1 (ru)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
PCT/US2017/017842 WO2018071055A1 (en) 2016-10-11 2017-02-14 Method and apparatus for the compact representation of bioinformatics data
PCT/US2017/041591 WO2018071080A2 (en) 2016-10-11 2017-07-11 Method and systems for the representation and processing of bioinformatics data using reference sequences
PCT/US2018/018092 WO2018152143A1 (en) 2017-02-14 2018-02-14 Method and apparatus for the compact representation of bioinformatics data using multiple genomic descriptors

Publications (1)

Publication Number Publication Date
EA201991908A1 true EA201991908A1 (ru) 2020-01-21

Family

ID=68609803

Family Applications (1)

Application Number Title Priority Date Filing Date
EA201991908A EA201991908A1 (ru) 2017-02-14 2018-02-14 Способ и устройство для компактного представления биоинформационных данных с помощью нескольких геномных дескрипторов

Country Status (10)

Country Link
EP (1) EP3583500A4 (ru)
KR (1) KR102733786B1 (ru)
AU (1) AU2018221458B2 (ru)
CA (1) CA3052824A1 (ru)
EA (1) EA201991908A1 (ru)
IL (1) IL268651A (ru)
MX (1) MX2019009680A (ru)
SG (1) SG11201907418YA (ru)
WO (1) WO2018152143A1 (ru)
ZA (1) ZA201905921B (ru)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110189830B (zh) * 2019-05-24 2021-06-08 杭州火树科技有限公司 基于机器学习的电子病历词库训练方法
EP3896698A1 (en) 2020-04-15 2021-10-20 Genomsys SA Method and system for the efficient data compression in mpeg-g
KR102497634B1 (ko) * 2020-12-21 2023-02-08 부산대학교 산학협력단 문자 빈도 기반 서열 재정렬을 통한 fastq 데이터 압축 방법 및 장치
CN116206687A (zh) * 2022-12-30 2023-06-02 深圳百人科技有限公司 一种模糊匹配的k-mer编码方式

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1383911A4 (en) * 2001-04-02 2004-12-15 Cytoprint Inc METHOD AND APPARATUS FOR DISCOVERING, IDENTIFYING AND COMPARING BIOLOGICAL ACTIVITY MECHANISMS
US7698067B2 (en) * 2002-02-12 2010-04-13 International Business Machines Corporation Sequence pattern descriptors for transmembrane structural details
US7809765B2 (en) * 2007-08-24 2010-10-05 General Electric Company Sequence identification and analysis
KR101922129B1 (ko) * 2011-12-05 2018-11-26 삼성전자주식회사 차세대 시퀀싱을 이용하여 획득된 유전 정보를 압축 및 압축해제하는 방법 및 장치
US9679104B2 (en) * 2013-01-17 2017-06-13 Edico Genome, Corp. Bioinformatics systems, apparatuses, and methods executed on an integrated circuit processing platform
CN103336916B (zh) * 2013-07-05 2016-04-06 中国科学院数学与系统科学研究院 一种测序序列映射方法及系统
US10902937B2 (en) * 2014-02-12 2021-01-26 International Business Machines Corporation Lossless compression of DNA sequences

Also Published As

Publication number Publication date
KR102733786B1 (ko) 2024-11-26
AU2018221458A1 (en) 2019-10-03
EP3583500A1 (en) 2019-12-25
AU2018221458B2 (en) 2022-12-08
KR20190113971A (ko) 2019-10-08
EP3583500A4 (en) 2020-12-16
CA3052824A1 (en) 2018-08-23
SG11201907418YA (en) 2019-09-27
WO2018152143A1 (en) 2018-08-23
ZA201905921B (en) 2021-05-26
IL268651A (en) 2019-10-31
MX2019009680A (es) 2019-10-09
NZ757185A (en) 2021-05-28

Similar Documents

Publication Publication Date Title
PH12019501879A1 (en) Method and apparatus for the compact representation of bioinformatics data using multiple genomic descriptors
EA201991908A1 (ru) Способ и устройство для компактного представления биоинформационных данных с помощью нескольких геномных дескрипторов
EP3944195A4 (en) METHOD OF ENCODING THREE-DIMENSIONAL DATA, METHOD OF DECODING OF THREE-DIMENSIONAL DATA, DEVICE FOR ENCODING OF THREE-DIMENSIONAL DATA, AND DECODER OF DECODING OF THREE-DIMENSIONAL DATA
MX2017012060A (es) Derivacion de informacion de movimiento para sub-bloques en codificacion de video.
MX2024009494A (es) Metodo de codificacion de datos tridimensionales, metodo de decodificacion de datos tridimensionales, dispositivo de codificacion de datos tridimensionales y dispositivo de decodificacion de datos tridimensionales.
MY189223A (en) Apparatus and method for encoding or decoding a multi-channel signal using a broadband alignment parameter and a plurality of narrowband alignment parameters
ZA201906992B (en) A communication method and apparatus
MX2019004125A (es) Estructuras eficientes de datos para la representacion de informacion bioinformatica.
EP3985613A4 (en) Three-dimensional data encoding method, three-dimensional data decoding method, three-dimensional data encoding device, and three-dimensional data decoding device
EP4325727A3 (en) Data processing method and device
SA519401514B1 (ar) طريقة وجهاز لضغط تمثيل بيانات المعلومات البيولوجية
MX364028B (es) Aparato y metodo de procesamiento de imagenes.
MY190014A (en) Data compression
EA201991906A1 (ru) Способ и системы для восстановления геномных референсных последовательностей из сжатых прочтений геномной последовательности
MX2019004131A (es) Metodo y aparato para el acceso a datos bioinformaticos estructurados en unidades de acceso.
SG11201906107QA (en) Data processing method, and terminal device, and network device
EA201991907A1 (ru) Способ и системы для эффективного сжатия прочтений геномной последовательности
RU2014145618A (ru) Устройство кодирования изображения, способ кодирования изображения и программа кодирования изображения, а также устройство декодирования изображения, способ декодирования изображения и программа декодирования изображения
TW201612895A (en) Method and apparatus for coding or decoding subband configuration data for subband groups
AR110436A1 (es) Método de codificación de vídeo, método de decodificación de vídeo, dispositivo de codificación de vídeo y dispositivo de decodificación de vídeo
AR107411A1 (es) Aparato y método para codificar o decodificar una señal multi-canal utilizando repetición de muestreo de dominio espectral
MY189399A (en) Method and device for encoding video having block size set for each block shape, and method and device for decoding video
TH1901007951A (th) วิธีการและชุดเครื่องเข้ารหัสเชิงขั้วอุปกรณ์แบบไร้สายและสื่อที่อ่านได้ด้วยคอมพิวเตอร์
TH1701007730A (th) เครื่องมือและวิธีการต่างๆ เพื่อทำการเข้ารหัส หรือทำการถอดรหัสสัญญาณ ชนิดหลายช่องสัญญาณความถี่เสี่ยงชนิดหนึ่ง โดยใช้การชักตัวอย่างซ้ำในโดเมนเชิงสเปกตรัม
PE20191228A1 (es) Metodo y aparato para representacion compacta de datos bioinformaticos