[go: up one dir, main page]

WO2025090677A1 - Biological computing methods and systems for analyzing biological units - Google Patents

Biological computing methods and systems for analyzing biological units Download PDF

Info

Publication number
WO2025090677A1
WO2025090677A1 PCT/US2024/052668 US2024052668W WO2025090677A1 WO 2025090677 A1 WO2025090677 A1 WO 2025090677A1 US 2024052668 W US2024052668 W US 2024052668W WO 2025090677 A1 WO2025090677 A1 WO 2025090677A1
Authority
WO
WIPO (PCT)
Prior art keywords
cus
sample
biological molecules
sbe
sample objects
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2024/052668
Other languages
French (fr)
Other versions
WO2025090677A9 (en
Inventor
Daniel GEORGIEV
Hynek KASL
Martin CIENCIALA
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xeno Cell Innovations SRO
Original Assignee
Xeno Cell Innovations SRO
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xeno Cell Innovations SRO filed Critical Xeno Cell Innovations SRO
Publication of WO2025090677A1 publication Critical patent/WO2025090677A1/en
Publication of WO2025090677A9 publication Critical patent/WO2025090677A9/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/30Unsupervised data analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0895Weakly supervised learning, e.g. semi-supervised or self-supervised learning

Definitions

  • the present disclosure provides innovative methods for detecting specific characteristics or markers within an input sample.
  • the methods, systems, and kits provided herein involves the use of computing units (CUs) engineered to interact with biological molecules within the sample. These CUs capture and concentrate the target molecules by displaying surface-bound entities (SBEs) upon interaction, and the subsequent measurement of the amount of bound biological molecules enables the precise detection of target-specific output signals.
  • CUs computing units
  • SBEs surface-bound entities
  • a method of detecting presence of one or more target specific output signals indicative of a characteristic of an input sample including: (a) contacting a plurality of sample objects derived from the input sample with a plurality of computing units (CUs), wherein a sample object of the plurality of sample objects can include a plurality of biological molecules, wherein each CU of the plurality of CUs can be engineered to interact with the plurality of sample objects and display a first set of at least one surface-bound entity (SBE) on its surface only upon interacting with the plurality of sample objects, and wherein the first set of at least one SBE can bind to the plurality of biological molecules; (b) lysing the plurality of sample objects to release the plurality of biological molecules such that the plurality of biological molecules can be released from the plurality of sample objects and binds to the first set of at least one SBE; and (c) detecting presence of the one or more target specific output signals indicative of the characteristic of the
  • a first CU of the plurality of CUs can interact with a first sample object of the plurality of sample objects a first number of times, the first CU can interact with a second sample object of the plurality of sample objects a second number of times, a second CU of the plurality of CUs can interact with the first sample object a third number of times, and the second CU can interact with the second sample object a forth number of times.
  • the plurality of biological molecules can be assigned to one or more hash group using the first set of at least one SBE.
  • the first number, the second number, the third number, and the forth number of interactions can be utilized to evaluate the one or more target specific output signals associated with the plurality of sample objects.
  • Also provided herein is a method of detecting presence of one or more target specific output signals indicative of a characteristic of an input sample via multiplexed hashing, the method including: (a) contacting a plurality of sample objects derived from the input sample with a plurality of CUs, wherein a sample object of the plurality of sample objects can include a plurality of biological molecules, wherein each CU of the plurality of CUs can be engineered to interact with the plurality of sample objects and display a first set of at least one SBE on its surface, wherein the plurality of CUs can be partitioned, such that a first CU of the plurality of CUs can interact with a first sample object of the plurality of sample objects a first number of times, the first CU can interact with a second sample object of the plurality of sample objects a second number of times, a second CU of the plurality of CUs can interact with the first sample object a third number of times, and the second CU can interact with the second sample
  • At least two of the plurality of sample objects can be bound to each other.
  • the each CU interacting with the plurality of sample objects can depend on the binding of the at least two of the plurality of sample objects.
  • the plurality of biological molecules bound to the first set of at least one SBE can be washed to remove any unbound biological molecules before step (c).
  • the first CU can display the first set of at least one SBE that is different from what the second CU displays on its surface.
  • at least one of the first number, the second number, the third number, and the forth number can be one.
  • At least one of the first number, the second number, the third number, and the forth number can be greater than one. In some embodiments, at least two of the first number, the second number, the third number, and the forth number can be the same. In some embodiments, at least two of the first number, the second number, the third number, and the forth number can be different. In some embodiments, the first number and the third number can be indicative of a characteristic of the first sample object, and the second number and the forth number can be indicative of a characteristic of the second sample object.
  • a plurality of molecular tags can be added after lysing in step (b).
  • the each CU of the plurality of CUs can further display a second set of at least one SBE.
  • the plurality of molecular tags can associate with the second set of at least one SBE.
  • the second set of at least one SBE can be incapable of binding to the plurality of biological molecules.
  • the plurality of molecular tags that do not associate with the second set of at least one SBE can be removed by washing.
  • a first molecular tag of the plurality of molecular tags can associate uniquely with a first SBE of the second set of at least one SBE, and a second molecular tag of the plurality of molecular tags can associate uniquely with a second SBE of the second set of at least one SBE.
  • the plurality of molecular tag and the second set of at least one SBE can be associated either before or after the each CU can interact with the sample object.
  • a molecular tag of the plurality of molecular tags can include a hash element and a priming element.
  • the priming element can be a random N-mer.
  • the hash element can include a unique molecular identifier (UMI).
  • the molecular tag can further include a recognition element.
  • the molecular tag can further include a sequencing element.
  • the molecular tag can further include a unique protein binding domain.
  • the unique protein binding domain can be an antibody binding domain.
  • the antibody binding domain can be protein G.
  • the plurality of biological molecules can be RNA, or protein.
  • the plurality of biological molecules can be RNA.
  • any of the methods provided herein, wherein the biological molecules can be RNA can further include (1) poly(A) priming of the RNA to the priming element of the molecular tag, and (2) reverse transcribing the RNA; after step (b) and before step (c).
  • the plurality of biological molecules can be protein.
  • the protein can include a barcode.
  • the protein can be an antibody with the barcode.
  • any of the methods provided herein, wherein the biological molecules can be protein can further include (1) capturing the protein via the sequencing element or the unique protein binding domain, (2) priming the barcode via the priming element, and (3) extending the priming element with a strand displacing polymerase; after step (b) and before step (c).
  • any of the methods provided herein, wherein the biological molecules can be protein can further include (i) capturing the protein via the unique protein binding domain, (ii) releasing the protein from the molecular tag; after step (b) and before step (c).
  • a protein-molecular tag complex can be optionally stabilized via crosslinking.
  • the cross-linking can be via an amine-reactive cross-linker.
  • the measuring in step (c) can further include determining an amount of the molecular tag associated with the plurality of biological molecules.
  • the determining can be performed using qPCR, sequencing, gel electrophoresis, isothermal amplification, ELISA, or mass spectrometry.
  • the sequencing can be Next-Generation sequencing or Sanger sequencing.
  • any of the methods provided herein can further comprising (3) computing the amount of the molecular tag such that the plurality of biological molecules in the first sample object can be differentiated from the plurality of biological molecules in the second sample object.
  • the plurality of sample objects in contact with the plurality of CUs in step (a) can be incubated for a sufficient amount of time in an incubator for the each CU to display the first set of at least one SBE and the second set of at least one SBE on its surface.
  • the plurality of sample objects in contact with the plurality of CUs can be incubated in a crowding agent.
  • the crowing agent can be a hydrogel.
  • the interaction between the plurality of CUs and the plurality of sample objects can include at least one logical operator module.
  • a logical operator module of the at least one logical operator module can include the sample object and two or more CUs of the plurality of CUs.
  • the at least one logical operator module can generate one or more output signals.
  • the at least one logical operator module can include a YES gate, an ND gate, a NAND gate, an OR gate, a NOR gate, a XOR gate, a XNOR gate, a NOT gate, or any combination thereof, wherein the two or more CUs comprise a first CU and a second CU.
  • the YES gate can include generating the one or more output signals only when both the first CU and the second CU can be bound to the sample object.
  • the AND gate can include generating the one or more output signals only when both the first CU and the second CU can be bound to the sample object.
  • the NAND gate can include suppressing or diminishing the one or more output signals when both the first CU and the second CU can be bound to the sample object.
  • the OR gate can include generating the one or more output signals when either the first CU or the second CU or when both the first CU and the second CU can be bound to the sample object.
  • the NOR gate can include generating the one or more output signals when both the first CU and the second CU cannot bound to the sample object.
  • the XOR gate can include generating the one or more output signals when either the first CU or the second CU but not both CUs can be bound to the sample object.
  • the XNOR gate can include generating the one or more output signals when either both the first CU and the second CU or when both the first CU and the second CU can be bound to the sample object.
  • the NOT gate can include suppressing or diminishing the one or more output signals when the first CU can be bound to the sample object.
  • the one or more output signals can be display of the first set of at least one SBE.
  • a system for biological computing including: (a) a plurality of sample objects, wherein a sample object of the plurality of sample objects can include a plurality of biological molecules; and (b) a plurality of CUs, wherein a CU of the plurality of CUs can be engineered to interact with the sample object and display a first set of at least one SBE upon interacting with the sample object.
  • a system for biological computing including: (a) a plurality of sample objects, wherein a sample object of the plurality of sample objects can include a plurality of biological molecules; and (b) a plurality of CUs, wherein a CU of the plurality of CUs can be engineered to interact with the sample object and display a first set of at least one SBE.
  • a first CU of the plurality of CUs can interact with a first sample object of the plurality of sample objects a first number of times, the first CU can interact with a second sample object of the plurality of sample objects a second number of times, a second CU of the plurality of CUs can interact with the first sample object a third number of times, and the second CU can interact with the second sample object a forth number of times.
  • the first number and the third number can be indicative of a characteristic of the first sample object, and the second number and the forth number can be indicative of a characteristic of the second sample object.
  • the CU of the plurality of CUs can further display a second set of at least one SBE.
  • any of the systems provided herein can further include (c) a plurality of molecular tags.
  • the plurality of molecular tags can associate with the second set of at least one SBE.
  • a molecular tag of the plurality of molecular tags can include a hash element and a priming element.
  • the priming element can be a random N-mer.
  • the molecular tag can further include a recognition element.
  • the molecular tag can further include a sequencing element.
  • the molecular tag can further include a unique protein binding domain. In some embodiments, the unique protein binding domain can be an antibody binding domain.
  • the antibody binding domain can be protein G.
  • any of the system provided herein can further include a strand displacing polymerase.
  • the CU of the plurality of CUs can be engineered to display the first set of at least one SBE and the second set of at least one SBE.
  • a first molecular tag of the plurality of molecular tags can associate uniquely with a first SBE of the second set of at least one SBE, and a second molecular tag of the plurality of molecular tags can associate uniquely with a second SBE of the second set of at least one SBE.
  • kits for biological computing including (a) a plurality of CUs, wherein a CU of the plurality of CUs can be engineered to interact with the sample object and display a first set of at least one SBE upon interacting with the sample object; and (b) an instruction for use of the kit.
  • kit for biological computing the kit including (a) a plurality of CUs, wherein a CU of the plurality of CUs can be engineered to interact with the sample object and display a first set of at least one SBE; and (b) an instruction for use of the kit.
  • any of the kits provided herein can further include (c) a plurality of molecular tags.
  • a molecular tag of the plurality of molecular tags can include a hash element and a priming element.
  • the priming element can be a random N-mer.
  • the molecular tag can further include a recognition element.
  • the molecular tag can further include a sequencing element.
  • the molecular tag can further include a unique protein binding domain.
  • the unique protein binding domain can be an antibody binding domain.
  • the antibody binding domain can be protein G.
  • any of the kits provided herein can further include a strand displacing polymerase.
  • a computer-implemented method including: (a) compiling a measurement matrix Ml comprising from measurements of biological molecules derived from a set of sample objects interacting with CUs, wherein the measurement matrix Ml can be partitioned by molecular tags assigned to the biological molecules; wherein the molecular tags can be assigned in sufficiently different proportions to biological molecules derived from different sample objects, wherein each column of the matrix Ml represents measurements associated with a same molecular tag, and each row represents measurements associated with a same biological molecule; and (b) generating a profile of a subset of sample objects based at least in part on the measurement matrix Ml .
  • the set of sample objects comprises a plurality of classes of sample objects, wherein the plurality of classes is denoted by Bi ⁇ Bl, B2, .. .
  • the profde of a subset of sample objects can be estimated based on the measurements with aid of a machine learning algorithm.
  • the machine learning algorithm comprises neural network algorithm.
  • computing the profde of a subset of sample objects further comprises: (a) computing proportions in which each molecular tag of the molecular tags can be assigned to the biological molecules derived from each sample object class Bi, wherein the proportions can be denoted by an optimal transformation matrix A; and (b) computing the profde of a subset of sample objects based at least in part on the measurement matrix Ml and the optimal transformation matrix A.
  • computing the optimal transformation matrix A comprises an operation that utilizes an optimization algorithm to minimize the absolute difference between transformation matrix A-transformed matrix B and a truncated measurement matrix M.
  • computing the matrix A comprises computing the matrix A by an optimization algorithm, wherein the optimization algorithm can be a linear program that is defined as: minimize ⁇ M — BA ⁇ subject to,
  • matrix A can represent the optimization variable denoting the proportions in which each molecular tag can be assigned to the biological molecules derived from each sample object class Bi, and (d) optimization constraints can be indicative of physical limitations, with the constrain
  • x c being optional and can correspond to a case where a number of measured sample objects can be known.
  • the profile can be transcriptomic profile, proteomic profile, or multiomic profile.
  • the profile can include probabilities specific sample objects that can be bound to each other.
  • a computer-implemented method for analyzing multiplexed hashed measurements of biological molecules comprising: (a) accessing, at one or more computing devices, the multiplexed hashed measurements of biological molecules, (b) utilizing a numerical classification engine stored in one or more memories of the one or more computing devices, to ascertain if the accessed measurements partially derive from a sample object of class Bi; and (c) training the numerical classification engine via supervised learning, to classify the multiplexed hashed measurements into a plurality of classes; wherein the training data comprises multiplexed hashed measurements and the classes of corresponding sample objects from which the measured biological molecules originate, and wherein the training data can be obtained by either direct measurement or being synthesized from single cell molecular data.
  • a computer-implemented method for analyzing multiplexed hashed measurements of biological molecules comprising: (a) accessing, at one or more computing devices, the multiplexed hashed measurements of biological molecules; (b) utilizing a numerical regression engine stored in one or more memories of the one or more computing devices, to determine the number of sample objects of class Bi that generated the accessed measurements; and (c) training the numerical regression engine via supervised learning, to perform regression on the multiplexed hashed measurements; wherein the training data comprises multiplexed hashed measurements, the number and the classes of corresponding sample objects from which the measured biological molecules originate, and wherein the training data can be obtained by either direct measurement or being synthesized from single cell molecular data.
  • Also provided herein is a computer-implemented method for analyzing multiplexed hashed measurements of biological molecules comprising: (a) accessing, at one or more computing devices, multiplexed hashed measurements of biological molecules; (b) determining, using a numerical classification engine stored in one or more memories of the one or more computing devices, whether the accessed measurements in part generate from a sample object described by a set of vectors ⁇ bl, ..., bm ⁇ ; and (c) training, the numerical classification engine using supervised learning, to classify multiplexed hashed measurements into two classes, true or false, based on training data comprising multiplexed hashed measurements and the representative vectors of sample objects from which the measured biological molecules originate; wherein the training data being obtained by either direct measurement or being synthesized from single cell molecular data.
  • Also provided herein is a computer-implemented method for analyzing multiplexed hashed measurements of biological molecules comprising: (a) accessing, at one or more computing devices, multiplexed hashed measurements of biological molecules; (b) determining, using a numerical regression engine stored in one or more memories of the one or more computing devices, the number of sample objects described by a set of vectors ⁇ bl, ..., bm ⁇ that in part generated the accessed measurements; (c) training the numerical classification engine using supervised learning, to performa regression on the multiplexed hashed measurements; wherein training data comprises the multiplexed hashed measurements and the number and representative vectors of sample objects from which the measured biological molecules originate; and wherein the training data being obtained by either direct measurement or being synthesized from single cell molecular data.
  • a method of spatially profding an input sample including a plurality of sample objects including a plurality of biological molecules without establishing a priori spatial relationship between a plurality of molecular tags and the plurality of sample objects including (a) contacting the plurality of sample objects with a plurality of computing units (CUs), wherein each CU of the plurality of CUs can be engineered to display a first set of at least one surface-bound entity (SBE) and a second set of at least one SBE on its surface, wherein the first set of at least one SBE can bind to the plurality of sample objects, and the second set of at least one SBE can be associated with the plurality of molecular tags and can bind to the plurality of biological molecules; and (b) permeabilizing the plurality of sample objects such that the plurality of biological molecules can be released and bind to the second set of at least one SBE associated with the plurality of molecular tags; and (c) establishing a posteriori spatial
  • a method of single-cell sequencing without establishing a priori spatial relationship between a plurality of molecular tags and a plurality of sample objects including a plurality of biological molecules including (a) contacting the plurality of sample objects with a plurality of computing units (CUs), wherein each CU of the plurality of CUs can be engineered to display a first set of at least one surface-bound entity (SBE) and a second set of at least one SBE on its surface, wherein the first set of at least one SBE can bind to the plurality of sample objects, and the second set of at least one SBE can be associated with the plurality of molecular tags and can bind to the plurality of biological molecules; and (b) permeabilizing the plurality of sample objects such that the plurality of biological molecules can be released and bind to the second set of at least one SBE associated with the plurality of molecular tags; and (c) establishing a posteriori spatial relationship between the plurality of molecular tags and
  • the plurality of biological molecules can be RNA.
  • the sequencing can be Next-Generation sequencing or Sanger sequencing.
  • each CU can be associated with the plurality of molecular tags.
  • each CU can be associated with the plurality of molecular tags prior to the contacting the plurality of sample objects with the plurality of CUs.
  • each CU can be associated with the plurality of molecular tags following the contacting the plurality of sample objects with the plurality of CUs.
  • each molecular tag of the plurality of molecular tag can include a barcode and a unique molecular identifier (UMI), and can be associated with the second set of at least one SBE.
  • each molecular tag can further include a sequencing element, a release element, and/or a linker.
  • the release element can release each molecular tag from each CU.
  • the linker can prevent extension.
  • the barcode can be unique to the each CU.
  • each molecular tag can be single-stranded.
  • each molecular tag can include a hairpin structure.
  • each molecular tag can be double-stranded.
  • the second set of at least one SBE can be poly(dT).
  • the barcode can be uniquely assigned to the each CU of the plurality of CUs.
  • the UMI can be uniquely assigned to the each molecular tag.
  • each molecular tag can be associated with at least two CUs of the plurality of CUs.
  • each CU can be engineered to display the second set of at least one SBE upon interacting with at least one sample object of the plurality of sample objects.
  • the second set of at least one SBE can further include a blocking element. In some embodiments, the blocking element can prevent reverse transcription.
  • the blocking element can be removed when the at least one sample object interacts with each CU.
  • the plurality of sample objects can interact with the plurality of CUs via the first set of at least one SBE.
  • the plurality of biological molecules can interact with the plurality of CUs via the second set of at least one SBE.
  • the plurality of CUs bound to each sample object of the plurality of sample objects can provide a physical barrier, preventing diffusion of the plurality of biological molecules upon permeabilizing of the sample object.
  • a first CU of the plurality of CUs can interact with a first sample object of the plurality of sample objects ⁇ al ⁇ number of times, the first CU can interact with a second sample object of the plurality of sample objects ⁇ a2 ⁇ number of times, a second CU of the plurality of CUs can interact with the first sample object ⁇ a3 ⁇ number of times, and the second CU can interact with the second sample object ⁇ a4 ⁇ number of times.
  • the first CU can display the first set of at least one SBE that can be different from what the second CU displays on its surface.
  • At least one of the ⁇ al ⁇ number, the ⁇ a2 ⁇ number, the ⁇ a3 ⁇ number, and the ⁇ a4 ⁇ number can be zero. In some embodiments, at least one of the ⁇ al ⁇ number, the ⁇ a2 ⁇ number, the ⁇ a3 ⁇ number, and the ⁇ a4 ⁇ number can be one. In some embodiments, at least two of the ⁇ al ⁇ number, the ⁇ a2 ⁇ number, the ⁇ a3 ⁇ number, and the ⁇ a4 ⁇ number can be the same. In some embodiments, at least two of the ⁇ al ⁇ number, the ⁇ a2 ⁇ number, the ⁇ a3 ⁇ number, and the ⁇ a4 ⁇ number can be different.
  • the evaluating proximity of each CU of the plurality of CUs can include evaluating proximity between the first CU and the second CU. In some embodiments, the evaluating proximity between the first CU and the second CU can include identifying the plurality of molecular tags associated with the first CU and the second CU. In some embodiments, the evaluating proximity between the first CU and the second CU can include identifying numbers of UMIs linked to the barcode and the plurality of biological molecules.
  • ⁇ al’ ⁇ number of UMIs can be linked to a first barcode and a first biological molecule of the plurality of the biological molecules
  • ⁇ a2’ ⁇ number of UMIs can be linked to the first barcode and a second biological molecule of the plurality of the biological molecules
  • ⁇ a3’ ⁇ number of UMIs can be linked to a second barcode and a third biological molecule of the plurality of the biological molecules
  • ⁇ a4’ ⁇ number of UMIs can be linked to the second barcode and a fourth biological molecule of the plurality of the biological molecules.
  • the first barcode can be unique to the first CU
  • the second barcode can be unique to the second CU.
  • the ⁇ al’ ⁇ number, the ⁇ a2’ ⁇ number, the ⁇ a3’ ⁇ number, and the ⁇ a4’ ⁇ number can be input into the machine learning algorithm for the proximity analysis.
  • the machine learning algorithm can output an adjacency matrix.
  • at least two of the ⁇ al’ ⁇ number, the ⁇ a2’ ⁇ number, the ⁇ a3’ ⁇ number, and the ⁇ a4’ ⁇ number can give an output value of 1 in the adjacency matrix.
  • the output value of 1 can indicate the first CU and the second CU are in spatial proximity.
  • At least two of the ⁇ al’ ⁇ number, the ⁇ a2’ ⁇ number, the ⁇ a3’ ⁇ number, and the ⁇ a4’ ⁇ number can give an output value of 0 in the adjacency matrix.
  • the output value of 0 can indicate the first CU and the second CU are not in spatial proximity.
  • the adjacency matrix can be used to establish a posteriori spatial relationship between the plurality of molecular tags and the plurality of sample objects.
  • a machine learning algorithm that can be trained by a computer implemented method for training a model, wherein the computer implemented method can include: (a) maintaining a dataset including Unique Molecular Identifier (UMI) vectors corresponding to a plurality of sample objects derived from single-cell RNA sequencing (scRNA-seq) samples; (b) generating a plurality of training inputs from the dataset, wherein each training input can include a UMI matrix representing UMI counts for number of interactions between a predetermined number of computing units (CUs) and a plurality of biological molecules, and each training input can be associated with an adjacency matrix representing known proximal relationships among CUs; and (c) training the model by adjusting model parameters to match model outputs to the adjacency matrix associated with the plurality of training inputs, wherein the trained model can generate a model output representing spatial relationships among CUs based on input where a priori spatial relationships between the plurality of biological molecules and the sample objects are not present.
  • UMI Unique Molecular Identifier
  • generating the plurality of training inputs can include: (a) randomly assigning the predetermined number of CUs to the plurality of sample objects to generate an assignment profde, wherein each of the predetermined number of CUs can be assigned to one or more sample objects, and each of the plurality of the sample objects can be assigned with one or more CUs; and (b) sampling the predetermined number of CUs and corresponding biological molecules from the UMI vectors of the dataset to generate training inputs.
  • the assignment profile can correspond with the adjacency matrix associated with the training inputs.
  • each row of the UMI matrix can represent a biological molecule of the plurality of biological molecules
  • each column of the UMI matrix can represent a CU of the predetermined number of CUs.
  • the known proximal relationships among CUs can include know proximal relationships between each pair of the CUs.
  • each training input can be generated from one scRNA-seq sample.
  • the model does not include positional embeddings to allow the trained model to handle input data with varying lengths and structures.
  • the model include a plurality of transformer blocks, and wherein a last transformer block of the plurality of transformer blocks can feature a single attention head.
  • the model can include a sigmoid activation function to allow the trained model to treat each proximal relationship between each pair of the CUs as an independent probability.
  • training the model can further include using a binary cross entropy as a loss function.
  • the computer-implemented method of the present disclosure can further include (a) partitioning the predetermined number of CUs into a plurality of clusters based on the model output representing spatial relationships among CUs, wherein each cluster of CUs comprises CUs that are proximal to one another; and (b) aggregating the UMI vectors of the CUs within each cluster of the plurality of clusters to generate resultant UMI vector for each sample object.
  • the model output representing spatial relationships among CUs can be indicative of proximal relationship among sample objects of the plurality of sample objects.
  • the proximal relationship among sample objects can be derived from spatial relationships among the plurality of clusters of CUs, wherein the spatial relationships among the plurality of clusters of CUs can be derived from the model output representing spatial relationships among CUs.
  • the resultant UMI vector can represent a distribution of the plurality of biological molecules within each sample object.
  • the partitioning the predetermined number of CUs into a plurality of clusters can include agglomerative clustering with full linkage.
  • the number of clusters can be known a priori.
  • the number of clusters can be determined by optimizing a secondary criterion.
  • a relative reconstruction error can be calculated to measure reconstruction quality of the model.
  • the model can include a predetermined number of stacked transformer blocks, wherein the predetermined number can be configurable.
  • each block of a subset of the predetermined number of stacked transformer blocks can have a predefined number of attention heads, wherein the predefined number of attention heads can be configurable.
  • an Adam optimizer with a predetermined learning rate can be used to train the model.
  • the model can be trained for a predetermined steps with a predefined batch size.
  • the machine learning algorithm can be a computer implemented method for training a model, including: (a) maintaining a dataset including Unique Molecular Identifier (UMI) vectors corresponding to each sample object of K number of sample objects derived from single-cell RNA sequencing (scRNA-seq) samples, wherein for sample object k.
  • UMI Unique Molecular Identifier
  • G number of biological molecules can be comprised in the UMI vector; (b) generating a plurality of training inputs from the dataset, wherein each training input comprises a UMI matrix X, wherein each row of the UMI matrix X can represent the G number of biological molecules, and each column of the UMI matrix X can represent A number of computing units (CUs), wherein values in the UMI matrix X indicate the UMI counts that can represent a number of interactions between the ith CU and the jth biological molecules, and each training input can be associated with an adjacency matrix Y representing known proximal relationships among CUs, wherein rows and columns of the adjacency matrix Y can represent the /V number of CUs, and wherein values in the adjacency matrix Y can represent a pairwise proximal relationship between the ith CU and the jth CU; and (c) training the model by adjusting model parameters to match model outputs to the adjacency matrix Y associated with the plurality of training inputs, wherein
  • generating the plurality of training inputs can include: (a) randomly assigning the A number of CUs to the K number of the sample objects to generate an assignment profde, wherein each of the A number of CUs can be assigned to one or more sample objects, and each of the K number of the sample objects can be assigned with one or more CUs; and (b) sampling the A number of CUs and corresponding biological molecules from the UMI vectors of the dataset to generate the training inputs.
  • the assignment profile can correspond with the adjacency matrix Y associated with the training inputs.
  • the number G can be a constant.
  • the number? can be an arbitrary integer.
  • each training input can be generated from one scRNA-seq sample.
  • the model does not include positional embeddings to allow the trained model to handle input data with varying lengths and structures.
  • the model can include a plurality of transformer blocks, and wherein a last transformer block of the plurality of transformer blocks can feature a single attention head.
  • the model can include a sigmoid activation function to allow the trained model to treat the pairwise proximal relationship between the ith CU and the jth CU as an independent probability.
  • training the model can further iclude using a binary cross entropy as a loss function.
  • the computer-implemented method can further include: (a) partitioning the N number of CUs into a plurality of clusters based on the model output representing spatial relationships among CUs, wherein each cluster of the CUs can include CUs that are proximal to one another; and (b) aggregating the UMI vectors of the CUs within each cluster of the plurality of clusters to generate resultant UMI vector for each sample object.
  • the model output representing spatial relationships among CUs can be indicative of proximal relationship among sample objects of the K number of sample objects.
  • the proximal relationship among sample objects can be derived from spatial relationships among the plurality of clusters of CUs, wherein the spatial relationships among the plurality of clusters of CUs can be derived from the model output representing spatial relationships among CUs.
  • the resultant UMI vector can represent a distribution of the G number of biological molecules within each sample object.
  • the partitioning the N number of CUs into a plurality of clusters can include agglomerative clustering with full linkage.
  • the number of clusters can be known a priori.
  • the number of clusters can be determined by optimizing a secondary criterion.
  • a relative reconstruction error ca be calculated to measure reconstruction quality of the model.
  • the model can include a predetermined number of stacked transformer blocks, wherein the predetermined number of configurable.
  • each block of a subset of the predetermined number of stacked transformer blocks can have a predefined number of attention heads.
  • an Adam optimizer with a predetermined learning rate can be used to train the model.
  • the model can be trained for a predetermined steps with a predefined batch size.
  • a computer implemented method for spatially profiling an input sample including a plurality of sample objects including a plurality of biological molecules including: (a) generating a measurement matrix for experimental measurement data of the input sample by aid of computing units (CUs), wherein each row of the measurement matrix represents G number of biological molecules, and each column of the measurement matrix can represent TV number of CUs, wherein values in the measurement matrix indicate Unique Molecular Identifier (UMI) counts that can represent a number of interactions between the ith CU and the jth biological molecules; (b) feeding the measurement matrix into a trained Al model, wherein the Al model can be trained by: (1) obtaining a dataset comprising UMI vectors corresponding to a plurality of sample objects derived from single-cell RNA sequencing (scRNA-seq) samples; (2) generating a plurality of training inputs from the dataset, wherein each training input can include a UMI matrix representing UMI counts for number of interactions between a predetermined number of CUs and a
  • the computer program product can include a non-transient machine-readable medium storing instructions that, when executed by at least one programmable processor, can cause the at least one programmable processor to perform any of the computer-implemented methods disclosed herein.
  • a computer-implemented system for training a model including: (a) at least one programmable processor; and (b) a machine-readable medium storing instructions that, when executed by the at least one processor, cause the at least one programmable processor to perform any of the computer-implemented methods disclosed herein.
  • a system for spatial profiling an input sample including: (a) the input sample including a plurality of sample objects including a plurality of biological molecules; (b) a plurality of CUs, wherein each CU of the plurality of CUs can be engineered to display a first set of at least one surface-bound entity (SBE) and a second set of at least one SBE on its surface; (c) a plurality of molecular tags; and (d) any of the presently described computer implemented methods.
  • the system can further include a sequencing method.
  • kits for spatial profiling an input sample including a plurality of sample objects including a plurality of biological molecules including: (a) a plurality of CUs, wherein each CU of the plurality of CUs can be engineered to display a first set of at least one surface-bound entity (SBE) and a second set of at least one SBE on its surface; (b) a plurality of molecular tags; and (c) an instruction for use of the kit.
  • the kit can further include an instruction for analysis using any of the presently described computer implemented methods.
  • FIG. l is a non-limiting schematic illustration of targeted sequestering of nucleic acids for multiomic data collection using the presently disclosed systems, kits, and methods.
  • FIG. 2 is a non-limiting schematic illustration of inducible targeted sequestering of RNA on the presently disclosed computing units (CUs).
  • FIG. 3A is a schematic illustration of non-limiting object types of FIG. 3E workflow.
  • FIG. 3B is a schematic illustration of non-limiting CU types of FIG. 3E workflow.
  • FIG. 3C is a schematic illustration of non-limiting SBE types of FIG. 3E workflow.
  • FIG. 3D is a schematic illustration of non-limiting other components of FIG. 3E workflow.
  • FIG. 3E is a non-limiting schematic illustration of hashed transcriptomic readout of target objects.
  • the steps of this schematic can be performed as a continuation of the lysis of target cell & incubation step of FIG. 1 with an addition of molecular tags.
  • FIG. 4A is a non-limiting schematic illustration of object types and varying elements.
  • FIG. 4B is a non-limiting schematic illustration of multiplex hashing on mRNA biological molecules. Each sample object interacting with CUs is interacting with a unique combination of CUs.
  • FIG. 4C is a non-limiting schematic illustration of spatial profiling with RNA.
  • FIG. 4D is a non-limiting schematic illustration of object types of FIG. 4E workflow.
  • FIG. 4E is a non-limiting schematic illustration of multiplex hashing on protein biological molecules. Each sample object interacting with CUs is interacting with a unique combination of CUs.
  • FIG. 4F is a non-limiting schematic illustration of spatial profiling with protein.
  • FIG. 5 is a non-limiting schematic illustration of computational analysis on multiplex hashing of the presently disclosed systems, kits, and methods.
  • FTG. 6 is a model architecture schematic for the design and training of transformer models towards the analysis of single cell transcriptomes.
  • the significantly lower number of UMIs per computing unit (95% of computing units have less than 240 UMIs) in comparison to the number of UMIs per sample object is indicative of the low informational content and sparsity of the computing unit UMI vectors.
  • Sample objects exhibiting RRE greater than 50% can, for practical purposes, be considered as failed reconstructions. In this case, the proportion of failed reconstructions was approximately 3.5%.
  • FIG. 8 is visualization of model variables.
  • Al-3 Example model input X projected by tSNE and cell types listed in the inset. The plot shows 150 points corresponding to 150 computing units assigned to 10 sample objects as indicated by the color coding. No separated clusters are visible. Some colocalization of computing units from same sample objects is present with most sample objects spanning the full support.
  • Bl -3 Example embeddings of the last transformer layer projected by tSNE. The embeddings exhibit strong spatial separation that closely mirrors the sample objects.
  • Cl-3 Visualization of the estimated adjacency matrix Y" following agglomerative clustering. Force layout is used to indicate relationships between computing units. Computing units appearing close together are predicted to be assigned to the same sample object.
  • FIG. 9 is cell type confusion matrix.
  • the dataset GSE158055 further comprises a broad set of cell annotations.
  • Confusion matrix indicates the probability a computing unit from a sample object on the y-axis is estimated to be from a sample object on the x-axis.
  • Diagonal components thereby list probabilities that the origin of computing units is estimated correctly within the given cell type.
  • Non-zero, off-diagonal components illustrate mismatches between similar cell types like NK and CD8 T cells.
  • FIG. 10 is the ROC curve of the prediction of edges between Sample Objects. Mark shows the optimal value of P maximizing the accuracy of predicting whether a given pair of sample objects are connected or not.
  • FIG. 11 is the ROC curve of the prediction of edges between Sample Objects. Mark shows the optimal value of P maximizing the accuracy of predicting whether a given pair of sample objects are connected or not. Examples of estimated Sample Object connections comprising various Sample Object Adjacency matrix variants. Left column includes a graph representation of the model output Y. Center and right columns include graph representations of the matrices W and W, respectively.
  • FIG. 12 is a graph illustrating the results of the association described in Example 7.
  • Nodes indicate computing units
  • edges indicate computed associations between computing units
  • the colors correspond to original associations with cells.
  • FIG. 13 is a confusion matrix.
  • FIG. 14 is a diagram of associations, where nodes indicate computing units, edges indicate associations between computing units, and colors correspond to original associations with cells.
  • the bottom diagram indicates the true association of computing units, where computing units associated with the same cell are linked by an edge, and the top diagram indicates the computed associations.
  • FIG. 15 is a diagram of associations, where nodes indicate computing units, edges indicate associations between computing units, and colors correspond to original associations with cells.
  • the bottom diagram indicates the true association of computing units, where computing units associated with the same cell are linked by an edge, and the top diagram indicates the computed associations.
  • FIG. 16A is schematics illustrating non-limiting exemplary structures of the RNA molecular tags on CUs.
  • NGS denotes sequencing handles;
  • VS denotes variable regions containing CU specific barcode;
  • UMI denotes a randomized sequence;
  • dT_N denotes a poly(dT) capture sequence;
  • TSO denotes a template switching oligo sequence.
  • FIG. 16B is schematics illustrating non-limiting exemplary techniques of capturing mRNAs and synthesizing cDNA using the RNA molecular tags on CLTs.
  • FIG. 17A is a graph illustrating the number of genes expressed above threshold.
  • HPA denotes the reference data set obtained from the Human Protein Atlas.
  • Test Data denotes the data set obtained from the presently disclosed system.
  • FIG. 17B is pie charts illustrating detected or not detected genes of varying degrees of abundance.
  • FIG. 17C is a graph illustrating the number of genes detected per read pairs per cell.
  • the presently described systems, kits, and methods contain, inter alia, engineered or synthetic cells that can perform tasks normally executed by complex instruments.
  • the engineered or synthetic cells of the present disclosure can bind to cells in an input sample and evaluate their profiles, allowing analysis of as many cells in a day as the currently available technologies currently do in a year, with unprecedented levels of sensitivity and specificity.
  • Also provided herein are systems, kits, and methods for randomized multiplex hashing of discrete biological units, which allow targeted analysis of gene expression, genotype, haplotype, epigenome, and/or proteome in discrete biological units.
  • the present disclosure can reduce resources required to analyze discrete biological units by targeting capture units to biological units that exhibit specific features, increasing specificity of analyses relative to bulk analysis.
  • the presently described systems, kits, and methods are highly modular. By mixing and matching the types of engineered or synthetic cells, the target profile can be modified or customized as desired by users by using logic gates, details of which are described further below.
  • the adaptability of the present disclosure allows for analysis of anything from e.g., rare epithelial cells in samples of lysed blood to e.g., apoptotic T-cells in primary cell cultures. Various aspects and embodiments of the present disclosure are described in greater details below.
  • the present disclosure leverages natural variations in biological molecule patterns by capturing these biological molecules on computing units (CUs) and using machine learning algorithms to establish spatial relationships between CUs.
  • CUs computing units
  • machine learning algorithms to establish spatial relationships between CUs.
  • barcodes can be efficiently partitioned to CUs, which can interact freely with single cells without requiring a priori barcode-to-cell assignments.
  • This method achieves near 100% cell capture due to the absence of stoichiometric constraints, unlike traditional methods that avoid overloading to prevent artifacts.
  • This is also applicable to multicellular clusters, where CUs can interact freely with suspended clusters, and relationships can be established a posteriori.
  • This approach can provide 3D spatial information about biological molecules and corresponding sample objects (e.g., cells) without needing organized barcode sequences.
  • Technical advantages include the ability to analyze partially dissociated tissues, measure 3D spatial organization, and assess intercellular relationships in a massively parallel manner, surpassing the capabilities of current methods.
  • the words “comprising” (and any form of comprising, such as “comprise” and “comprises”), “having” (and any form of having, such as “have” and “has”), “including” (and any form of including, such as “includes” and “include”) or “containing” (and any form of containing, such as “contains” and “contain”) are inclusive or open-ended and do not exclude additional, unrecited elements or method steps. It is contemplated that any embodiment discussed in this specification can be implemented with respect to any method or composition of the disclosure, and vice versa. Furthermore, compositions of the present disclosure can be used to achieve methods of the present disclosure.
  • fragment refers to any functional fragment, variant, derivative or analog of a polynucleotide, polypeptide or biomolecule that possesses an in vivo or in vitro activity that is characteristic of the polynucleotide, polypeptide or biomolecule.
  • the fragment, variant or analog has a length equal to about 10%, about 20%, about 30%, about 40%, about 50%, about 60%, about 70%, about 80% or about 90% or greater of the length of the polynucleotide, polypeptide, or biomolecule.
  • Functional expression of the fragment or variant can be easily assayed by the person of ordinary skill in the art by testing activity and the ability to manufacture products as described herein.
  • the present disclosure relates to, inter alia, methods for detecting target-specific output signals within an input sample, which enhance accuracy and efficiency in single-cell research.
  • the methods provided herein can consist of contacting sample objects with computing units (CUs) engineered to display surface-bound entities (SBEs) capable of binding biological molecules.
  • the methods provided herein can utilize multiplexed hashing, involves partitioning CUs to interact with specific sample objects multiple times. The SBEs on CUs can bind biological molecules within the sample objects. The presence of targetspecific output signals can then be determined by assigning biological molecules to hash groups based on interactions and using interaction frequencies to evaluate signals associated with sample objects.
  • the present disclosure further relates to methods for targeted analysis of gene expression, genotype, haplotype, epigenome, and/or proteome in discrete biological units.
  • the present disclosure also relates to systems and kits for implementing the methods of the present disclosure.
  • the methods of the present disclosure can reduce resources required to analyze discrete biological units by targeting capture units to biological units that exhibit specific features.
  • the methods of the present disclosure can increase specificity of analyses relative to bulk analysis.
  • the present disclosure can enable analysis of individual biological units by multiplexed hashing.
  • the target profile can be modified and customized in any way the user desires.
  • Modular logical operator modules can also be customized based on the target profde.
  • the logical operator modules can receive one or more input signals (e.g., sample objects) and can generate one or more output signals (e.g., output signal objects (SOs)) based on the specified modules.
  • input signals e.g., sample objects
  • SOs output signal objects
  • the sample objects (derived from the input sample) can be mixed with the Reaction Reagent and Reaction Medium to allow the sample objects to interact with the CUs of the present disclosure, thereby forming one or more computational clusters and generating output signals.
  • the sample objects e.g., cells
  • the engineered cells exit their dormant state and enter an activated state, wherein they can produce easily measureable output signals (e.g., reporter signals).
  • the presence or absence of output signals can be read and/or quantified by use of the readout reagent.
  • the target profile can be modified in any way the user desires by using the logical operator module of the present disclosure.
  • a two-input “AND” gate requires two specific e.g., antigens to both be present on the bound cell to trigger a positive readout.
  • a method of detecting presence of one or more target specific output signals indicative of a characteristic of an input sample including: (a) contacting a plurality of sample objects derived from the input sample with a plurality of computing units (CUs), wherein a sample object of the plurality of sample objects can include a plurality of biological molecules, wherein each CU of the plurality of CUs can be engineered to interact with the plurality of sample objects and can display a first set of at least one surface-bound entity (SBE) on its surface only upon interacting with the plurality of sample objects, and wherein the first set of at least one SBE is capable of binding to the plurality of biological molecules; (b) lysing the plurality of sample objects to release the plurality of biological molecules such that the plurality of biological molecules can be released from the plurality of sample objects and can bind to the first set of at least one SBE; and (c) detecting presence of the one or more target specific output signals indicative of the characteristic
  • a first CU of the plurality of CUs can interact with a first sample object of the plurality of sample objects a first number of times, the first CU can interact with a second sample object of the plurality of sample objects a second number of times, a second CU of the plurality of CUs can interact with the first sample object a third number of times, and the second CU can interact with the second sample object a forth number of times.
  • the plurality of biological molecules can be assigned to one or more hash group using the first set of at least one SBE.
  • the first number, the second number, the third number, and the forth number of interactions can be utilized to evaluate the one or more target specific output signals associated with the plurality of sample objects.
  • Also provided herein is a method of detecting presence of one or more target specific output signals indicative of a characteristic of an input sample via multiplexed hashing, the method including: (a) contacting a plurality of sample objects derived from the input sample with a plurality of CUs, wherein a sample object of the plurality of sample objects can include a plurality of biological molecules, wherein each CU of the plurality of CUs can be engineered to interact with the plurality of sample objects and display a first set of at least one SBE on its surface, wherein the plurality of CUs can be partitioned, such that a first CU of the plurality of CUs can interact with a first sample object of the plurality of sample objects a first number of times, the first CU can interact with a second sample object of the plurality of sample objects a second number of times, a second CU of the plurality of CUs can interact with the first sample object a third number of times, and the second CU can interact with the second sample
  • At least two of the plurality of sample objects can be bound to each other.
  • the each CU interacting with the plurality of sample objects can depend on the binding of the at least two of the plurality of sample objects.
  • the plurality of biological molecules bound to the first set of at least one SBE can be washed to remove any unbound biological molecules before step (c).
  • the first CU can display the first set of at least one SBE that can be different from what the second CU displays on its surface.
  • at least one of the first number, the second number, the third number, and the forth number can be one.
  • At least one of the first number, the second number, the third number, and the forth number can be greater than one. In some embodiments, at least two of the first number, the second number, the third number, and the forth number can be the same. In some embodiments, at least two of the first number, the second number, the third number, and the forth number can be different. In some embodiments, the first number and the third number can be indicative of a characteristic of the first sample object, and the second number and the forth number can be indicative of a characteristic of the second sample object.
  • FIGS. 1-10 Non-limiting embodiments of the presently described methods are illustrated in FIGS.
  • the plurality of biological molecule can be RNA. In some embodiments, the plurality of biological molecule can be protein or peptide.
  • the plurality of sample objects in contact with the plurality of CUs in the contacting step (a) can be incubated for a sufficient amount of time in an incubator for the each CU to display the at least one SBE on its surface.
  • the plurality of sample objects in contact with the plurality of CUs can be incubated in a crowding agent.
  • the crowing agent can be a hydrogel.
  • the crowding agent can include, but not limited to, polyethylene glycol (PEG), sucrose, urea, Ficoll, dextran, cellulose, chitosan, poly(lactic-co-glycolic acid), hydroxypropyl methylcellulose (HPMC), poly(N-isopropylacrylamide) (PNIPAAm), poly(2-hydroxyethyl methacrylate) (PHEMA), poly(vinyl caprolactam) (PVCL), composite synthetic/proteinaceus hydrogel, such as PEG/PVA/PVP+gelatin, biopolymer hydrogel, such as chitosan, hyaluronic acid, silk fibroin, and their functionalized variants, and/or protein, such as BSA.
  • PEG polyethylene glycol
  • sucrose sucrose
  • urea Ficoll
  • Ficoll Ficoll
  • dextran cellulose
  • chitosan poly(lactic-co-glycolic acid)
  • HPMC hydroxypropyl methylcellulose
  • the crowding agent can be temperature responsive, such as, but not limited to, hydroxypropyl methylcellulose (HPMC), poly(N-isopropylacrylamide) (PNIPAAm), poly(2-hydroxyethyl methacrylate) (PHEMA), poly(vinyl caprolactam) (PVCL).
  • HPMC hydroxypropyl methylcellulose
  • PNIPAAm poly(N-isopropylacrylamide)
  • PHEMA poly(2-hydroxyethyl methacrylate)
  • PVCL poly(vinyl caprolactam)
  • the crowding agent can be temperature, pH, and/or osmolarity responsive, such as, but not limited to, composite synthetic/proteinaceus hydrogel (e.g., PEG/PVA/PVP+gelatin) and/or biopolymer hydrogel (e.g., chitosan, hyaluronic acid, silk fibroin, and their functionalized variants).
  • a plurality of sample objects which include various biological molecules, can be contacted with computing units (CUs).
  • Each CU displays surfacebound entities (SBEs) capable of binding to the sample objects and biological molecules.
  • SBEs surfacebound entities
  • the sample objects can be permeabilized to release the biological molecules, which then bind to the SBEs associated with molecular tags.
  • molecular tags can facilitate the establishment of spatial relationships between the molecular tags and the sample objects by evaluating the proximity of CUs using machine learning algorithms.
  • the presently described methods can include steps for reverse transcription of the biological molecules for sequencing, and various aspects of the molecular tags and computing units, such as the use of barcodes, unique molecular identifiers (UMIs), release elements, and linkers are described in detail infra.
  • UMIs unique molecular identifiers
  • machine learning algorithms can be employed to train models that analyze the proximity of CUs and derive spatial relationships among the biological molecules and sample objects.
  • the algorithms can utilize UMI vectors from single-cell RNA sequencing data to generate training inputs, which are used to train the model by matching outputs to known adjacency matrices.
  • the trained model can then be used to process experimental data, generating output matrices that represent spatial relationships among CUs, and aggregating biological molecules bound to proximal CUs to achieve spatial profiling.
  • a first CU of the plurality of CUs can interact with a first sample object of the plurality of sample objects a first number of times, the first CU can interact with a second sample object of the plurality of sample objects a second number of times, a second CU of the plurality of CUs can interact with the first sample object a third number of times, and the second CU can interact with the second sample object a forth number of times.
  • a non-limiting aspect of sample objects interacting with CUs is illustrated in FIGS. 1, 2, 3A-3E, and 4A-4D.
  • the first CU can display the at least one SBE that is different from what the second CU displays on its surface.
  • at least one of the first number, the second number, the third number, and the forth number can be one.
  • at least one of the first number, the second number, the third number, and the forth number can be greater than one.
  • at least two of the first number, the second number, the third number, and the forth number can be the same.
  • at least two of the first number, the second number, the third number, and the forth number can be different.
  • the first number and the third number can be indicative of a characteristic of the first sample object
  • the second number and the forth number can be indicative of a characteristic of the second sample object.
  • the at least one SBE can be two or more SBEs.
  • a plurality of molecular tags can be added after lysing in step.
  • each CU of the plurality of CUs can further display a second set of at least one SBE.
  • the plurality of molecular tags can associate with the second set of at least one SBE.
  • the second set of at least one SBE can be incapable of binding to the plurality of biological molecules.
  • the plurality of molecular tags that do not associate with the second set of at least one SBE can be removed by washing.
  • a first molecular tag of the plurality of molecular tags can associate uniquely with a first SBE of the second set of at least one SBE, and a second molecular tag of the plurality of molecular tags can associate uniquely with a second SBE of the second set of at least one SBE.
  • the plurality of molecular tag and the second set of at least one SBE can be associated either before or after each CU interacts with the sample object.
  • a molecular tag of the plurality of molecular tags can include a hash element and a priming element.
  • the priming element can be a random N-mer.
  • the hash element can include a unique molecular identifier (UMI), molecular barcoding that can provide error correction and increased accuracy during sequencing.
  • the molecular tag can further include a recognition element, a unique sequence for recognizing a specific SBE.
  • the molecular tag can further include a sequencing element.
  • the molecular tag can further include a unique protein binding domain.
  • the unique protein binding domain can be an antibody binding domain.
  • the antibody binding domain can be protein G.
  • the plurality of biological molecules can be DNA, RNA, or protein. In some embodiments, the plurality of biological molecules can be DNA.
  • the plurality of biological molecules can be RNA.
  • any of the methods described herein, wherein the biological molecule is RNA can further include (1) poly(A) priming of the RNA to the priming element of the molecular tag, and (2) reverse transcribing the RNA; after lysing step and before detecting step.
  • the plurality of biological molecules can be protein.
  • the protein can include a barcode.
  • the protein can be an antibody with the barcode.
  • any of the methods described herein, wherein the biological molecule is protein can further include (1) capturing the protein via the sequencing element or the unique protein binding domain, (2) priming the barcode via the priming element, and (3) extending the priming element with a strand displacing polymerase; after lysing step and before detecting step.
  • any of the methods described herein, wherein the biological molecule is protein can further include (1) capturing the protein via the unique protein binding domain, (2) releasing the protein from the molecular tag; after lysing step and before detecting step.
  • a protein-molecular tag complex can be optionally stabilized via crosslinking.
  • the cross-linking can be via an amine-reactive cross-linker.
  • the measuring can further comprises determining an amount of the molecular tag associated with the plurality of biological molecules. [0114] In some embodiments, the determining can be performed using qPCR, sequencing, gel electrophoresis, isothermal amplification, ELISA, or mass spectrometry.
  • the sequencing can be Next-Generation sequencing or Sanger sequencing.
  • in any of the methods described herein can further include (3) computing the amount of the molecular tag such that the plurality of biological molecules in the first sample object can be differentiated from the plurality of biological molecules in the second sample object.
  • the plurality of sample objects in contact with the plurality of CUs in the contacting step can be incubated for a sufficient amount of time in an incubator for the each CU to display the first set of at least one SBE and the second set of at least one SBE on its surface.
  • the sufficient amount of time can be about 2 hours, about 2.1 hours, about 2.2 hours, about 2.3 hours, about 2.4 hours, about 2.5 hours, about 2.6 hours, about 2.7 hours, about 2.8 hours, about 2.9 hours, about 3 hours, about 3.1 hours, about 3.2 hours, about 3.3 hours, about 3.4 hours, about 3.5 hours, about 3.6 hours, about 3.7 hours, about 3.8 hours, about 3.9 hours, about 4 hours, about 4.1 hours, about 4.2 hours, about 4.3 hours, about 4.4 hours, about 4.5 hours, about 4.6 hours, about 4.7 hours, about 4.8 hours, about 4.9 hours, about 5 hours, about 5.1 hours, about 5.2 hours, about 5.3 hours, about 5.4 hours, about 5.5 hours, about 5.6 hours, about 5.7 hours, about 5.8 hours, about 5.9 hours, or about 6 hours.
  • the sufficient amount of time can be about 3 hours. In some embodiments, the sufficient amount of time can be about 3.5 hours. In some embodiments, the sufficient amount of time can be about 4 hours. In some embodiments, the sufficient amount of time can be about 4.5 hours. In some embodiments, the sufficient amount of time can be about 5 hours.
  • the plurality of sample objects in contact with the plurality of CUs can be incubated in a crowding agent. In some embodiments, the crowing agent is a hydrogel.
  • Also provided herein for multiplex hashing is a computer-implemented method, the method including: (a) compiling a measurement matrix Ml including from measurements of biological molecules derived from a set of sample objects interacting with CUs, wherein the measurement matrix Ml can be partitioned by molecular tags assigned to the biological molecules; wherein the molecular tags can be assigned in sufficiently different proportions to biological molecules derived from different sample objects, wherein each column of the matrix Ml can represent measurements associated with a same molecular tag, and each row represents measurements associated with a same biological molecule; and (b) generating a profile of a subset of sample objects based at least in part on the measurement matrix Ml.
  • the set of sample objects can include a plurality of classes of sample objects, wherein the plurality of classes can be denoted by Bz ⁇ Bl, B2, . . .
  • the profile of a subset of sample objects can be estimated based on the measurements with aid of a machine learning algorithm.
  • the machine learning algorithm can include neural network algorithm.
  • computing the profile of a subset of sample objects can further include: (a) computing proportions in which each molecular tag of the molecular tags is assigned to the biological molecules derived from each sample object class Bz, wherein the proportions can be denoted by the optimal transformation matrix A; and (b) computing the profile of a subset of sample objects based at least in part on the measurement matrix Ml and the optimal transformation matrix A.
  • computing the optimal transformation matrix A can include an operation that can utilize an optimization algorithm to minimize the absolute difference between transformation matrix A-transformed matrix B and a truncated measurement matrix M.
  • computing the matrix A can include computing the matrix A by an optimization algorithm, wherein the optimization algorithm can be a linear program that can be defined as: minimize ⁇ M — BA ⁇ subject to, A > 0,
  • matrix M can be compiled from truncated hashed measurements, wherein ith column of M can be a vector of measurements associated with the same molecular tag, and jth row of M corresponds to measurements of the jth biological molecule
  • matrix B can be compiled from the class vectors b t j, wherein columns of B correspond to the vectors and the jth row of B corresponds to the jth biological molecule
  • matrix A can represent the optimization variable denoting the proportions in which each molecular tag can be assigned to the biological molecules derived from each sample object class Bi
  • optimization constraints can be indicative of physical limitations, with the constrain
  • X c being optional and corresponds to a case where a number of measured sample objects is known.
  • the profile can be transcriptomic profile, proteomic profile, or multiomic profile.
  • the profile can include probabilities specific sample objects that can be bound to each other.
  • Also provided herein is a computer-implemented method for analyzing multiplexed hashed measurements of biological molecules, the method including: (a) accessing, at one or more computing devices, the multiplexed hashed measurements of biological molecules, (b) utilizing a numerical classification engine stored in one or more memories of the one or more computing devices, to ascertain if the accessed measurements partially derive from a sample object of class Bi; and (c) training the numerical classification engine via supervised learning, to classify the multiplexed hashed measurements into a plurality of classes; wherein the training data can include multiplexed hashed measurements and the classes of corresponding sample objects from which the measured biological molecules originate, and wherein the training data can be obtained by either direct measurement or being synthesized from single cell molecular data.
  • a computer-implemented method for analyzing multiplexed hashed measurements of biological molecules including: (a) accessing, at one or more computing devices, the multiplexed hashed measurements of biological molecules; (b) utilizing a numerical regression engine stored in one or more memories of the one or more computing devices, to determine the number of sample objects of class Bi that generated the accessed measurements; and (c) training the numerical regression engine via supervised learning, to perform regression on the multiplexed hashed measurements; wherein the training data can include multiplexed hashed measurements, the number and the classes of corresponding sample objects from which the measured biological molecules originate, and wherein the training data can be obtained by either direct measurement or being synthesized from single cell molecular data.
  • a computer-implemented method for analyzing multiplexed hashed measurements of biological molecules including: (a) accessing, at one or more computing devices, multiplexed hashed measurements of biological molecules; (b) determining, using a numerical classification engine stored in one or more memories of the one or more computing devices, whether the accessed measurements in part generate from a sample object described by a set of vectors ⁇ bl, ..., bm ⁇ ; and (c) training, the numerical classification engine using supervised learning, to classify multiplexed hashed measurements into two classes, true or false, based on training data comprising multiplexed hashed measurements and the representative vectors of sample objects from which the measured biological molecules originate; wherein the training data being obtained by either direct measurement or being synthesized from single cell molecular data.
  • Also provided herein is a computer-implemented method for analyzing multiplexed hashed measurements of biological molecules, the method including: (a) accessing, at one or more computing devices, multiplexed hashed measurements of biological molecules; (b) determining, using a numerical regression engine stored in one or more memories of the one or more computing devices, the number of sample objects described by a set of vectors (bl, ..., bm ⁇ that in part generated the accessed measurements; (c) training the numerical classification engine using supervised learning, to performa regression on the multiplexed hashed measurements; wherein training data comprises the multiplexed hashed measurements and the number and representative vectors of sample objects from which the measured biological molecules originate; and wherein the training data being obtained by either direct measurement or being synthesized from single cell molecular data.
  • a computer-implemented method including: (a) compiling a measurement matrix Ml comprising from measurements of biological molecules derived from a set of sample objects interacting with CUs, wherein the measurement matrix Ml can be partitioned by molecular tags assigned to the biological molecules; wherein the molecular tags can be assigned in sufficiently different proportions to biological molecules derived from different sample objects, wherein each column of the matrix Ml represents measurements associated with a same molecular tag, and each row represents measurements associated with a same biological molecule; and (b) generating a profile of a subset of sample objects based at least in part on the measurement matrix Ml .
  • the set of sample objects comprises a plurality of classes of sample objects, wherein the plurality of classes is denoted by Bi ⁇ Bl, B2, .. .
  • the profile of a subset of sample objects can be estimated based on the measurements with aid of a machine learning algorithm.
  • the machine learning algorithm comprises neural network algorithm.
  • computing the profile of a subset of sample objects further comprises: (a) computing proportions in which each molecular tag of the molecular tags can be assigned to the biological molecules derived from each sample object class Bi, wherein the proportions can be denoted by the optimal transformation matrix A; and (b) computing the profile of a subset of sample objects based at least in part on the measurement matrix Ml and the optimal transformation matrix A.
  • computing the optimal transformation matrix A comprises an operation that utilizes an optimization algorithm to minimize the absolute difference between transformation matrix A-transformed matrix B and a truncated measurement matrix M.
  • computing the matrix A comprises computing the matrix A by an optimization algorithm, wherein the optimization algorithm can be a linear program that is defined as: minimize ⁇ M — BA ⁇ subject to, wherein: (a) matrix M can be compiled from truncated hashed measurements, wherein h column of M is a vector of measurements associated with the same molecular tag, and jth row of M corresponds to measurements of the jth biological molecule, (b) matrix /?
  • a computer-implemented method for analyzing multiplexed hashed measurements of biological molecules comprising: (a) accessing, at one or more computing devices, the multiplexed hashed measurements of biological molecules, (b) utilizing a numerical classification engine stored in one or more memories of the one or more computing devices, to ascertain if the accessed measurements partially derive from a sample object of class Bi; and (c) training the numerical classification engine via supervised learning, to classify the multiplexed hashed measurements into a plurality of classes; wherein the training data comprises multiplexed hashed measurements and the classes of corresponding sample objects from which the measured biological molecules originate, and wherein the training data can be obtained by either direct measurement or being synthesized from single cell molecular data.
  • a computer-implemented method for analyzing multiplexed hashed measurements of biological molecules comprising: (a) accessing, at one or more computing devices, the multiplexed hashed measurements of biological molecules; (b) utilizing a numerical regression engine stored in one or more memories of the one or more computing devices, to determine the number of sample objects of class Bi that generated the accessed measurements; and (c) training the numerical regression engine via supervised learning, to perform regression on the multiplexed hashed measurements; wherein the training data comprises multiplexed hashed measurements, the number and the classes of corresponding sample objects from which the measured biological molecules originate, and wherein the training data can be obtained by either direct measurement or being synthesized from single cell molecular data.
  • Also provided herein is a computer-implemented method for analyzing multiplexed hashed measurements of biological molecules, comprising: (a) accessing, at one or more computing devices, multiplexed hashed measurements of biological molecules; (b) determining, using a numerical classification engine stored in one or more memories of the one or more computing devices, whether the accessed measurements in part generate from a sample object described by a set of vectors ⁇ bl, ..., bm ⁇ ; and (c) training, the numerical classification engine using supervised learning, to classify multiplexed hashed measurements into two classes, true or false, based on training data comprising multiplexed hashed measurements and the representative vectors of sample objects from which the measured biological molecules originate; wherein the training data being obtained by either direct measurement or being synthesized from single cell molecular data.
  • Also provided herein is a computer-implemented method for analyzing multiplexed hashed measurements of biological molecules, comprising: (a) accessing, at one or more computing devices, multiplexed hashed measurements of biological molecules; (b) determining, using a numerical regression engine stored in one or more memories of the one or more computing devices, the number of sample objects described by a set of vectors ⁇ bl, bm ⁇ that in part generated the accessed measurements; (c) training the numerical classification engine using supervised learning, to perform a regression on the multiplexed hashed measurements; wherein training data comprises the multiplexed hashed measurements and the number and representative vectors of sample objects from which the measured biological molecules originate; and wherein the training data being obtained by either direct measurement or being synthesized from single cell molecular data.
  • an aspect of presently disclosed methods pertains to a novel method of analyzing sequencing data.
  • the sequencing is single-cell RNA sequencing (scRNAseq).
  • Multiplex hashing and transformation can encompass a series of data processing steps to refine, cluster, and derive meaningful insights from e.g., scRNAseq datasets.
  • the present disclosure further includes a neural network model for training and classification, as well as a transformation matrix computation process that can facilitate the extraction of biological molecule vectors from the data.
  • An aspect of multiplex hashing can comprises several interconnected processes, each contributing to the comprehensive analysis of e.g., scRNAseq data.
  • Neural network processes can begin with the training phase, where the model learns from labeled data, gaining an understanding of patterns that correspond to specific cell clusters or other relevant characteristics. These patterns can encompass gene expression profiles associated with distinct cell types or conditions.
  • the neural network can perform classification tasks on new, unlabeled data points. Leveraging the patterns it has learned, it can assign cells to specific clusters or categories, facilitating the identification of cell types, states, or conditions. This supervised learning approach underpins the neural network's ability to make predictions on new data, even when labels can be absent.
  • neural networks can be designed as deep learning models, featuring multiple hidden layers. This deep architecture can be particularly advantageous for e.g., scRNAseq data analysis, which often involves high-dimensional, intricate data patterns.
  • the disclosed method can be designed for the analysis of multiplexed hashed measurements of biological molecules.
  • the method can include compiling a measurement matrix, denoted as Ml, from measurements of biological molecules.
  • these biological molecules can be derived from a set of sample objects that interact with CUs.
  • the measurement matrix Ml can be partitioned by molecular tags that can be assigned to the biological molecules. These molecular tags can be assigned in sufficiently different proportions to biological molecules derived from different sample objects.
  • each column of the matrix Ml can represent measurements associated with a same molecular tag, and each row can represent measurements associated with a same biological molecule.
  • the method can further include generating a profile of a subset of sample objects. This profile generation is based at least in part on the measurement matrix Ml.
  • the profile can provide a comprehensive representation of the subset of sample objects.
  • the profile can be used to gain insights into the biological characteristics of the sample objects, facilitating further analysis and interpretation of the data.
  • the measurement matrix Ml can be divided based on molecular tags. These tags can be allocated to the biological molecules in a way that guarantees adequate (i.e., sufficient) distinction and/or sufficiently different proportions between molecules originating from different sample objects.
  • the process of assigning molecular tags can facilitate the differentiation and recognition of the biological molecules according to their source and/or origin. This sufficiently different proportions between molecules from different sample objects mathematically can enable the eventual generation of a profile for a subset of sample objects, as described herein elsewhere.
  • the sufficiently different proportions can include where, for example, molecular tags can be assigned to biological molecules derived from sample object A in a proportion of about 60%, and to biological molecules derived from sample object B in a proportion of about 40%.
  • This difference in proportions can allow for the differentiation of the biological molecules based on their source, and subsequently, the generation of distinct profiles for subsets of sample objects.
  • molecular tags can be assigned to biological molecules derived from two different sample objects: A and B.
  • the proportions of molecular tags assigned can vary, for instance, about 70% for A and about 30% for B, or about 65% for A and about 35% for B, or even about 78% for A and about 22% for B. It’s worth noting that these proportions are merely illustrative and do not limit the scope of the technology. The actual proportions can be adjusted based on the specific requirements of the biological analysis.
  • the threshold and/or definition for sufficiently different proportions can vary in different situations and/or cells. As discussed herein elsewhere, these proportions can then be used in the computation of the transformation matrix A, which can encapsulate the proportions in which each molecular tag can be assigned to the biological molecules derived from each sample object class. The transformation matrix A can then be applied to the measurement matrix Ml to generate the profile of a subset of sample objects, providing a comprehensive representation of the subset of sample object. The details for the computation of the transformation matrix A are described herein elsewhere.
  • each row of the matrix Ml can represent measurements associated with a same biological molecule. This can indicate that all measurements within a single row are associated with the same biological molecule, thereby representing the same type of molecular tag. This row-wise organization of the measurements can facilitate the analysis of the data, as it allows for the efficient comparison and differentiation of measurements associated with the same biological molecule.
  • the method can include generating a profile of a subset of sample objects. This profile generation can be based at least in part on the measurement matrix Ml . The profile can provide a comprehensive representation of the subset of sample objects.
  • the profile can be used to gain insights into the biological characteristics of the sample objects, facilitating further analysis and interpretation of the data.
  • a subset of sample objects can be generated, specifically class B sample objects.
  • these class B sample objects can be represented by a matrix denoted as B.
  • Each vector in this set, bi is a mathematical construct that can encapsulate a series of values, each value represented as rli, r2i, . . . rmi.
  • each vector bi in the set can be a representation of a class B sample object.
  • the elements of the vector, rli, r2i, .. . rmi can represent measurements associated with specific biological molecules derived from the class B sample object. These measurements can capture the quantities or proportions of the biological molecules within the sample object, providing a comprehensive representation of its molecular composition.
  • the set of sample objects can comprise a plurality of classes of sample objects.
  • Each class of sample objects can be denoted by Bi ⁇ Bl, B2, ... Bk ⁇ . This notation can signify that there can be multiple classes of sample objects, each class being distinct and represented by a different Bi.
  • Each class Bi of sample objects can be represented by a set of vectors, denoted as ⁇ bi 1 , bi2, biki ⁇ . This set of vectors can provide a mathematical representation of the class Bi sample objects.
  • Each vector in the set, bij is a mathematical construct that can encapsulate a series of values, each value represented as rlij, r2ij, . .. rmij. These values within the vector can be indicative of typical amounts for some of the biological molecules for class Bi sample objects.
  • the computation of the profile of a subset of sample objects can include the computation of proportions in which each molecular tag can be assigned to the biological molecules derived from each sample object class Bi. This computation can enable the differentiation and identification of the biological molecules based on their origin. As discussed herein elsewhere, the assignment of molecular tags can ensure sufficient differentiation between the molecules derived from different sample objects.
  • the proportions and/or optimal proportions in which each molecular tag is assigned to the biological molecules can be denoted by the transformation matrix, referred to as the optimal transformation matrix A or transformation matrix A.
  • the transformation matrix A can be a mathematical construct that can encapsulate the proportions in which each molecular tag can be assigned to the biological molecules derived from each sample object class Bi.
  • Each element of the matrix A can represent a proportion, indicating the extent to which a particular molecular tag can be assigned to the biological molecules derived from a specific sample object class Bi.
  • This matrix A can provide a comprehensive representation of the assignment of molecular tags, capturing the intricate details of the molecular composition and interactions within the sample objects.
  • the computation of the profile of a subset of sample objects can be based at least in part on the measurement matrix Ml and the transformation matrix A.
  • the computation of the profile can include applying the transformation matrix A to the measurement matrix Ml . This applying can transform the measurements into a form that aligns with the proportions in which each molecular tag can be assigned to the biological molecules.
  • an optimization algorithm can be employed to compute the transformation matrix A.
  • This algorithm can align the basis vectors with the corresponding measurements obtained from the scRNAseq data.
  • Optimization algorithms can be mathematical techniques used to find the optimum solution from a set of possible solutions.
  • the optimization algorithm can be employed to compute the transformation matrix A that can minimize the absolute difference between the transformed matrix of basis vectors and the matrix of measurements.
  • One of the objectives of this optimization can be to minimize the absolute difference between the transformation matrix-transformed matrix B and a truncated measurement matrix M.
  • This objective function can quantify the dissimilarity between these two matrices, and the algorithm can seek to find the transformation matrix A that minimizes this dissimilarity.
  • the choice of optimization algorithm can vary depending on the specific problem and the mathematical characteristics of the data. In the subject matter discussed herein, the optimization algorithm can be designed to handle the high-dimensional and complex nature of the biological data, thereby enhancing the accuracy and efficiency of the analysis.
  • the profile generated can provide a comprehensive representation of a subset of sample objects, capturing the intricate details of their molecular composition and interactions.
  • This profile can be of various types, including a transcriptomic profile, a proteomic profile, or a multiomic profile, depending on the specific characteristics of the biological data and the objectives of the analysis.
  • the profile can include probabilities that specific sample objects are bound to each other. These probabilities can provide insights into the likelihood of interactions between different sample objects, facilitating the understanding of their complex biological relationships. These probabilities can be derived from the measurements of biological molecules and the assignment of molecular tags, providing a quantitative measure of the interactions between the sample objects.
  • the method can include a process for analyzing multiplexed hashed measurements of biological molecules.
  • the method can include accessing the multiplexed hashed measurements of biological molecules.
  • the measurements can be derived from a variety of sources, including but not limited to, biological experiments, simulations, and databases.
  • the access to these measurements can be facilitated by various data retrieval techniques, such as database queries, file reading operations, and network communications.
  • the method can further include utilizing a numerical classification or regression engine.
  • This engine can be stored in one or more memories of the one or more computing devices.
  • the classification or regression engine can be a computational model that can be designed to perform classification or regression tasks on the accessed measurements.
  • the classification task can involve assigning each measurement to one of a set of predefined classes, while the regression task involves predicting a continuous output value based on the measurements.
  • the classification or regression engine can utilize various mathematical and statistical techniques to perform these tasks, including but not limited to, decision trees, support vector machines, and neural networks.
  • the numerical classification or regression engine can be trained via supervised learning.
  • Supervised learning is a type of machine learning where the model learns from a set of labeled training data.
  • the training data can comprise multiplexed hashed measurements and the classes or output values of corresponding sample objects from which the measured biological molecules originate.
  • the training data can be obtained by either direct measurement or being synthesized from single cell molecular data.
  • the engine can learn to map the input measurements to the output classes or values by adjusting its internal parameters. This learning process can be typically guided by a loss function, which can quantify the difference between the predicted outputs and the actual outputs. The goal of the training process can be to minimize this loss function, thereby improving the accuracy of the engine’s predictions.
  • the trained numerical classification or regression engine can be used to analyze new multiplexed hashed measurements of biological molecules. For example, the new measurements can be feed into the trained engine, which then outputs a predicted class or value.
  • the initial step in the multiplex hashing and transformation process can involve the refinement of publicly available e.g., scRNAseq datasets.
  • This stage can ensure that the subsequent procedures can be conducted on a dataset that is both manageable and meaningful.
  • the refinement process aims to extract only the most relevant cells and genes from the raw scRNAseq data.
  • scRNAseq datasets often contain a multitude of cells, each expressing a vast number of genes. However, not all of these cells and genes contribute significantly to the objectives at hand. Some cells can be of lesser interest due to various factors, such as low-quality data or cell populations that do not pertain to the specific biological question being addressed.
  • a selection process can be employed. This process can involve various criteria, including the quality of individual cells, gene expression levels, and biological relevance. For instance, cells that exhibit poor data quality or high levels of technical noise can be excluded. Similarly, genes that are not involved in the biological processes under investigation or exhibit minimal variation across cells can be omitted from the refined dataset.
  • the result of this refinement process can be a curated scRNAseq dataset that includes only the cells and genes deemed relevant to the objectives. This refined dataset can serve as the foundation for subsequent analyses, ensuring that computational resources can be allocated efficiently and that the insights gained can be directly pertinent to the specific biological questions being addressed.
  • Cluster analysis can involve grouping cells into clusters, for example, based on various characteristics, including cell gene expression profiles. This can be achieved through unsupervised machine learning algorithms, such as k-means clustering or hierarchical clustering, which identify patterns of similarity or dissimilarity among the cells. Cells that share similar gene expression profiles can be grouped together within the same cluster, while those with distinct profiles can be placed in separate clusters.
  • the outcome of the cluster analysis can be a collection of cell clusters, each cluster representing a unique population of cells with similar gene expression patterns. These clusters can serve as a critical foundation for subsequent analyses and provide insights into the heterogeneity of the cellular population within the scRNAseq dataset.
  • Unsupervised machine learning algorithms can be integral to data analysis, particularly in situations where predefined outcomes or labeled data can be absent. These algorithms can be employed to uncover hidden patterns and structures within a dataset, providing valuable insights into the inherent organization of the data.
  • the unsupervised machine learning algorithm can be clustering algorithm.
  • cluster analysis can be a key application of unsupervised learning. It can involve the grouping of data points into clusters based on similarities or dissimilarities within the data.
  • Unsupervised clustering algorithms such as, but not limited to, k- means clustering and hierarchical clustering, can be utilized. These algorithms can identify patterns of likeness or disparity among cells in the context of scRNAseq data analysis. Cells with analogous gene expression profiles can be assigned to the same cluster, while those with distinct profiles can be grouped separately.
  • the result of cluster analysis can be a set of cell clusters, each representing a distinct population of cells sharing similar gene expression patterns. These clusters can provide a foundational understanding of the cellular heterogeneity present within the scRNAseq dataset.
  • the unsupervised machine learning algorithm can be dimensionality reduction algorithm, encompassing dimensionality reduction techniques.
  • Highdimensional datasets like scRNAseq data, can be challenging to work with directly.
  • Dimensionality reduction algorithms including, but not limited to, principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE), can be employed to reduce the complexity of the dataset while retaining crucial information. These algorithms can enable visualization and analysis of data in lower-dimensional spaces, making it easier to discern underlying patterns and relationships.
  • PCA principal component analysis
  • t-SNE t-distributed stochastic neighbor embedding
  • Basis vectors can be mathematical representations that can capture the characteristic mRNA transcript patterns within each cell cluster. They can serve as a concise way to describe the gene expression profiles specific to each cluster. Each basis vector can correspond to one of the identified clusters.
  • the basis vector for the zth cluster can be a mathematical representation that can encapsulate the numbers of mRNA transcripts characteristic of cells within that cluster. These basis vectors can be instrumental in reducing the dimensionality of the data while retaining crucial information about the gene expression patterns of each cluster.
  • Basis vectors can be computed using various techniques, such as principal component analysis (PCA) or singular value decomposition (SVD). PCA can serve as a linear dimensionality reduction method that can transform high-dimensional data into a lower-dimensional representation while preserving data variance.
  • PCA principal component analysis
  • SVD singular value decomposition
  • This technique can identify principal components, which can be linear combinations of original variables (e.g., gene expression levels) determined by eigenvalues and eigenvectors of covariance matrix of the data. Principal components can be ordered by their variance, with the first component explaining the most variance. By selecting a subset of these components, PCA can effectively reduce data dimensionality while retaining most variability.
  • SVD is a versatile matrix factorization technique employed for dimensionality reduction, data compression, and feature extraction. SVD can decompose a data matrix into three matrices: U, S (sigma), and V A T (the transpose of V), each capturing different data characteristics. S can contain singular values, akin to eigenvalues in PCA, representing component importance.
  • Each basis vector representing distinct cell clusters identified during prior cluster analysis, can be subjected to the scrutiny of the neural network.
  • the primary objective can be to classify each basis vector as either TRUE or FALSE. This classification can be guided by the neural network’s learning and predictive capabilities.
  • the model can leverage the information contained within the truncated multiplexed hashed measurement and the relationships learned during its training to make informed determinations regarding the validity of each basis vector.
  • the neural network can serve as an intelligent classifier, capable of recognizing and validating basis vectors that can be indicative of distinct cell clusters within the scRNAseq dataset.
  • Generation of input data can be characterized by the creation of data points, each comprising a triplet of fundamental components.
  • Simulated multiplex hashed measurements can serve as a condensed representation of the molecular activity within individual cells present in scRNAseq datasets.
  • random linear combinations of mRNA vectors can be calculated, where each vector can represent the expression levels of specific genes.
  • the nominal values of these mRNA vectors can be selected from random subsamples of the refined scRNAseq dataset. This approach can ensure that the simulated measurements can be diverse and representative of the gene expression profiles within the cellular population.
  • the simulated multiplex hashed measurements can capture details while efficiently compressing the data, thereby preserving its biological information.
  • the second component in each data point is the random selection of a basis vector.
  • basis vectors previously computed during the cluster analysis stage, encapsulate the characteristic mRNA transcript counts within distinct cell clusters.
  • the random selection process can introduce variability into the input data, enabling the neural network model to recognize and comprehend patterns across various cell populations.
  • the final element is the computed label, essential for the supervised learning process during neural network training. Determination of the label can hinge on whether the selected basis vector corresponds to a cell cluster associated with one of the selected cells. When such correspondence exists, the label can be designated as TRUE; otherwise, it can be marked as FALSE. This label can serve as the ground truth for the neural network model, enabling it to learn, generalize, and make predictions with accuracy.
  • all basis vectors classified as TRUE can be systematically grouped into a matrix of basis vectors. This grouping can consolidate the basis vectors associated with distinct cell clusters. The resulting matrix can provide a coherent representation of the basis vectors, which can be integral for subsequent data transformation.
  • the truncated multiplexed hashed measurement comprising entries corresponding exclusively to genes used for model training, can be organized into a matrix of measurements. This matrix structure can enhance data organization and prepare the measurements for alignment with the basis vectors.
  • an optimization algorithm can be employed to compute the optimal transformation matrix. The primary objective of this optimization is to minimize the absolute difference between the transformed matrix of basis vectors and the matrix of measurements. This transformation matrix, once computed, can serve as a bridge between the basis vectors and multiplexed hashed measurements. Through an optimization algorithm, the transformation matrix can be tailored to align the basis vectors with the corresponding measurements, facilitating accurate data transformation.
  • the optimization algorithm can play a pivotal role in aligning the basis vectors with the corresponding measurements obtained from scRNAseq data.
  • Optimization algorithms can be mathematical techniques used to find the best solution from a set of possible solutions.
  • the optimization algorithm can be employed to compute the optimal transformation matrix. This matrix can transform basis vectors into a form that can align with the matrix of measurements, facilitating subsequent data analysis.
  • the optimization algorithm can minimize the absolute difference between the transformed matrix of basis vectors and the matrix of measurements. This objective function can quantify the dissimilarity between these two matrices, and the algorithm can find the transformation matrix that can minimize this dissimilarity.
  • the choice of optimization algorithm can vary depending on the specific problem and the mathematical characteristics of the data.
  • the optimization algorithm can be gradient descent, which is an iterative algorithm that can adjust the transformation matrix in small steps to minimize the objective function.
  • the optimization algorithm can be Newton’s method, which is an iterative method that can use second-order derivatives to find the optimal solution.
  • the optimization algorithm can be a quasi-Newton method, which is a variations of Newton’s method that can approximate the Hessian matrix to reduce computational complexity.
  • the optimization algorithm can be conjugate gradient, which is an iterative method suitable for large-scale optimization problems, often used when the objective function is quadratic.
  • the optimization algorithm can be a genetic algorithm, which is an evolutionary algorithm that can mimic the process of natural selection to search for optimal solutions.
  • the optimization algorithm can be simulated annealing, which is a probabilistic optimization algorithm inspired by the annealing process in metallurgy.
  • the optimization algorithm can tailor the transformation matrix to minimize the absolute difference between the transformed basis vectors and the matrix of measurements. By adjusting the elements of the transformation matrix, it can optimize the alignment of these two matrices, ensuring that the data transformation is accurate and meaningful
  • the process of computation of the biological molecule vectors can constitute an important phase within the multiplex hashing and neural network training methodology.
  • This process can involve the computation of biological molecule vectors, offering a comprehensive description of cellular composition and gene expression within the scRNAseq dataset.
  • the non-truncated multiplexed hashed measurements can be systematically organized into a matrix of measurements. This organization can transform the raw data into a structured format conducive to subsequent computations. The resulting matrix can serve as the input data for further analysis.
  • the next step involves the computation of the Moore-Penrose pseudo inverse of the optimal transformation matrix. This mathematical operation can be a pivotal role in data transformation and can ensure the robustness of the subsequent analysis.
  • the Moore-Penrose pseudo inverse can be calculated to facilitate the subsequent steps in the process.
  • the computed Moore-Penrose pseudo inverse can be applied to the matrix of measurements. This step can provide insights into the biological molecules contained within a specified number of cells.
  • the inverse transformation can enable the reconstruction of the biological molecule vectors, which can represent the gene expression profiles and cellular composition within the scRNAseq dataset.
  • the Moore-Penrose pseudo inverse is a mathematical concept used primarily in linear algebra and matrix computations. It can be a generalization of the matrix inverse for matrices that may not have an exact inverse, such as rectangular matrices or matrices that are not full rank. In essence, the pseudo can inverse provide a way to approximate an inverse for a matrix that might not be invertible in the traditional sense. It can be denoted as A + , where A is the matrix for which you want to find the pseudo inverse.
  • the formula for the Moore-Penrose pseudo inverse can depend on the specific method used for its computation, but it can typically involve the singular value decomposition (SVD) of the matrix.
  • the pseudo inverse can be a valuable tool in various fields, including machine learning, signal processing, and scientific computing, where it can be used to handle various matrix- related problems, especially when dealing with non-square or singular matrices.
  • the present disclosure provides methods for single-cell sequencing without establishing a priori spatial relationship between a plurality of molecular tags and a plurality of sample objects comprising biological molecules.
  • the presently described method can begin by contacting the sample objects with a plurality of computing units (CUs).
  • Each CU can be engineered to display two sets of surface-bound entities (SBEs): a first set capable of binding to the sample objects and a second set associated with molecular tags, capable of binding to biological molecules.
  • SBEs surface-bound entities
  • the sample objects can then be permeabilized, allowing the biological molecules to be released and bind to the SBEs associated with the molecular tags.
  • a posteriori spatial relationships between the molecular tags and the sample objects can be established. This can be achieved by evaluating the proximity of each CU with the assistance of a machine learning algorithm, which aggregates the biological molecules bound to proximal CUs.
  • the method can further include reverse transcribing the biological molecules bound to the molecular tags before establishing the spatial relationship.
  • This reverse transcription can be crucial for sequencing, which can be performed using, for example, but not limited to, Next-Generation Sequencing or Sanger Sequencing techniques.
  • each CU can be associated with the molecular tags either before or after contacting the sample objects.
  • the molecular tags can typically include a barcode and a unique molecular identifier (UMI) and can include additional elements such as sequencing elements, release elements, and linkers. These tags can be single-stranded, featuring a hairpin structure, or doublestranded, with the barcode uniquely assigned to each CU and the UMI uniquely assigned to each molecular tag.
  • UMI unique molecular identifier
  • the presently described method can include evaluating the proximity between CUs, where proximity can be inferred from the interaction frequency of CUs with sample objects and the molecular tags.
  • the evaluation can involve identifying UMIs linked to barcodes and biological molecules, with the resulting data input into a machine learning algorithm.
  • the algorithm can output an adjacency matrix, indicating spatial proximity based on UMI linkage data. This matrix can then be used to establish a posteriori spatial relationships between molecular tags and sample objects, enabling detailed single-cell sequencing analysis.
  • a method of single-cell sequencing without establishing a priori spatial relationship between a plurality of molecular tags and a plurality of sample objects comprising a plurality of biological molecules including (a) contacting the plurality of sample objects with a plurality of computing units (CUs), wherein each CU of the plurality of CUs can be engineered to display a first set of at least one surface-bound entity (SBE) and a second set of at least one SBE on its surface, wherein the first set of at least one SBE can bind to the plurality of sample objects, and the second set of at least one SBE can be associated with the plurality of molecular tags and is capable of binding to the plurality of biological molecules; and (b) permeabilizing the plurality of sample objects such that the plurality of biological molecules can be released and bind to the second set of at least one SBE associated with the plurality of molecular tags; and (c) establishing a posteriori spatial relationship between the plurality of mo
  • the presently described methods can further include reverse transcribing the plurality of biological molecules bound to the plurality of molecular tags for sequencing prior to establishing a posteriori spatial relationship.
  • the plurality of biological molecules can be RNA.
  • the sequencing can be Next-Generation sequencing or Sanger sequencing.
  • each CU can be associated with the plurality of molecular tags. In some embodiments, each CU can be associated with the plurality of molecular tags prior to the contacting the plurality of sample objects with the plurality of CUs. In some embodiments, each CU can be associated with the plurality of molecular tags following the contacting the plurality of sample objects with the plurality of CUs. In some embodiments, each molecular tag of the plurality of molecular tag can include a barcode and a unique molecular identifier (UMI), and is associated with the second set of at least one SBE. In some embodiments, each molecular tag can further include a sequencing element, a release element, and/or a linker.
  • UMI unique molecular identifier
  • the release element can release each molecular tag from each CU.
  • the linker can prevent extension.
  • the barcode can be unique to each CU.
  • each molecular tag can be single-stranded.
  • each molecular tag can include a hairpin structure.
  • each molecular tag can be double- stranded.
  • the second set of at least one SBE can be poly(dT).
  • the barcode can be uniquely assigned to the each CU of the plurality of CUs.
  • the UMI can be uniquely assigned to the each molecular tag.
  • each molecular tag can be associated with at least two CUs of the plurality of CUs.
  • each CU can be engineered to display the second set of at least one SBE upon interacting with at least one sample object of the plurality of sample objects.
  • the second set of at least one SBE can further include a blocking element.
  • the blocking element can prevent reverse transcription.
  • the blocking element can be removed when the at least one sample object interacts with each CU.
  • the plurality of sample objects can interact with the plurality of CUs via the first set of at least one SBE.
  • the plurality of biological molecules can interact with the plurality of CUs via the second set of at least one SBE.
  • the plurality of CUs bound to each sample object of the plurality of sample objects can provide a physical barrier, preventing diffusion of the plurality of biological molecules upon permeabilizing of the sample object.
  • the presently described methods leverage spatial fluctuations in the number of biological molecules, such as those caused by Brownian motion and cell signaling, within subcellular portions of a sample to evaluate computing unit (CU) association by proximity. This proximity information can then be utilized to compute target-specific output signals associated with various sample objects, including interactions between sample objects (e.g., cell-cell interactions) and the distribution of biological molecules within individual sample objects (e.g., subcellular resolution of mRNA transcripts).
  • CU computing unit
  • a dataset of UMI vectors can be used for model training.
  • This dataset can comprise millions of cells and thousands of features across samples measured from various patients.
  • the dataset can be split into training, validation, and test sets with no intersection of patients between sets. Top genes with the least sparsity can be retained for further processing.
  • Training inputs can be constructed from UMI vectors of cells randomly selected from the training dataset.
  • Each training input can include UMI vectors of K cells, a one-to-one assignment of N computing units to the K cells, and UMI vectors generated for each computing unit.
  • the adjacency matrix can be used to describe the assignment of computing units to sample objects, ensuring equal probability of selection and independent attribution of UMIs.
  • the model architecture can involve inputting the measurement matrix, applying a log Ip transformation, and reducing dimensionality through a dense layer.
  • the transformed data can be processed through stacked transformer blocks with attention heads and feed-forward layers.
  • the final transformer block’s attention logits can be scaled and subjected to a sigmoid function, producing an output matrix indicating the likelihood of computing unit interactions.
  • the model can be trained using binary cross-entropy loss and an Adam optimizer.
  • the training process can aim to match the output matrix to the training input adjacency matrix.
  • the model output can further be processed to estimate sample object UMI vectors by clustering the computing units and summing the UMI vectors within each cluster.
  • the quality of reconstruction can be assessed using relative reconstruction error (RRE).
  • Additional outputs of the model can include the estimated number of cells, achieved through regression and predictive clustering techniques.
  • the final output can comprise the interaction likelihood matrix and the estimated number of cells.
  • a first CU of the plurality of CUs can interact with a first sample object of the plurality of sample objects ⁇ al ⁇ number of times, the first CU can interact with a second sample object of the plurality of sample objects ⁇ a2 ⁇ number of times, a second CU of the plurality of CUs can interact with the first sample object ⁇ a3 ⁇ number of times, and the second CU can interact with the second sample object ⁇ a4 ⁇ number of times.
  • the first CU can display the first set of at least one SBE that can be different from what the second CU displays on its surface.
  • At least one of the ⁇ al ⁇ number, the ⁇ a2 ⁇ number, the a3 ⁇ number, and the ⁇ a4 ⁇ number can be zero. In some embodiments, at least one of the ⁇ al ⁇ number, the ⁇ a2 ⁇ number, the ⁇ a3 ⁇ number, and the ⁇ a4 ⁇ number can be one. In some embodiments, at least two of the ⁇ al ⁇ number, the ⁇ a2 ⁇ number, the ⁇ a3 ⁇ number, and the ⁇ a4 ⁇ number can be the same. In some embodiments, at least two of the ⁇ al ⁇ number, the ⁇ a2 ⁇ number, the ⁇ a3 ⁇ number, and the ⁇ a4 ⁇ number can be different.
  • evaluating proximity of each CU of the plurality of CUs comprises evaluating proximity between the first CU and the second CU. In some embodiments, evaluating proximity between the first CU and the second CU can include identifying the plurality of molecular tags associated with the first CU and the second CU. In some embodiments, evaluating proximity between the first CU and the second CU can include identifying numbers of UMIs linked to the barcode and the plurality of biological molecules.
  • ⁇ al’ ⁇ number of UMIs can be linked to a first barcode and a first biological molecule of the plurality of the biological molecules
  • ⁇ a2’ ⁇ number of UMIs can be linked to the first barcode and a second biological molecule of the plurality of the biological molecules
  • ⁇ a3’ ⁇ number of UMIs can be linked to a second barcode and a third biological molecule of the plurality of the biological molecules
  • ⁇ a4’ ⁇ number of UMIs can be linked to the second barcode and a fourth biological molecule of the plurality of the biological molecules.
  • the first barcode is unique to the first CU
  • the second barcode can be unique to the second CU.
  • the ⁇ al’ ⁇ number, the ⁇ a2’ ⁇ number, the ⁇ a3’ ⁇ number, and the ⁇ a4’ ⁇ number can be input into the machine learning algorithm for the proximity analysis.
  • the machine learning algorithm can output an adjacency matrix.
  • at least two of the ⁇ al’ ⁇ number, the ⁇ a2’ ⁇ number, the ⁇ a3’ ⁇ number, and the ⁇ a4’ ⁇ number can give an output value of 1 in the adjacency matrix.
  • the output value of 1 can indicate the first CU and the second CU are in spatial proximity.
  • At least two of the ⁇ al’ ⁇ number, the ⁇ a2’ ⁇ number, the ⁇ a3’ ⁇ number, and the ⁇ a4’ ⁇ number can give an output value of 0 in the adjacency matrix.
  • the output value of 0 can indicate the first CU and the second CU are not in spatial proximity.
  • the adjacency matrix can be used to establish a posteriori spatial relationship between the plurality of molecular tags and the plurality of sample objects. Spatial Profiling
  • the present disclosure provides methods of spatial profiling of an input sample comprising a plurality of sample objects.
  • the presently described methods allow for a comprehensive spatial profiling of biological samples, providing valuable insights into the interactions and distribution of biological molecules at a subcellular level.
  • the integration of machine learning algorithms further enhances the accuracy and utility of the spatial profiling process, enabling the generation of adjacency matrices to map spatial relationships and facilitate downstream analysis.
  • spatial profiling can be achieved without establishing a priori spatial relationships between a plurality of molecular tags and the sample objects.
  • the presently described method can involve contacting the sample objects with a plurality of computing units (CUs), each engineered to display at least one first set and one second set of surface-bound entities (SBEs) on its surface.
  • the first set of SBEs can bind to the sample objects, while the second set of SBEs can be associated with the molecular tags and capable of binding to the biological molecules.
  • the biological molecules can be released and subsequently bind to the second set of SBEs associated with the molecular tags.
  • This interaction can facilitate the establishment of a posteriori spatial relationships between the molecular tags and the sample objects.
  • the spatial profiling process can utilize the proximity of each CU with the aid of a machine learning algorithm to evaluate and aggregate the biological molecules bound to proximal CUs.
  • the presently described method can enable the detailed spatial profiling of cell-cell interactions and the distribution of biological molecules within individual sample objects, providing subcellular resolution of molecular entities, such as mRNA transcripts.
  • each CU can be associated with molecular tags either prior to or following the contacting of sample objects with CUs.
  • the molecular tags can include barcodes and unique molecular identifiers (UMIs), which can be associated with the second set of SBEs.
  • the molecular tags can include sequencing elements, release elements, and/or linkers. The release element can facilitate the detachment of the molecular tags from the CUs, while the linker can prevent extension.
  • the barcodes can be unique to each CU, and the UMIs can be unique to each molecular tag, ensuring precise tracking and analysis.
  • the second set of SBEs can be engineered to display poly(dT) sequences, enhancing the binding specificity to biological molecules.
  • the interaction between the CUs and sample objects via the first set of SBEs, and between the biological molecules and CUs via the second set of SBEs, can establish a robust system for spatial profiling.
  • Proximity analysis can be conducted by evaluating the interaction frequencies between CUs and sample objects, as well as identifying molecular tags and UMIs linked to specific barcodes and biological molecules.
  • a method of spatially profiling an input sample comprising a plurality of sample objects comprising a plurality of biological molecules without establishing a priori spatial relationship between a plurality of molecular tags and the plurality of sample objects
  • the method including (a) contacting the plurality of sample objects with a plurality of computing units (CUs), wherein each CU of the plurality of CUs can be engineered to display a first set of at least one surface-bound entity (SBE) and a second set of at least one SBE on its surface, wherein the first set of at least one SBE can bind to the plurality of sample objects, and the second set of at least one SBE can be associated with the plurality of molecular tags and can bind to the plurality of biological molecules; and (b) permeabilizing the plurality of sample objects such that the plurality of biological molecules can be released and bind to the second set of at least one SBE associated with the plurality of molecular tags; and (c) establishing a
  • the spatial profiling can include information on cell-cell interactions of the plurality of sample objects and information regarding the distribution of the plurality of biological molecules within each sample object.
  • each CU can be associated with the plurality of molecular tags. In some embodiments, each CU can be associated with the plurality of molecular tags prior to the contacting the plurality of sample objects with the plurality of CUs. In some embodiments, each CU can be associated with the plurality of molecular tags following the contacting the plurality of sample objects with the plurality of CUs. In some embodiments, each molecular tag of the plurality of molecular tag can include a barcode and a unique molecular identifier (UMI), and is associated with the second set of at least one SBE. In some embodiments, each molecular tag can further include a sequencing element, a release element, and/or a linker.
  • UMI unique molecular identifier
  • the release element can release each molecular tag from each CU.
  • the linker can prevent extension.
  • the barcode can be unique to each CU.
  • each molecular tag can be single-stranded.
  • each molecular tag can include a hairpin structure.
  • each molecular tag can be double-stranded.
  • the second set of at least one SBE can be poly(dT).
  • the barcode can be uniquely assigned to the each CU of the plurality of CUs.
  • the UMI can be uniquely assigned to the each molecular tag.
  • each molecular tag can be associated with at least two CUs of the plurality of CUs.
  • each CU can be engineered to display the second set of at least one SBE upon interacting with at least one sample object of the plurality of sample objects.
  • the second set of at least one SBE can further include a blocking element.
  • the blocking element can prevent reverse transcription.
  • the blocking element can be removed when the at least one sample object interacts with each CU.
  • the plurality of sample objects can interact with the plurality of CUs via the first set of at least one SBE.
  • the plurality of biological molecules can interact with the plurality of CUs via the second set of at least one SBE.
  • the plurality of CUs bound to each sample object of the plurality of sample objects can provide a physical barrier, preventing diffusion of the plurality of biological molecules upon permeabilizing of the sample object.
  • the construction of the training dataset can include UMI vectors of K cells (sample objects), assignment of N computing units to K sample objects, and UMI vectors generated for each computing unit.
  • the assignment of CUs to sample objects can be no longer surjective.
  • Sample objects can be randomly connected, described by an adjacency matrix where off-diagonal elements can indicate the probability of connection.
  • the assignment of CUs to sample objects can be detailed by an adjacency matrix, calculated in two steps: selecting one sample object per CU and partitioning CUs assigned to the same sample object into sets, assigning them to neighboring sample objects.
  • the UMI vectors can be partitioned among the CUs assigned to the same sample object, ensuring each UMI can be attributed to a single CU independently and with equal probability adjusted by a normalization factor.
  • the same model architecture can be used with specific parameters.
  • the model output can be processed to estimate the sample object adjacency matrix.
  • the output matrix can be used to partition the set of CUs into a number of disjoint sets, represented by an estimated adjacency matrix.
  • Agglomerative clustering with full linkage can be used for partitioning, with the number of clusters determined a priori or by optimizing a secondary criterion, such as the number of CUs per cluster.
  • a heuristic compute the estimated sample object adjacency matrix, where sets of disjoint CUs can be compared based on summations over the CUs to determine adjacency.
  • the presently described method can enable the analysis of physical interactions between cells in scenarios where sample objects are not uniquely assignable to computing units, providing a robust approach for spatial profiling in single-cell transcriptomics.
  • a molecular tag refers to any molecule capable of (directly or indirectly) capturing and/or labeling a plurality of biological molecules.
  • the molecular tag can be a nucleic acid or a polypeptide.
  • the molecular tag can be a conjugate of, for example, but not limited to, an oligonucleotide-antibody conjugate.
  • a non-limiting exemplary structure of molecular tag is shown in FIG. 3D.
  • the molecular tag can include a unique sequence for recognizing a specific biological molecule.
  • the molecular tag can include a recognition sequence, which can be a unique sequence for recognizing a specific SBE.
  • the molecular tag can also include a hash element and a priming element ( .g., a random N-mer, such as dT_N).
  • the molecular tag can include a unique sequencing element for e.g., Next Generation Sequencing (NGS).
  • NGS Next Generation Sequencing
  • the molecular tag can include an oligonucleotide-antibody conjugate and a hash element.
  • FIG. 3D, FIG. 4D, and FIGS. 16A-16B Non-limiting exemplary structures of a molecular tag are illustrated in FIG. 3D, FIG. 4D, and FIGS. 16A-16B.
  • sequenced polynucleotides can be, for example, nucleic acid molecules such as DNA or RNA, including variants or derivatives thereof (e.g., single stranded DNA or DNA/RNA hybrids, and nucleic acid molecules with a nucleotide analog).
  • individual biological molecules e.g., cells or cellular contents following lysis of cells
  • partitioning/hashing using the hash element, which is described further below.
  • Sequencing can be performed by various commercial systems. More generally, sequencing can be performed using nucleic acid amplification, polymerase chain reaction (PCR) (e.g., digital PCR and droplet digital PCR (ddPCR), quantitative PCR, real time PCR, multiplex PCR, PCR-based singleplex methods, emulsion PCR), and/or isothermal amplification.
  • PCR polymerase chain reaction
  • ddPCR digital PCR and droplet digital PCR
  • quantitative PCR quantitative PCR
  • real time PCR real time PCR
  • multiplex PCR multiplex PCR
  • PCR-based singleplex methods emulsion PCR
  • methods for sequencing can include, but are not limited to, DNA hybridization methods (e.g., Southern blotting), restriction enzyme digestion methods, Sanger sequencing methods, next-generation sequencing methods (e.g., single-molecule real-time sequencing, nanopore sequencing, and Polony sequencing), ligation methods, and microarray methods.
  • DNA hybridization methods e.g., Southern blotting
  • restriction enzyme digestion methods e.g., restriction enzyme digestion methods
  • Sanger sequencing methods e.g., next-generation sequencing methods (e.g., single-molecule real-time sequencing, nanopore sequencing, and Polony sequencing)
  • ligation methods e.g., ligation methods, and microarray methods.
  • sequencing methods include targeted sequencing, single molecule real-time sequencing, exon sequencing, electron microscopy-based sequencing, panel sequencing, transistor-mediated sequencing, direct sequencing, random shotgun sequencing, Sanger dideoxy termination sequencing, whole-genome sequencing, sequencing by hybridization, pyrosequencing, capillary electrophoresis, gel electrophoresis, duplex sequencing, cycle sequencing, single-base extension sequencing, solid-phase sequencing, high-throughput sequencing, massively parallel signature sequencing, co-amplification at lower denaturation temperature-PCR (COLD-PCR), sequencing by reversible dye terminator, paired- end sequencing, near-term sequencing, exonuclease sequencing, sequencing by ligation, shortread sequencing, single-molecule sequencing, sequencing-by-synthesis, real-time sequencing, reverse-terminator sequencing, nanopore sequencing, 454 sequencing, Solexa Genome Analyzer sequencing, SOLiDTM sequencing, MS-PET sequencing, and any combinations thereof.
  • COLD-PCR denaturation temperature-PCR
  • each molecular tag can include at least one priming element.
  • the priming element can be an oligonucleotide, a polypeptide, a small molecule, or any combination thereof, that can bind specifically to a biological molecule to be analyzed.
  • a molecular tag can be used to capture or detect a biological molecule to be analyzed.
  • a molecular tag can be a functional nucleic acid sequence configured to interact with one or more biological molecules, such as one or more different types of nucleic acids (e.g., RNA molecules and DNA molecules).
  • the functional nucleic acid sequence can include an N-mer sequence (e.g., a random N-mer sequence), which N-mer sequences can be configured to interact with a plurality of DNA molecules and/or a plurality of RNA molecules.
  • the functional sequence can include a poly(N) sequence, which poly(N) sequences can be configured to interact with messenger RNA (mRNA) molecules via the poly(A) tail of an mRNA transcript.
  • the poly(N) sequence can be configured to interact with a DNA molecule comprising a single stranded 3’ terminus.
  • the functional sequence can include a poly(T) sequence, which poly(T) sequences can be configured to interact with messenger RNA (mRNA) molecules via the poly(A) tail of an mRNA transcript.
  • the functional nucleic acid sequence can be the binding target of a protein (e.g, a transcription factor, a DNA binding protein, or a RNA binding protein), where the protein is a desired biological molecule to be analyzed.
  • a molecular tag can include ribonucleotides and/or deoxyribonucleotides as well as synthetic nucleotide residues that can be capable of participating in Watson-Crick type or analogous base pair interactions.
  • the molecular tag is capable of priming a reverse transcription reaction to generate cDNA that is complementary to the captured RNA biological molecules.
  • the priming element of the molecular tag can prime a DNA extension (polymerase) reaction to generate DNA that is complementary to the captured DNA biological molecules.
  • the priming element can be located at the 3’ end of the molecular tag and can include a free 3’ end that can be extended, e.g., by template dependent polymerization, to form an extended molecular tag.
  • the priming element can include a nucleotide sequence that is capable of hybridizing to nucleic acids, e.g., RNA, DNA, or other analyte, present in the sample object interacting with one or more CUs.
  • the priming element can be selected or designed to bind selectively or specifically to a target nucleic acid.
  • the capture domain can be selected or designed to capture mRNA by way of hybridization to the mRNA poly(A) tail.
  • the capture domain can includes a poly(T) DNA or poly(N) oligonucleotide, which is capable of hybridizing to a poly(A) tail of mRNA or a single stranded 3’ terminus of DNA.
  • the priming element can include nucleotides that can be functionally or structurally analogous to a poly(T) tail.
  • a molecular tag can include a priming element having a sequence that is capable of binding to mRNA and/or genomic DNA.
  • the molecular tag can include a priming element that includes a nucleic acid sequence (e.g., a poly(T) or poly(N) sequence) capable of binding to a poly(A) tail of an mRNA and/or to a poly(A) homopolymeric sequence present in genomic DNA.
  • a homopolymeric sequence can be added to an mRNA molecule or a genomic DNA molecule using a terminal transferase enzyme in order to produce a DNA or RNA biological molecule that has a poly(A) or poly(T) sequence.
  • a poly(A) sequence can be added to a biological molecule (e.g., a fragment of genomic DNA) thereby making the biological molecule capable of capture by a poly(T) priming element.
  • random sequences can be used to form all or a part of the priming element.
  • random sequences can be used in conjunction with poly(T) (or poly(T) analogue) sequences.
  • a priming element includes a poly(T) (or a “poly(T)-like”) oligonucleotide
  • it can also include a random oligonucleotide sequence (e.g., “poly(T)-random sequence” probe). This can, for example, be located 5’ or 3’ of the poly(T) sequence, e.g., at the 3’ end of the priming element.
  • the poly(T)- random sequence probe can facilitate the capture of the mRNA poly(A) tail.
  • the priming element can be an entirely random sequence.
  • degenerate priming element can be used.
  • the priming element can be based on a particular gene sequence or particular motif sequence or common/conserved sequence, that it is designed to capture (i.e., a sequence-specific priming element).
  • the priming element is capable of binding selectively to a desired sub-type or subset of nucleic acid, for example a particular type of RNA, such as mRNA, rRNA, tRNA, SRP RNA, tmRNA, snRNA, snoRNA, SmY RNA, scaRNA, gRNA, RNase P, RNase MRP, TERC, SL RNA, aRNA, cis-NAT, crRNA, IncRNA, miRNA, piRNA, siRNA, shRNA, tasiRNA, rasiRNA, 7SK, eRNA, ncRNA or other types of RNA.
  • the priming element can be capable of binding selectively to a desired subset of ribonucleic acids, for
  • the priming element of the molecular tag can be a non-nucleic acid domain.
  • suitable priming elements that are not exclusively nucleic-acid based can include, but are not limited to, proteins, peptides, aptamers, antigens, antibodies, and molecular analogs that mimic the functionality of any of the priming elements described herein.
  • a molecular tag can include a recognition element, which can be a contiguous nucleic acid segment or two or more non-contiguous nucleic acid segments that can identify and bind to a SBE (e.g, ssDNA binding protein) on CU.
  • a recognition element can be unique to a SBE to which it binds.
  • a recognition element can include one or more specific polynucleotides sequences, one or more random nucleic acid and/or amino acid sequences, and/or one or more synthetic nucleic acid and/or amino acid sequences.
  • a recognition element can be a nucleic acid sequence that does not substantially hybridize to the biological molecules (e.g, DNA and/or RNA) to be analyzed.
  • a recognition element can bind to a SBE in a reversible or irreversible manner.
  • a recognition element can be a chemical moiety that can bind to thiol groups on CUs.
  • the chemical moiety can be, but are not limited to, maleimides, iodoacetamides, disulfides, haloacetyl groups (e.g., chloroacetyl, bromoacetyl), aziridines, or vinyl sulfones.
  • a recognition element can include a sequence that can bind covalently to a single stranded DNA binding protein, e.g., a HUH endonuclease from the family of replication initiator domains or relaxases.
  • a recognition element can include a sequence that can form double stranded structures recognized by a double stranded DNA binding protein, e.g., a protein from the family of transcription factors.
  • a molecular tag can include one or more hash elements.
  • a hash element can be a contiguous nucleic acid segment or two or more non-continuous nucleic acid segments that function as a label or an identifier that can convey the origin of the biological molecules.
  • a hash element can be uniquely assigned to a computing unit and can thereby be associated with sample objects with the computing unit, such that it can allow for accurate detection of a plurality of biological molecules originating from different sample object but captured in the same computing unit.
  • a hash element can have a variety of different formats.
  • a hash element can include random nucleic acid and/or amino acid sequences; and synthetic nucleic acid and/or amino acid sequences.
  • a hash element can be attached to a biological molecule in a reversible or irreversible manner.
  • a hash element can be added to, for example, a fragment of a DNA or RNA biological molecule before, during, and/or after sequencing of the sample.
  • a hash element can allow for identification and/or quantification of individual sequencing-reads.
  • a hash element can be a nucleic acid sequence that does not substantially hybridize to a biological molecule (e.g., DNA and/or RNA).
  • a hash element is not nucleic acids, such as, but not limited to, polypeptide tags and affinity tags, e.g., FLAG-gat, HA-tag, His-tag, or Myc-tag.
  • a hash element can permit partitioning of captured biological molecules by the affinity tag and quantification of biological molecules associated with each tag by methods other than sequencing, e.g., sandwich ELISA.
  • a hash element can include a heavy metal, which can allow partitioning of the captured biological molecules and quantification by mass spectrometry.
  • the molecular tags can be divided into one or more subsets, wherein the hash elements of the multiple molecular tags can include sequences that can be the same within a subset of the molecular tags, while the sequences of the hash elements of another subset of the multiple molecular tags can be different from the sequences of the hash elements of the first subset.
  • a hash element can be associated with the origin of the biological molecule within the multiplex analysis.
  • a hash element can be associated with a quantity of the biological molecule present within a sample object.
  • a mixed but known set of hash elements can provide a stronger address or attribution of the biological molecules to a given sample object, by providing duplicate or independent confirmation of the identity of the biological molecule.
  • the multiple hash element can represent increasing specificity of the origin of biological molecules.
  • the hash element can include a unique molecular identifier.
  • a unique molecular identifier can be a contiguous nucleic acid segment or two or more non-contiguous nucleic acid segments that can function as a label or identifier for a particular biological molecule to be analyzed.
  • a UMI can be unique.
  • a UMI can include one or more specific polynucleotides sequences, one or more random nucleic acid and/or amino acid sequences, and/or one or more synthetic nucleic acid and/or amino acid sequences.
  • the UMI can be a nucleic acid sequence that does not substantially hybridize to analyte biological molecules in the sample object.
  • the UMI can have less than 80% sequence identity (e.g., less than 70%, 60%, 50%, or less than 40% sequence identity) to the nucleic acid sequences across a substantial part (e.g., 80% or more) of the nucleic acid molecules in the sample object.
  • sequence identity e.g., less than 70%, 60%, 50%, or less than 40% sequence identity
  • the UMI can include from about 6 to about 20 or more nucleotides within the sequence of the capture probes.
  • the length of a UMI sequence can be about 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides or longer.
  • the length of a UMI sequence can be at least about 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides or longer.
  • the length of a UMI sequence is at most about 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides or shorter.
  • these nucleotides can be completely contiguous, i.e., in a single stretch of adjacent nucleotides, or they can be separated into two or more separate subsequences that can be separated by 1 or more nucleotides.
  • Separated UMI subsequences can be from about 4 to about 16 nucleotides in length. In some embodiments, the UMI subsequence can be about 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or 16 nucleotides or longer. In some embodiments, the UMI subsequence can be at least about 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or 16 nucleotides or longer. In some embodiments, the UMI subsequence can be at most about 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or 16 nucleotides or shorter.
  • a UMI can be attached to a molecular tag in a reversible or irreversible manner.
  • a UMI can be added to, for example, a fragment of a DNA or RNA sample before, during, and/or after sequencing of the biological molecule.
  • a UMI can allow for identification and/or quantification of individual sequencing reads.
  • a UMI can be a used as a fluorescent barcode for which fluorescently labeled oligonucleotide probes hybridize to the UMI.
  • sequences of the molecular tag can generally be selected for compatibility with any of a variety of different sequencing systems, e. ., NGS, 454 Sequencing, Ion Torrent Proton or PGM, Illumina XI 0, PacBio, Nanopore, etc., and the requirements thereof.
  • functional sequences can be selected for compatibility with non-commercialized sequencing systems. Examples of such sequencing systems and techniques, for which suitable functional sequences can be used, include (but are not limited to) Roche 454 sequencing, Ion Torrent Proton or PGM sequencing, Illumina X10 sequencing, PacBio SMRT sequencing, and Oxford Nanopore sequencing. Further, in some embodiments, functional sequences can be selected for compatibility with other sequencing systems, including non-commercialized sequencing systems.
  • a barcode can be a label, or identifier, that can convey or is capable of conveying information (e.g., information about a computing unit).
  • a barcode can be part of a CU.
  • a barcode can be attached to a CU.
  • a particular barcode can be unique a particular CU relative to other barcodes.
  • Barcodes can have a variety of different formats. For example, barcodes can include polynucleotide barcodes, random nucleic acid and/or amino acid sequences, and synthetic nucleic acid and/or amino acid sequences.
  • a barcode can be attached to a CU or to another moiety or structure in a reversible or irreversible manner. Barcodes can allow for identification and/or quantification of individual sequencing-reads (e.g., a barcode can be or can include a unique molecular identifier (UMI).
  • UMI unique molecular identifier
  • Barcodes can spatially resolve biological molecules found in sample objects, for example, at single-cell resolution (e.g, a barcode can be or can include a spatial barcode).
  • a barcode can include both a UMI and a spatial barcode.
  • a barcode can include two or more sub-barcodes that together can function as a single barcode.
  • a polynucleotide barcode can include two or more polynucleotide sequences (e.g., sub-barcodes) that can be separated by one or more non-barcode sequences.
  • a molecular tag can optionally include a release element.
  • the release element can represent a portion of a molecular tag that can be used to reversibly attach the molecular tag to a CU.
  • the barcode and/or UMIs can be released by cleavage of the release element.
  • the release element can link the molecular tag to the CU via a disulfide bond.
  • a reducing agent can be added to break the disulfide bonds, resulting in release of the molecular tag from the CU.
  • the release element can be a photosensitive chemical bond e.g., a chemical bond that dissociates when exposed to light such as ultraviolet light).
  • the release element can be an ultrasonic cleavage domain. For example, ultrasonic cleavage can depend on nucleotide sequence, length, pH, ionic strength, temperature, and the ultrasonic frequency.
  • Oligonucleotides with photo-sensitive chemical bonds have various advantages. They can be cleaved efficiently and rapidly (e.g., in nanoseconds and milliseconds). When a photo-cleavable release element is used, the cleavable reaction can be triggered by light, and can be highly selective to the linker and consequently biorthogonal. Typically, wavelength absorption for the photocleavable release element can be located in the near-UV range of the spectrum. In some embodiments, k ma x of the photocleavable linker can be from about 300 nm to about 400 nm, or from about 310 nm to about 365 nm.
  • Xmax of the photocleavable linker can be about 300 nm, about 312 nm, about 325 nm, about 330 nm, about 340 nm, about 345 nm, about 355 nm, about 365 nm, or about 400 nm.
  • Non-limiting examples of a photo-sensitive chemical bond that can be used in a release element can include those described in Leriche et al. Bioorg Med Chem. 2012 Jan 15;20(2):571- 82, which is incorporated by reference herein in its entireties.
  • linkers that comprise photo-sensitive chemical bonds can include 3-amino-3-(2-nitrophenyl)propionic acid (ANP), phenacyl ester derivatives, 8-quinolinyl benzenesulfonate, dicoumarin, 6-bromo-7- alkixycoumarin-4-ylmethoxycarbonyl, a bimane-based linker, and a bis-arylhydrazone based linker.
  • the photo-sensitive bond can be part of a release element, such as an ortho-nitrobenzyl (ONB) linker.
  • release element can include labile chemical bonds such as, but not limited to, ester linkages (e.g., cleavable with an acid, a base, or hydroxylamine), a vicinal diol linkage (e.g., cleavable via sodium periodate), a Diels-Alder linkage (e.g., cleavable via heat), a sulfone linkage (e.g., cleavable via a base), a silyl ether linkage (e.g., cleavable via an acid), a glycosidic linkage (e.g., cleavable via an amylase), a peptide linkage (e.g., cleavable via a protease), an abasic or apurinic/apyrimidinic (AP) site (e.g., cleavable with an alkali or an AP endonuclease), or
  • ester linkages
  • the release element can include a sequence that can be recognized by one or more enzymes capable of cleaving a nucleic acid molecule, e.g., capable of breaking the phosphodiester linkage between two or more nucleotides.
  • a bond can be cleavable via other nucleic acid molecule targeting enzymes, such as restriction enzymes (e.g., restriction endonucleases).
  • the release element can include a restriction endonuclease (restriction enzyme) recognition sequence. Restriction enzymes cut double-stranded or single stranded DNA at specific recognition nucleotide sequences known as restriction sites.
  • a rare-cutting restriction enzyme e.g., enzymes with a long recognition site (at least 8 base pairs in length), can be used to reduce the possibility of cleaving elsewhere in the capture probe.
  • the release element can include a poly(U) sequence which can be cleaved by a mixture of Uracil DNA glycosylase (UDG) and the DNA glycosylase-lyase Endonuclease VIII, commercially known as the USERTM enzyme.
  • UDG Uracil DNA glycosylase
  • USERTM enzyme commercially known as the USERTM enzyme.
  • a molecular tag can include a linker, which can prevent extension.
  • the linker can be dideoxynucleotides (ddNTPs), which are modified nucleotides that lack the 3’ hydroxyl group necessary for forming phosphodiester bonds, thereby preventing further nucleotide addition.
  • the linker can be spacer phosphoramidites, which are synthetic linkers that can be incorporated into oligonucleotides to create a gap or block extension.
  • spacer phosphoramidites can include spacer C3, C6, C12, HEG (hexaethylene glycol) spacer, abasic sites, which are nucleotides with no base attached, which can prevent extension by causing a pause in the DNA polymerase activity.
  • the linker can be peptide nucleic acids (PNAs), which are synthetic polymers structurally similar to DNA but with a peptide backbone, which can hybridize to DNA and block polymerase extension.
  • the linker can be methylphosphonate linkers, which are phosphoramidite linkers with a methyl group attached to the phosphorus, preventing normal base pairing and extension.
  • the linker can be LNA (locked nucleic acid) probes, which are modified RNA nucleotides with a methylene bridge connecting the 2’ oxygen and the 4’ carbon, which can enhance binding affinity and block extension.
  • the linker can be thioate linkers, which are sulfur-containing linkers that can interfere with the polymerase activity and prevent extension.
  • the linker can be PEG (polyethylene glycol) linkers, which are flexible, hydrophilic linkers that can create a physical barrier to polymerase extension.
  • the linker can be phosphate or phosphorothioate groups at the 3’ end, wherein adding a phosphate or phosphorothioate group at the 3’ end of an oligonucleotide can prevent extension by blocking the 3’ hydroxyl group.
  • a molecular tag can further include a blocking element.
  • the blocking element can prevent reverse transcription.
  • the blocking element can be removed when the at least one sample object interacts with each CU.
  • the blocking element can be C3 Spacer, which is a three-carbon spacer that acts as a blocking group to prevent the extension of nucleic acids during reverse transcription.
  • the blocking element can be 3’ phosphate group. Addition of a phosphate group at the 3’ end of an oligonucleotide can prevent reverse transcription by blocking the addition of nucleotides.
  • the blocking element can be 3’ amino modifier, which can prevent reverse transcription by blocking the polymerase from extending the nucleotide chain.
  • the blocking element can be dideoxynucleotides (ddNTPs). Incorporation of dideoxynucleotides at the end of an oligonucleotide can block reverse transcription by preventing the addition of further nucleotides.
  • the blocking element can be locked nucleic acids (LNAs), which are modified RNA nucleotides that can form highly stable duplexes with complementary DNA or RNA, blocking reverse transcriptase from accessing and extending the template.
  • LNAs locked nucleic acids
  • the blocking element can be a thioate linker, which is a sulfur-containing linker that can interfere with reverse transcriptase activity and prevent the enzyme from synthesizing complementary DNA.
  • the blocking element can be a hairpin loop structure. Designing oligonucleotides with hairpin loop structures can create physical barriers to reverse transcription, preventing the enzyme from extending the nucleic acid strand.
  • the blocking element can be a PEG (polyethylene glycol) linker, which can create a steric hindrance that blocks the reverse transcriptase from binding and extending the nucleic acid.
  • the blocking element can be an aptamers or secondary structure, which are like G-quadruplexes or aptamers that can fold in such a way that they block the binding or activity of reverse transcriptase.
  • the blocking element can be a moiety that includes a ssDNA sequence further fused to a domain that can prevent extension (e.g., C3, C6 etc.).
  • the ssDNA sequence can be recognized by an endonuclease. Upon cleavage, extension by reverse transcription can become enabled.
  • Input samples can be collected from various origins, e.g., biological origin.
  • the input sample can be biological.
  • the biological input sample can comprise a biological fluid (e.g., whole blood, serum, plasma, sputum, urine, saliva, nipple aspirate, ductal lavage, vaginal fluid, nasal fluid, ear fluid, gastric fluid, cerebrospinal fluid, sweat, pericrevicular fluid, semen, prostatic fluid, feces, cell lysate, or tears).
  • the biological input sample can comprise a tissue samples e.g., hair, skin, or biopsy material).
  • the biological input sample can comprise an enriched biological material (e.g., various cell types or exosomes).
  • the input sample can be a biological sample derived from a subject or an individual.
  • the input sample can be primary cell cultures.
  • the input sample can be cell line cultures.
  • the input sample can be other complex cultures, such as, but not limited to, organoids, 2D/3D cultures, mixed-cell cultures, genetically modified cultures, cell lines modified using CRISPR, and the like.
  • the presently described methods can be compatible with whole blood, stabilized leukocyte fractions, isolated peripheral blood mononuclear cells (PBMCs), cryogenically stored PBMCs, and primary cell cultures.
  • the recommended cell resuspension solution can be lx DPBS containing 0.1% gelatin w/v.
  • standard cultivation media can also be compatible.
  • sample volume can be about 10 pL, about 20 pL, about 30 pL, about 40 pL, about 50 pL, about 60 pL, about 70 pL, about 80 pL, about 90 pL, about 100 pL, about 110 pL, about 120 pL, about 130 pL, about 140 pL, about 150 pL, about 160 pL, about 170 pL, about 180 pL, about 190 pL, about 200 pL, about 210 pL, about 220 pL, about 230 pL, about 240 pL, about 250 pL, about 260 pL, about 270 pL, about 280 pL, about 290 pL, about 300 pL, about 310 pL, about 320 pL, about 330 pL, about 340 pL, about 350 pL
  • the total number of cells (passive background) of the input sample can be about 100,000 cells, about 200,000 cells, about 300,000 cells, about 400,000 cells, about 500,000 cells, can be about 600,000 cells, about 700,000 cells, about 800,000 cells, about 900,000 cells, or about 1,000,000 cells. In some embodiments, the total number of cells (passive background) of the input sample can be about 1 million cells, about 1.5 million cells, about 2 million cells, about 2.5 million cells, or about 3 million cells. In some embodiments, the total number of cells (passive background) of the input sample can be about 1 million cells.
  • cells expressing at least one target antigen can be about 1%, about 2%, about 3%, about 4%, about 5%, about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50% of the passive background. In some embodiments, cells expressing at least one target antigen (active background) can be about 10% of the passive background.
  • cells expressing all target antigen can be about 1%, about 2%, about 3%, about 4%, about 5%, about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50% of the active background. In some embodiments, cells expressing all target antigen (target cells) can be 10% of the active background.
  • the sample objects can be entities derived from the input sample, such that the CUs of the present disclosure, configured to interact with the sample objects, can generate an output signal indicative of a characteristic of the input sample.
  • a sample object can include a plurality of biological molecule.
  • the plurality of biological molecules can be DNA.
  • the plurality of biological molecule can be RNA.
  • the plurality of biological molecule can be protein or peptide.
  • a sample object can comprise a cell.
  • the cell can comprise a plurality of biological molecules.
  • the sample object can be lysed in order to release the plurality of biological molecules, which can then subsequently interact with one or more SBEs on CUs interacting with the sample object.
  • the sample object comprises a cell, which can associate with a surfacebound entity (SBE), which is further explained below.
  • the sample object can associate with an SBE of a CU.
  • a sample object can produce a signal object (SO), which is further explained below.
  • a sample object can degrade a signal object (SO).
  • a sample object can recognize an SO. In some embodiment, such recognition can lead to the sample object modulating its ability of producing or degrading an SO.
  • sample objects can be individual entities or clusters formed through aggregation of entities.
  • sample objects can be cells, and cell clusters can be formed due to specific cell-cell interactions, wherein the formation of cell clusters can signify cytotoxicity, adherence, and/or differentiation of cells or cell lineage.
  • a sample object can comprise a molecule.
  • the molecule can comprise a plurality of biological molecules.
  • the molecule can comprise a wide array of molecules produced by the input sample.
  • the molecule can be a metabolite and/or its related compounds.
  • the molecule can be a protein or peptide and/or its derivatives.
  • the molecule can be a pheromone and/or its derivative compounds.
  • the molecule can be a signaling molecule, which can include, but not limited to, a wide array of mammalian hormones, cytokines, interleukins, and/or chemokines.
  • the molecule can be one or more oligonucleotides.
  • the molecule can be one or more DNA.
  • the molecule can be one or more RNA.
  • sample objects from the input sample or within the set of computing entities can be pre-treated prior to any of the presently described methods.
  • pre-treatment can slow or promote metabolic processes through external influence (e.g., temperature change) or chemical treatment (e.g., metabolite or inducer supplement).
  • pre-treatment can also expose or conceal object surfaces.
  • SBEs can be occluded by nonspecific layers (e.g., polysaccharide, glycoprotein layers). Such layers can be removed by appropriate enzymatic or chemical pre-treatment.
  • sample objects can comprise background entities that can hinder recognition of target objects.
  • Erroneous clustering with background objects can be minimized by pretreatment of the objects with elements that can interact non-specifically with surface entities and thereby fill the residual binding capacity.
  • blocking entities can inhibit non-specific binding (passive and covalent) between SBEs or between surfaces. Such blocking entities do not exhibit cross-reactivity with SBEs, and thus should not disrupt the sample objects.
  • the Reaction Reagent is a composition comprising at least one computing unit (CU), wherein each CU of the at least one CU can be configured to interact with a sample object derived from an input sample such that an output signal indicative of a characteristic of the input sample can be generated.
  • the Reaction Reagent can further comprise one or more additional components, such as yeast extract, peptone, D-glucose, Dulbecco’s phosphate-buffered saline, D-(+)-Trehalose dihydrate, skim milk, and/or gelatin.
  • the components of the Reaction Reagent can be configured such that one or more computational clusters can be formed upon coming in contact with the at least one sample object, wherein each computational cluster comprises, independently, the at least one CUs.
  • each computational cluster comprises, independently, the at least one CUs.
  • the target profile can be modified as desired by users.
  • any components of the Reaction Reagent described herein can be formulated with acceptable excipients, such as carriers, solvents, stabilizers, diluents, etc., depending upon a customized combination of CUs and target profile.
  • acceptable excipients can include, for example, carrier molecules that include large, slowly metabolized macromolecules such as proteins, polysaccharides, polylactic acids, polyglycolic acids, polymeric amino acids, amino acid copolymers, and inactive viral particles.
  • exemplary excipients can include antioxidants (for example and without limitation, ascorbic acid), chelating agents (for example and without limitation, EDTA), carbohydrates (for example and without limitation, dextrin, hydroxyalkylcellulose, and hydroxyalkylmethylcellulose), stearic acid, liquids (for example and without limitation, oils, water, saline, glycerol and ethanol), wetting or emulsifying agents, pH buffering substances, and the like.
  • antioxidants for example and without limitation, ascorbic acid
  • chelating agents for example and without limitation, EDTA
  • carbohydrates for example and without limitation, dextrin, hydroxyalkylcellulose, and hydroxyalkylmethylcellulose
  • stearic acid for example and without limitation, oils, water, saline, glycerol and ethanol
  • wetting or emulsifying agents for example and without limitation, oils, water, saline, glycerol and ethanol
  • the Reaction Reagent can be dehydrated and rehydrated prior to or upon contacting the at least one sample object from the input sample.
  • the Reaction Reagent can be stored up to about 1 weeks, up to about 2 weeks, up to about 3 weeks, up to about 4 weeks, up to about 5 weeks, up to about 6 weeks, up to about 7 weeks, up to about 8 weeks, up to about 9 weeks, or up to about 10 weeks at room temperature without loss of signal. In some embodiments, the Reaction Reagent can be stored up to about 2 weeks at room temperature without loss of signal.
  • the Reaction Reagent can be stored up to about 1 weeks, up to about 2 weeks, up to about 3 weeks, up to about 4 weeks, up to about 5 weeks, up to about 6 weeks, up to about 7 weeks, up to about 8 weeks, up to about 9 weeks, up to about 10 weeks, up to about 11 weeks, up to about 12 weeks, up to about 13 weeks, up to about 14 weeks, up to about 15 weeks, or up to about 16 weeks at 4°C without loss of signal. In some embodiments, the Reaction Reagent can be stored up to about 8 weeks at 4°C without loss of signal.
  • Computing Units CUs
  • a computing unit is an object that has the potential to affect the informational output (e.g., output signal) of the presently described methods.
  • Informational output can be affected in a number of different ways.
  • a CU can be an object with a classified surface-bound entity (SBE) profile, thereby mediating object association.
  • a CU can be an object that can produce other CUs or signal objects (SOs), thereby affecting the informational output of a customized method.
  • SOs signal objects
  • a CU can be an object that can recognize SOs, thereby affecting the memory capacity of the presently described system.
  • a CU can express one or more surface-bound entities (SBEs) upon interacting with one or more sample objects.
  • a CU of the at least one CU can be, independently, (i) associated with one or more surface-bound entities (SBEs); (ii) capable of recognizing a signal object (SO); (iii) capable of producing an SO; (iv) be capable of degrading an SO; (v) capable of producing a change in a material property of the reaction medium; (vi) capable of producing another CU; or (vii) capable of changing its state following signal recognition or external influence.
  • a CU can be capable of producing a change in a material property of the Reaction Medium, which is further described below, of the present disclosure upon coming in contact with the Reaction Reagent comprising CUs.
  • a CU can be capable of producing a reporter entity, such as, but are not limited to, fluorescent proteins (e.g., GFP, RFP, YFP, or CFP), luminescent proteins (e.g., luciferase, such as, but are not limited to, Gaussia princeps luciferase (GLuc), Metridia longa luciferase (MLuc), Renilla reniformis luciferase (RLuc), Cypridina noctiluca luciferase (CLuc)), enzymes (e.g., betalactamase, beta-galactosidase, SEAP), any functional fragments or variants thereof.
  • the material property of the Reaction Medium can be
  • a CU can comprise non-native elements, native elements in non-native locations, or other alterations to native elements.
  • Introduced elements can include, but are not limited to, signal object generators, SBEs, reporter molecules, regulatory sequences, genetic selection markers, other types of non-genetic markers (e.g., magnetic, immunological), reference reporter molecules, enzyme coding genes (e.g., protease, kinase, phosphatase), etc.
  • modifications can be performed by introducing genetic changes at select chromosomal regions. Chromosomal modifications can either introduce new genetic elements at one or more desired loci or modify native elements (promoters, degradation tags, termini tags, transposon sites, etc.) at native loci.
  • modifications can be performed by introducing additional genetic material, e.g., plasmids or synthetic chromosomes.
  • a CU can be a wildtype cell (e.g., of bacterial, yeast, mammalian origin) or an engineered or synthetic cell (e.g., any genetically engineered cell).
  • a CU can be an engineered or synthetic cell based on or derived from a yeast cell.
  • suitable yeasts can include, but are not limited to, Pichia pastoris, Saccharomyces cerevisiae, Arxula adeninivorans (Blastoboiys adeninivorans)', Candida boidinii, Hansenual polymorpha Pichia angusta), Kluveromyces lactis, Yarrowia lipolytica etc.
  • a CU can be monomeric or multimeric molecule (e.g., a polypeptide, polypeptide derivative, nucleic acid).
  • a CU can be a cell comprising an SBE covalently attached to an anchor present within the cell.
  • the SBE can be covalently attached to the anchor by a linker.
  • the linker can comprise a repeat motif.
  • the linker can facilitate accessibility of the SBE and/or increases an effective contact area between a sample object and the CU upon coming in contact with the Reaction Medium.
  • a CU can be a cell displaying SBEs on its outermost surface (e.g. a cell wall or membrane).
  • the SBEs and the cell can be produced separately and subsequently associated by standard practices, e.g., through immunological labeling.
  • SBEs can be produced by the CU itself and tethered or anchored on the surface.
  • a SBE can be tethered to the surface by interaction with surface anchored proteins.
  • a SBE can be anchored to the surface as a fusion polypeptide with a surface anchor moiety that interacts either with the membrane (e.g., by hydrophobic interaction with membrane lipids) or the cell wall (e.g., covalent bonding to cell wall polysaccharides or polypeptides).
  • a CU can be a molecule (e.g, a monomeric or multimeric molecule).
  • the molecule can include, without limitation, a polypeptide, a polypeptide derivative, a nucleic acid, and/or a solid support.
  • a CU can comprise a polypeptide.
  • the polypeptide can be an antibody, e.g., a bi specific antibody (BsAb).
  • the polypeptide can be an enzyme.
  • the method can further comprise an agent, which the enzyme can convert into a signal object (SO).
  • a CU can comprise a solid support.
  • the solid support can be a functionalized bead.
  • a CU can be a molecule with a single SBE or multiple SBE moieties of the same or different specificities.
  • This can include a collection of immunoglobulins, their derivatives (e.g., scFv, Fab, diabody, etc.), or similar binding entities with some level of molecular specificity (e.g., DARPins, TALENS, antigens, nucleic acids).
  • a first CU of the at least one CU can be capable of producing a second CU, wherein the second CU can be of the same type as the first CU or of a different type than the first CU.
  • a first CU of the at least one CU can interact with a second CU of the at least one CU to influence SBE profiles of the first CU and/or the second CU.
  • a first CU of the at least one CU can be associated with a first SBE and a second CU of the at least one CU can be associated with a second SBE.
  • the first SBE and second SBE can be capable of forming a complex comprising the first SBE and second SBE.
  • CUs can be classified by the types of SBEs they expose to the Reaction Medium. Each SBE can be typed by how it can be recognized (e.g., by van der Waals forces, hydrogen bonds, hydrophobic and/or ionic interactions). Presentation of SBEs can be either constant, spontaneous, or induced (e.g., initiated following detection of signals transmitted in the medium or following direct interaction with objects).
  • CUs can be classified at each point in time.
  • a CU’s class can change in time as the CU’s SBE profile changes.
  • a CU can be called a target object at a given point in time if its current SBE profile belongs to one of the predetermined classes. If two target objects have statistically similar SBE profiles, they can be assigned to the same target object class.
  • SBEs Surface-Bound Entities
  • SBEs can mediate object association either directly or through intermediate objects.
  • a SBE profile can define a CU class that can subsequently serve in constructing the methods and systems that perform particular computing functions.
  • SBEs can be expressed on the surface of a CU upon the CU interacting with one or more sample objects.
  • a first CU interacting with a second CU can influence the SBE profile(s) of the first CU and/or the second CU.
  • the SBE profile can be constant in time, change spontaneously (e.g., by random change of internal state), change as a result of an internal state change (e.g.
  • an SBE can have a dual purpose of (1) binding to a CU to a sample object; and (2) recognizing an SO.
  • a SBE can interact with a plurality of biological molecules released from one or more sample objects.
  • the one or more sample objects can interact with one or more CUs and can express SBEs upon interacting with the one or more sample objects.
  • the one or more sample objects interacting with the one or more CUs can be lysed in order to release the plurality of biological molecules from the one or more sample objects.
  • the plurality of biological molecules can interact with at least one SBE on the surface of the one or more CUs.
  • the plurality of biological molecules can also interact with one or more molecular tags, which is explained further below.
  • a SBE can be a single-stranded DNA (ssDNA) binding protein.
  • a SBE can be a mRNA cap binding protein, such as, but not limited to eIF4E.
  • a SBE can be an oligonucleotide anchored to the CU surface, e.g., poly(dT).
  • a SBE can be an antigen on the cell surface (e.g., cell surface receptor).
  • the antigen can be a disease-associated antigen (e.g., cancer- associated antigen).
  • the SBE can be a marker indicating a state of the sample object (e.g., differentiation factors or clusters of differentiation (CDs)).
  • the SBE can be a biological signal (e.g., MHC, MHC epitope complex, or glycocalyx).
  • the SBE can be a marker indicating an activity of the sample object (e.g., receptors, receptor ligand complexes, or ion channels, and their modified forms).
  • the SBE can be a pathogenic marker (e.g., glycoproteins or lectins).
  • the SBE can be a synthetically produced molecule (e.g., surface tags, displayed epitopes, or conjugated molecules).
  • a SBE can be covalently attached to a surface of a CU.
  • the SBE can be covalently attached to the surface of the CU by a linker.
  • the linker can comprise a repeat motif.
  • the linker can be a polypeptide linker.
  • the CU can be a cell, and the SBE can be covalently attached to an antigen present on the surface of the cell.
  • the SBE can be a polypeptide covalently attached to a surface receptor of the cell.
  • the CU can be a solid support, and the SBE can be covalently attached to the surface of the solid support.
  • the SBE e.g., a polypeptide SBE
  • the surface of a bead e.g, a functionalized bead
  • a SBE can be non-covalently attached to a surface of the CU.
  • the SBE can comprise a binding moiety that can bind to a surface of the CU.
  • the CU can be a cell, and the SBE can comprise a binding moiety that can bind to an antigen present on the surface of the cell.
  • the SBE can comprise a ligand that binds to a surface receptor of the cell.
  • the SBE binding moiety can be an antibody moiety.
  • the CU can be a solid support, and the SBE can comprise a binding moiety that can bind to a surface of the solid support.
  • the SBE e.g, a polypeptide SBE
  • the SBE can comprise a binding moiety that can bind to a surface of a bead (e.g., a functionalized bead).
  • SBEs can be displayed as a part of glycosylphosphatidylinositol (GPI) anchored fusion protein.
  • the recombinant structure of the fusion protein can be optimized to promote interaction between objects and, in particular, between cellular CUs and target objects.
  • the fusion protein construct can include domains of common yeast flocculation proteins, e.g., S. cerevisiae Flol, Flo5, Flo9, FlolO and Flol 1.
  • the domains can include putative GPI associated moieties.
  • the domains can also include truncated fragments of the extracellular domains to increase the area of contact between adjoining objects, thereby increasing the strength of association between objects displaying complementary SBEs.
  • a SBE can bind to a cognate binding partner.
  • the cognate binding partner can be a signal object (SO).
  • the cognate binding partner can be another SBE.
  • binding of the SBE to the other SBE can modulate an activity of the CU.
  • binding of the SBE to the other SBE can modulate the ability of the CU to produce an SO.
  • binding of the SBE to the other SBE can modulate the ability of the CU to degrade an SO.
  • the SBE can comprises an SO.
  • binding of the SBE to the SO can modulate an activity of the CU.
  • binding of the SBE to the SO can modulate the ability of the CU to produce an SO.
  • binding of the SBE to the SO can modulate the ability of the CU to degrade an SO.
  • a SBE can be covalently attached to the surface of a sample object. In some embodiments, a SBE can be non-covalently attached to the surface of a sample object.
  • Engineered display of SBEs in various cellular systems has been disclosed in a number of publications, addressing a wide range of organisms ranging from phage and E. coli to yeast and general eukaryotic systems and also addressing protein folding, secretion, surface capture, and anchoring or tethering mechanisms.
  • an SBE can be a cellular receptor.
  • receptor SBEs can be localized to the cellular membrane or cytoplasm.
  • receptor SBEs can include a transcription factor or a transmembrane receptor.
  • a SBE can be located on the surface of a cellular CU with a signaling moiety located on the cytoplasmic side of the cellular membrane.
  • a cytoplasmic receptor can be any SO binding entity that can directly or indirectly lead to metabolic modulation, such as, but not limited to, a transcription factor or an aptamer that can change the transcription or translation rate of one or more genes.
  • a SBE can be a surface receptor or a transmembrane receptor protein.
  • a SBE can be a yeast receptor or its derivative (e.g., the signaling moiety of a yeast receptor fused to an engineered segment).
  • a SBE can include a heterologous signaling moiety fused to a yeast signaling moiety and can incorporate other modifications for improved activity.
  • the SBE can incorporate homologous and/or heterologous segments of G protein-coupled receptors (GPCRs) in yeast derived CUs.
  • GPCRs G protein-coupled receptors
  • Incorporated GPCRs can include modifications (e.g., compositions of extracellular, transmembrane, and cytoplasmic domains of GPCRs from different organisms and mutations improving their signaling properties) that can alter the receptors’ specificities, sensitivities, and signaling activities. Incorporated GPCRs can also include modifications (e.g, mutations or truncations in cytoplasmic domains) that can alter post- translational regulation of receptor activity (e.g., degradation, molecular interaction).
  • Nonlimiting exemplary yeast GPCRs can include S. cerevisiae pheromone receptors STE2 and STE3 or their derivatives (e.g., mutants with altered stability).
  • Non-limiting exemplary bacterial pheromone transcription factor proteins can include LuxR proteins that can sense bacterial pheromones N-acyl homoserine lactones.
  • a CU can comprise one or more SBEs.
  • a CU can comprise a first set of SBE and a second set of SBE.
  • a CU can be engineered to display the second set of at least one SBE upon interacting with at least one sample object of the plurality of sample objects.
  • the plurality of sample objects can interact with the plurality of CUs via a first set of at least one SBE.
  • the plurality of biological molecules can interact with the plurality of CUs via a second set of at least one SBE.
  • the plurality of CUs bound to each sample object of the plurality of sample objects can provide a physical barrier, preventing diffusion of the plurality of biological molecules upon permeabilizing of the sample object.
  • a CU can store information regarding past interactions with objects and external influences in its internal state.
  • the internal state of the CU can be not necessarily identical to the full state of the physical object as is defined by dynamical systems theory. Instead, the internal state contains information necessary to support and execute future computing actions.
  • the internal state can include continuous variables (e.g., ionic concentrations, permittivities, permeabilities, internal pressures, absorbances, rigidities), discrete variables (e.g., entity copy numbers, degrees of polymerization, set of molecular conformations, molecular modifications), as well as spatial distributions (e.g., compartmentalization of entities, polarization).
  • it can be convenient to define the state through probability distributions.
  • Internal states can be defined for both molecular and cellular CUs. Internal states of cellular CUs, can be aptly described by biochemical reaction network models. Physical manifestations of the states, as related to the present disclosure, can be copy numbers of certain molecular species, in particular regulatory molecular species (e.g., transcription factors, regulatory RNAs, transferases) that contribute to maintaining cellular homeostasis. In some embodiments, the physical manifestations of the state can include copy numbers of active or inactive transcription factors in select cellular compartments. Wildtype transcription factors or their regulators can be considered. Heterologous transcription factors modified for the given host organism can also be considered. In addition, novel transcription factors can be considered.
  • the internal state of the molecular CU can be set at time of production or through later interaction with objects or CUs.
  • a CU can implement mechanisms that change its state following signal recognition or external influence.
  • the effect of a mechanism can be predicted precisely (e. ., rapid and stable conformational change) or can have a stochastic nature (e.g., a change in an entity’s time averaged copy number).
  • the state change can happen immediately following interaction of the SBE with its cognate ligand or following external influence.
  • the SO can interact directly with a transcription factor or the transcription factor can undergo conformational changes as a result of a shift in e.g., temperature or illumination.
  • the state change can follow a transient internal process during which the state change can permeate but the effector process can be reset upon signal removal.
  • transient modification of e.g., transcription factor complexes following signal recognition can also serve in changing the internal state of a cellular CU.
  • a transcription factor can be modified directly by the SO.
  • a transcription factor can be modified as part of the transient process that can ensue following a signal recognition event.
  • activated transcription factors or associated elements can yield changes in transcription that can subsequently alter many other processes.
  • novel transcription factors can be used to transform signal recognition events into transcriptional changes.
  • novel transcription factors can allow pathway rerouting to promoters that can be independent of a native response.
  • novel transcription factors can enable further modulation or signal processing.
  • the internal state of the CU can affect intracellular entities that are not themselves part of the state.
  • the current state of the system can determine the copy numbers of the corresponding gene products and thereby the states of any entity those products can affect.
  • the affected entities can include SBEs.
  • the SBE profile of a CU can change as a result of a state change.
  • the affected entities can include SBEs and any elements that can relay signal recognition events to other parts of the cell.
  • the affected entities can include produced CUs or signal objects.
  • any of the entities affected through state change can be equally affected by signal recognition events or external influences.
  • entities can be introduced that are not affected by signal recognition events, external influences, or internal state changes. These entities can be taken from the same family of entities as the reporter entities (z.e., fluorescent proteins, luminescent proteins, enzymes, etc.) and can be used to generate control measurements to which other measurements can be compared.
  • reporter entities z.e., fluorescent proteins, luminescent proteins, enzymes, etc.
  • the CU state change can detect cell state changes, such as, but are not limited to, cell specific apoptosis, cell specific activation, cell specific suppression, or stem cell differentiation.
  • the cell state change is cell specific apoptosis.
  • the Apoptosis Reaction Tubes comprise Reaction Reagent.
  • the Apoptosis Reaction Tubes comprises strain Annexin, strain EGFR, and strain BARI (SEQ ID NOs: 1-3). Further description of apoptosis detection is described in Example 3.
  • the Reaction Medium is a composition comprising a semi-permeable nutritive medium.
  • a material property of the Reaction Medium can be an optical property, an electrical property, or a thermal property.
  • the Reaction Medium has thermoresponsive properties.
  • the components of the Reaction Medium can be configured such that one or more computational clusters can be formed upon coming in contact with the at least one sample object, wherein each computational cluster comprises, independently, the at least one CUs.
  • any components of the Reaction Medium described herein can be formulated with acceptable excipients, such as carriers, solvents, stabilizers, diluents, etc., depending upon a customized combination of CUs and target profile.
  • acceptable excipients can include, for example, carrier molecules that include large, slowly metabolized macromolecules such as proteins, polysaccharides, polylactic acids, polyglycolic acids, polymeric amino acids, amino acid copolymers, and inactive viral particles.
  • exemplary excipients can include antioxidants (for example and without limitation, ascorbic acid), chelating agents (for example and without limitation, EDTA), carbohydrates (for example and without limitation, dextrin, hydroxyalkylcellulose, and hydroxyalkylmethylcellulose), stearic acid, liquids (for example and without limitation, oils, water, saline, glycerol and ethanol), wetting or emulsifying agents, pH buffering substances, and the like.
  • the Reaction Medium can comprise gelatin, alginic acid sodium salt, yeast extract, peptone, D-glucose, and/or an antibiotic (c. ., tetracycline, doxycycline, ampicillin, etc.).
  • the Reaction Medium can comprise gelatin, alginic acid sodium salt, yeast extract, peptone, D-glucose, and ampicillin.
  • the Reaction Medium can include a crowding agent.
  • the crowing agent can be a hydrogel.
  • the crowding agent can include, but not limited to, polyethylene glycol (PEG), sucrose, urea, Ficoll, dextran, cellulose, chitosan, poly(lactic-co-glycolic acid), hydroxypropyl methylcellulose (HPMC), poly(N-isopropylacrylamide) (PNIPAAm), poly(2 -hydroxyethyl methacrylate) (PHEMA), poly(vinyl caprolactam) (PVCL), composite synthetic/proteinaceus hydrogel, such as PEG/PVA/PVP+gelatin, biopolymer hydrogel, such as chitosan, hyaluronic acid, silk fibroin, and their functionalized variants, and/or protein, such as BSA.
  • PEG polyethylene glycol
  • sucrose sucrose
  • urea Ficoll
  • dextran cellulose
  • chitosan poly(lactic
  • the crowding agent can be temperature responsive, such as, but not limited to, hydroxypropyl methyl cellulose (HPMC), poly(N-isopropylacrylamide) (PNIPAAm), poly(2-hydroxyethyl methacrylate) (PHEMA), poly(vinyl caprolactam) (PVCL).
  • HPMC hydroxypropyl methyl cellulose
  • PNIPAAm poly(N-isopropylacrylamide)
  • PHEMA poly(2-hydroxyethyl methacrylate)
  • PVCL poly(vinyl caprolactam)
  • the crowding agent can be temperature, pH, and/or osmolarity responsive, such as, but not limited to, composite synthetic/proteinaceus hydrogel (e.g., PEG/PVA/PVP+gelatin) and/or biopolymer hydrogel (e.g., chitosan, hyaluronic acid, silk fibroin, and their functionalized variants).
  • the Reaction Medium can be stored up to about 1 weeks, up to about 2 weeks, up to about 3 weeks, up to about 4 weeks, up to about 5 weeks, up to about 6 weeks, up to about 7 weeks, up to about 8 weeks, up to about 9 weeks, or up to about 10 weeks at room temperature without loss of signal. In some embodiments, the Reaction Medium can be stored up to about 2 weeks at room temperature without loss of signal.
  • the Reaction Medium can be stored up to about 1 weeks, up to about 2 weeks, up to about 3 weeks, up to about 4 weeks, up to about 5 weeks, up to about 6 weeks, up to about 7 weeks, up to about 8 weeks, up to about 9 weeks, up to about 10 weeks, up to about 11 weeks, up to about 12 weeks, up to about 13 weeks, up to about 14 weeks, up to about 15 weeks, or up to about 16 weeks at 4°C without loss of signal. In some embodiments, the Reaction Medium can be stored up to about 8 weeks at 4°C without loss of signal.
  • the readout reagent is a composition comprising a reporter entity.
  • the readout reagent further comprises a buffer.
  • a reporter entity can be any suitable molecular entity that can affect quantitative or qualitative measurements.
  • Non-limiting exemplary reporter entities can include fluorescent proteins (e.g., GFP, RFP, YFP, or CFP), luminescent proteins (c.g, luciferase, such as, but are not limited to, Gaussia princeps luciferase (GLuc), Metridia longa luciferase (MLuc), Renilla reniformis luciferase (RLuc), Cypridina noctiluca luciferase (CLuc)), enzymes (e.g, beta-lactamase, beta-galactosidase, SEAP), any functional fragments or variants thereof.
  • the origin of these reporters and the coding sequences are fully disclosed in the current state of the art.
  • signal recognition events, external influences, or state changes in the presently disclosed system can lead to production of reporter entities.
  • the produced reporter entities can be cytoplasmic.
  • the produced reporter entities can be secreted and linked to the surface.
  • the produced reporter entities can be secreted and released into the medium of the system described herein.
  • secretion of reporter entities can increase their accessibility or increase their reporting function.
  • entities can be introduced that are not affected by signal recognition events, external influences, or internal state changes. These entities can be taken from the same family of entities as the reporter entities (z.e., fluorescent proteins, luminescent proteins, enzymes, etc.) and can be used to generate control measurements to which other measurements are compared.
  • reporter entities z.e., fluorescent proteins, luminescent proteins, enzymes, etc.
  • a readout reagent comprises a luminescence substrate and a luminescence buffer.
  • the luminescence substrate and luminescence buffer can be mixed at 1 : 10 prior to assay readout.
  • the luminescence substrate and luminescence buffer can be mixed at 1 :20 prior to assay readout.
  • the luminescence substrate and luminescence buffer can be mixed at 1 :30 prior to assay readout.
  • the luminescence substrate and luminescence buffer can be mixed at 1 :40 prior to assay readout.
  • the luminescence substrate and luminescence buffer can be mixed at 1:50 prior to assay readout.
  • the luminescence substrate and luminescence buffer can be mixed at 1 :60 prior to assay readout. In some embodiments, the luminescence substrate and luminescence buffer can be mixed at 1 :70 prior to assay readout. In some embodiments, the luminescence substrate and luminescence buffer can be mixed at 1 :80 prior to assay readout. In some embodiments, the luminescence substrate and luminescence buffer can be mixed at 1 :90 prior to assay readout. In some embodiments, the luminescence substrate and luminescence buffer can be mixed at 1 : 100 prior to assay readout.
  • SOs unlike CUs which can be intended to generate new information, do not generate new information.
  • SOs can be intended to complement CUs by transferring information between entities.
  • a SO can be any object that can interact with a SBE.
  • an object can be both a SO and a CU.
  • a SO can be fused to a molecular entity that itself has signal processing functions. This can result in a SO that can interact with other SOs or CUs that can be recognized by SBEs.
  • SOs can be suspended in the medium before association with their respective SBEs.
  • the SO can be produced by a CU in the system.
  • the SO can be degraded by a CU in the system.
  • the SO can be produced by a sample object in the system.
  • SOs can be added to a medium.
  • SOs can be generated by system entities (e.g., CUs and/or sample objects).
  • SOs can be generated in the medium from other SOs.
  • SOs can be generated within CUs from elementary metabolic precursors (e.g., by protein expression).
  • SOs can be secreted.
  • secreted SOs can belong to families of yeast pheromones. Expression of yeast pheromones can then rely on the presence of either wildtype coding sequences or their recombinant derivatives introduced in tandem with appropriate regulatory sequences (e.g., promoters, untranslated regions, transcription factor binding sites).
  • yeast pheromones can belong to a family of lipidated pheromones of alpha factor type that can be expressed as precursors and secreted using non-traditional pathways.
  • yeast pheromones can belong to a family of peptide pheromones of the alphafactor type that can be also expressed as precursors but can be secreted using traditional pathways.
  • several modes of regulation can be used to control and activate biogenesis of both pheromone types.
  • Suspended SOs can have various properties that can affect their recognition.
  • a SO can be membrane permeable and hence free to interact with cytoplasmic SBEs.
  • a SO can be membrane impermeable and hence can require SBEs that can be exposed to the medium.
  • recognition of SOs can be inhibited by other mechanisms (e.g., hydrophobic sequestration) and can require additional treatments.
  • SOs can be metabolites (e.g., amino acids, carbohydrates, ions, etc.), antibiotics (e.g., tetracycline, doxycycline, ampicillin, etc.), and various synthetic compounds e.g., isopropyl beta-D-l-thiogalactopyranoside-IPTG, nitrocefin, anhydrotetracycline-aTc, toxins, etc.).
  • SOs can be biogenerated entities, e.g., signaling molecules of viral, bacterial, or mammalian origin.
  • Biogenerated signaling molecules can include, but are not limited to, bacterial acetylated homoserine lactones (AHL) and type 2 autoinducers, yeast mating pheromones, plant hormones, animal morphogens, and a wide array of mammalian hormones, cytokines, interleukins, chemokines etc. Biogenerated signaling molecules can also be engineered for altered specificity or function.
  • AHL bacterial acetylated homoserine lactones
  • type 2 autoinducers yeast mating pheromones
  • plant hormones e.g., cytokines, interleukins, chemokines etc.
  • Biogenerated signaling molecules can also be engineered for altered specificity or function.
  • SOs produced by CUs can be referred to as internal signals.
  • an SO can be an internal SO.
  • internal signals can be produced from other signals, e.g., by cleavage of existing signals or from metabolic precursors within cellular CUs.
  • internal signals can be engineered for altered behavior.
  • SOs or enzymes contributing to their productions can be expressed from recombined genes that comprise specific regulatory sequences (e.g., promoters, operators, untranslated regions).
  • signal behavior can be engineered through addition of synthetic open reading frame (ORF) elements.
  • ORF open reading frame
  • single codon changes can redirect the signal to non-native SBEs.
  • peptide signal degradation can be modulated by extending the terminal domains with a peptidase recognition tag.
  • peptide signals can be translated as pre-pro-peptides, where a series of additional cytoplasmic, periplasmic, or extracellular processing steps can be required to produce mature pheromones.
  • the pre-pro- peptide format can provide a platform for engineering signal activation, strength, and specificity and can therefore increase informational content of a single internal SO.
  • a SO can have multiple activity states, each of which can be characterized by its affinity for SBEs.
  • cleavage of pro-peptide sequences can transition SOs between the multiple activity states.
  • putative protease recognition sites within the propeptide sequence can be used to encode transitions between the multiple activity states.
  • the peptide signals can belong to or be derived from a family of lipidated or non-lipidated yeast pheromones (e.g., S. cerevisiae alpha mating factor).
  • yeast pheromone pro-peptide sequences can be engineered to increase or decrease the signal strength of a single pre-pro-peptide by varying the number of mature peptides encoded within a single ORF.
  • pheromone coding sequences can be flanked by protease recognition sites and repeated within a single ORF.
  • the yeast pheromone alpha factor can be translated as a pre-pro-peptide with up to about four mature peptide coding sequences, the number of which can be increased or decreased using the same flanking motifs established in the wildtype sequence.
  • the pro-peptide sequence can also be extended to increase the number of activity states.
  • additional sequences that can include recognition sites for non-native proteases can be inserted into wildtype sequence. By such modification, intermediate activity states can be produced, wherein the SOs recognizes SBEs that can be different from those recognized by the fully mature pheromone.
  • the mature pheromone coding sequence can also be altered.
  • mature pheromone sequences from one species can be exchanged for pheromone sequences from another species. Previous work has shown that crosstalk between pheromone-receptor pairs from different species can be negligible.
  • internal signals can also include signaling molecules from unrelated species. Such internal signals can exhibit properties desirable for some applications.
  • internal signals can be membrane permeable and hence detectable by simple mechanisms.
  • internal signals can provide complete orthogonality with respect to other signal objects in a medium (e.g., plant hormones).
  • SOs can be targeted towards SBEs of target objects in a medium.
  • signals can include mammalian cytokines, chemokines, interleukins, growth hormones, neuropeptides, etc.
  • yeast cells can provide a cellular platform wherein signals of various origins (e.g., bacterial, mammalian, or other higher eukaryotes) can be produced. In some embodiments, production of such signals can require additional metabolic engineering to enable necessary post-translational modifications and metabolic processing.
  • signals of various origins e.g., bacterial, mammalian, or other higher eukaryotes
  • internal signals can also exhibit properties that can enable their easy measurement by external devices.
  • Measurable signals can include molecules with high specificity that can be quantified directly (e.g., immunosorbent assay or polymerase chain reaction).
  • Measurable signals can also include molecules that can alter bulk properties (e.g., absorbance, fluorescence, etc.) of a medium comprising the presently described system.
  • easily measureable internal signal can primarily be used for readout of device states.
  • Non-limiting exemplary internal signals can include fluorescent proteins (e.g., GFP, RFP, YFP, or CFP), luminescent proteins (e.g., luciferase, such as, but are not limited to, Gctussia princeps luciferase (GLuc), Metridia longa luciferase (MLuc), Renilla reniformis luciferase (RLuc), Cypridina noctiluca luciferase (CLuc)), enzymes (e.g., beta-lactamase, betagalactosidase, SEAP), any functional fragments or variants thereof.
  • fluorescent proteins e.g., GFP, RFP, YFP, or CFP
  • luminescent proteins e.g., luciferase, such as, but are not limited to, Gctussia princeps luciferase (GLuc), Metridia longa luciferase (MLuc), Renilla reni
  • the internal signal classification does not bar a signal from also being classified as external. For instance, signals that can be produced within a medium comprising the presently described system, isolated, and manually returned to the medium at a later time can be both internal and external signals.
  • SOs that are not produced by CUs can be referred to as external signals.
  • an SO can be an external SO.
  • external signals can include synthesized molecules or purified biogenerated products.
  • external signal can include metabolites and related compounds or short peptides.
  • SOs can be added to a medium comprising the presently described system, and the added SOs can include transcriptional inducers (e.g., IPTG, aTc, AHL), classes of amino acids (e.g., aromatic amino acids), pheromones (e.g., alpha factor), etc.
  • transcriptional inducers e.g., IPTG, aTc, AHL
  • classes of amino acids e.g., aromatic amino acids
  • pheromones e.g., alpha factor
  • external signals can include optical signals that can illuminate a medium comprising the presently described system.
  • external signals can include magnetic signals that can magnetize system objects.
  • external signals can include electrical signals that can polarize object charge.
  • external signals can also be produced by entities in the input sample.
  • such signals can include a wide array of molecules that can be well-mixed throughout a medium comprising the presently described system.
  • such signals can be localized to specific target objects.
  • Well-mixed signals can serve to coordinate system wide computing functions. Localized signals can affect only those CUs that recognize the target object (i.e., CUs with SBEs that interact with SBEs displayed by the target object class).
  • Production of external signals can be constant in time or time-varying according to some predetermined pattern (e.g., exponentially decaying in time).
  • the logical operator modules can receive one or more input signals (e.g., sample objects) and generate one or more output signals (e.g., output SOs) based on the specified modules.
  • input signals e.g., sample objects
  • output SOs e.g., output SOs
  • a CU can affect internal SOs or external SOs.
  • a CU can affect a single or multiple signals directly (e.g., by producing, degrading, or transforming signals) or indirectly (e.g., by producing CUs).
  • effects on SOs can be constant or time-dependent, where changes can occur spontaneously, following a change in internal state, or during signal induction.
  • Signals can be affected while at least partially exposed to the medium of the present disclosure (i.e., Reaction Reagent + Reaction Medium).
  • a CU can catalyzes processing of signals.
  • Such a catalyst can be an enzyme that can nullify SOs by affecting their degradation or an enzyme that can change the specificity of the SO towards SBEs.
  • an inactive SO can be split into one or more active SOs recognized by specific SBEs.
  • the enzyme in the case of peptide signals or their derivatives, the enzyme can be a protease that can specifically or nonspecifically recognize and cleave the signal amino acid sequence.
  • purified proteases can be added to the medium of the present disclosure (i.e., Reaction Reagent + Reaction Medium) in an active or inactive form that can be cis or trans activated.
  • a wide variety of commercially available proteases can be used for this purpose.
  • heterologous, native, or engineered proteases can also be secreted by the CUs.
  • CUs can affect signals directly by secreting SOs into the medium.
  • SOs and cellular CUs can be secreted and linked to the CU surface.
  • SOs and cellular CUs can be secreted and released into the medium. Such secretion can be constant in time, induced by recognized signals, or induced following an internal state change.
  • intracellular SOs can be loaded in the CU any time prior to use (e.g., by electroporation or other disclosed methods for peptide transfection); however, in most cases, intracellular SOs can be synthesized by the CUs using available or engineered metabolic processes.
  • SOs can be produced from available metabolites by appropriate synthesizing enzymes.
  • polypeptide SOs can be produced by either heterologous or homologous gene expression, where the expression itself can be either constant in time or time-dependent with changes occurring either upon signal induction or internal state change.
  • SOs or potential CUs can be stored by the current CU for a period of time prior to secretion or secreted directly.
  • Storage of SOs and their conditional secretion can be accomplished by synthesis of object precursors and rapid secretion following final processing (e.g., addition of functional groups or release by cleavage of pre-domains), where the processing itself can be induced through regulated gene expression or post-translational activation of catalyzing agents.
  • biogeneration of SOs can require heterologous metabolic mechanisms.
  • the secreted SOs can be recognized by entities in the input sample, rather than being recognized by other CUs.
  • the CU can be metabolically engineered to produce and regulate the various metabolic factors that can be necessary to produce the SOs from the native metabolic precursors.
  • CUs can affect signals indirectly by producing CUs of the same or different type.
  • the capability of CUs to produce or degrade SOs can be enhanced or attenuated by binding of the SBE to its cognate binding partner.
  • the cognate binding partner can be a SBE associated with at least one CU.
  • the cognate binding partner can be a SBE associated with at least one sample object.
  • the cognate binding partner can be another SO.
  • a sample object can be associated with an SBE and can produce or degrade an SO.
  • the capability of the sample object to produce the SO can be enhanced or attenuated by binding of the SBE to its cognate binding partner.
  • the cognate binding partner can be a SBE associated with at least one CU.
  • the cognate binding partner can be a SBE associated with at least one sample object.
  • the cognate binding partner can be another SO.
  • a CU can be affected by internal or external SOs through interaction with a SBE.
  • a SBE can be specific for a single signal or multi-specific for a subset of signals with variable sensitivity for each member of the subset.
  • a SBE can recognize a signal through physical interactions, e.g., hydrogen bonds, van der Waals forces, hydrophobic interactions, and/or ionic interactions.
  • a SBE can display some change in activity once an interaction can be initiated.
  • signal interactions can activate or inhibit e.g., an enzymatic process of a molecular CU with a SBE moiety and an enzymatic function.
  • a CU can be a protease with a SBE located near its active site, where signal interactions can lead to protease inhibition.
  • signal interactions with a SBE does not necessarily prevent processing of the SO. For example, further processing of the signal interaction by another protease can reverse an inhibitory effect of the original signal.
  • a SBE can be located on the surface of a cellular CU with a signaling moiety located on the cytoplasmic side of the cellular membrane.
  • a SBE can be a surface receptor or a transmembrane receptor protein.
  • a SBE can be a cytoplasmic protein with a binding moiety.
  • a SBE can be a transcription factor with a sensory domain that can recognize membrane permeable SOs.
  • a CU can be configured to respond to an SO in the medium of the present disclosure (i.e., Reaction Reagent + Reaction Medium).
  • the CU can be configured to respond to an SO in the medium by enhancing the level of the SO in the medium.
  • the SO can be an internal SO that can be recognized by at least one CU.
  • the SO can be an internal SO that can be recognized by at least one sample object.
  • the SO can be an output SO, and the output signal can comprise the SO level.
  • a CU can producing an SO, and interaction of the SO with the CU can enhance the capability of the CU to produce the SO.
  • the CU can be configured to respond to the SO by producing another SO, wherein interaction of the other SO with another CU can enhance the capability of the other CU to produce the SO.
  • the CUs can form one or more clusters, also referred to herein as a computational cluster, in a medium (i.e., Reaction Reagent + Reaction Medium) comprising the presently described system, upon coming in contact with at least one sample object derived from an input sample.
  • the computational cluster can further comprise at least one sample object derived from the input sample.
  • the computational cluster can comprise one or more CUs, and optionally one or more sample objects.
  • the computational clusters can be formed after a sufficient amount of time in an incubator.
  • the sufficient amount of time can be about 2 hours, about 2.1 hours, about 2.2 hours, about 2.3 hours, about 2.4 hours, about 2.5 hours, about 2.6 hours, about 2.7 hours, about 2.8 hours, about 2.9 hours, about 3 hours, about 3.1 hours, about 3.2 hours, about 3.3 hours, about 3.4 hours, about 3.5 hours, about 3.6 hours, about 3.7 hours, about 3.8 hours, about 3.9 hours, about 4 hours, about 4.1 hours, about 4.2 hours, about 4.3 hours, about 4.4 hours, about 4.5 hours, about 4.6 hours, about 4.7 hours, about 4.8 hours, about 4.9 hours, about 5 hours, about 5.1 hours, about 5.2 hours, about 5.3 hours, about 5.4 hours, about 5.5 hours, about 5.6 hours, about 5.7 hours, about 5.8 hours, about 5.9 hours, or about 6 hours.
  • the sufficient amount of time can be about 3 hours. In some embodiments, the sufficient amount of time can be about 3.5 hours. In some embodiments, the sufficient amount of time can be about 4 hours. In some embodiments, the sufficient amount of time can be about 4.5 hours. In some embodiments, the sufficient amount of time can be about 5 hours.
  • the computational cluster can be formed when the CUs can associate with sample objects by diffusion of the CUs and the sample object in a medium comprising the presently described system. In some embodiments, the computational cluster can be formed when a force can be applied to a medium comprising the presently described system, such that the CUs and the sample objects can be placed in close proximity. In some embodiments, the force can be a centrifugal force. In some embodiments, the centrifugal force can be continuous.
  • the continuous centrifugal force can be applied by twisting the reaction tube around its z-axis between spin-downs, wherein such twisting of the reaction tubes can create a sliding effect along the reaction tube wall, promoting better interactions between Sample Objects and CUs.
  • the force can be a magnetic force.
  • the force can be an electrostatic force.
  • the computational cluster can be formed when the CU can be configured to respond to an SO present in a boundary layer around the cluster by producing another SO that can be not restricted to the boundary layer.
  • the computational cluster can comprise a first CU
  • the presently described system can further comprise a second CU that can be not localized to the cluster, wherein the second CU can be configured to degrade an SO produced by the first CU.
  • the CUs of the present disclosure can enable a rich set of behaviors.
  • Standard mathematical notation can be introduced to precisely describe certain orchestrated functionalities of CU compositions.
  • Functions which can be made up in part with arithmetic operators and logical operators, can be encoded and evaluated by the CUs and signals that interact and selforganize within the medium of the present disclosure (i.e., Reaction Reagent + Reaction Medium).
  • the input that drives the computation can be the sample objects derived from an input sample.
  • the computation of the present disclosure can implement a rich set of operations that can include all forms of combinational logic and sequential logic.
  • the fundamental configurations provided below can be the building blocks that can form a functionally complete system of Boolean operations.
  • the internal states of the CUs jointly make up the composite system state that can condition future Boolean operations for sequential logic functions.
  • the presently described systems and the methods of using the systems utilize Boolean logical operations.
  • the logical operator modules of the present disclosure can act as logical circuits that can perform logical operations.
  • a non-limiting exemplary logical operator module can receive one or more input signals (e.g., sample objects) and can generate one or more output signals e.g., output SOs).
  • a logical operator module can comprise one or more CUs.
  • a logical operator module can comprise at least one sample object and at least one CU.
  • a logical operator module can comprise at least one sample object and two or more CUs.
  • a logical operator module can generate one or more output signals.
  • the methods of the present disclosure can comprise at least one logical operator modules.
  • a logical operator module can comprise a YES gate, an AND gate, a NAND gate, an OR gate, a NOR gate, a XOR gate, a XNOR gate, a NOT gate, or any combination thereof.
  • a logical operator module can operate as a YES gate, wherein the YES gate can comprise generating one or more output signals only when both a first CU and a second CU can interact with at least one sample object.
  • a logical operator module can operate as an AND gate, wherein the AND gate can comprise generating one or more output signals only when both a first CU and a second CU can interact with at least one sample object.
  • a logical operator module can operate as a NAND gate, wherein the NAND gate can comprise suppressing or diminishing one or more output signals when both a first CU and a second CU can interact with at least one sample object.
  • a logical operator module can operate as an OR gate, wherein the OR gate can comprise generating one or more output signals when either a first CU or a second CU or when both the first CU and the second CU can interact with at least one sample object.
  • a logical operator module can operate as a NOR gate, wherein the NOR gate can comprise generating one or more output signals when both the a CU and a second CU can be not interacting with at least one sample object.
  • a logical operator module can operate as a XOR gate, wherein the XOR gate can comprise generating one or more output signals when either a first CU or a second CU but not both CUs can interact with at least one sample object.
  • a logical operator module can operate as a XNOR gate, wherein the XNOR gate can comprise generating one or more output signals when either both a first CU and a second CU or when both the first CU and the second CU can interact with at least one sample object.
  • a logical operator module can operate as a NOT gate, wherein the NOT gate can comprise suppressing or diminishing one or more output signals when a first CU can interact with at least one sample object.
  • Negation can be an essential aspect of any general Boolean network since it can allow for recognition of absent SBEs. Negated implication can be a negation of all outputs so that an output can be true only if first input can be true and second input can be false. To compute a negated implication, the configurations can be further extended with signal degrading objects. In some embodiments, signal degradation can be achieved by target associated CUs. This can comprise secretion of signal degrading enzymes or display of signal degrading enzymes by target associated CUs.
  • FIG. 6 is a non-limiting model architecture schematic for the design and training of transformer models for omics analysis, according to one or more embodiments herein.
  • a transformer-based model can be selected and trained for spatial profiling of singlecell transcriptomes.
  • other models can be used, such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), or graph neural networks (GNNs), depending on the specific requirements and characteristics of the data.
  • CNNs convolutional neural networks
  • RNNs recurrent neural networks
  • GNNs graph neural networks
  • the training data construction process for the transformer-based model can involve several steps.
  • the GSE158055 dataset can be utilized to generate training dataset.
  • the GSE158055 dataset comprising Unique Molecular Identifier (UMI) vectors from blood samples of COVID-19 patients
  • UMI Unique Molecular Identifier
  • each training input can be generated from one scRNA-seq sample to focus on biological differences. Generating training inputs from individual scRNA-seq samples can ensure that the model learns from the inherent biological variability present within each sample rather than confounding batch effects that can arise if multiple samples were mixed. This approach can allow the model to identify meaningful patterns and relationships between cells based on their gene expression profiles.
  • Batches of training inputs can be created, each including UMI vectors of K cells, assignments of N computing units (CUs) to these cells, and UMI vectors generated for each CU.
  • the assignment of CUs to cells can be represented by an adjacency matrix, ensuring random, independent, and equal probability assignment of UMIs to CUs.
  • different methods of CU assignment and sampling can be employed to explore alternative training dynamics.
  • Validation and test inputs can be similarly constructed to maintain consistency.
  • the construction of training data can involve generating training inputs that can include UMI matrices and can be associated with adjacency matrices to represent the assignment of computing units (CUs) to cells.
  • each training input can include a UMI matrix representing UMI counts for the number of interactions between a predetermined number of computing units (CUs) and a plurality of biological molecules.
  • the training input can be associated with an adjacency matrix that represents known proximal relationships among CUs.
  • the UMI matrices representing UMI counts for CUs can be used as inputs.
  • the adjacency matrix can indicate spatial relationships between CUs, which can then be used to derive spatial relationship between sample objects (e.g., cells).
  • the use of binary values in the adjacency matrix can ensure clarity in defining these interactions.
  • the adjacency matrix can be constructed to maintain consistency across training, validation, and test inputs. This consistency can be crucial for ensuring that the model training, validation, and testing processes can be comparable and reliable.
  • the random and independent assignment of UMIs to CUs can be employed to prevent bias and ensure that the training inputs are representative of the underlying biological data.
  • different methods of CU assignment and sampling can be explored to investigate alternative training dynamics. For example, weighted random assignment based on cell size or type, or stratified sampling to ensure diverse representation of cell types and states, can be utilized. These alternative methods can provide insights into the impact of different training dynamics on the model’s performance.
  • validation and test inputs can be constructed similarly to the training inputs to maintain consistency. This can include using the same pre-processing steps for the raw data, ensuring random and independent assignment of UMIs to CUs, and maintaining the same structure for UMI matrices and the associated adjacency matrices. This consistency can be essential for accurately evaluating the model’s performance and ensuring its generalizability.
  • post-processing and quality control measures can be applied to validate the quality of the constructed training inputs. For example, checking the adjacency matrices for correctness in representing cell interactions, verifying the randomness and independence of UMI assignments, and ensuring the training inputs accurately reflect the biological and spatial characteristics of the dataset can be part of these measures.
  • the constructed training, validation, and test datasets can be maintained in a structured format for easy access and reproducibility.
  • the numbers K and N can be predetermined.
  • the K represents the number of sample objects
  • the N represents the number of computing units (CUs).
  • the UMI vector X t E R G where X t can be equal to the number of UMIs corresponding to gene j attributed to computing unit i, can be constructed by sampling the UMI vector of the sample object to which the computing unit can be assigned. This can be done by randomly partitioning the UMIs of the sample object amongst the computing units assigned to the sample object, such that:
  • each UMI can be attributed to a single computing unit
  • each UMI can have equal probability of being attributed to each relevant computing unit.
  • the UMI vectors of sample objects and computing units can be also expressed as matrices T E R KxG and X E R NxG , respectively.
  • a training input can include three elements: (a) UMI vectors of K cells (sample objects) randomly selected from the training dataset, (b) the assignment of N computing units to K sample objects, and (c) UMI vectors generated for each computing unit.
  • the assignment process of computing units to sample objects can be different than described above, which can be no longer surjective.
  • the numbers K and N can be predetermined.
  • the sample objects can be randomly connected with each other, described by an adjacency matrix representing these connections.
  • the assignment of computing units to sample objects can be done in two steps: first, selecting one sample object for each computing unit with equal probability; and second, partitioning the computing units assigned to the same sample object into multiple sets, which can also be assigned to neighboring sample objects.
  • the UMI vector for each computing unit can be constructed by sampling the UMI vector of the sample object to which the computing unit can be assigned, ensuring each UMI can be attributed to a single computing unit independently and with equal probability.
  • the UMI vectors of sample objects and computing units can be also expressed as matrices.
  • validation and test inputs can be constructed from their respective datasets to maintain consistency and reliability in evaluating the model’s performance.
  • the numbers K and N can be fixed.
  • the assignment of computing units to sample objects can be described by the adjacency matrix Y E R NxN where Y t j — 1 if computing unit i and computing] can be assigned to the same sample object and Y ⁇ — 0 otherwise.
  • the assignment can be calculated in two steps. First, with equal probability, exactly 1 sample object can be selected for each computing unit.
  • the UMI vector X t E R G can be constructed by sampling the UMI vector of the sample object to which the computing unit can be assigned. This can be done by randomly partitioning the UMIs of the sample object amongst the computing units assigned to the sample object, such that (a) each UMI can be attributed to a single computing unit, (b) the attribution of any one UMI can be independent of all other UMIs, and (c) the probability of a UMI being attributed to a CU can be equal to , where c can be the number of sample objects to which the CU can be assigned and a can be the normalization factor.
  • the UMI vectors of sample objects and CUs can be also expressed as matrices T E R KxG and X E R NxG , respectively.
  • the model architecture for subcellular resolution by associative lattice can include multiple components designed to process and analyze the UMI matrices and adjacency matrices associated with the training inputs.
  • the architecture can utilize transformer blocks, sigmoid activation functions, and binary cross-entropy loss functions to achieve high accuracy in recognizing biological molecule patterns and spatial relationships.
  • the input layer can accept UMI matrices X and optionally estimated adjacency matrices Y as inputs.
  • the adjacency matrix estimate can be omitted.
  • the UMI matrix X 6 R NxG can represent the UMI counts for the interactions between computing units (CUs) and biological molecules, while the adjacency matrix Y can capture the proximal relationships among CUs.
  • the model can include an embedding layer to transform the high-dimensional UMI vectors into lower-dimensional representations. This step can involve the use of linear transformations or other dimensionality reduction techniques.
  • the core of the model architecture can consist of multiple transformer blocks.
  • Each transformer block can include self-attention mechanisms and feed-forward neural networks.
  • the self-attention mechanisms can allow the model to weigh the importance of each element in the input sequence, capturing the complex dependencies between CUs and their associated biological molecules.
  • each transformer block can compute attention scores based on the UMI matrix and the adjacency matrix, facilitating the recognition of spatial relationships and molecular interactions.
  • the model can utilize sigmoid activation functions to introduce non-linearity into the network.
  • the sigmoid function can map the input values to the range (0, 1), making it suitable for binary classification tasks, such as determining the presence or absence of specific molecular interactions.
  • the output layer can produce the final predictions based on the processed input data.
  • the output can be a binary classification indicating whether a specific interaction or spatial relationship exists.
  • the output layer can use a sigmoid activation function to generate probabilities, which can then be thresholded to produce binary outcomes.
  • the model can be trained using a binary cross-entropy loss function. This loss function can measure the difference between the predicted probabilities and the actual binary labels, guiding the optimization process to minimize prediction errors.
  • the loss function can be defined as:
  • post-processing steps can be applied to the output of the machine learning models to improve the accuracy and interpretability of the results. These steps can validate the model’s predictions and make the findings accessible and useful for further biological analysis.
  • the model output can be further processed to obtain estimates of sample object UMI vectors.
  • the output matrix can be used to partition the set of computing units into a number of disjoint sets. This partition can be represented by the estimated adjacency matrix. For instance, the partitioning can be performed by agglomerative clustering with full linkage, where the number of clusters can be either known a priori or determined by optimization of a secondary criterion, such as the average number of computing units per cluster.
  • Post-processing steps can include verification of the accuracy of the model’s predictions. This can involve comparing the predicted interactions with known biological data or experimental results to assess the model’s performance. In some embodiments, statistical measures such as precision and recall can be used to evaluate the accuracy of the predictions. These metrics can help in identifying any discrepancies and refining the model further. In some embodiments, post-processing can involve consistency checks to verify that the model’s predictions can be reproducible and stable across different datasets or experimental conditions. Post-processing can culminate in the generation of comprehensive reports that document the findings.
  • reports can include detailed descriptions of the predicted interactions and clusters.
  • the reports can also include annotations and interpretations of the findings, providing context and insights into the biological significance of the results.
  • the processed data can be stored in a structured format to ensure easy access and reproducibility. This can include maintaining records of the post-processing steps, parameters used, and the resulting outputs .
  • the data can be made accessible through online platforms or databases, facilitating further analysis and collaboration among researchers.
  • the framework can enhance the accuracy and interpretability of subcellular interactions and spatial relationships, providing valuable insights into the underlying biological processes.
  • the purpose of the clustering can be to obtain more relevant biological information regarding the sample objects.
  • relative gene expression of two or more genes can change when UMIs from computing units belonging to a single cluster can be considered in aggregate.
  • other ways of interpreting the data can include predicting the number of cells, as the number of cells can also be an output of the model. Training the model can provide both regression for the number of clusters and predictions for the number of clusters. Therefore, the output can include the model output and the number of cells. This post-processing approach is illustrated in Example 6.
  • the model output T can be further processed to obtain estimates of sample object UMI vectors T E R K * G .
  • the output matrix F can be used to partition the set of computing units into K number of disjoint sets. This partition can be represented by the estimated adjacency matrix Y E R N xN .
  • the partitioning can be performed by agglomerative clustering with full linkage, where the number of clusters can either be known a priori or determined by optimization of a secondary criterion, such as the average number of computing units per cluster.
  • the UMI vectors of computing units in cluster z can be summed into a single vector Tt.
  • the relative reconstruction error (RRE) can be calculated as a measure of the reconstruction quality.
  • the model output can be further processed to estimate the relationships between sample objects.
  • the set of computing units (CUs) can be partitioned into a predetermined number of disjoint sets using clustering techniques. This partitioning can be represented by an estimated adjacency matrix.
  • the clustering can be performed by agglomerative clustering, where the number of clusters can be determined by a predefined criterion such as the number of computing units per cluster.
  • a heuristic approach can then be used to compute the estimated sample object adjacency matrix, which can indicate the relationships between these disjoint sets.
  • the adjacency matrix can represent the likelihood that sets of computing units are assigned to the same sample object based on their relationships.
  • a ROC curve can be generated, indicating the model’s performance in predicting connections between sample objects.
  • the optimal threshold for maximizing the accuracy of these predictions can be identified, enhancing the model’s ability to accurately reflect the underlying biological interactions.
  • a model output Y can be further processed to obtain the estimated Sample Object adjacency matrix.
  • the output matrix Y can be used to partition the set of computing units into K number of disjoint sets. This partition can be represented by the estimated adjacency matrix Y E R N xN .
  • the partitioning can be performed by agglomerative clustering with full linkage, where the number of clusters can be either known a priori or determined by optimization of a secondary criterion, such as the number of computing units per cluster.
  • a simple heuristic can then be used to compute the estimated Sample Object adjacency matrix W E R RXR .
  • VF i ; - 1 where the summation can be over all Computing Units l t in the ith set and Computing Units rrij in the /'th set, and /? > 0.
  • the ROC curve of the prediction of edges between sample objects can then be analyzed, with a mark showing the optimal value of ? maximizing the accuracy of predicting whether a given pair of sample objects are connected or not.
  • a model output Y can be further processed to obtain the estimated sample object adjacency matrix.
  • the output matrix Y can be used to partition the set of computing units into K number of disjoint sets. This partition can be represented by the estimated adjacency matrix.
  • the partitioning can be performed by agglomerative clustering with full linkage, where the number of clusters can be either known a priori or determined by optimization of a secondary criterion, such as the number of computing units per cluster.
  • a simple heuristic can then be used to compute the estimated sample object adjacency matrix.
  • the ROC curve of the prediction of edges between sample objects can then be analyzed, with a mark showing the optimal value of maximizing the accuracy of predicting whether a given pair of sample objects are connected or not.
  • data validation steps can be incorporated to ensure the accuracy and reliability of the machine learning model’s performance.
  • Data validation can involve checking the integrity, consistency, and accuracy of the data used for training, validation, and testing the model.
  • the distribution of UMI counts per sample object across the validation dataset can be examined, with a median UMI count per sample object and a total number of cells. This distribution can provide insight into the overall molecular content captured within the dataset, which is essential for assessing the robustness and representativeness of the model.
  • the distribution of UMIs allocated to each computing unit across the validation inputs can be analyzed.
  • the significantly lower number of UMIs per computing unit (with 95% of computing units having less than 240 UMIs) compared to the number of UMIs per sample object can indicate the low informational content and sparsity of the computing unit UMI vectors. This sparsity can impact the model’s ability to accurately capture and reconstruct the underlying biological information.
  • the distribution of the relative reconstruction error (RRE) per sample object across the validation inputs can also be evaluated.
  • Example 5 provides a visual representation of the data validation results, illustrating the distribution of UMI counts, UMIs per computing unit, and relative reconstruction error (RRE) across the validation dataset.
  • RRE relative reconstruction error
  • FIG. 9 provides an exemplary visualization of the cell type confusion matrix associated with the input data from, for example, the dataset GSE158055, which include a broad set of cell annotations, allowing for a comprehensive analysis of cell type classification accuracy.
  • the confusion matrix can indicate the probability that a computing unit from a sample object on the y-axis is estimated to be from a sample object on the x-axis.
  • Diagonal components of the matrix can list the probabilities that the origin of computing units is correctly estimated within the given cell type. High values along the diagonal indicate a high accuracy of correctly identifying the cell type of the computing units, reflecting the model’s ability to accurately classify cell types based on the input data.
  • Non-zero, off-diagonal components of the confusion matrix illustrate mismatches between cell types.
  • NK cells and CD8 T cells can lead to misclassifications, due to errors in the estimated CU adjacency matrix Y, which can be represented by the off-diagonal values. These mismatches highlight areas where the model can need improvement in distinguishing between similar cell types.
  • the matrix can provide a clear and quantifiable measure of the model’s performance in classifying cell types, which is crucial for validating the accuracy and reliability of the model’s predictions.
  • FIG. 13 is another exemplary confusion matrix, listing statistics of computing unit associations for samples comprising B lymphocytes.
  • the results shown in Example 7 illustrate that computing units in a highly connected cluster can be likely associated with the same cell, validating the model’s accuracy in predicting cell-cell interactions. They also demonstrate how the model can use mRNA transcript data to associate computing units with their respective sample objects, thereby providing insights into cell-cell interactions and improving the estimation of mRNA transcripts for individual cells in the sample.
  • a system for biological computing including: (a) a plurality of sample objects, wherein a sample object of the plurality of sample objects can include a plurality of biological molecules; and (b) a plurality of CUs, wherein a CU of the plurality of CUs can be engineered to interact with the sample object and display a first set of at least one SBE upon interacting with the sample object.
  • a first CU of the plurality of CUs can interact with a first sample object of the plurality of sample objects a first number of times, the first CU can interact with a second sample object of the plurality of sample objects a second number of times, a second CU of the plurality of CUs can interact with the first sample object a third number of times, and the second CU can interact with the second sample object a forth number of times.
  • the first number and the third number can be indicative of a characteristic of the first sample object
  • the second number and the forth number can be indicative of a characteristic of the second sample object.
  • the CU of the plurality of CUs can be further displays a second set of at least one SBE.
  • the system can further include (c) a plurality of molecular tags, which is described in detail above.
  • the plurality of molecular tags can associate with the second set of at least one SBE.
  • a molecular tag of the plurality of molecular tags can include a hash element and a priming element.
  • the priming element can be a random N-mer.
  • the molecular tag can further include a recognition element.
  • the molecular tag can further include a sequencing element.
  • the molecular tag can further include a unique protein binding domain.
  • the unique protein binding domain can be an antibody binding domain.
  • the antibody binding domain can be protein G.
  • the system can further include a strand displacing polymerase.
  • the CU of the plurality of CUs can be engineered to display the first set of at least one SBE and the second set of at least one SBE.
  • a first molecular tag of the plurality of molecular tags can associate uniquely with a first SBE of the second set of at least one SBE
  • a second molecular tag of the plurality of molecular tags can associate uniquely with a second SBE of the second set of at least one SBE
  • a system for biological computing of the presently disclosed methods comprising: (a) at least one sample object derived from an input sample; (b) a Reaction Reagent comprising at least one computing unit (CU) configured to interact with the at least one sample object; (c) a Reaction Medium; and (d) a readout reagent comprising a reporter entity and a reporter buffer.
  • the system can further comprise one or more evaluation strips and a plate adopter. In some embodiments, the system can further comprise one or more 96-well plates and one or more microcentrifuge tubes. In some embodiments, the system can further comprise a thermocycler. In some embodiments, the system can further comprise a plate reader.
  • reactions can be analyzed separately or in bulk on any plate reader with readout reagent (e.g., luminescence) functionality.
  • the plate reader with luminescence functionality can have sensitivity about ⁇ 100 amol ATP.
  • the read setting of the plate reader with luminescence functionality can be endpoint/kinetic read type, luminescence fiber optics type, 255 gain, 1 s integration time, autoadjust or maximum read height, custom (defined by user) plate type, and 5 reads (average used as the final readout values).
  • the presently described systems can maintain single target sensitivity in the presence of high background. In some embodiments, the presently described systems can exhibit exceptional false positive rate characteristics. In some embodiments, the presently described system can be applied in targeted sequestering of nucleic acids for multiomic data collection. In some embodiments, the presently described system can be used for inducible targeted sequestering of RNA on computing units. See Example 2. In some embodiments, the presently described system can be used for hashed transcriptomic readout of target objects. See Example 3
  • the present disclosure implements a computing device.
  • the presently disclosed methods can be organized in a computing architecture that can cast broad terms from computer science into the device domain.
  • Standard computer architectures e.g., von Neumann or Harvard architectures
  • More detailed information of the biological computing architecture and its implementation comprising the presently described entities are described in International Patent Application No. PCT/US2019/50068 and U.S. Patent Publication No. US2021-0319279, each of which are incorporated by reference in its entireties for all purposes.
  • a system for spatial profiling an input sample including: (a) the input sample comprising a plurality of sample objects comprising a plurality of biological molecules; (b) a plurality of CUs, wherein each CU of the plurality of CUs can be engineered to display a first set of at least one surface-bound entity (SBE) and a second set of at least one SBE on its surface; (c) a plurality of molecular tags; and (d) a computer implemented method, which is described in detail supra.
  • the system can further include a sequencing method.
  • a clustering reaction can be used to ensure SBE recognition can be completed for biological computing.
  • object sizes e.g., cellular CUs
  • exogenous forces can be necessary to achieve sufficient mixing of objects. For example, a 1 pm particle can diffuse 1000 times slower than a 1 nm particle.
  • the clustering reaction can use other mechanisms besides diffusion to promote interaction between SBEs.
  • the clustering reaction must minimize the effects that these mechanisms have on signaling. Forced interaction between objects can resemble object co-localization, which can be the basis for most computational operations. Hence, the clustering reaction must also implement mechanisms that prevent signaling between objects.
  • the clustering method of the present disclosure can simultaneously force interaction and block signaling.
  • system objects i.e., sample objects and CUs
  • Combination of objects can be done all at once or progressively.
  • the reaction can commence with resuspension of objects in the medium of the present disclosure, which can be optimized for specific clustering (e.g., low viscosity, high ion content, containing blocking agents or solubility agents).
  • the medium can also contain signal blockers (e.g., binding compounds or signal degrading enzymes) or metabolic suppressors (e.g., translation inhibitors).
  • the medium can be depleted of essential metabolites (e.g., amino acids, carbon sources) required for high levels of signal production.
  • the reaction can proceed by forcing object interaction (e.g., hydrodynamic mixing, electromagnetic manipulation, mechanical compression).
  • object interaction e.g., hydrodynamic mixing, electromagnetic manipulation, mechanical compression.
  • object properties e.g., hydrodynamic mixing, electromagnetic manipulation, mechanical compression.
  • Hydrodynamic mixing can be appropriate in mixtures comprising objects of various masses or size.
  • Electromagnetic manipulation can be applicable in mixtures where target objects or CUs can be magnetizable.
  • Objects can be magnetized by linkage (e.g., covalent or non-covalent bonding) to magnetizable entities or by loading (e.g., by electroporation, diffusion, carrier particles) the objects with magnetizable particles.
  • the clustering reaction can be performed in a one-time batch process, continuously throughout the computing process (e.g., in a reactor), or repeated at multiple times during the computing process. In each execution, the results of the reaction can be clusters that can co-localize CUs.
  • the signal production can be binary, that is, either a signal can be transmitted or signal can be not transmitted.
  • the signaling processes e.g., signal production, signal degradation, signal recognition, state transition, etc.
  • the evaluation reaction can initiate or terminate signal production and regulate chemical or spatial conditions. The primary goal can be to protect cluster integrity and to ensure completion of relevant physiological and enzymatic processes.
  • the evaluation and clustering reactions can occur simultaneously.
  • the evaluation reaction can be explicitly initiated following the clustering reaction.
  • the two reactions can share the same medium but remain separate through changes in temperature, object density, or illumination.
  • separation of the two stages can include a medium exchange.
  • the medium of the clustering reaction can be optimized for interaction (e.g., low viscosity, increased ion content, added blocking entities and solubility agents)
  • the medium of the evaluation reaction can include enriched signaling precursors.
  • the medium can be enriched in metabolites that can support signal production and signal recognition.
  • synthetic growth media can be optimized for CU metabolisms and enriched with inducers of wild type or recombinant cellular systems.
  • the properties of the media e.g., pH, salt content
  • can be optimized for buffering and for any extracellular enzymatic processes e.g., proteolysis, hydrolysis).
  • I l l Spatial arrangement of the evaluation reaction can affect diffusion. Convective diffusion can weaken signal strength requiring more sensitive SBEs that can be more error prone. Hence, the evaluation reaction can take precautions to limit object motion. For instance, medium viscosity can be increased by addition of certain polysaccharides or other polymers. In addition, the additives can be cross-linked, forming a structured matrix that can trap and immobilize system objects. The result can be greater signal accumulation and decreased signal transmission between clusters.
  • the system can further comprise an agent that can increases the viscosity of a medium comprising the system.
  • the agent can be a polymer.
  • the polymer can be a polysaccharide.
  • the agent can be cross-linked to form a matrix configured to immobilize one or more of the system components.
  • kits for biological computing including (a) a plurality of CUs, wherein a CU of the plurality of CUs can be engineered to interact with the sample object and display a first set of at least one SBE upon interacting with the sample object; and (b) an instruction for use of the kit.
  • the kit can further include (c) a plurality of molecular tags, which is explained in greater detail above.
  • kits for biological computing including (a) a plurality of CUs, wherein a CU of the plurality of CUs can be engineered to interact with the sample object and display a first set of at least one SBE upon interacting with the sample object; (b) one or more reaction tubes including Reaction Reagent and one or more master tubes including Reaction Medium; and (c) an instruction for use of the kit.
  • the kit can also include Reaction Reagent and one or more master tubes containing Reaction Medium.
  • the CU of the plurality of CUs can be further engineered to display a second set of at least one SBE.
  • a molecular tag of the plurality of molecular tags can include a hash element and a priming element.
  • the priming element can be a random N-mer.
  • the molecular tag can further include a recognition element.
  • the molecular tag can further include a sequencing element.
  • the molecular tag can further include a unique protein binding domain.
  • the unique protein binding domain can be an antibody binding domain.
  • the antibody binding domain can be protein G.
  • the kit can further include a strand displacing polymerase.
  • kits that contains any of the above described system components and compositions described herein.
  • a kit for biological computing comprising: (a) one or more reaction tubes comprising Reaction Reagent; (b) one or more master tubes comprising Reaction Medium; (c) a readout reagent comprising a reporter entity and a reporter buffer; and (d) an instruction for use of the kit.
  • the kit can further comprise one or more evaluation strips and a plate adapter.
  • the kit can further comprise a blank sample.
  • the blank sample can be prepared with the Reaction Medium and the Reaction Reagent of the present disclosure and can constitute a reaction tube with resuspension buffer used in a place of a test sample.
  • sample readout signal can be normalized using the following formula:
  • the plate adapter can accommodate up to 12 evaluation strips at once.
  • the plate adapter prior to first use, can be defined as a standard 96- well plate with adjusted parameters of 22000 pm plate height and 5000 pm well diameter.
  • quantity of Reaction Reagent in the reaction tube can determine the processing power of a reaction.
  • the active component of the presently described kit is supplied in dried form in the reaction tube.
  • the kit can contain evaluation strips.
  • the evaluation strips can be a set of standard PCR strips.
  • up to 8 tubes can be used to analyze a reaction.
  • black and white PCR strips can be alternatively used, but read setting may need to be adjusted to account for the shift in signal intensity.
  • kits for spatial profiling an input sample comprising a plurality of sample objects comprising a plurality of biological molecules
  • the kit comprising: (a) a plurality of CUs, wherein each CU of the plurality of CUs can be engineered to display a first set of at least one surface-bound entity (SBE) and a second set of at least one SBE on its surface; (b) a plurality of molecular tags; and (c) an instruction for use of the kit.
  • the kit can further include an instruction for analysis using a computer implemented method, which is described in detail supra.
  • the instructions for practicing the presently described methods can be generally recorded on a suitable recording medium.
  • the instructions can be printed on a substrate, such as paper or plastic, etc.
  • the instructions can be present in the kits as a package insert, in the labeling of the container of the kit or components thereof (/ ., associated with the packaging or sub-packaging), etc.
  • the instructions can be present as an electronic storage data fde present on a suitable computer readable storage medium, e.g. CD-ROM, diskette, flash drive, etc.
  • the actual instructions are not present in the kit, but means for obtaining the instructions from a remote source (e.g. via the Internet), can be provided.
  • An example of this embodiment can be a kit that includes a web address where the instructions can be viewed and/or from which the instructions can be downloaded. As with the instructions, this means for obtaining the instructions can be recorded on a suitable substrate.
  • This experiment is performed for targeted transcriptome readout with the presently described systems and methods.
  • a non-limiting exemplary work-flow of this experiment is described in FIG. 1.
  • Yeast cells are engineered to comprise at least one logical operator module as presently described herein, wherein the at least one logical operator module recognizes a given marker combination.
  • the yeast cells are further engineered to conditionally display either an RNA or DNA binding protein based on the activation of the at least one logical operator module.
  • One or more reaction tubes comprising Reaction Reagent is prepared with the engineered yeast cells and an input sample (e.g., human cells) with spiked-in targets (with a known marker combination).
  • the one or more reaction tubes are incubated with Reaction Medium and FIG. 1 (see Incubation step), to enable activation of the at least one logical operator module on the target.
  • Activation of the at least one logical operator module triggers expression and display of RNA/DNA binding protein on the engineered yeast cells interacting with the targets, as shown in FIG. 1 (see Target specific display step).
  • the input sample cells are then lysed to release nucleic acid into the medium.
  • Genomic DNA is further enzymatically fragmented, and the reaction is incubated in a crowding agent rich environment that enables local binding of nucleic acids to the respective binding proteins, as shown in FIG. 1 (see Lysis of target cell & Incubation step).
  • the reaction is washed to dilute out unbound nucleic acids, as shown in FIG. 1 (see Wash out unbound step).
  • the RNA molecules are reverse transcribed into DNA, as shown in FIG. 1 (see Reverse transcription step).
  • Multiple different readouts can be produced with a plate reader, such as qPCR or NGS, as shown in FIG. 1 (see Readout step).
  • Yeast cells are engineered to conditionally express and display an RNA binding protein tethered to the cell wall. Induced cells are compared against non-induced cells. Parent strains are used as controls throughout the experiment. The cells are grown according to standard laboratory practices in a rich medium and diluted to exponential phase. The cells are standardized to a known concentration and mixed with RNA substrate containing recognized epitope. The mixtures are incubated in a binding buffer so that RNA substrate is able to attach to the binding protein. The cells are washed multiple times with a washing buffer to dilute out unbound RNA substrate. The samples are spun down so that the cells are retained while molecules are diluted. The samples that are not spun down are used as a dilution control.
  • CUs are engineered to comprise at least one logical operator module recognizing a given marker combination, as described above. CUs are further engineered to conditionally display an mRNAcap binding protein (RBP) (e.g., a IF4E) based on the logical operator module activation and to conditionally display a specific ssDNA binding protein (DBP) (e.g., a HUH- tag).
  • RBP mRNAcap binding protein
  • DBP specific ssDNA binding protein
  • a reaction with the Reaction Reagent is prepared with the CUs and an input sample with spiked-in sample objects (with the recognized marker combination). The reaction is incubated in Reaction Medium to enable activation of the logical operator modules on sample objects. The activation triggers expression and display of SBEs, such as RNA binding proteins and DNA binding proteins, on the CUs interacting with the sample objects. The sample objects are then lysed, which leads to localized mRNA release into the Reaction Medium, where localization is promoted by crowding agent (e.g., a hydrogel) in the Reaction Medium.
  • crowding agent e.g., a hydrogel
  • each molecular tag includes a unique recognition element (e.g., ssDNA binding sequence), a unique hash element, and a priming element (e.g., dT_N).
  • the priming element is optimized for hybridization of polyA tails of mRNA molecules for non-specific retention.
  • the reaction is washed to dilute out unbound biological molecules.
  • Co-localization of mRNAs and molecular tags on the CU surface results in hybridization of polyA tail to the dT_N priming element.
  • First strand synthesis proceeds using standard methods.
  • the synthesized tagged-cDNA is subsequently released and analyzed/ sequenced using NGS or similar methods.
  • Computational algorithms are used to estimate transcriptomic profile of individual target objects as well as distributions of CU types on individual target objects.
  • This Example employed spatial fluctuations in the number of biological molecules, such as those caused by Brownian motion and cell signaling, within subcellular portions of the sample to evaluate CU association by proximity. This proximity information was subsequently utilized to compute target-specific output signals associated with a plurality of sample objects. This included analyzing sample object interactions, such as cell-cell interactions, and determining the distribution of biological molecules within specific sample objects, achieving subcellular resolution of entities like mRNA transcripts. By leveraging the spatial fluctuations, this Example provides enhanced accuracy and utility in computing these output signals across various configurations and disease contexts.
  • the GSE 158055 dataset was split into individual scRNA-seq samples and then organized randomly into train, dev and test datasets with zero intersection of patients between individual sets.
  • the split ratio was chosen roughly 80/10/10 resulting in 1,168,236 train cells, 128,084 dev cells and 116,358 test cells. Across all samples, top G genes with least sparsity were retained for further processing as described below.
  • the UMI vector for cell k in the training dataset was denoted by T k E R G , where T k i was equal to the number of UMIs corresponding to the gene i detected in cell k. From the UMI vectors a batch of training inputs was generated forming a single training epoch. Each input was generated from a single scRNA-seq sample to ensure computation was based on biological differences and not batch-to-batch differences.
  • the distribution of the relative reconstruction error (RRE) per sample object across the validation inputs was also evaluated.
  • the median RRE per sample object was 0.
  • Sample objects exhibiting RRE greater than 50% can be considered failed reconstructions for practical purposes. In this case, the proportion of failed reconstructions was approximately 3.5%. This metric is vital for assessing the reconstruction quality and identifying potential issues in the model’s performance.
  • FIG. 7 provides a visual representation of the data validation results, illustrating the distribution of UMI counts, UMIs per computing unit, and relative reconstruction error (RRE) across the validation dataset.
  • Panel A shows the distribution of UMI counts per sample object, with a median of 1312 UMIs and a total of 128,000 cells, highlighting the variability and range of molecular content captured within each sample object.
  • Panel B depicts the distribution of UMIs allocated to each computing unit, with a median of 85 UMIs per computing unit and 95% of computing units having fewer than 240 UMIs, illustrating the sparsity and low informational content of the computing unit UMI vectors compared to the sample objects.
  • This panel provides a clear visualization of the reconstruction quality and highlights areas where the model’s performance can be improved.
  • the model architecture is shown in FIG. 6.
  • the input to the model was the measurement matrix X G R NxG , where N was the number of computing units and G was the number of genes. G was a constant for any given model while N could be arbitrarily chosen in inference mode.
  • the model first applied loglp transformation on the input data. Then linear dense layer was applied to reduce the dimensionality to R NxE where E was an embedding dimension. After dimension reduction, the tensor was passed through L stacked transformer blocks, each with H attention heads, a feed-forward dimension of F, and a ReLU activation function. Since no positional embedding was used, the model could handle inputs X G R NxG with arbitrary integer N. The last transformer block featured a single attention head. Its scaled attention logits were extracted, and, instead of applying softmax as in a standard transformer block, a sigmoid function was applied to each element. This resulted in an output matrix Y G [0,l] WxW , where each element Y it j indicated the relative likelihood that computing unit i and computing unit j were interacting with the same sample object.
  • the model output Y was further processed to obtain estimates of sample object UMI vectors T E R KxG .
  • the output matrix Y was used to partition the set of computing units into K number of disjoint sets. This partition was represented by the estimated adjacency matrix Y R N XN F or instance, the partitioning can be performed by agglomerative clustering with full linkage, where the number of clusters is either known a priori or determined by optimization of a secondary criterion, such as the [average] number of computing units per cluster.
  • FIG. 9 provides a detailed visualization of the cell type confusion matrix associated with the input data from the dataset GSE158O55.
  • the confusion matrix (FIG. 9) indicates the probability that a computing unit from a sample object on the y-axis was estimated to be from a sample object on the x-axis.
  • Diagonal components of the matrix list the probabilities that the origin of computing units was correctly estimated within the given cell type. High values along the diagonal indicate a high accuracy of correctly identifying the cell type of the computing units, reflecting the model’s ability to accurately classify cell types based on the input data.
  • Non-zero, off-diagonal components of the confusion matrix illustrate mismatches between cell types.
  • NK cells and CD8 T cells can lead to misclassifications, which are represented by the off-diagonal values. These mismatches highlight areas where the model needs improvement in distinguishing between similar cell types.
  • the matrix provides a clear and quantifiable measure of the model’s performance in classifying cell types, which is crucial for validating the accuracy and reliability of the model’s predictions.
  • FIG. 9 effectively illustrates the classification performance of the model, providing insights into the strengths and weaknesses of the cell type classification.
  • Example 5 assumed that each Computing Unit was assigned to a unique Sample Object. In cases where some Sample Objects are bound to each other, such assignment is not possible. In practice, most cases fall in this category, for example, as a result of incomplete dissociation or cell-cell interactions. For this purpose, the large language model was extended to consider Computing Units assigned to 2 or more Sample Objects.
  • the notable difference was in the assignment of Computing Units to Sample Objects, which was no longer suijective.
  • the UMI vector X L 6 R G where X t j was equal to the number of UMIs corresponding to gene j attributed to computing unit z, was constructed by sampling the UMI vector of the sample object to which the computing unit can be assigned.
  • validation and test inputs were constructed from the validation and test datasets.
  • the model output Y was further processed to obtain estimates of sample object UMI vectors T E R KxG .
  • the output matrix Y was used to partition the set of computing units into K number of disjoint sets. This partition was represented by the estimated adjacency matrix Y E R N xN .
  • the partitioning can be performed by agglomerative clustering with full linkage, where the number of clusters can either be known a priori or determined by optimization of a secondary criterion, such as the average number of computing units per cluster.
  • the UMI vectors of computing units in cluster z can be summed into a single vector Tj.
  • the relative reconstruction error (RRE) was calculated as a measure of the reconstruction quality.
  • the model output Y was further processed to obtain the estimated Sample Object adjacency matrix.
  • the output matrix Y was used to partition the set of computing units into K number of disjoint sets. This partition was represented by the estimated adjacency matrix Y E RN XN p or i ns t ancej the partitioning can be performed by agglomerative clustering with full linkage, where the number of clusters can be either known a priori or determined by optimization of a secondary criterion, such as the number of computing units per cluster.
  • a simple heuristic can then be used to compute the estimated Sample Object adjacency matrix W E RKXK
  • W i: j 1 if Yi i mj > P i t ,m where the summation was over all Computing Units l t in the ith set and Computing Units rrij in the jth set, and /? > 0.
  • the ROC curve of the prediction of edges between sample objects was then analyzed, with a mark showing the optimal value of ? maximizing the accuracy of predicting whether a given pair of sample objects are connected or not.
  • left column includes a graph representation of the model output Y.
  • Center and right columns include graph representations of the matrices W and W, respectively.
  • This post-processing approach corresponds to FIG 8. Cl -3.
  • the visualization of the computing units in C1-C3 in FIG. 8 appears clearer and more organized compared to B1-B3, providing additional value by offering a more coherent representation of the biological data. This improved clarity in visualization can enhance the understanding of biological processes and the relationships between computing units and sample objects, leading to more accurate and insightful analyses.
  • FIG. 8 shows the example model input X projected by t-SNE, with cell types listed in the inset.
  • This plot displays 150 points corresponding to 150 computing units assigned to 10 sample objects, indicated by the color coding.
  • the projection reveals that no separated clusters are visible, indicating that the initial model input does not exhibit clear spatial separation.
  • some colocalization of computing units from the same sample objects is present, with most sample objects spanning the full support of the plot.
  • This visualization can help in understanding the initial distribution and relationship of computing units before any transformation can be applied by the model.
  • Panels B1-B3 in FIG. 8 presents the example embeddings of the last transformer layer projected by t-SNE.
  • the embeddings exhibit strong spatial separation that closely mirrors the sample objects.
  • the t-SNE projection reveals distinct clusters, each corresponding to a different sample object, demonstrating the model’s ability to learn and represent the underlying structure and relationships within the data. This spatial separation indicates that the transformer layers effectively capture and encode the relevant features, facilitating accurate downstream analysis and predictions.
  • Panels C1-C3 in FIG. 8 visualizes the estimated adjacency matrix Y following agglomerative clustering.
  • a force layout can be used to indicate the relationships between computing units, where computing units appearing close together can be predicted to be assigned to the same sample object.
  • This visualization can provide a clear representation of the clustering results, showing how the model groups computing units based on their learned relationships and interactions.
  • the force layout can help in intuitively understanding the predicted assignments and the structural organization of the computing units within the sample objects.
  • FIG. 8 effectively illustrates the progression and transformation of model variables from the initial input through to the final clustering.
  • the t-SNE projections and force layout can provide valuable insights into the model’s performance and its ability to capture and represent complex biological relationships.
  • EXAMPLE 7 Estimating mRNA Transcripts of a Cell in an Input Sample
  • This Example considers samples of 6 cells taken from patient derived tissue. The number of mRNA transcripts for each cell was taken from a publicly available dataset.
  • each computing unit was simulated by, first randomly associating each of the 300 computing units with one of the 6 cells, and second associating each transcript from each of the 6 cells with one associated computing unit. Hence, mRNA transcripts from one cell were thereby randomly distributed amongst the computing units associated with that cell. Lastly, each computing unit was associated with a unique molecular tag.
  • FIG. 12 is the confusion matrix, listing statistics of computing unit associations for samples comprising B lymphocytes. These results show that computing units in a highly connected cluster were likely associated with the same cell, validating the model’s accuracy in predicting cell-cell interactions.
  • This example demonstrates how the model can use mRNA transcript data to associate computing units with their respective sample objects, thereby providing insights into cell-cell interactions and improving the estimation of mRNA transcripts for individual cells in the sample.
  • This Example is performed to demonstrate how the model can effectively use mRNA transcript data to associate computing units with their respective cells, both before and after postprocessing. It highlights the model’s ability to reveal cell-cell interactions and the distribution of mRNA transcripts within cells, thereby providing valuable insights into the complex relationships within the sample.
  • This Example considered three cells from a peripheral blood mononuclear cell (PBMC) sample bound to each other in different configurations.
  • the number of mRNA transcripts for each cell was taken from a publicly available dataset.
  • the number of mRNA transcripts bound to each computing unit was simulated by, first randomly associating each of the 300 computing units with a subset of the 3 cells, and second associating each transcript from each of the 3 cells with one associated computing unit.
  • Computing units associated with singleton cells were not associated with other cells.
  • Computing units associated with non-singleton cells, i.e., cells bound to other cells, were with some probability also associated with one of the bound cells.
  • mRNA transcripts from one cell were thereby randomly distributed amongst the computing units associated with that cell.
  • each computing unit was associated with a unique molecular tag.
  • FIGS. 14-15 illustrates the associations between the computing units and the cells. Then, mRNA transcript numbers associated with each computing unit were used to associate computing units with each other. For this purpose, the large language model described in Examples 5-6 was used. Select results of these associations are illustrated in FIGS. 14-15, where nodes indicate computing units, edges indicate associations between computing units, and colors correspond to original associations with cells. In FIGS. 14-15, the bottom diagram indicates the true association of computing units, where computing units associated with the same cell are linked by an edge, and the top diagram indicates the computed associations.
  • FIG. 14 shows the cell-cell visualization before post-processing, illustrating the initial associations computed by the model, which indicate the relationships between computing units based on the mRNA transcript data.
  • FIG. 15 shows the cell-cell visualization after postprocessing, illustrating the refined associations after applying the large language model and additional post-processing steps.
  • This Example demonstrates isolation of mRNA from leukocytes and synthesis of cDNA from the isolated mRNA.
  • thermoblock was preheated to 37 °C.
  • the Encapsulating Reagent (a buffer containing computing units of the present disclosure, engineered to display human CD45 and covalently labeled with unique molecular tags comprising poly(dT) such that each computing unit is denoted by a molecular tag with a unique barcode) was equilibrated to room temperature for 5 minutes.
  • the Encapsulating Reagent was briefly centrifuged at 100 * g for 5 seconds.
  • 100 JJ.1 of the Resuspension Buffer H2O, NaCl, KC1, Na2HPO4, KH2PO4, P407) was added and gently mixed by pipetting.
  • the Encapsulation Reagent was placed on a thermoshaker at 1400 RPM, 37 °C for 5 minutes. 10 pl of the Encapsulation Reagent was transferred to each reaction for analysis. Each reaction was gently vortexed and placed into a fixed-angle centrifuge, orienting the hinge of the tube away from the center, and spun down at 200 x g for 1 minute. Without removing the tubes from the centrifuge, the tubes were twisted 180° until the hinge faced inwards to the center. The tubes were spun down nine more times in the same manner, rotating the tube 180° between each spin-down. At the end of the spinning, a small pellet was visible in each reaction tube. The pellets were gently dispersed with gentle pipetting in each tube.
  • RT Reverse Transcription
  • Template Switching' The RT Buffer was vortexed briefly, and then the RT mix was prepared in a separate tube as shown in Table 1, adding the RT Enzyme Mix last. Table 1.
  • RT Reverse Transcription
  • Each RT reaction was mixed thoroughly by pipetting several times and centrifuged briefly to collect solutions to the bottom of tubes. Each RT reaction mix was incubated in a thermal cycler with the heated lid with the following steps: (1) set to 105 °C; (2) 90 minutes at 42 °C; (3) 10 minutes at 70 °C; and (4) hold at 4 °C.
  • EXAMPLE 10 Estimating mRNA Transcripts of a Cell in an Input Sample Using Sequencing Data
  • This Example demonstrates how the large language model described in Examples 5-6 can use mRNA sequencing data to associate computing units with their respective sample objects, thereby providing insights into the distributions of biological molecules at the sub- cellular level and cell-cell interactions, improving the estimation of mRNA transcripts for individual cells in the sample.
  • EXAMPLE 11 Estimating mRNA Transcripts of a Cell in an Input Sample Using Allele Specific Sequencing Data
  • This Example demonstrates how the large language model of Examples 5-6 can use allele specific mRNA sequencing data to associate computing units with their respective sample objects, thereby providing insights into distribution of biological molecules at the sub-cellular level and cell-cell interactions, improving the estimation of mRNA transcripts for individual cells in the sample.
  • Samples of approximately 1,000 cells are taken from patient derived tissues.
  • the number of mRNA transcripts bound to each computing unit is obtained by Next Generation Sequencing (NGS) of cDNA obtained following the steps of Example 9, where mRNA transcripts from one cell or a cluster of cells are captured by molecular tags bound to computing units associated with that cell or cluster of cells.
  • NGS Next Generation Sequencing
  • mRNA transcript numbers associated with each computing unit and with each allele are used to associate computing units associated with the same cell or cluster of cells.
  • the large language model described in Examples 5-6 is used.
  • the number of genes G is equal to the number of allele assignments.
  • G is three times the number of genes corresponding to the first allele genes, second allele genes, and indeterminate allele genes.
  • the transcript is by default assigned to the indeterminate allele. All other mRNA transcripts are assigned to alleles using a priori information regarding allele sequences.

Landscapes

  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Chemical & Material Sciences (AREA)
  • Biotechnology (AREA)
  • Artificial Intelligence (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Organic Chemistry (AREA)
  • General Physics & Mathematics (AREA)
  • Wood Science & Technology (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • Mathematical Physics (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Public Health (AREA)
  • Computing Systems (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Biochemistry (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Provided herein are methods/systems/kit for detecting target-specific output signals, signifying a characteristic of an input sample. The methods, inter alia, involves the interaction of computing units (CUs) engineered to display surface-bound entities (SBEs) upon contact with sample objects, which comprise biological molecules. Computing units (CUs) can include cells and/or molecules that can convert biological signals into output signals that can provide information about the input sample.

Description

BIOLOGICAL COMPUTING METHODS AND SYSTEMS FOR ANALYZING
BIOLOGICAL UNITS
RELATED APPLICATION
[0001] This application claims priority to U.S. Provisional Patent Application Serial Nos. 63/592,687, filed on October 24, 2023, and 63/666,556, filed on July 1, 2024, which are incorporated herein by reference in their entireties for all purpose.
BACKGROUND
[0002] To gain a comprehensive understanding of cells, they must be studied on both multi- and single-cell levels. However, a significant challenge is posed by the inherent heterogeneity within the smallest cell populations. This heterogeneity presents substantial challenges in drug and biomarker discovery and development processes. Consequently, the emerging field of single-cell research is faced with a crucial limitation in terms of processing power. The analysis of each individual cell within a population is deemed essential for uncovering the intricacies of cellular heterogeneity. However, the processing of hundreds of thousands to millions of single cells is necessitated. Unfortunately, existing techniques, such as molecule-based enzymatic methods (e.g., enzyme-linked immunosorbent assay (ELISA)) and instrument-based brute force methods (e. ., flow cytometry), are restricted by their processing power. For example, when analyzing a group of cells within a sample the size of a human body using either ELISA or flow cytometry, an impractical amount of time would be required, potentially extending to years. The emerging field of single-cell research is thus confronted with a bottleneck that hinders the transition from the discovery of new drugs and biomarkers to their translation into accessible assays and diagnostics. There is an urgent need for a revolutionary single-cell technology that can swiftly and comprehensively analyze numerous cells, addressing this limitation by completing analyses in a matter of hours or days, rather than years. Such a breakthrough technology would not only expedite the development of novel therapeutics and diagnostic tools but would also significantly advance our understanding of cellular biology on both multi- and single-cell levels. SUMMARY
[0003] The present disclosure provides innovative methods for detecting specific characteristics or markers within an input sample. The methods, systems, and kits provided herein involves the use of computing units (CUs) engineered to interact with biological molecules within the sample. These CUs capture and concentrate the target molecules by displaying surface-bound entities (SBEs) upon interaction, and the subsequent measurement of the amount of bound biological molecules enables the precise detection of target-specific output signals. The methods, systems, and kits provided herein offers a significant advancement in the field of single-cell research, addressing the limitations associated with existing techniques and providing a faster and more efficient means of analyzing numerous cells in a short amount of time, which is crucial for accelerating drug discovery and diagnostic development processes. [0004] Accordingly, in one aspect, provided herein is a method of detecting presence of one or more target specific output signals indicative of a characteristic of an input sample, the method including: (a) contacting a plurality of sample objects derived from the input sample with a plurality of computing units (CUs), wherein a sample object of the plurality of sample objects can include a plurality of biological molecules, wherein each CU of the plurality of CUs can be engineered to interact with the plurality of sample objects and display a first set of at least one surface-bound entity (SBE) on its surface only upon interacting with the plurality of sample objects, and wherein the first set of at least one SBE can bind to the plurality of biological molecules; (b) lysing the plurality of sample objects to release the plurality of biological molecules such that the plurality of biological molecules can be released from the plurality of sample objects and binds to the first set of at least one SBE; and (c) detecting presence of the one or more target specific output signals indicative of the characteristic of the input sample by measuring an amount of the plurality of biological molecules bound to the first set of at least one SBE, wherein the amount can be indicative of the characteristic of the input sample.
[0005] In some embodiments, a first CU of the plurality of CUs can interact with a first sample object of the plurality of sample objects a first number of times, the first CU can interact with a second sample object of the plurality of sample objects a second number of times, a second CU of the plurality of CUs can interact with the first sample object a third number of times, and the second CU can interact with the second sample object a forth number of times. In some embodiments, the plurality of biological molecules can be assigned to one or more hash group using the first set of at least one SBE. In some embodiments, the first number, the second number, the third number, and the forth number of interactions can be utilized to evaluate the one or more target specific output signals associated with the plurality of sample objects.
[0006] Also provided herein is a method of detecting presence of one or more target specific output signals indicative of a characteristic of an input sample via multiplexed hashing, the method including: (a) contacting a plurality of sample objects derived from the input sample with a plurality of CUs, wherein a sample object of the plurality of sample objects can include a plurality of biological molecules, wherein each CU of the plurality of CUs can be engineered to interact with the plurality of sample objects and display a first set of at least one SBE on its surface, wherein the plurality of CUs can be partitioned, such that a first CU of the plurality of CUs can interact with a first sample object of the plurality of sample objects a first number of times, the first CU can interact with a second sample object of the plurality of sample objects a second number of times, a second CU of the plurality of CUs can interact with the first sample object a third number of times, and the second CU can interact with the second sample object a forth number of times, and wherein the first set of at least one SBE can bind to the plurality of biological molecules; (b) lysing the plurality of sample objects to release the plurality of biological molecules such that the plurality of biological molecules can be released from the sample object and binds to the first set of at least one SBE; and (c) detecting presence of the one or more target specific output signals indicative of the characteristic of the input sample by (i) assigning the plurality of biological molecules to one or more hash group using the first set of at least one SBE, and (ii) utilizing the first number, the second number, the third number, and the forth number of interactions to evaluate the one or more target specific output signals associated with the plurality of sample objects.
[0007] In some embodiments, at least two of the plurality of sample objects can be bound to each other. In some embodiments, the each CU interacting with the plurality of sample objects can depend on the binding of the at least two of the plurality of sample objects. In some embodiments, the plurality of biological molecules bound to the first set of at least one SBE can be washed to remove any unbound biological molecules before step (c). In some embodiments, the first CU can display the first set of at least one SBE that is different from what the second CU displays on its surface. In some embodiments, at least one of the first number, the second number, the third number, and the forth number can be one. In some embodiments, at least one of the first number, the second number, the third number, and the forth number can be greater than one. In some embodiments, at least two of the first number, the second number, the third number, and the forth number can be the same. In some embodiments, at least two of the first number, the second number, the third number, and the forth number can be different. In some embodiments, the first number and the third number can be indicative of a characteristic of the first sample object, and the second number and the forth number can be indicative of a characteristic of the second sample object.
[0008] In some embodiments, a plurality of molecular tags can be added after lysing in step (b). In some embodiments, the each CU of the plurality of CUs can further display a second set of at least one SBE. In some embodiments, the plurality of molecular tags can associate with the second set of at least one SBE. In some embodiments, the second set of at least one SBE can be incapable of binding to the plurality of biological molecules. In some embodiments, the plurality of molecular tags that do not associate with the second set of at least one SBE can be removed by washing. In some embodiments, a first molecular tag of the plurality of molecular tags can associate uniquely with a first SBE of the second set of at least one SBE, and a second molecular tag of the plurality of molecular tags can associate uniquely with a second SBE of the second set of at least one SBE. In some embodiments, the plurality of molecular tag and the second set of at least one SBE can be associated either before or after the each CU can interact with the sample object. In some embodiments, a molecular tag of the plurality of molecular tags can include a hash element and a priming element. In some embodiments, the priming element can be a random N-mer. In some embodiments, the hash element can include a unique molecular identifier (UMI). In some embodiments, the molecular tag can further include a recognition element. In some embodiments, the molecular tag can further include a sequencing element. In some embodiments, the molecular tag can further include a unique protein binding domain. In some embodiments, the unique protein binding domain can be an antibody binding domain. In some embodiments, the antibody binding domain can be protein G. In some embodiments, the plurality of biological molecules can be RNA, or protein.
[0009] In some embodiments, the plurality of biological molecules can be RNA. In some embodiments, any of the methods provided herein, wherein the biological molecules can be RNA, can further include (1) poly(A) priming of the RNA to the priming element of the molecular tag, and (2) reverse transcribing the RNA; after step (b) and before step (c). [0010] In some embodiments, the plurality of biological molecules can be protein. In some embodiments, the protein can include a barcode. In some embodiments, the protein can be an antibody with the barcode. In some embodiments, any of the methods provided herein, wherein the biological molecules can be protein, can further include (1) capturing the protein via the sequencing element or the unique protein binding domain, (2) priming the barcode via the priming element, and (3) extending the priming element with a strand displacing polymerase; after step (b) and before step (c). In some embodiments, any of the methods provided herein, wherein the biological molecules can be protein, can further include (i) capturing the protein via the unique protein binding domain, (ii) releasing the protein from the molecular tag; after step (b) and before step (c). In some embodiments, a protein-molecular tag complex can be optionally stabilized via crosslinking. In some embodiments, the cross-linking can be via an amine-reactive cross-linker.
[0011] In some embodiments, the measuring in step (c) can further include determining an amount of the molecular tag associated with the plurality of biological molecules. In some embodiments, the determining can be performed using qPCR, sequencing, gel electrophoresis, isothermal amplification, ELISA, or mass spectrometry. In some embodiments, the sequencing can be Next-Generation sequencing or Sanger sequencing. In some embodiments, any of the methods provided herein can further comprising (3) computing the amount of the molecular tag such that the plurality of biological molecules in the first sample object can be differentiated from the plurality of biological molecules in the second sample object. In some embodiments, the plurality of sample objects in contact with the plurality of CUs in step (a) can be incubated for a sufficient amount of time in an incubator for the each CU to display the first set of at least one SBE and the second set of at least one SBE on its surface. In some embodiments, the plurality of sample objects in contact with the plurality of CUs can be incubated in a crowding agent. In some embodiments, the crowing agent can be a hydrogel.
[0012] In some embodiments, the interaction between the plurality of CUs and the plurality of sample objects can include at least one logical operator module. In some embodiments, a logical operator module of the at least one logical operator module can include the sample object and two or more CUs of the plurality of CUs. In some embodiments, the at least one logical operator module can generate one or more output signals. In some embodiments, the at least one logical operator module can include a YES gate, an ND gate, a NAND gate, an OR gate, a NOR gate, a XOR gate, a XNOR gate, a NOT gate, or any combination thereof, wherein the two or more CUs comprise a first CU and a second CU. In some embodiments, the YES gate can include generating the one or more output signals only when both the first CU and the second CU can be bound to the sample object. In some embodiments, the AND gate can include generating the one or more output signals only when both the first CU and the second CU can be bound to the sample object. In some embodiments, the NAND gate can include suppressing or diminishing the one or more output signals when both the first CU and the second CU can be bound to the sample object. In some embodiments, the OR gate can include generating the one or more output signals when either the first CU or the second CU or when both the first CU and the second CU can be bound to the sample object. In some embodiments, the NOR gate can include generating the one or more output signals when both the first CU and the second CU cannot bound to the sample object. In some embodiments, the XOR gate can include generating the one or more output signals when either the first CU or the second CU but not both CUs can be bound to the sample object. In some embodiments, the XNOR gate can include generating the one or more output signals when either both the first CU and the second CU or when both the first CU and the second CU can be bound to the sample object. In some embodiments, the NOT gate can include suppressing or diminishing the one or more output signals when the first CU can be bound to the sample object. In some embodiments, the one or more output signals can be display of the first set of at least one SBE.
[0013] In another aspect, provided herein is a system for biological computing, the system including: (a) a plurality of sample objects, wherein a sample object of the plurality of sample objects can include a plurality of biological molecules; and (b) a plurality of CUs, wherein a CU of the plurality of CUs can be engineered to interact with the sample object and display a first set of at least one SBE upon interacting with the sample object. Also provided herein is a system for biological computing, the system including: (a) a plurality of sample objects, wherein a sample object of the plurality of sample objects can include a plurality of biological molecules; and (b) a plurality of CUs, wherein a CU of the plurality of CUs can be engineered to interact with the sample object and display a first set of at least one SBE. In some embodiments, a first CU of the plurality of CUs can interact with a first sample object of the plurality of sample objects a first number of times, the first CU can interact with a second sample object of the plurality of sample objects a second number of times, a second CU of the plurality of CUs can interact with the first sample object a third number of times, and the second CU can interact with the second sample object a forth number of times. In some embodiments, the first number and the third number can be indicative of a characteristic of the first sample object, and the second number and the forth number can be indicative of a characteristic of the second sample object. In some embodiments, the CU of the plurality of CUs can further display a second set of at least one SBE. In some embodiments, any of the systems provided herein can further include (c) a plurality of molecular tags. In some embodiments, the plurality of molecular tags can associate with the second set of at least one SBE. In some embodiments, a molecular tag of the plurality of molecular tags can include a hash element and a priming element. In some embodiments, the priming element can be a random N-mer. In some embodiments, the molecular tag can further include a recognition element. In some embodiments, the molecular tag can further include a sequencing element. In some embodiments, the molecular tag can further include a unique protein binding domain. In some embodiments, the unique protein binding domain can be an antibody binding domain. In some embodiments, the antibody binding domain can be protein G. In some embodiments, any of the system provided herein can further include a strand displacing polymerase. In some embodiments, the CU of the plurality of CUs can be engineered to display the first set of at least one SBE and the second set of at least one SBE. In some embodiments, a first molecular tag of the plurality of molecular tags can associate uniquely with a first SBE of the second set of at least one SBE, and a second molecular tag of the plurality of molecular tags can associate uniquely with a second SBE of the second set of at least one SBE.
[0014] Also provided herein is a kit for biological computing, the kit including (a) a plurality of CUs, wherein a CU of the plurality of CUs can be engineered to interact with the sample object and display a first set of at least one SBE upon interacting with the sample object; and (b) an instruction for use of the kit. Also provided herein is a kit for biological computing, the kit including (a) a plurality of CUs, wherein a CU of the plurality of CUs can be engineered to interact with the sample object and display a first set of at least one SBE; and (b) an instruction for use of the kit. In some embodiments, the CU of the plurality of CUs can be further engineered to display a second set of at least one SBE. In some embodiments, any of the kits provided herein can further include (c) a plurality of molecular tags. In some embodiments, a molecular tag of the plurality of molecular tags can include a hash element and a priming element. In some embodiments, the priming element can be a random N-mer. In some embodiments, the molecular tag can further include a recognition element. Tn some embodiments, the molecular tag can further include a sequencing element. In some embodiments, the molecular tag can further include a unique protein binding domain. In some embodiments, the unique protein binding domain can be an antibody binding domain. In some embodiments, the antibody binding domain can be protein G. In some embodiments, any of the kits provided herein can further include a strand displacing polymerase.
[0015] Further provided herein is a computer-implemented method, the method including: (a) compiling a measurement matrix Ml comprising from measurements of biological molecules derived from a set of sample objects interacting with CUs, wherein the measurement matrix Ml can be partitioned by molecular tags assigned to the biological molecules; wherein the molecular tags can be assigned in sufficiently different proportions to biological molecules derived from different sample objects, wherein each column of the matrix Ml represents measurements associated with a same molecular tag, and each row represents measurements associated with a same biological molecule; and (b) generating a profile of a subset of sample objects based at least in part on the measurement matrix Ml .
[0016] In some embodiments, the subset of sample objects can be class B sample objects denoted by matrix B, and wherein the class B sample objects can be represented by a set of vectors bi {bl, b2, . .. bk} (i = 1, 2, 3, .. . k) and where the vector bi = (rli, r2i, . .. rmi) can be indicative of typical amounts for some of biological molecules for class B sample objects. In some embodiments, the set of sample objects comprises a plurality of classes of sample objects, wherein the plurality of classes is denoted by Bi {Bl, B2, .. . Bk}, and wherein each class Bi of sample objects can be represented by a set of vectors {bi 1 , bi2, biki } and where the vector bij = (rlij, r2ij, rmij) can be indicative of typical amounts for some of the biological molecules for class Bi sample objects. In some embodiments, the profde of a subset of sample objects can be estimated based on the measurements with aid of a machine learning algorithm. In some embodiments, the machine learning algorithm comprises neural network algorithm. In some embodiments, computing the profde of a subset of sample objects further comprises: (a) computing proportions in which each molecular tag of the molecular tags can be assigned to the biological molecules derived from each sample object class Bi, wherein the proportions can be denoted by an optimal transformation matrix A; and (b) computing the profde of a subset of sample objects based at least in part on the measurement matrix Ml and the optimal transformation matrix A.
[0017] In some embodiments, computing the optimal transformation matrix A comprises an operation that utilizes an optimization algorithm to minimize the absolute difference between transformation matrix A-transformed matrix B and a truncated measurement matrix M. In some embodiments, computing the matrix A comprises computing the matrix A by an optimization algorithm, wherein the optimization algorithm can be a linear program that is defined as: minimize \M — BA\ subject to,
A > 0, | 4 |x = c, i = 1, ... , n wherein: (a) matrix M can be compiled from truncated hashed measurements, wherein h column of M is a vector of measurements associated with the same molecular tag, and jth row of M corresponds to measurements of the jth biological molecule, (b) matrix /? can be compiled from the class vectors bI7, wherein columns of B correspond to the vectors and the jth row of B corresponds to the jth biological molecule, (c) matrix A can represent the optimization variable denoting the proportions in which each molecular tag can be assigned to the biological molecules derived from each sample object class Bi, and (d) optimization constraints can be indicative of physical limitations, with the constrain |71 |x = c being optional and can correspond to a case where a number of measured sample objects can be known.
[0018] In some embodiments, the profile can be transcriptomic profile, proteomic profile, or multiomic profile. In some embodiments, the profile can include probabilities specific sample objects that can be bound to each other.
[0019] Further provided herein is a computer-implemented method for analyzing multiplexed hashed measurements of biological molecules, comprising: (a) accessing, at one or more computing devices, the multiplexed hashed measurements of biological molecules, (b) utilizing a numerical classification engine stored in one or more memories of the one or more computing devices, to ascertain if the accessed measurements partially derive from a sample object of class Bi; and (c) training the numerical classification engine via supervised learning, to classify the multiplexed hashed measurements into a plurality of classes; wherein the training data comprises multiplexed hashed measurements and the classes of corresponding sample objects from which the measured biological molecules originate, and wherein the training data can be obtained by either direct measurement or being synthesized from single cell molecular data.
[0020] Further provided herein is a computer-implemented method for analyzing multiplexed hashed measurements of biological molecules, comprising: (a) accessing, at one or more computing devices, the multiplexed hashed measurements of biological molecules; (b) utilizing a numerical regression engine stored in one or more memories of the one or more computing devices, to determine the number of sample objects of class Bi that generated the accessed measurements; and (c) training the numerical regression engine via supervised learning, to perform regression on the multiplexed hashed measurements; wherein the training data comprises multiplexed hashed measurements, the number and the classes of corresponding sample objects from which the measured biological molecules originate, and wherein the training data can be obtained by either direct measurement or being synthesized from single cell molecular data.
[0021] Also provided herein is a computer-implemented method for analyzing multiplexed hashed measurements of biological molecules, comprising: (a) accessing, at one or more computing devices, multiplexed hashed measurements of biological molecules; (b) determining, using a numerical classification engine stored in one or more memories of the one or more computing devices, whether the accessed measurements in part generate from a sample object described by a set of vectors {bl, ..., bm}; and (c) training, the numerical classification engine using supervised learning, to classify multiplexed hashed measurements into two classes, true or false, based on training data comprising multiplexed hashed measurements and the representative vectors of sample objects from which the measured biological molecules originate; wherein the training data being obtained by either direct measurement or being synthesized from single cell molecular data.
[0022] Also provided herein is a computer-implemented method for analyzing multiplexed hashed measurements of biological molecules, comprising: (a) accessing, at one or more computing devices, multiplexed hashed measurements of biological molecules; (b) determining, using a numerical regression engine stored in one or more memories of the one or more computing devices, the number of sample objects described by a set of vectors {bl, ..., bm} that in part generated the accessed measurements; (c) training the numerical classification engine using supervised learning, to performa regression on the multiplexed hashed measurements; wherein training data comprises the multiplexed hashed measurements and the number and representative vectors of sample objects from which the measured biological molecules originate; and wherein the training data being obtained by either direct measurement or being synthesized from single cell molecular data.
[0023] Provided herein is a method of spatially profding an input sample including a plurality of sample objects including a plurality of biological molecules without establishing a priori spatial relationship between a plurality of molecular tags and the plurality of sample objects, the method including (a) contacting the plurality of sample objects with a plurality of computing units (CUs), wherein each CU of the plurality of CUs can be engineered to display a first set of at least one surface-bound entity (SBE) and a second set of at least one SBE on its surface, wherein the first set of at least one SBE can bind to the plurality of sample objects, and the second set of at least one SBE can be associated with the plurality of molecular tags and can bind to the plurality of biological molecules; and (b) permeabilizing the plurality of sample objects such that the plurality of biological molecules can be released and bind to the second set of at least one SBE associated with the plurality of molecular tags; and (c) establishing a posteriori spatial relationship between the plurality of molecular tags and the plurality of sample objects for spatial profding by evaluating proximity of the each CU of the plurality of CUs with aid of a machine learning algorithm and aggregating the plurality of biological molecules bound to proximal CUs. [0024] In some embodiments, the spatial profding can include information on cell-cell interactions of the plurality of sample objects and information regarding the distribution of the plurality of biological molecules within each sample object.
[0025] Provided herein is a method of single-cell sequencing without establishing a priori spatial relationship between a plurality of molecular tags and a plurality of sample objects including a plurality of biological molecules, the method including (a) contacting the plurality of sample objects with a plurality of computing units (CUs), wherein each CU of the plurality of CUs can be engineered to display a first set of at least one surface-bound entity (SBE) and a second set of at least one SBE on its surface, wherein the first set of at least one SBE can bind to the plurality of sample objects, and the second set of at least one SBE can be associated with the plurality of molecular tags and can bind to the plurality of biological molecules; and (b) permeabilizing the plurality of sample objects such that the plurality of biological molecules can be released and bind to the second set of at least one SBE associated with the plurality of molecular tags; and (c) establishing a posteriori spatial relationship between the plurality of molecular tags and the plurality of sample objects for single cell sequencing by evaluating proximity of the each CU of the plurality of CUs with aid of a machine learning algorithm and aggregating the plurality of biological molecules bound to proximal CUs. In some embodiments, the method can further include reverse transcribing the plurality of biological molecules bound to the plurality of molecular tags for sequencing prior to establishing a posteriori spatial relationship.
[0026] In some embodiments, the plurality of biological molecules can be RNA. In some embodiments, the sequencing can be Next-Generation sequencing or Sanger sequencing. In some embodiments, each CU can be associated with the plurality of molecular tags. In some embodiments, each CU can be associated with the plurality of molecular tags prior to the contacting the plurality of sample objects with the plurality of CUs. In some embodiments, each CU can be associated with the plurality of molecular tags following the contacting the plurality of sample objects with the plurality of CUs. In some embodiments, each molecular tag of the plurality of molecular tag can include a barcode and a unique molecular identifier (UMI), and can be associated with the second set of at least one SBE. In some embodiments, each molecular tag can further include a sequencing element, a release element, and/or a linker. In some embodiments, the release element can release each molecular tag from each CU. In some embodiments, the linker can prevent extension. In some embodiments, the barcode can be unique to the each CU. In some embodiments, each molecular tag can be single-stranded. In some embodiments, each molecular tag can include a hairpin structure. In some embodiments, each molecular tag can be double-stranded. In some embodiments, the second set of at least one SBE can be poly(dT). In some embodiments, the barcode can be uniquely assigned to the each CU of the plurality of CUs. In some embodiments, the UMI can be uniquely assigned to the each molecular tag. In some embodiments, each molecular tag can be associated with at least two CUs of the plurality of CUs. In some embodiments, each CU can be engineered to display the second set of at least one SBE upon interacting with at least one sample object of the plurality of sample objects. In some embodiments, the second set of at least one SBE can further include a blocking element. In some embodiments, the blocking element can prevent reverse transcription. In some embodiments, the blocking element can be removed when the at least one sample object interacts with each CU. In some embodiments, the plurality of sample objects can interact with the plurality of CUs via the first set of at least one SBE. In some embodiments, the plurality of biological molecules can interact with the plurality of CUs via the second set of at least one SBE. In some embodiments, the plurality of CUs bound to each sample object of the plurality of sample objects can provide a physical barrier, preventing diffusion of the plurality of biological molecules upon permeabilizing of the sample object.
[0027] In some embodiments, a first CU of the plurality of CUs can interact with a first sample object of the plurality of sample objects {al } number of times, the first CU can interact with a second sample object of the plurality of sample objects {a2} number of times, a second CU of the plurality of CUs can interact with the first sample object {a3} number of times, and the second CU can interact with the second sample object {a4} number of times. In some embodiments, the first CU can display the first set of at least one SBE that can be different from what the second CU displays on its surface. In some embodiments, at least one of the {al } number, the {a2} number, the {a3} number, and the {a4} number can be zero. In some embodiments, at least one of the {al } number, the {a2} number, the { a3 } number, and the {a4} number can be one. In some embodiments, at least two of the {al } number, the {a2} number, the {a3} number, and the {a4} number can be the same. In some embodiments, at least two of the {al } number, the {a2} number, the { a3 } number, and the {a4} number can be different. [0028] In some embodiments, the evaluating proximity of each CU of the plurality of CUs can include evaluating proximity between the first CU and the second CU. In some embodiments, the evaluating proximity between the first CU and the second CU can include identifying the plurality of molecular tags associated with the first CU and the second CU. In some embodiments, the evaluating proximity between the first CU and the second CU can include identifying numbers of UMIs linked to the barcode and the plurality of biological molecules. [0029] In some embodiments, {al’} number of UMIs can be linked to a first barcode and a first biological molecule of the plurality of the biological molecules, {a2’ } number of UMIs can be linked to the first barcode and a second biological molecule of the plurality of the biological molecules, {a3’} number of UMIs can be linked to a second barcode and a third biological molecule of the plurality of the biological molecules, and {a4’ } number of UMIs can be linked to the second barcode and a fourth biological molecule of the plurality of the biological molecules. In some embodiments, the first barcode can be unique to the first CU, and the second barcode can be unique to the second CU. In some embodiments, the {al’} number, the {a2’ } number, the {a3’} number, and the {a4’} number can be input into the machine learning algorithm for the proximity analysis. In some embodiments, the machine learning algorithm can output an adjacency matrix. In some embodiments, at least two of the {al’} number, the {a2’ } number, the {a3’} number, and the {a4’} number can give an output value of 1 in the adjacency matrix. In some embodiments, the output value of 1 can indicate the first CU and the second CU are in spatial proximity. In some embodiments, at least two of the {al’ } number, the {a2’ } number, the {a3’} number, and the {a4’} number can give an output value of 0 in the adjacency matrix. In some embodiments, the output value of 0 can indicate the first CU and the second CU are not in spatial proximity. In some embodiments, the adjacency matrix can be used to establish a posteriori spatial relationship between the plurality of molecular tags and the plurality of sample objects.
[0030] Provided herein is a machine learning algorithm that can be trained by a computer implemented method for training a model, wherein the computer implemented method can include: (a) maintaining a dataset including Unique Molecular Identifier (UMI) vectors corresponding to a plurality of sample objects derived from single-cell RNA sequencing (scRNA-seq) samples; (b) generating a plurality of training inputs from the dataset, wherein each training input can include a UMI matrix representing UMI counts for number of interactions between a predetermined number of computing units (CUs) and a plurality of biological molecules, and each training input can be associated with an adjacency matrix representing known proximal relationships among CUs; and (c) training the model by adjusting model parameters to match model outputs to the adjacency matrix associated with the plurality of training inputs, wherein the trained model can generate a model output representing spatial relationships among CUs based on input where a priori spatial relationships between the plurality of biological molecules and the sample objects are not present.
[0031] In some embodiments, generating the plurality of training inputs can include: (a) randomly assigning the predetermined number of CUs to the plurality of sample objects to generate an assignment profde, wherein each of the predetermined number of CUs can be assigned to one or more sample objects, and each of the plurality of the sample objects can be assigned with one or more CUs; and (b) sampling the predetermined number of CUs and corresponding biological molecules from the UMI vectors of the dataset to generate training inputs. [0032] In some embodiments, the assignment profile can correspond with the adjacency matrix associated with the training inputs. In some embodiments, each row of the UMI matrix can represent a biological molecule of the plurality of biological molecules, and each column of the UMI matrix can represent a CU of the predetermined number of CUs. In some embodiments, the known proximal relationships among CUs can include know proximal relationships between each pair of the CUs. In some embodiments, each training input can be generated from one scRNA-seq sample. In some embodiments, the model does not include positional embeddings to allow the trained model to handle input data with varying lengths and structures. In some embodiments, the model include a plurality of transformer blocks, and wherein a last transformer block of the plurality of transformer blocks can feature a single attention head. In some embodiments, the model can include a sigmoid activation function to allow the trained model to treat each proximal relationship between each pair of the CUs as an independent probability. In some embodiments, training the model can further include using a binary cross entropy as a loss function.
[0033] In some embodiments, the computer-implemented method of the present disclosure can further include (a) partitioning the predetermined number of CUs into a plurality of clusters based on the model output representing spatial relationships among CUs, wherein each cluster of CUs comprises CUs that are proximal to one another; and (b) aggregating the UMI vectors of the CUs within each cluster of the plurality of clusters to generate resultant UMI vector for each sample object. In some embodiments, the model output representing spatial relationships among CUs can be indicative of proximal relationship among sample objects of the plurality of sample objects. In some embodiments, the proximal relationship among sample objects can be derived from spatial relationships among the plurality of clusters of CUs, wherein the spatial relationships among the plurality of clusters of CUs can be derived from the model output representing spatial relationships among CUs. In some embodiments, the resultant UMI vector can represent a distribution of the plurality of biological molecules within each sample object. In some embodiments, the partitioning the predetermined number of CUs into a plurality of clusters can include agglomerative clustering with full linkage. In some embodiments, the number of clusters can be known a priori. In some embodiments, the number of clusters can be determined by optimizing a secondary criterion. In some embodiments, a relative reconstruction error can be calculated to measure reconstruction quality of the model. [0034] In some embodiments, the model can include a predetermined number of stacked transformer blocks, wherein the predetermined number can be configurable. In some embodiments, each block of a subset of the predetermined number of stacked transformer blocks can have a predefined number of attention heads, wherein the predefined number of attention heads can be configurable. In some embodiments, an Adam optimizer with a predetermined learning rate can be used to train the model. In some embodiments, the model can be trained for a predetermined steps with a predefined batch size.
[0035] In some embodiments, the machine learning algorithm can be a computer implemented method for training a model, including: (a) maintaining a dataset including Unique Molecular Identifier (UMI) vectors corresponding to each sample object of K number of sample objects derived from single-cell RNA sequencing (scRNA-seq) samples, wherein for sample object k. G number of biological molecules can be comprised in the UMI vector; (b) generating a plurality of training inputs from the dataset, wherein each training input comprises a UMI matrix X, wherein each row of the UMI matrix X can represent the G number of biological molecules, and each column of the UMI matrix X can represent A number of computing units (CUs), wherein values in the UMI matrix X indicate the UMI counts that can represent a number of interactions between the ith CU and the jth biological molecules, and each training input can be associated with an adjacency matrix Y representing known proximal relationships among CUs, wherein rows and columns of the adjacency matrix Y can represent the /V number of CUs, and wherein values in the adjacency matrix Y can represent a pairwise proximal relationship between the ith CU and the jth CU; and (c) training the model by adjusting model parameters to match model outputs to the adjacency matrix Y associated with the plurality of training inputs, wherein the trained model can generate a model output representing spatial relationships among CUs based on input where a priori spatial relationships between the G number of biological molecules and the K number of sample objects are not present.
[0036] In some embodiments, generating the plurality of training inputs can include: (a) randomly assigning the A number of CUs to the K number of the sample objects to generate an assignment profde, wherein each of the A number of CUs can be assigned to one or more sample objects, and each of the K number of the sample objects can be assigned with one or more CUs; and (b) sampling the A number of CUs and corresponding biological molecules from the UMI vectors of the dataset to generate the training inputs. [0037] In some embodiments, the assignment profile can correspond with the adjacency matrix Y associated with the training inputs. In some embodiments, the number G can be a constant. In some embodiments, the number? can be an arbitrary integer. In some embodiments, each training input can be generated from one scRNA-seq sample. In some embodiments, the model does not include positional embeddings to allow the trained model to handle input data with varying lengths and structures. In some embodiments, the model can include a plurality of transformer blocks, and wherein a last transformer block of the plurality of transformer blocks can feature a single attention head. In some embodiments, the model can include a sigmoid activation function to allow the trained model to treat the pairwise proximal relationship between the ith CU and the jth CU as an independent probability. In some embodiments, training the model can further iclude using a binary cross entropy as a loss function. In some embodiments, the computer-implemented method can further include: (a) partitioning the N number of CUs into a plurality of clusters based on the model output representing spatial relationships among CUs, wherein each cluster of the CUs can include CUs that are proximal to one another; and (b) aggregating the UMI vectors of the CUs within each cluster of the plurality of clusters to generate resultant UMI vector for each sample object.
[0038] In some embodiments, the model output representing spatial relationships among CUs can be indicative of proximal relationship among sample objects of the K number of sample objects. In some embodiments, the proximal relationship among sample objects can be derived from spatial relationships among the plurality of clusters of CUs, wherein the spatial relationships among the plurality of clusters of CUs can be derived from the model output representing spatial relationships among CUs. In some embodiments, the resultant UMI vector can represent a distribution of the G number of biological molecules within each sample object. In some embodiments, the partitioning the N number of CUs into a plurality of clusters can include agglomerative clustering with full linkage. In some embodiments, the number of clusters can be known a priori. In some embodiments, the number of clusters can be determined by optimizing a secondary criterion. In some embodiments, a relative reconstruction error ca be calculated to measure reconstruction quality of the model. In some embodiments, the model can include a predetermined number of stacked transformer blocks, wherein the predetermined number of configurable. In some embodiments, each block of a subset of the predetermined number of stacked transformer blocks can have a predefined number of attention heads. In some embodiments, an Adam optimizer with a predetermined learning rate can be used to train the model. In some embodiments, the model can be trained for a predetermined steps with a predefined batch size.
[0039] Also provided herein is a computer implemented method for spatially profiling an input sample including a plurality of sample objects including a plurality of biological molecules, the method including: (a) generating a measurement matrix for experimental measurement data of the input sample by aid of computing units (CUs), wherein each row of the measurement matrix represents G number of biological molecules, and each column of the measurement matrix can represent TV number of CUs, wherein values in the measurement matrix indicate Unique Molecular Identifier (UMI) counts that can represent a number of interactions between the ith CU and the jth biological molecules; (b) feeding the measurement matrix into a trained Al model, wherein the Al model can be trained by: (1) obtaining a dataset comprising UMI vectors corresponding to a plurality of sample objects derived from single-cell RNA sequencing (scRNA-seq) samples; (2) generating a plurality of training inputs from the dataset, wherein each training input can include a UMI matrix representing UMI counts for number of interactions between a predetermined number of CUs and a plurality of biological molecules, and each training input can be associated with an adjacency matrix representing known proximal relationships among CUs; and (3) training the model by adjusting model parameters to match model outputs to the adjacency matrix associated with the plurality of training inputs; (c) receiving an output matrix from the trained Al model, wherein the output matrix can represent a spatial relationship among CUs, wherein the spatial relationship among CUs can identify proximal CUs that are proximal to one another, (d) aggregating the plurality of biological molecules bound to the proximal CUs to spatially profde the input sample.
[0040] Further provided herein is a computer program product for training a model, wherein the computer program product can include a non-transient machine-readable medium storing instructions that, when executed by at least one programmable processor, can cause the at least one programmable processor to perform any of the computer-implemented methods disclosed herein.
[0041] Further provided herein is a computer-implemented system for training a model including: (a) at least one programmable processor; and (b) a machine-readable medium storing instructions that, when executed by the at least one processor, cause the at least one programmable processor to perform any of the computer-implemented methods disclosed herein. [0042] Provided herein is a system for spatial profiling an input sample, the system including: (a) the input sample including a plurality of sample objects including a plurality of biological molecules; (b) a plurality of CUs, wherein each CU of the plurality of CUs can be engineered to display a first set of at least one surface-bound entity (SBE) and a second set of at least one SBE on its surface; (c) a plurality of molecular tags; and (d) any of the presently described computer implemented methods. In some embodiments, the system can further include a sequencing method.
[0043] Provided herein is a kit for spatial profiling an input sample including a plurality of sample objects including a plurality of biological molecules, the kit including: (a) a plurality of CUs, wherein each CU of the plurality of CUs can be engineered to display a first set of at least one surface-bound entity (SBE) and a second set of at least one SBE on its surface; (b) a plurality of molecular tags; and (c) an instruction for use of the kit. In some embodiments, the kit can further include an instruction for analysis using any of the presently described computer implemented methods.
[0044] Additional aspects and advantages of the present disclosure will become readily apparent to those skilled in this art from the following detailed description, wherein only illustrative embodiments of the present disclosure are shown and described. As will be realized, the present disclosure is capable of other and different embodiments, and its several details are capable of modifications in various obvious respects, all without departing from the disclosure. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.
[0045] All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference. To the extent publications and patents or patent applications incorporated by reference contradict the disclosure contained in the specification, the specification is intended to supersede and/or take precedence over any such contradictory material. BRIEF DESCRIPTION OF THE DRAWINGS
[0046] The features of the present disclosure are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present disclosure will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the disclosure are utilized, and the accompanying drawings of which:
[0047] FIG. l is a non-limiting schematic illustration of targeted sequestering of nucleic acids for multiomic data collection using the presently disclosed systems, kits, and methods.
[0048] FIG. 2 is a non-limiting schematic illustration of inducible targeted sequestering of RNA on the presently disclosed computing units (CUs).
[0049] FIG. 3A is a schematic illustration of non-limiting object types of FIG. 3E workflow. [0050] FIG. 3B is a schematic illustration of non-limiting CU types of FIG. 3E workflow. [0051] FIG. 3C is a schematic illustration of non-limiting SBE types of FIG. 3E workflow. [0052] FIG. 3D is a schematic illustration of non-limiting other components of FIG. 3E workflow.
[0053] FIG. 3E is a non-limiting schematic illustration of hashed transcriptomic readout of target objects. In some embodiments, the steps of this schematic can be performed as a continuation of the lysis of target cell & incubation step of FIG. 1 with an addition of molecular tags.
[0054] FIG. 4A is a non-limiting schematic illustration of object types and varying elements.
[0055] FIG. 4B is a non-limiting schematic illustration of multiplex hashing on mRNA biological molecules. Each sample object interacting with CUs is interacting with a unique combination of CUs.
[0056] FIG. 4C is a non-limiting schematic illustration of spatial profiling with RNA.
[0057] FIG. 4D is a non-limiting schematic illustration of object types of FIG. 4E workflow.
[0058] FIG. 4E is a non-limiting schematic illustration of multiplex hashing on protein biological molecules. Each sample object interacting with CUs is interacting with a unique combination of CUs.
[0059] FIG. 4F is a non-limiting schematic illustration of spatial profiling with protein.
[0060] FIG. 5 is a non-limiting schematic illustration of computational analysis on multiplex hashing of the presently disclosed systems, kits, and methods. [0061] FTG. 6 is a model architecture schematic for the design and training of transformer models towards the analysis of single cell transcriptomes.
[0062] FIG. 7A is a graph illustrating distribution of UMI counts per sample object across the validation dataset (median = 1312, total number of cells = 128,000).
[0063] FIG. 7B is a graph illustrating distribution of UMIs allocated to each computing unit across the validation inputs (median = 85). The significantly lower number of UMIs per computing unit (95% of computing units have less than 240 UMIs) in comparison to the number of UMIs per sample object is indicative of the low informational content and sparsity of the computing unit UMI vectors.
[0064] FIG. 7C is a graph illustrating distribution of the relative reconstruction error (RRE) per sample object across the validation inputs (median = 0). The distribution is bimodal with 56 % of the sample objects reconstructed perfectly (RRE = 0) and the remaining sample objects reconstructed with median RRE = 14 %. Sample objects exhibiting RRE greater than 50% can, for practical purposes, be considered as failed reconstructions. In this case, the proportion of failed reconstructions was approximately 3.5%.
[0065] FIG. 8 is visualization of model variables. Al-3: Example model input X projected by tSNE and cell types listed in the inset. The plot shows 150 points corresponding to 150 computing units assigned to 10 sample objects as indicated by the color coding. No separated clusters are visible. Some colocalization of computing units from same sample objects is present with most sample objects spanning the full support. Bl -3: Example embeddings of the last transformer layer projected by tSNE. The embeddings exhibit strong spatial separation that closely mirrors the sample objects. Cl-3: Visualization of the estimated adjacency matrix Y" following agglomerative clustering. Force layout is used to indicate relationships between computing units. Computing units appearing close together are predicted to be assigned to the same sample object.
[0066] FIG. 9 is cell type confusion matrix. The dataset GSE158055 further comprises a broad set of cell annotations. Confusion matrix indicates the probability a computing unit from a sample object on the y-axis is estimated to be from a sample object on the x-axis. Diagonal components thereby list probabilities that the origin of computing units is estimated correctly within the given cell type. Non-zero, off-diagonal components illustrate mismatches between similar cell types like NK and CD8 T cells. [0067] FIG. 10 is the ROC curve of the prediction of edges between Sample Objects. Mark shows the optimal value of P maximizing the accuracy of predicting whether a given pair of sample objects are connected or not.
[0068] FIG. 11 is the ROC curve of the prediction of edges between Sample Objects. Mark shows the optimal value of P maximizing the accuracy of predicting whether a given pair of sample objects are connected or not. Examples of estimated Sample Object connections comprising various Sample Object Adjacency matrix variants. Left column includes a graph representation of the model output Y. Center and right columns include graph representations of the matrices W and W, respectively.
[0069] FIG. 12 is a graph illustrating the results of the association described in Example 7.
Nodes indicate computing units, edges indicate computed associations between computing units, and the colors correspond to original associations with cells.
[0070] FIG. 13 is a confusion matrix.
[0071] FIG. 14 is a diagram of associations, where nodes indicate computing units, edges indicate associations between computing units, and colors correspond to original associations with cells. The bottom diagram indicates the true association of computing units, where computing units associated with the same cell are linked by an edge, and the top diagram indicates the computed associations.
[0072] FIG. 15 is a diagram of associations, where nodes indicate computing units, edges indicate associations between computing units, and colors correspond to original associations with cells. The bottom diagram indicates the true association of computing units, where computing units associated with the same cell are linked by an edge, and the top diagram indicates the computed associations.
[0073] FIG. 16A is schematics illustrating non-limiting exemplary structures of the RNA molecular tags on CUs. NGS denotes sequencing handles; VS denotes variable regions containing CU specific barcode; UMI denotes a randomized sequence; dT_N denotes a poly(dT) capture sequence; and TSO denotes a template switching oligo sequence.
[0074] FIG. 16B is schematics illustrating non-limiting exemplary techniques of capturing mRNAs and synthesizing cDNA using the RNA molecular tags on CLTs. [0075] FIG. 17A is a graph illustrating the number of genes expressed above threshold. HPA denotes the reference data set obtained from the Human Protein Atlas. Test Data denotes the data set obtained from the presently disclosed system.
[0076] FIG. 17B is pie charts illustrating detected or not detected genes of varying degrees of abundance.
[0077] FIG. 17C is a graph illustrating the number of genes detected per read pairs per cell.
DETAILED DESCRIPTION
[0078] The presently described systems, kits, and methods contain, inter alia, engineered or synthetic cells that can perform tasks normally executed by complex instruments. The engineered or synthetic cells of the present disclosure can bind to cells in an input sample and evaluate their profiles, allowing analysis of as many cells in a day as the currently available technologies currently do in a year, with unprecedented levels of sensitivity and specificity. Also provided herein are systems, kits, and methods for randomized multiplex hashing of discrete biological units, which allow targeted analysis of gene expression, genotype, haplotype, epigenome, and/or proteome in discrete biological units. The present disclosure can reduce resources required to analyze discrete biological units by targeting capture units to biological units that exhibit specific features, increasing specificity of analyses relative to bulk analysis. Furthermore, the presently described systems, kits, and methods are highly modular. By mixing and matching the types of engineered or synthetic cells, the target profile can be modified or customized as desired by users by using logic gates, details of which are described further below. The adaptability of the present disclosure allows for analysis of anything from e.g., rare epithelial cells in samples of lysed blood to e.g., apoptotic T-cells in primary cell cultures. Various aspects and embodiments of the present disclosure are described in greater details below.
[0079] Furthermore, the present disclosure leverages natural variations in biological molecule patterns by capturing these biological molecules on computing units (CUs) and using machine learning algorithms to establish spatial relationships between CUs. For single-cell sequencing, barcodes can be efficiently partitioned to CUs, which can interact freely with single cells without requiring a priori barcode-to-cell assignments. This method achieves near 100% cell capture due to the absence of stoichiometric constraints, unlike traditional methods that avoid overloading to prevent artifacts. This is also applicable to multicellular clusters, where CUs can interact freely with suspended clusters, and relationships can be established a posteriori. This approach can provide 3D spatial information about biological molecules and corresponding sample objects (e.g., cells) without needing organized barcode sequences. Technical advantages include the ability to analyze partially dissociated tissues, measure 3D spatial organization, and assess intercellular relationships in a massively parallel manner, surpassing the capabilities of current methods.
[0080] The following descriptions and examples illustrate embodiments of the present disclosure in detail. Although the present disclosure has been described in some details by way of illustration and example for purposes of clarity and understanding, it will be apparent that certain changes and modifications can be practiced within the scope of the appended claims. [0081] The section headings used herein are for organizational purposes only and are not to be construed as limiting the subject matter described.
[0082] Although various features of the disclosure can be described in the context of a single embodiment, the features can also be provided separately or in any suitable combination. Conversely, although the present disclosure can be described herein in the context of separate embodiments for clarity, the present disclosure can also be implemented in a single embodiment. It is to be understood that the present disclosure is not limited to the particular embodiments described herein and as such can vary. Those of skill in the art will recognize that there are variations and modifications of the present disclosure, which are encompassed within its scope. [0083] It is intended that every maximum numerical limitation given throughout this specification includes every lower numerical limitation, as if such lower numerical limitations were expressly written herein. Every minimum numerical limitation given throughout this specification will include every higher numerical limitation, as if such higher numerical limitations were expressly written herein. Every numerical range given throughout this specification will include every narrower numerical range that falls within such broader numerical range, as if such narrower numerical ranges were all expressly written herein.
[0084] All patent filings, websites, other publications, accession numbers and the like cited above or below are incorporated by reference in their entirety for all purposes to the same extent as if each individual item were specifically and individually indicated to be so incorporated by reference. If different versions of a sequence are associated with an accession number at different times, the version associated with the accession number at the effective filing date of this application is meant. The effective filing date means the earlier of the actual filing date or filing date of a priority application referring to the accession number if applicable. Likewise, if different versions of a publication, website or the like are published at different times, the version most recently published at the effective filing date of the application is meant unless otherwise indicated. Any feature, step, element, embodiment, or aspect of the disclosure can be used in combination with any other unless specifically indicated otherwise.
DEFINITION
[0085] All terms are intended to be understood as they would be understood by a person skilled in the art. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the disclosure pertains.
[0086] The following definitions supplement those in the art and are directed to the current application and are not to be imputed to any related or unrelated cases, e.g., to any commonly owned patent or application. Although any methods and materials similar or equivalent to those described herein can be used in the practice for testing of the present disclosure, the preferred materials and methods are described herein. Accordingly the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting.
[0087] In this application, the use of the singular includes the plural unless specifically stated otherwise. It must be noted that, as used in the specification, the singular forms “a”, “an”, and “the” include plural referents unless the context clearly dictates otherwise.
[0088] In this application, the use of “or” means “and/or” unless stated otherwise. The terms “and/or” and “any combination thereof’ and their grammatical equivalents as used herein, can be used interchangeably. These terms can convey that any combination is specifically contemplated. Solely for illustrative purposes, the following phrases “A, B, and/or C” or “A, B, C, or any combination thereof’ can mean “A individually; B individually; C individually; A and B; B and C; A and C; and A, B, and C”. The term “or” can be used conjunctively or disjunctively, unless the context specifically refers to a disjunctive use.
[0089] Furthermore, the use of the term “including” as well as other forms, such as “include”, “includes” and “included”, is not limiting. [0090] Reference in the specification to “some embodiments”, “an embodiment”, “one embodiment” or “other embodiments” means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least some embodiments, but not necessarily all embodiments, of the present disclosures.
[0091] As used in this specification and claim(s), the words “comprising” (and any form of comprising, such as “comprise” and “comprises”), “having” (and any form of having, such as “have” and “has”), “including” (and any form of including, such as “includes” and “include”) or “containing” (and any form of containing, such as “contains” and “contain”) are inclusive or open-ended and do not exclude additional, unrecited elements or method steps. It is contemplated that any embodiment discussed in this specification can be implemented with respect to any method or composition of the disclosure, and vice versa. Furthermore, compositions of the present disclosure can be used to achieve methods of the present disclosure. [0092] The term “about” or “approximately” means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, “about” can mean within 1 or more than 1 standard deviation, per the practice in the art. Where particular values are described in the application and claims, unless otherwise stated the term “about” meaning within an acceptable error range for the particular value should be assumed.
[0093] The term “fragment” or “variant” refers to any functional fragment, variant, derivative or analog of a polynucleotide, polypeptide or biomolecule that possesses an in vivo or in vitro activity that is characteristic of the polynucleotide, polypeptide or biomolecule. In some embodiments, the fragment, variant or analog has a length equal to about 10%, about 20%, about 30%, about 40%, about 50%, about 60%, about 70%, about 80% or about 90% or greater of the length of the polynucleotide, polypeptide, or biomolecule. Functional expression of the fragment or variant can be easily assayed by the person of ordinary skill in the art by testing activity and the ability to manufacture products as described herein.
[0094] It is appreciated that certain features of the disclosure, which are, for clarity, described in the context of separate embodiments, can also be provided in combination in a single embodiment. Conversely, various features of the disclosure, which are, for brevity, described in the context of a single embodiment, can also be provided separately or in any suitable sub- combination. All combinations of the embodiments pertaining to the disclosure are specifically embraced by the present disclosure and are disclosed herein just as if each and every combination was individually and explicitly disclosed. In addition, all sub-combinations of the various embodiments and elements thereof are also specifically embraced by the present disclosure and are disclosed herein just as if each and every such sub combination was individually and explicitly disclosed herein.
METHODS OF THE DISCLOSURE
[0095] The present disclosure relates to, inter alia, methods for detecting target-specific output signals within an input sample, which enhance accuracy and efficiency in single-cell research. For example, the methods provided herein can consist of contacting sample objects with computing units (CUs) engineered to display surface-bound entities (SBEs) capable of binding biological molecules. In another example, the methods provided herein can utilize multiplexed hashing, involves partitioning CUs to interact with specific sample objects multiple times. The SBEs on CUs can bind biological molecules within the sample objects. The presence of targetspecific output signals can then be determined by assigning biological molecules to hash groups based on interactions and using interaction frequencies to evaluate signals associated with sample objects.
[0096] The present disclosure further relates to methods for targeted analysis of gene expression, genotype, haplotype, epigenome, and/or proteome in discrete biological units. The present disclosure also relates to systems and kits for implementing the methods of the present disclosure. The methods of the present disclosure can reduce resources required to analyze discrete biological units by targeting capture units to biological units that exhibit specific features. In this aspect, the methods of the present disclosure can increase specificity of analyses relative to bulk analysis. Furthermore, the present disclosure can enable analysis of individual biological units by multiplexed hashing.
[0097] Also presented herein are highly adaptable and modular methods of rapidly analyzing an input sample with unprecedented levels of sensitivity and specificity, allowing analysis of as many cells in a day as the currently available technologies currently do in a year. For example, by mixing and matching different types of engineered cells comprising one or more computing units (CUs), the target profile can be modified and customized in any way the user desires. Modular logical operator modules can also be customized based on the target profde. The logical operator modules can receive one or more input signals (e.g., sample objects) and can generate one or more output signals (e.g., output signal objects (SOs)) based on the specified modules.
[0098] For example, the sample objects (derived from the input sample) can be mixed with the Reaction Reagent and Reaction Medium to allow the sample objects to interact with the CUs of the present disclosure, thereby forming one or more computational clusters and generating output signals. If the sample objects (e.g., cells) match the targeted profile, the engineered cells exit their dormant state and enter an activated state, wherein they can produce easily measureable output signals (e.g., reporter signals). The presence or absence of output signals can be read and/or quantified by use of the readout reagent. The target profile can be modified in any way the user desires by using the logical operator module of the present disclosure. In a non-limiting example, a two-input “AND” gate requires two specific e.g., antigens to both be present on the bound cell to trigger a positive readout. Various aspects and embodiments of the logical operator modules are described in greater detail below.
[0099] Accordingly, in one aspect, provided herein is a method of detecting presence of one or more target specific output signals indicative of a characteristic of an input sample, the method including: (a) contacting a plurality of sample objects derived from the input sample with a plurality of computing units (CUs), wherein a sample object of the plurality of sample objects can include a plurality of biological molecules, wherein each CU of the plurality of CUs can be engineered to interact with the plurality of sample objects and can display a first set of at least one surface-bound entity (SBE) on its surface only upon interacting with the plurality of sample objects, and wherein the first set of at least one SBE is capable of binding to the plurality of biological molecules; (b) lysing the plurality of sample objects to release the plurality of biological molecules such that the plurality of biological molecules can be released from the plurality of sample objects and can bind to the first set of at least one SBE; and (c) detecting presence of the one or more target specific output signals indicative of the characteristic of the input sample by measuring an amount of the plurality of biological molecules bound to the first set of at least one SBE, wherein the amount can be indicative of the characteristic of the input sample. In some embodiments, a first CU of the plurality of CUs can interact with a first sample object of the plurality of sample objects a first number of times, the first CU can interact with a second sample object of the plurality of sample objects a second number of times, a second CU of the plurality of CUs can interact with the first sample object a third number of times, and the second CU can interact with the second sample object a forth number of times. In some embodiments, the plurality of biological molecules can be assigned to one or more hash group using the first set of at least one SBE. In some embodiments, the first number, the second number, the third number, and the forth number of interactions can be utilized to evaluate the one or more target specific output signals associated with the plurality of sample objects.
[0100] Also provided herein is a method of detecting presence of one or more target specific output signals indicative of a characteristic of an input sample via multiplexed hashing, the method including: (a) contacting a plurality of sample objects derived from the input sample with a plurality of CUs, wherein a sample object of the plurality of sample objects can include a plurality of biological molecules, wherein each CU of the plurality of CUs can be engineered to interact with the plurality of sample objects and display a first set of at least one SBE on its surface, wherein the plurality of CUs can be partitioned, such that a first CU of the plurality of CUs can interact with a first sample object of the plurality of sample objects a first number of times, the first CU can interact with a second sample object of the plurality of sample objects a second number of times, a second CU of the plurality of CUs can interact with the first sample object a third number of times, and the second CU can interact with the second sample object a forth number of times, and wherein the first set of at least one SBE is capable of binding to the plurality of biological molecules; (b) lysing the plurality of sample objects to release the plurality of biological molecules such that the plurality of biological molecules can be released from the sample object and binds to the first set of at least one SBE; (c) detecting presence of the one or more target specific output signals indicative of the characteristic of the input sample by (1) assigning the plurality of biological molecules to one or more hash group using the first set of at least one SBE, and (2) utilizing the first number, the second number, the third number, and the forth number of interactions to evaluate the one or more target specific output signals associated with the plurality of sample objects.
[0101] In any of the methods described herein, at least two of the plurality of sample objects can be bound to each other. In some embodiments, the each CU interacting with the plurality of sample objects can depend on the binding of the at least two of the plurality of sample objects. In some embodiments, the plurality of biological molecules bound to the first set of at least one SBE can be washed to remove any unbound biological molecules before step (c). In some embodiments, the first CU can display the first set of at least one SBE that can be different from what the second CU displays on its surface. In some embodiments, at least one of the first number, the second number, the third number, and the forth number can be one. In some embodiments, at least one of the first number, the second number, the third number, and the forth number can be greater than one. In some embodiments, at least two of the first number, the second number, the third number, and the forth number can be the same. In some embodiments, at least two of the first number, the second number, the third number, and the forth number can be different. In some embodiments, the first number and the third number can be indicative of a characteristic of the first sample object, and the second number and the forth number can be indicative of a characteristic of the second sample object.
[0102] Non-limiting embodiments of the presently described methods are illustrated in FIGS.
1, 2, 3A-3E, and 4A-4D
[0103] In some embodiments, the plurality of biological molecule can be RNA. In some embodiments, the plurality of biological molecule can be protein or peptide.
[0104] In some embodiments, the plurality of sample objects in contact with the plurality of CUs in the contacting step (a) can be incubated for a sufficient amount of time in an incubator for the each CU to display the at least one SBE on its surface. In some embodiments, the plurality of sample objects in contact with the plurality of CUs can be incubated in a crowding agent. In some embodiments, the crowing agent can be a hydrogel. In some embodiments, the crowding agent can include, but not limited to, polyethylene glycol (PEG), sucrose, urea, Ficoll, dextran, cellulose, chitosan, poly(lactic-co-glycolic acid), hydroxypropyl methylcellulose (HPMC), poly(N-isopropylacrylamide) (PNIPAAm), poly(2-hydroxyethyl methacrylate) (PHEMA), poly(vinyl caprolactam) (PVCL), composite synthetic/proteinaceus hydrogel, such as PEG/PVA/PVP+gelatin, biopolymer hydrogel, such as chitosan, hyaluronic acid, silk fibroin, and their functionalized variants, and/or protein, such as BSA. In some embodiments, the crowding agent can be temperature responsive, such as, but not limited to, hydroxypropyl methylcellulose (HPMC), poly(N-isopropylacrylamide) (PNIPAAm), poly(2-hydroxyethyl methacrylate) (PHEMA), poly(vinyl caprolactam) (PVCL). In some embodiments, the crowding agent can be temperature, pH, and/or osmolarity responsive, such as, but not limited to, composite synthetic/proteinaceus hydrogel (e.g., PEG/PVA/PVP+gelatin) and/or biopolymer hydrogel (e.g., chitosan, hyaluronic acid, silk fibroin, and their functionalized variants).
[0105] Also provided herein are methods and systems for spatially profiling and single-cell sequencing of biological samples. A plurality of sample objects, which include various biological molecules, can be contacted with computing units (CUs). Each CU displays surfacebound entities (SBEs) capable of binding to the sample objects and biological molecules. The sample objects can be permeabilized to release the biological molecules, which then bind to the SBEs associated with molecular tags. These molecular tags can facilitate the establishment of spatial relationships between the molecular tags and the sample objects by evaluating the proximity of CUs using machine learning algorithms.
[0106] In some embodiments, the presently described methods can include steps for reverse transcription of the biological molecules for sequencing, and various aspects of the molecular tags and computing units, such as the use of barcodes, unique molecular identifiers (UMIs), release elements, and linkers are described in detail infra.
[0107] In some embodiments, machine learning algorithms can be employed to train models that analyze the proximity of CUs and derive spatial relationships among the biological molecules and sample objects. The algorithms can utilize UMI vectors from single-cell RNA sequencing data to generate training inputs, which are used to train the model by matching outputs to known adjacency matrices. The trained model can then be used to process experimental data, generating output matrices that represent spatial relationships among CUs, and aggregating biological molecules bound to proximal CUs to achieve spatial profiling.
[0108] Various aspects and embodiments of the presently disclosed methods are explained in greater details below. Below section headings are for organizational purposes only and are not to be construed as limiting the subject matter described.
Multiplex Hashing
[0109] The present disclosure can enable analysis of individual biological units by multiplexed hashing. In any of the methods described herein, in some embodiments, a first CU of the plurality of CUs can interact with a first sample object of the plurality of sample objects a first number of times, the first CU can interact with a second sample object of the plurality of sample objects a second number of times, a second CU of the plurality of CUs can interact with the first sample object a third number of times, and the second CU can interact with the second sample object a forth number of times. A non-limiting aspect of sample objects interacting with CUs is illustrated in FIGS. 1, 2, 3A-3E, and 4A-4D.
[0110] In some embodiments, the first CU can display the at least one SBE that is different from what the second CU displays on its surface. In some embodiments, at least one of the first number, the second number, the third number, and the forth number can be one. In some embodiments, at least one of the first number, the second number, the third number, and the forth number can be greater than one. In some embodiments, at least two of the first number, the second number, the third number, and the forth number can be the same. In some embodiments, at least two of the first number, the second number, the third number, and the forth number can be different. In some embodiments, the first number and the third number can be indicative of a characteristic of the first sample object, and the second number and the forth number can be indicative of a characteristic of the second sample object.
[0111] In some embodiments, the at least one SBE can be two or more SBEs. In some embodiments, a plurality of molecular tags can be added after lysing in step. In some embodiments, each CU of the plurality of CUs can further display a second set of at least one SBE. In some embodiments, the plurality of molecular tags can associate with the second set of at least one SBE. In some embodiments, the second set of at least one SBE can be incapable of binding to the plurality of biological molecules. In some embodiments, the plurality of molecular tags that do not associate with the second set of at least one SBE can be removed by washing. In some embodiments, a first molecular tag of the plurality of molecular tags can associate uniquely with a first SBE of the second set of at least one SBE, and a second molecular tag of the plurality of molecular tags can associate uniquely with a second SBE of the second set of at least one SBE. In some embodiments, the plurality of molecular tag and the second set of at least one SBE can be associated either before or after each CU interacts with the sample object. In some embodiments, a molecular tag of the plurality of molecular tags can include a hash element and a priming element. In some embodiments, the priming element can be a random N-mer. In some embodiments, the hash element can include a unique molecular identifier (UMI), molecular barcoding that can provide error correction and increased accuracy during sequencing. In some embodiments, the molecular tag can further include a recognition element, a unique sequence for recognizing a specific SBE. In some embodiments, the molecular tag can further include a sequencing element. In some embodiments, the molecular tag can further include a unique protein binding domain. In some embodiments, the unique protein binding domain can be an antibody binding domain. In some embodiments, the antibody binding domain can be protein G. In some embodiments, the plurality of biological molecules can be DNA, RNA, or protein. In some embodiments, the plurality of biological molecules can be DNA.
[0112] In some embodiments, the plurality of biological molecules can be RNA. In some embodiments, any of the methods described herein, wherein the biological molecule is RNA, can further include (1) poly(A) priming of the RNA to the priming element of the molecular tag, and (2) reverse transcribing the RNA; after lysing step and before detecting step.
[0113] In some embodiments, the plurality of biological molecules can be protein. In some embodiments, the protein can include a barcode. In some embodiments, the protein can be an antibody with the barcode. In some embodiments, any of the methods described herein, wherein the biological molecule is protein, can further include (1) capturing the protein via the sequencing element or the unique protein binding domain, (2) priming the barcode via the priming element, and (3) extending the priming element with a strand displacing polymerase; after lysing step and before detecting step. In some embodiments, any of the methods described herein, wherein the biological molecule is protein, can further include (1) capturing the protein via the unique protein binding domain, (2) releasing the protein from the molecular tag; after lysing step and before detecting step. In some embodiments, a protein-molecular tag complex can be optionally stabilized via crosslinking. In some embodiments, the cross-linking can be via an amine-reactive cross-linker. In some embodiments, the measuring can further comprises determining an amount of the molecular tag associated with the plurality of biological molecules. [0114] In some embodiments, the determining can be performed using qPCR, sequencing, gel electrophoresis, isothermal amplification, ELISA, or mass spectrometry. In some embodiments, the sequencing can be Next-Generation sequencing or Sanger sequencing. In some embodiments, in any of the methods described herein can further include (3) computing the amount of the molecular tag such that the plurality of biological molecules in the first sample object can be differentiated from the plurality of biological molecules in the second sample object. [0115] In some embodiments, the plurality of sample objects in contact with the plurality of CUs in the contacting step can be incubated for a sufficient amount of time in an incubator for the each CU to display the first set of at least one SBE and the second set of at least one SBE on its surface. In some embodiments, the sufficient amount of time can be about 2 hours, about 2.1 hours, about 2.2 hours, about 2.3 hours, about 2.4 hours, about 2.5 hours, about 2.6 hours, about 2.7 hours, about 2.8 hours, about 2.9 hours, about 3 hours, about 3.1 hours, about 3.2 hours, about 3.3 hours, about 3.4 hours, about 3.5 hours, about 3.6 hours, about 3.7 hours, about 3.8 hours, about 3.9 hours, about 4 hours, about 4.1 hours, about 4.2 hours, about 4.3 hours, about 4.4 hours, about 4.5 hours, about 4.6 hours, about 4.7 hours, about 4.8 hours, about 4.9 hours, about 5 hours, about 5.1 hours, about 5.2 hours, about 5.3 hours, about 5.4 hours, about 5.5 hours, about 5.6 hours, about 5.7 hours, about 5.8 hours, about 5.9 hours, or about 6 hours. In some embodiments, the sufficient amount of time can be about 3 hours. In some embodiments, the sufficient amount of time can be about 3.5 hours. In some embodiments, the sufficient amount of time can be about 4 hours. In some embodiments, the sufficient amount of time can be about 4.5 hours. In some embodiments, the sufficient amount of time can be about 5 hours. In some embodiments, the plurality of sample objects in contact with the plurality of CUs can be incubated in a crowding agent. In some embodiments, the crowing agent is a hydrogel.
[0116] Also provided herein for multiplex hashing is a computer-implemented method, the method including: (a) compiling a measurement matrix Ml including from measurements of biological molecules derived from a set of sample objects interacting with CUs, wherein the measurement matrix Ml can be partitioned by molecular tags assigned to the biological molecules; wherein the molecular tags can be assigned in sufficiently different proportions to biological molecules derived from different sample objects, wherein each column of the matrix Ml can represent measurements associated with a same molecular tag, and each row represents measurements associated with a same biological molecule; and (b) generating a profile of a subset of sample objects based at least in part on the measurement matrix Ml.
[0117] In some embodiments, the subset of sample objects can be class B sample objects denoted by matrix B, and wherein the class B sample objects can be represented by a set of vectors bz {bl, b2, . .. bk} (z = 1, 2, 3, ... k) and where the vector bz = (rlz, r2z, . .. rmz) can be indicative of typical amounts for some of biological molecules for class B sample objects. In some embodiments, the set of sample objects can include a plurality of classes of sample objects, wherein the plurality of classes can be denoted by Bz {Bl, B2, . . . Bk), and wherein each class Bz of sample objects can be represented by a set of vectors {bz 1, bz'2, bzkz} and where the vector bzy = fri z/, r2z/, rmz/) can be indicative of typical amounts for some of the biological molecules for class Bz sample objects. In some embodiments, the profile of a subset of sample objects can be estimated based on the measurements with aid of a machine learning algorithm. In some embodiments, the machine learning algorithm can include neural network algorithm. In some embodiments, computing the profile of a subset of sample objects can further include: (a) computing proportions in which each molecular tag of the molecular tags is assigned to the biological molecules derived from each sample object class Bz, wherein the proportions can be denoted by the optimal transformation matrix A; and (b) computing the profile of a subset of sample objects based at least in part on the measurement matrix Ml and the optimal transformation matrix A. In some embodiments, computing the optimal transformation matrix A can include an operation that can utilize an optimization algorithm to minimize the absolute difference between transformation matrix A-transformed matrix B and a truncated measurement matrix M. In some embodiments, computing the matrix A can include computing the matrix A by an optimization algorithm, wherein the optimization algorithm can be a linear program that can be defined as: minimize \M — BA\ subject to, A > 0, |71 |x = c, i = 1, ... , n wherein (a) matrix M can be compiled from truncated hashed measurements, wherein ith column of M can be a vector of measurements associated with the same molecular tag, and jth row of M corresponds to measurements of the jth biological molecule, (b) matrix B can be compiled from the class vectors btj, wherein columns of B correspond to the vectors and the jth row of B corresponds to the jth biological molecule, (c) matrix A can represent the optimization variable denoting the proportions in which each molecular tag can be assigned to the biological molecules derived from each sample object class Bi, and (d) optimization constraints can be indicative of physical limitations, with the constrain |A|X = c being optional and corresponds to a case where a number of measured sample objects is known. [0118] In some embodiments, the profile can be transcriptomic profile, proteomic profile, or multiomic profile. In some embodiments, the profile can include probabilities specific sample objects that can be bound to each other.
[0119] Also provided herein is a computer-implemented method for analyzing multiplexed hashed measurements of biological molecules, the method including: (a) accessing, at one or more computing devices, the multiplexed hashed measurements of biological molecules, (b) utilizing a numerical classification engine stored in one or more memories of the one or more computing devices, to ascertain if the accessed measurements partially derive from a sample object of class Bi; and (c) training the numerical classification engine via supervised learning, to classify the multiplexed hashed measurements into a plurality of classes; wherein the training data can include multiplexed hashed measurements and the classes of corresponding sample objects from which the measured biological molecules originate, and wherein the training data can be obtained by either direct measurement or being synthesized from single cell molecular data.
[0120] Further provided herein is a computer-implemented method for analyzing multiplexed hashed measurements of biological molecules, the method including: (a) accessing, at one or more computing devices, the multiplexed hashed measurements of biological molecules; (b) utilizing a numerical regression engine stored in one or more memories of the one or more computing devices, to determine the number of sample objects of class Bi that generated the accessed measurements; and (c) training the numerical regression engine via supervised learning, to perform regression on the multiplexed hashed measurements; wherein the training data can include multiplexed hashed measurements, the number and the classes of corresponding sample objects from which the measured biological molecules originate, and wherein the training data can be obtained by either direct measurement or being synthesized from single cell molecular data.
[0121] Further provided herein is a computer-implemented method for analyzing multiplexed hashed measurements of biological molecules, the method including: (a) accessing, at one or more computing devices, multiplexed hashed measurements of biological molecules; (b) determining, using a numerical classification engine stored in one or more memories of the one or more computing devices, whether the accessed measurements in part generate from a sample object described by a set of vectors {bl, ..., bm{; and (c) training, the numerical classification engine using supervised learning, to classify multiplexed hashed measurements into two classes, true or false, based on training data comprising multiplexed hashed measurements and the representative vectors of sample objects from which the measured biological molecules originate; wherein the training data being obtained by either direct measurement or being synthesized from single cell molecular data.
[0122] Also provided herein is a computer-implemented method for analyzing multiplexed hashed measurements of biological molecules, the method including: (a) accessing, at one or more computing devices, multiplexed hashed measurements of biological molecules; (b) determining, using a numerical regression engine stored in one or more memories of the one or more computing devices, the number of sample objects described by a set of vectors (bl, ..., bm} that in part generated the accessed measurements; (c) training the numerical classification engine using supervised learning, to performa regression on the multiplexed hashed measurements; wherein training data comprises the multiplexed hashed measurements and the number and representative vectors of sample objects from which the measured biological molecules originate; and wherein the training data being obtained by either direct measurement or being synthesized from single cell molecular data.
Computation After Sequencing
[0123] In an aspect, provided herein is a computer-implemented method, the method including: (a) compiling a measurement matrix Ml comprising from measurements of biological molecules derived from a set of sample objects interacting with CUs, wherein the measurement matrix Ml can be partitioned by molecular tags assigned to the biological molecules; wherein the molecular tags can be assigned in sufficiently different proportions to biological molecules derived from different sample objects, wherein each column of the matrix Ml represents measurements associated with a same molecular tag, and each row represents measurements associated with a same biological molecule; and (b) generating a profile of a subset of sample objects based at least in part on the measurement matrix Ml .
[0124] In some embodiments, the subset of sample objects can be class B sample objects denoted by matrix B, and wherein the class B sample objects can be represented by a set of vectors bi {bl, b2, ... bk} (i = 1, 2, 3, .. . k) and where the vector bi = (rli, r2i, . .. rmi) can be indicative of typical amounts for some of biological molecules for class B sample objects. In some embodiments, the set of sample objects comprises a plurality of classes of sample objects, wherein the plurality of classes is denoted by Bi {Bl, B2, .. . Bk}, and wherein each class Bi of sample objects can be represented by a set of vectors {bi 1 , bi2, biki } and where the vector bij = (rlij, r2ij, rmij) can be indicative of typical amounts for some of the biological molecules for class Bi sample objects. In some embodiments, the profile of a subset of sample objects can be estimated based on the measurements with aid of a machine learning algorithm. In some embodiments, the machine learning algorithm comprises neural network algorithm. In some embodiments, computing the profile of a subset of sample objects further comprises: (a) computing proportions in which each molecular tag of the molecular tags can be assigned to the biological molecules derived from each sample object class Bi, wherein the proportions can be denoted by the optimal transformation matrix A; and (b) computing the profile of a subset of sample objects based at least in part on the measurement matrix Ml and the optimal transformation matrix A.
[0125] In some embodiments, computing the optimal transformation matrix A comprises an operation that utilizes an optimization algorithm to minimize the absolute difference between transformation matrix A-transformed matrix B and a truncated measurement matrix M. In some embodiments, computing the matrix A comprises computing the matrix A by an optimization algorithm, wherein the optimization algorithm can be a linear program that is defined as: minimize \M — BA\ subject to,
Figure imgf000040_0001
wherein: (a) matrix M can be compiled from truncated hashed measurements, wherein h column of M is a vector of measurements associated with the same molecular tag, and jth row of M corresponds to measurements of the jth biological molecule, (b) matrix /? can be compiled from the class vectors bLj, wherein columns of B correspond to the vectors and the jth row of B corresponds to the jth biological molecule, (c) matrix A can represent the optimization variable denoting the proportions in which each molecular tag can be assigned to the biological molecules derived from each sample object class Bi, and (d) optimization constraints can be indicative of physical limitations, with the constrain lA^ = c being optional and can correspond to a case where a number of measured sample objects can be known. [0126] In some embodiments, the profile can be transcriptomic profile, proteomic profile, or multiomic profile. In some embodiments, the profile can include probabilities specific sample objects can be bound to each other.
[0127] Further provided herein is a computer-implemented method for analyzing multiplexed hashed measurements of biological molecules, comprising: (a) accessing, at one or more computing devices, the multiplexed hashed measurements of biological molecules, (b) utilizing a numerical classification engine stored in one or more memories of the one or more computing devices, to ascertain if the accessed measurements partially derive from a sample object of class Bi; and (c) training the numerical classification engine via supervised learning, to classify the multiplexed hashed measurements into a plurality of classes; wherein the training data comprises multiplexed hashed measurements and the classes of corresponding sample objects from which the measured biological molecules originate, and wherein the training data can be obtained by either direct measurement or being synthesized from single cell molecular data.
[0128] Further provided herein is a computer-implemented method for analyzing multiplexed hashed measurements of biological molecules, comprising: (a) accessing, at one or more computing devices, the multiplexed hashed measurements of biological molecules; (b) utilizing a numerical regression engine stored in one or more memories of the one or more computing devices, to determine the number of sample objects of class Bi that generated the accessed measurements; and (c) training the numerical regression engine via supervised learning, to perform regression on the multiplexed hashed measurements; wherein the training data comprises multiplexed hashed measurements, the number and the classes of corresponding sample objects from which the measured biological molecules originate, and wherein the training data can be obtained by either direct measurement or being synthesized from single cell molecular data.
[0129] Also provided herein is a computer-implemented method for analyzing multiplexed hashed measurements of biological molecules, comprising: (a) accessing, at one or more computing devices, multiplexed hashed measurements of biological molecules; (b) determining, using a numerical classification engine stored in one or more memories of the one or more computing devices, whether the accessed measurements in part generate from a sample object described by a set of vectors {bl, ..., bm}; and (c) training, the numerical classification engine using supervised learning, to classify multiplexed hashed measurements into two classes, true or false, based on training data comprising multiplexed hashed measurements and the representative vectors of sample objects from which the measured biological molecules originate; wherein the training data being obtained by either direct measurement or being synthesized from single cell molecular data.
[0130] Also provided herein is a computer-implemented method for analyzing multiplexed hashed measurements of biological molecules, comprising: (a) accessing, at one or more computing devices, multiplexed hashed measurements of biological molecules; (b) determining, using a numerical regression engine stored in one or more memories of the one or more computing devices, the number of sample objects described by a set of vectors {bl, bm} that in part generated the accessed measurements; (c) training the numerical classification engine using supervised learning, to perform a regression on the multiplexed hashed measurements; wherein training data comprises the multiplexed hashed measurements and the number and representative vectors of sample objects from which the measured biological molecules originate; and wherein the training data being obtained by either direct measurement or being synthesized from single cell molecular data.
[0131] Accordingly, an aspect of presently disclosed methods pertains to a novel method of analyzing sequencing data. In some embodiments, the sequencing is single-cell RNA sequencing (scRNAseq). Multiplex hashing and transformation can encompass a series of data processing steps to refine, cluster, and derive meaningful insights from e.g., scRNAseq datasets. The present disclosure further includes a neural network model for training and classification, as well as a transformation matrix computation process that can facilitate the extraction of biological molecule vectors from the data. An aspect of multiplex hashing can comprises several interconnected processes, each contributing to the comprehensive analysis of e.g., scRNAseq data.
[0132] Neural network processes can begin with the training phase, where the model learns from labeled data, gaining an understanding of patterns that correspond to specific cell clusters or other relevant characteristics. These patterns can encompass gene expression profiles associated with distinct cell types or conditions. Following training, the neural network can perform classification tasks on new, unlabeled data points. Leveraging the patterns it has learned, it can assign cells to specific clusters or categories, facilitating the identification of cell types, states, or conditions. This supervised learning approach underpins the neural network's ability to make predictions on new data, even when labels can be absent. Moreover, neural networks can be designed as deep learning models, featuring multiple hidden layers. This deep architecture can be particularly advantageous for e.g., scRNAseq data analysis, which often involves high-dimensional, intricate data patterns. The flexibility of the neural network can allow customization of its architecture and complexity to suit specific analysis needs. When integrated with other processes like multiplex hashing and transformation matrix computation, the neural network can streamline e.g., scRNAseq data analysis. By automating cell classification based on gene expression profiles, it can accelerate the analysis process, minimizes human bias, and unveils subtle patterns that manual methods can overlook.
[0133] In an aspect, the disclosed method can be designed for the analysis of multiplexed hashed measurements of biological molecules. The method can include compiling a measurement matrix, denoted as Ml, from measurements of biological molecules. In some embodiments, these biological molecules can be derived from a set of sample objects that interact with CUs. The measurement matrix Ml can be partitioned by molecular tags that can be assigned to the biological molecules. These molecular tags can be assigned in sufficiently different proportions to biological molecules derived from different sample objects. In some embodiments, each column of the matrix Ml can represent measurements associated with a same molecular tag, and each row can represent measurements associated with a same biological molecule. In some embodiments, the method can further include generating a profile of a subset of sample objects. This profile generation is based at least in part on the measurement matrix Ml. The profile can provide a comprehensive representation of the subset of sample objects. The profile can be used to gain insights into the biological characteristics of the sample objects, facilitating further analysis and interpretation of the data.
[0134] As discussed herein elsewhere, the measurement matrix Ml can be divided based on molecular tags. These tags can be allocated to the biological molecules in a way that guarantees adequate (i.e., sufficient) distinction and/or sufficiently different proportions between molecules originating from different sample objects. The process of assigning molecular tags can facilitate the differentiation and recognition of the biological molecules according to their source and/or origin. This sufficiently different proportions between molecules from different sample objects mathematically can enable the eventual generation of a profile for a subset of sample objects, as described herein elsewhere. In some embodiment, the sufficiently different proportions can include where, for example, molecular tags can be assigned to biological molecules derived from sample object A in a proportion of about 60%, and to biological molecules derived from sample object B in a proportion of about 40%. This difference in proportions can allow for the differentiation of the biological molecules based on their source, and subsequently, the generation of distinct profiles for subsets of sample objects. In some embodiments, consider various scenarios where molecular tags can be assigned to biological molecules derived from two different sample objects: A and B. The proportions of molecular tags assigned can vary, for instance, about 70% for A and about 30% for B, or about 65% for A and about 35% for B, or even about 78% for A and about 22% for B. It’s worth noting that these proportions are merely illustrative and do not limit the scope of the technology. The actual proportions can be adjusted based on the specific requirements of the biological analysis. It is also worth noting that the threshold and/or definition for sufficiently different proportions can vary in different situations and/or cells. As discussed herein elsewhere, these proportions can then be used in the computation of the transformation matrix A, which can encapsulate the proportions in which each molecular tag can be assigned to the biological molecules derived from each sample object class. The transformation matrix A can then be applied to the measurement matrix Ml to generate the profile of a subset of sample objects, providing a comprehensive representation of the subset of sample object. The details for the computation of the transformation matrix A are described herein elsewhere.
[0135] Referring back to matrix Ml, as discussed herein elsewhere, each row of the matrix Ml can represent measurements associated with a same biological molecule. This can indicate that all measurements within a single row are associated with the same biological molecule, thereby representing the same type of molecular tag. This row-wise organization of the measurements can facilitate the analysis of the data, as it allows for the efficient comparison and differentiation of measurements associated with the same biological molecule. In some embodiments, following the compilation of the measurement matrix and the assignment of molecular tags, the method can include generating a profile of a subset of sample objects. This profile generation can be based at least in part on the measurement matrix Ml . The profile can provide a comprehensive representation of the subset of sample objects. The profile can be used to gain insights into the biological characteristics of the sample objects, facilitating further analysis and interpretation of the data. [0136] In the disclosed method, a subset of sample objects can be generated, specifically class B sample objects. In some embodiments, these class B sample objects can be represented by a matrix denoted as B. The representation of these sample objects can further be detailed by a set of vectors, denoted as bi {bl, b2, ... bk} where i = 1, 2, 3, . .. k. Each vector in this set, bi, is a mathematical construct that can encapsulate a series of values, each value represented as rli, r2i, . . . rmi. These values within the vector can be indicative of typical amounts for some of the biological molecules for class B sample objects. In some embodiments, each vector bi in the set can be a representation of a class B sample object. The elements of the vector, rli, r2i, .. . rmi, can represent measurements associated with specific biological molecules derived from the class B sample object. These measurements can capture the quantities or proportions of the biological molecules within the sample object, providing a comprehensive representation of its molecular composition.
[0137] In some embodiments, the set of sample objects can comprise a plurality of classes of sample objects. Each class of sample objects can be denoted by Bi {Bl, B2, ... Bk}. This notation can signify that there can be multiple classes of sample objects, each class being distinct and represented by a different Bi. Each class Bi of sample objects can be represented by a set of vectors, denoted as {bi 1 , bi2, biki } . This set of vectors can provide a mathematical representation of the class Bi sample objects. Each vector in the set, bij, is a mathematical construct that can encapsulate a series of values, each value represented as rlij, r2ij, . .. rmij. These values within the vector can be indicative of typical amounts for some of the biological molecules for class Bi sample objects.
[0138] In some embodiments, the computation of the profile of a subset of sample objects can include the computation of proportions in which each molecular tag can be assigned to the biological molecules derived from each sample object class Bi. This computation can enable the differentiation and identification of the biological molecules based on their origin. As discussed herein elsewhere, the assignment of molecular tags can ensure sufficient differentiation between the molecules derived from different sample objects. The proportions and/or optimal proportions in which each molecular tag is assigned to the biological molecules can be denoted by the transformation matrix, referred to as the optimal transformation matrix A or transformation matrix A. The transformation matrix A can be a mathematical construct that can encapsulate the proportions in which each molecular tag can be assigned to the biological molecules derived from each sample object class Bi. Each element of the matrix A can represent a proportion, indicating the extent to which a particular molecular tag can be assigned to the biological molecules derived from a specific sample object class Bi. This matrix A can provide a comprehensive representation of the assignment of molecular tags, capturing the intricate details of the molecular composition and interactions within the sample objects.
[0139] In some embodiments, the computation of the profile of a subset of sample objects can be based at least in part on the measurement matrix Ml and the transformation matrix A. The computation of the profile can include applying the transformation matrix A to the measurement matrix Ml . This applying can transform the measurements into a form that aligns with the proportions in which each molecular tag can be assigned to the biological molecules.
[0140] In some embodiments, an optimization algorithm can be employed to compute the transformation matrix A. This algorithm can align the basis vectors with the corresponding measurements obtained from the scRNAseq data. Optimization algorithms can be mathematical techniques used to find the optimum solution from a set of possible solutions. In the context of the subject matter discussed herein, the optimization algorithm can be employed to compute the transformation matrix A that can minimize the absolute difference between the transformed matrix of basis vectors and the matrix of measurements.
[0141] One of the objectives of this optimization can be to minimize the absolute difference between the transformation matrix-transformed matrix B and a truncated measurement matrix M. This objective function can quantify the dissimilarity between these two matrices, and the algorithm can seek to find the transformation matrix A that minimizes this dissimilarity. The choice of optimization algorithm can vary depending on the specific problem and the mathematical characteristics of the data. In the subject matter discussed herein, the optimization algorithm can be designed to handle the high-dimensional and complex nature of the biological data, thereby enhancing the accuracy and efficiency of the analysis.
[0142] The profile generated can provide a comprehensive representation of a subset of sample objects, capturing the intricate details of their molecular composition and interactions. This profile can be of various types, including a transcriptomic profile, a proteomic profile, or a multiomic profile, depending on the specific characteristics of the biological data and the objectives of the analysis. In some embodiments, the profile can include probabilities that specific sample objects are bound to each other. These probabilities can provide insights into the likelihood of interactions between different sample objects, facilitating the understanding of their complex biological relationships. These probabilities can be derived from the measurements of biological molecules and the assignment of molecular tags, providing a quantitative measure of the interactions between the sample objects.
[0143] In an aspect, the method can include a process for analyzing multiplexed hashed measurements of biological molecules. The method can include accessing the multiplexed hashed measurements of biological molecules. The measurements can be derived from a variety of sources, including but not limited to, biological experiments, simulations, and databases. The access to these measurements can be facilitated by various data retrieval techniques, such as database queries, file reading operations, and network communications.
[0144] In some embodiments, the method can further include utilizing a numerical classification or regression engine. This engine can be stored in one or more memories of the one or more computing devices. The classification or regression engine can be a computational model that can be designed to perform classification or regression tasks on the accessed measurements. The classification task can involve assigning each measurement to one of a set of predefined classes, while the regression task involves predicting a continuous output value based on the measurements. The classification or regression engine can utilize various mathematical and statistical techniques to perform these tasks, including but not limited to, decision trees, support vector machines, and neural networks.
[0145] The numerical classification or regression engine can be trained via supervised learning. Supervised learning is a type of machine learning where the model learns from a set of labeled training data. The training data can comprise multiplexed hashed measurements and the classes or output values of corresponding sample objects from which the measured biological molecules originate. The training data can be obtained by either direct measurement or being synthesized from single cell molecular data. During the training process, the engine can learn to map the input measurements to the output classes or values by adjusting its internal parameters. This learning process can be typically guided by a loss function, which can quantify the difference between the predicted outputs and the actual outputs. The goal of the training process can be to minimize this loss function, thereby improving the accuracy of the engine’s predictions. The trained numerical classification or regression engine can be used to analyze new multiplexed hashed measurements of biological molecules. For example, the new measurements can be feed into the trained engine, which then outputs a predicted class or value.
Refinement of Sequencing Data
The initial step in the multiplex hashing and transformation process can involve the refinement of publicly available e.g., scRNAseq datasets. This stage can ensure that the subsequent procedures can be conducted on a dataset that is both manageable and meaningful. The refinement process aims to extract only the most relevant cells and genes from the raw scRNAseq data. In practice, scRNAseq datasets often contain a multitude of cells, each expressing a vast number of genes. However, not all of these cells and genes contribute significantly to the objectives at hand. Some cells can be of lesser interest due to various factors, such as low-quality data or cell populations that do not pertain to the specific biological question being addressed. Likewise, not all genes can be of relevance for a particular study, especially when focusing on specific biological processes or pathways. To refine the scRNAseq data, a selection process can be employed. This process can involve various criteria, including the quality of individual cells, gene expression levels, and biological relevance. For instance, cells that exhibit poor data quality or high levels of technical noise can be excluded. Similarly, genes that are not involved in the biological processes under investigation or exhibit minimal variation across cells can be omitted from the refined dataset. The result of this refinement process can be a curated scRNAseq dataset that includes only the cells and genes deemed relevant to the objectives. This refined dataset can serve as the foundation for subsequent analyses, ensuring that computational resources can be allocated efficiently and that the insights gained can be directly pertinent to the specific biological questions being addressed.
Cluster Analysis
[0146] Cluster analysis can involve grouping cells into clusters, for example, based on various characteristics, including cell gene expression profiles. This can be achieved through unsupervised machine learning algorithms, such as k-means clustering or hierarchical clustering, which identify patterns of similarity or dissimilarity among the cells. Cells that share similar gene expression profiles can be grouped together within the same cluster, while those with distinct profiles can be placed in separate clusters. The outcome of the cluster analysis can be a collection of cell clusters, each cluster representing a unique population of cells with similar gene expression patterns. These clusters can serve as a critical foundation for subsequent analyses and provide insights into the heterogeneity of the cellular population within the scRNAseq dataset.
[0147] Unsupervised machine learning algorithms can be integral to data analysis, particularly in situations where predefined outcomes or labeled data can be absent. These algorithms can be employed to uncover hidden patterns and structures within a dataset, providing valuable insights into the inherent organization of the data.
[0148] In some embodiments, the unsupervised machine learning algorithm can be clustering algorithm. In clustering algorithm, cluster analysis can be a key application of unsupervised learning. It can involve the grouping of data points into clusters based on similarities or dissimilarities within the data. Unsupervised clustering algorithms, such as, but not limited to, k- means clustering and hierarchical clustering, can be utilized. These algorithms can identify patterns of likeness or disparity among cells in the context of scRNAseq data analysis. Cells with analogous gene expression profiles can be assigned to the same cluster, while those with distinct profiles can be grouped separately. The result of cluster analysis can be a set of cell clusters, each representing a distinct population of cells sharing similar gene expression patterns. These clusters can provide a foundational understanding of the cellular heterogeneity present within the scRNAseq dataset.
[0149] In some embodiments, the unsupervised machine learning algorithm can be dimensionality reduction algorithm, encompassing dimensionality reduction techniques. Highdimensional datasets, like scRNAseq data, can be challenging to work with directly. Dimensionality reduction algorithms, including, but not limited to, principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE), can be employed to reduce the complexity of the dataset while retaining crucial information. These algorithms can enable visualization and analysis of data in lower-dimensional spaces, making it easier to discern underlying patterns and relationships.
Basis Vector Computation
[0150] Once the cells can be grouped into clusters, a set of basis vectors can be computed.
Basis vectors can be mathematical representations that can capture the characteristic mRNA transcript patterns within each cell cluster. They can serve as a concise way to describe the gene expression profiles specific to each cluster. Each basis vector can correspond to one of the identified clusters. The basis vector for the zth cluster can be a mathematical representation that can encapsulate the numbers of mRNA transcripts characteristic of cells within that cluster. These basis vectors can be instrumental in reducing the dimensionality of the data while retaining crucial information about the gene expression patterns of each cluster. Basis vectors can be computed using various techniques, such as principal component analysis (PCA) or singular value decomposition (SVD). PCA can serve as a linear dimensionality reduction method that can transform high-dimensional data into a lower-dimensional representation while preserving data variance. This technique can identify principal components, which can be linear combinations of original variables (e.g., gene expression levels) determined by eigenvalues and eigenvectors of covariance matrix of the data. Principal components can be ordered by their variance, with the first component explaining the most variance. By selecting a subset of these components, PCA can effectively reduce data dimensionality while retaining most variability. SVD is a versatile matrix factorization technique employed for dimensionality reduction, data compression, and feature extraction. SVD can decompose a data matrix into three matrices: U, S (sigma), and VAT (the transpose of V), each capturing different data characteristics. S can contain singular values, akin to eigenvalues in PCA, representing component importance. By retaining select singular values, dimensionality can be reduced. SVD can also reconstruct original data from the lower-dimensional representation. These mathematical methods can extract the key features of gene expression variation within each cluster, allowing for a more efficient representation of the scRNAseq dataset. Each basis vector, representing distinct cell clusters identified during prior cluster analysis, can be subjected to the scrutiny of the neural network. The primary objective can be to classify each basis vector as either TRUE or FALSE. This classification can be guided by the neural network’s learning and predictive capabilities. The model can leverage the information contained within the truncated multiplexed hashed measurement and the relationships learned during its training to make informed determinations regarding the validity of each basis vector. In essence, the neural network can serve as an intelligent classifier, capable of recognizing and validating basis vectors that can be indicative of distinct cell clusters within the scRNAseq dataset. Generation of Input Data
[0151] Generation of input data can be characterized by the creation of data points, each comprising a triplet of fundamental components. Simulated multiplex hashed measurements can serve as a condensed representation of the molecular activity within individual cells present in scRNAseq datasets. To craft these measurements, random linear combinations of mRNA vectors can be calculated, where each vector can represent the expression levels of specific genes. The nominal values of these mRNA vectors can be selected from random subsamples of the refined scRNAseq dataset. This approach can ensure that the simulated measurements can be diverse and representative of the gene expression profiles within the cellular population. The simulated multiplex hashed measurements can capture details while efficiently compressing the data, thereby preserving its biological information. The second component in each data point is the random selection of a basis vector. These basis vectors, previously computed during the cluster analysis stage, encapsulate the characteristic mRNA transcript counts within distinct cell clusters. The random selection process can introduce variability into the input data, enabling the neural network model to recognize and comprehend patterns across various cell populations. The final element is the computed label, essential for the supervised learning process during neural network training. Determination of the label can hinge on whether the selected basis vector corresponds to a cell cluster associated with one of the selected cells. When such correspondence exists, the label can be designated as TRUE; otherwise, it can be marked as FALSE. This label can serve as the ground truth for the neural network model, enabling it to learn, generalize, and make predictions with accuracy.
Transformation Matrix Computation
[0152] In some embodiments, all basis vectors classified as TRUE can be systematically grouped into a matrix of basis vectors. This grouping can consolidate the basis vectors associated with distinct cell clusters. The resulting matrix can provide a coherent representation of the basis vectors, which can be integral for subsequent data transformation.
[0153] In some embodiments, the truncated multiplexed hashed measurement, comprising entries corresponding exclusively to genes used for model training, can be organized into a matrix of measurements. This matrix structure can enhance data organization and prepare the measurements for alignment with the basis vectors. [0154] In some embodiments, an optimization algorithm can be employed to compute the optimal transformation matrix. The primary objective of this optimization is to minimize the absolute difference between the transformed matrix of basis vectors and the matrix of measurements. This transformation matrix, once computed, can serve as a bridge between the basis vectors and multiplexed hashed measurements. Through an optimization algorithm, the transformation matrix can be tailored to align the basis vectors with the corresponding measurements, facilitating accurate data transformation.
[0155] The optimization algorithm can play a pivotal role in aligning the basis vectors with the corresponding measurements obtained from scRNAseq data. Optimization algorithms can be mathematical techniques used to find the best solution from a set of possible solutions. In the context of the multiplex hashing and neural network training methodology, the optimization algorithm can be employed to compute the optimal transformation matrix. This matrix can transform basis vectors into a form that can align with the matrix of measurements, facilitating subsequent data analysis. The optimization algorithm can minimize the absolute difference between the transformed matrix of basis vectors and the matrix of measurements. This objective function can quantify the dissimilarity between these two matrices, and the algorithm can find the transformation matrix that can minimize this dissimilarity. The choice of optimization algorithm can vary depending on the specific problem and the mathematical characteristics of the data. In some embodiments, the optimization algorithm can be gradient descent, which is an iterative algorithm that can adjust the transformation matrix in small steps to minimize the objective function. In some embodiments, the optimization algorithm can be Newton’s method, which is an iterative method that can use second-order derivatives to find the optimal solution. In some embodiments, the optimization algorithm can be a quasi-Newton method, which is a variations of Newton’s method that can approximate the Hessian matrix to reduce computational complexity. In some embodiments, the optimization algorithm can be conjugate gradient, which is an iterative method suitable for large-scale optimization problems, often used when the objective function is quadratic. In some embodiments, the optimization algorithm can be a genetic algorithm, which is an evolutionary algorithm that can mimic the process of natural selection to search for optimal solutions. In some embodiments, the optimization algorithm can be simulated annealing, which is a probabilistic optimization algorithm inspired by the annealing process in metallurgy. The optimization algorithm can tailor the transformation matrix to minimize the absolute difference between the transformed basis vectors and the matrix of measurements. By adjusting the elements of the transformation matrix, it can optimize the alignment of these two matrices, ensuring that the data transformation is accurate and meaningful
Computation of Biological Molecule Vectors
[0156] In some embodiments, the process of computation of the biological molecule vectors can constitute an important phase within the multiplex hashing and neural network training methodology. This process can involve the computation of biological molecule vectors, offering a comprehensive description of cellular composition and gene expression within the scRNAseq dataset. In some embodiments, the non-truncated multiplexed hashed measurements can be systematically organized into a matrix of measurements. This organization can transform the raw data into a structured format conducive to subsequent computations. The resulting matrix can serve as the input data for further analysis. In some embodiments, the next step involves the computation of the Moore-Penrose pseudo inverse of the optimal transformation matrix. This mathematical operation can be a pivotal role in data transformation and can ensure the robustness of the subsequent analysis. The Moore-Penrose pseudo inverse can be calculated to facilitate the subsequent steps in the process. In some embodiments, the computed Moore-Penrose pseudo inverse can be applied to the matrix of measurements. This step can provide insights into the biological molecules contained within a specified number of cells. The inverse transformation can enable the reconstruction of the biological molecule vectors, which can represent the gene expression profiles and cellular composition within the scRNAseq dataset.
[0157] The Moore-Penrose pseudo inverse, often referred to simply as the “pseudo inverse,” is a mathematical concept used primarily in linear algebra and matrix computations. It can be a generalization of the matrix inverse for matrices that may not have an exact inverse, such as rectangular matrices or matrices that are not full rank. In essence, the pseudo can inverse provide a way to approximate an inverse for a matrix that might not be invertible in the traditional sense. It can be denoted as A+, where A is the matrix for which you want to find the pseudo inverse. Mathematically, the formula for the Moore-Penrose pseudo inverse can depend on the specific method used for its computation, but it can typically involve the singular value decomposition (SVD) of the matrix. The pseudo inverse of a matrix A can satisfy certain properties, such as, but not limited to: • AA+A = A
• A AA = A '
• (AA )' = AA
• (A+A = A+A
[0158] The pseudo inverse can be a valuable tool in various fields, including machine learning, signal processing, and scientific computing, where it can be used to handle various matrix- related problems, especially when dealing with non-square or singular matrices.
Single Cell Sequencing
[0159] The present disclosure provides methods for single-cell sequencing without establishing a priori spatial relationship between a plurality of molecular tags and a plurality of sample objects comprising biological molecules. In some embodiments, the presently described method can begin by contacting the sample objects with a plurality of computing units (CUs). Each CU can be engineered to display two sets of surface-bound entities (SBEs): a first set capable of binding to the sample objects and a second set associated with molecular tags, capable of binding to biological molecules. The sample objects can then be permeabilized, allowing the biological molecules to be released and bind to the SBEs associated with the molecular tags. Following this, a posteriori spatial relationships between the molecular tags and the sample objects can be established. This can be achieved by evaluating the proximity of each CU with the assistance of a machine learning algorithm, which aggregates the biological molecules bound to proximal CUs.
[0160] In some embodiments, the method can further include reverse transcribing the biological molecules bound to the molecular tags before establishing the spatial relationship. This reverse transcription can be crucial for sequencing, which can be performed using, for example, but not limited to, Next-Generation Sequencing or Sanger Sequencing techniques. Additionally, each CU can be associated with the molecular tags either before or after contacting the sample objects. The molecular tags can typically include a barcode and a unique molecular identifier (UMI) and can include additional elements such as sequencing elements, release elements, and linkers. These tags can be single-stranded, featuring a hairpin structure, or doublestranded, with the barcode uniquely assigned to each CU and the UMI uniquely assigned to each molecular tag. [0161] In some embodiments, the presently described method can include evaluating the proximity between CUs, where proximity can be inferred from the interaction frequency of CUs with sample objects and the molecular tags. The evaluation can involve identifying UMIs linked to barcodes and biological molecules, with the resulting data input into a machine learning algorithm. The algorithm can output an adjacency matrix, indicating spatial proximity based on UMI linkage data. This matrix can then be used to establish a posteriori spatial relationships between molecular tags and sample objects, enabling detailed single-cell sequencing analysis. [0162] Accordingly, provided herein is a method of single-cell sequencing without establishing a priori spatial relationship between a plurality of molecular tags and a plurality of sample objects comprising a plurality of biological molecules, the method including (a) contacting the plurality of sample objects with a plurality of computing units (CUs), wherein each CU of the plurality of CUs can be engineered to display a first set of at least one surface-bound entity (SBE) and a second set of at least one SBE on its surface, wherein the first set of at least one SBE can bind to the plurality of sample objects, and the second set of at least one SBE can be associated with the plurality of molecular tags and is capable of binding to the plurality of biological molecules; and (b) permeabilizing the plurality of sample objects such that the plurality of biological molecules can be released and bind to the second set of at least one SBE associated with the plurality of molecular tags; and (c) establishing a posteriori spatial relationship between the plurality of molecular tags and the plurality of sample objects for single cell sequencing by evaluating proximity of the each CU of the plurality of CUs with aid of a machine learning algorithm and aggregating the plurality of biological molecules bound to proximal CUs.
[0163] In some embodiments, the presently described methods can further include reverse transcribing the plurality of biological molecules bound to the plurality of molecular tags for sequencing prior to establishing a posteriori spatial relationship. In some embodiments, the plurality of biological molecules can be RNA. In some embodiments, the sequencing can be Next-Generation sequencing or Sanger sequencing.
[0164] In some embodiments, each CU can be associated with the plurality of molecular tags. In some embodiments, each CU can be associated with the plurality of molecular tags prior to the contacting the plurality of sample objects with the plurality of CUs. In some embodiments, each CU can be associated with the plurality of molecular tags following the contacting the plurality of sample objects with the plurality of CUs. In some embodiments, each molecular tag of the plurality of molecular tag can include a barcode and a unique molecular identifier (UMI), and is associated with the second set of at least one SBE. In some embodiments, each molecular tag can further include a sequencing element, a release element, and/or a linker. In some embodiments, the release element can release each molecular tag from each CU. In some embodiments, the linker can prevent extension. In some embodiments, the barcode can be unique to each CU. In some embodiments, each molecular tag can be single-stranded. In some embodiments, each molecular tag can include a hairpin structure. In some embodiments, each molecular tag can be double- stranded. In some embodiments, the second set of at least one SBE can be poly(dT). In some embodiments, the barcode can be uniquely assigned to the each CU of the plurality of CUs. In some embodiments, the UMI can be uniquely assigned to the each molecular tag. In some embodiments, each molecular tag can be associated with at least two CUs of the plurality of CUs. In some embodiments, each CU can be engineered to display the second set of at least one SBE upon interacting with at least one sample object of the plurality of sample objects. In some embodiments, the second set of at least one SBE can further include a blocking element. In some embodiments, the blocking element can prevent reverse transcription. In some embodiments, the blocking element can be removed when the at least one sample object interacts with each CU. In some embodiments, the plurality of sample objects can interact with the plurality of CUs via the first set of at least one SBE. In some embodiments, the plurality of biological molecules can interact with the plurality of CUs via the second set of at least one SBE. In some embodiments, the plurality of CUs bound to each sample object of the plurality of sample objects can provide a physical barrier, preventing diffusion of the plurality of biological molecules upon permeabilizing of the sample object.
Proximity Analysis for Single Cell Sequencing
[0165] The presently described methods leverage spatial fluctuations in the number of biological molecules, such as those caused by Brownian motion and cell signaling, within subcellular portions of a sample to evaluate computing unit (CU) association by proximity. This proximity information can then be utilized to compute target-specific output signals associated with various sample objects, including interactions between sample objects (e.g., cell-cell interactions) and the distribution of biological molecules within individual sample objects (e.g., subcellular resolution of mRNA transcripts).
[0166] For example, a dataset of UMI vectors can be used for model training. This dataset can comprise millions of cells and thousands of features across samples measured from various patients. The dataset can be split into training, validation, and test sets with no intersection of patients between sets. Top genes with the least sparsity can be retained for further processing. [0167] Training inputs can be constructed from UMI vectors of cells randomly selected from the training dataset. Each training input can include UMI vectors of K cells, a one-to-one assignment of N computing units to the K cells, and UMI vectors generated for each computing unit. The adjacency matrix can be used to describe the assignment of computing units to sample objects, ensuring equal probability of selection and independent attribution of UMIs.
[0168] The model architecture can involve inputting the measurement matrix, applying a log Ip transformation, and reducing dimensionality through a dense layer. The transformed data can be processed through stacked transformer blocks with attention heads and feed-forward layers. The final transformer block’s attention logits can be scaled and subjected to a sigmoid function, producing an output matrix indicating the likelihood of computing unit interactions.
[0169] The model can be trained using binary cross-entropy loss and an Adam optimizer. The training process can aim to match the output matrix to the training input adjacency matrix. The model output can further be processed to estimate sample object UMI vectors by clustering the computing units and summing the UMI vectors within each cluster. The quality of reconstruction can be assessed using relative reconstruction error (RRE).
[0170] Additional outputs of the model can include the estimated number of cells, achieved through regression and predictive clustering techniques. The final output can comprise the interaction likelihood matrix and the estimated number of cells.
[0171] In some embodiments, a first CU of the plurality of CUs can interact with a first sample object of the plurality of sample objects {al } number of times, the first CU can interact with a second sample object of the plurality of sample objects {a2} number of times, a second CU of the plurality of CUs can interact with the first sample object {a3 } number of times, and the second CU can interact with the second sample object {a4} number of times. In some embodiments, the first CU can display the first set of at least one SBE that can be different from what the second CU displays on its surface. In some embodiments, at least one of the {al } number, the {a2} number, the a3 } number, and the {a4} number can be zero. In some embodiments, at least one of the {al } number, the {a2} number, the {a3 } number, and the {a4} number can be one. In some embodiments, at least two of the {al } number, the {a2} number, the {a3} number, and the {a4} number can be the same. In some embodiments, at least two of the {al } number, the {a2} number, the {a3} number, and the {a4} number can be different. [0172] In some embodiments, evaluating proximity of each CU of the plurality of CUs comprises evaluating proximity between the first CU and the second CU. In some embodiments, evaluating proximity between the first CU and the second CU can include identifying the plurality of molecular tags associated with the first CU and the second CU. In some embodiments, evaluating proximity between the first CU and the second CU can include identifying numbers of UMIs linked to the barcode and the plurality of biological molecules. [0173] In some embodiments, {al’} number of UMIs can be linked to a first barcode and a first biological molecule of the plurality of the biological molecules, {a2’} number of UMIs can be linked to the first barcode and a second biological molecule of the plurality of the biological molecules, {a3’} number of UMIs can be linked to a second barcode and a third biological molecule of the plurality of the biological molecules, and {a4’} number of UMIs can be linked to the second barcode and a fourth biological molecule of the plurality of the biological molecules. [0174] In some embodiments, the first barcode is unique to the first CU, and the second barcode can be unique to the second CU. In some embodiments, the {al’} number, the {a2’ } number, the {a3’} number, and the {a4’} number can be input into the machine learning algorithm for the proximity analysis. In some embodiments, the machine learning algorithm can output an adjacency matrix. In some embodiments, at least two of the {al’ } number, the {a2’} number, the {a3’} number, and the {a4’} number can give an output value of 1 in the adjacency matrix. In some embodiments, the output value of 1 can indicate the first CU and the second CU are in spatial proximity. In some embodiments, at least two of the {al’} number, the {a2’ } number, the {a3’} number, and the {a4’} number can give an output value of 0 in the adjacency matrix. In some embodiments, the output value of 0 can indicate the first CU and the second CU are not in spatial proximity. In some embodiments, the adjacency matrix can be used to establish a posteriori spatial relationship between the plurality of molecular tags and the plurality of sample objects. Spatial Profiling
[0175] The present disclosure provides methods of spatial profiling of an input sample comprising a plurality of sample objects. The presently described methods allow for a comprehensive spatial profiling of biological samples, providing valuable insights into the interactions and distribution of biological molecules at a subcellular level. The integration of machine learning algorithms further enhances the accuracy and utility of the spatial profiling process, enabling the generation of adjacency matrices to map spatial relationships and facilitate downstream analysis.
[0176] In some embodiments, spatial profiling can be achieved without establishing a priori spatial relationships between a plurality of molecular tags and the sample objects. In an embodiment, the presently described method can involve contacting the sample objects with a plurality of computing units (CUs), each engineered to display at least one first set and one second set of surface-bound entities (SBEs) on its surface. The first set of SBEs can bind to the sample objects, while the second set of SBEs can be associated with the molecular tags and capable of binding to the biological molecules.
[0177] Upon permeabilizing the sample objects, the biological molecules can be released and subsequently bind to the second set of SBEs associated with the molecular tags. This interaction can facilitate the establishment of a posteriori spatial relationships between the molecular tags and the sample objects. The spatial profiling process can utilize the proximity of each CU with the aid of a machine learning algorithm to evaluate and aggregate the biological molecules bound to proximal CUs. The presently described method can enable the detailed spatial profiling of cell-cell interactions and the distribution of biological molecules within individual sample objects, providing subcellular resolution of molecular entities, such as mRNA transcripts.
[0178] In some embodiments, each CU can be associated with molecular tags either prior to or following the contacting of sample objects with CUs. The molecular tags can include barcodes and unique molecular identifiers (UMIs), which can be associated with the second set of SBEs. Additionally, the molecular tags can include sequencing elements, release elements, and/or linkers. The release element can facilitate the detachment of the molecular tags from the CUs, while the linker can prevent extension. The barcodes can be unique to each CU, and the UMIs can be unique to each molecular tag, ensuring precise tracking and analysis. [0179] In some embodiments, the second set of SBEs can be engineered to display poly(dT) sequences, enhancing the binding specificity to biological molecules. The interaction between the CUs and sample objects via the first set of SBEs, and between the biological molecules and CUs via the second set of SBEs, can establish a robust system for spatial profiling. Proximity analysis can be conducted by evaluating the interaction frequencies between CUs and sample objects, as well as identifying molecular tags and UMIs linked to specific barcodes and biological molecules.
[0180] Accordingly, provided herein is a method of spatially profiling an input sample comprising a plurality of sample objects comprising a plurality of biological molecules without establishing a priori spatial relationship between a plurality of molecular tags and the plurality of sample objects, the method including (a) contacting the plurality of sample objects with a plurality of computing units (CUs), wherein each CU of the plurality of CUs can be engineered to display a first set of at least one surface-bound entity (SBE) and a second set of at least one SBE on its surface, wherein the first set of at least one SBE can bind to the plurality of sample objects, and the second set of at least one SBE can be associated with the plurality of molecular tags and can bind to the plurality of biological molecules; and (b) permeabilizing the plurality of sample objects such that the plurality of biological molecules can be released and bind to the second set of at least one SBE associated with the plurality of molecular tags; and (c) establishing a posteriori spatial relationship between the plurality of molecular tags and the plurality of sample objects for spatial profiling by evaluating proximity of the each CU of the plurality of CUs with aid of a machine learning algorithm and aggregating the plurality of biological molecules bound to proximal CUs.
[0181] In some embodiments, the spatial profiling can include information on cell-cell interactions of the plurality of sample objects and information regarding the distribution of the plurality of biological molecules within each sample object.
[0182] In some embodiments, each CU can be associated with the plurality of molecular tags. In some embodiments, each CU can be associated with the plurality of molecular tags prior to the contacting the plurality of sample objects with the plurality of CUs. In some embodiments, each CU can be associated with the plurality of molecular tags following the contacting the plurality of sample objects with the plurality of CUs. In some embodiments, each molecular tag of the plurality of molecular tag can include a barcode and a unique molecular identifier (UMI), and is associated with the second set of at least one SBE. In some embodiments, each molecular tag can further include a sequencing element, a release element, and/or a linker. In some embodiments, the release element can release each molecular tag from each CU. In some embodiments, the linker can prevent extension. In some embodiments, the barcode can be unique to each CU. In some embodiments, each molecular tag can be single-stranded. In some embodiments, each molecular tag can include a hairpin structure. In some embodiments, each molecular tag can be double-stranded. In some embodiments, the second set of at least one SBE can be poly(dT). In some embodiments, the barcode can be uniquely assigned to the each CU of the plurality of CUs. In some embodiments, the UMI can be uniquely assigned to the each molecular tag. In some embodiments, each molecular tag can be associated with at least two CUs of the plurality of CUs. In some embodiments, each CU can be engineered to display the second set of at least one SBE upon interacting with at least one sample object of the plurality of sample objects. In some embodiments, the second set of at least one SBE can further include a blocking element. In some embodiments, the blocking element can prevent reverse transcription. In some embodiments, the blocking element can be removed when the at least one sample object interacts with each CU. In some embodiments, the plurality of sample objects can interact with the plurality of CUs via the first set of at least one SBE. In some embodiments, the plurality of biological molecules can interact with the plurality of CUs via the second set of at least one SBE. In some embodiments, the plurality of CUs bound to each sample object of the plurality of sample objects can provide a physical barrier, preventing diffusion of the plurality of biological molecules upon permeabilizing of the sample object.
Proximity Analysis for Spatial Profilins
[0183] Proximity analysis for single cell sequencing is described in detail supra. In single cell analysis, the assumption can be that each Computing Unit (CU) can be assigned to a unique sample object. Since such an assignment can be impossible in case where some sample objects are bound to each other, the supra proximity analysis can be extended to consider CUs assigned to two more sample objects.
[0184] Building on the supra description, the construction of the training dataset can include UMI vectors of K cells (sample objects), assignment of N computing units to K sample objects, and UMI vectors generated for each computing unit. Unlike the proximity analysis of single cell sequencing, the assignment of CUs to sample objects can be no longer surjective. Sample objects can be randomly connected, described by an adjacency matrix where off-diagonal elements can indicate the probability of connection. The assignment of CUs to sample objects can be detailed by an adjacency matrix, calculated in two steps: selecting one sample object per CU and partitioning CUs assigned to the same sample object into sets, assigning them to neighboring sample objects. The UMI vectors can be partitioned among the CUs assigned to the same sample object, ensuring each UMI can be attributed to a single CU independently and with equal probability adjusted by a normalization factor.
[0185] The same model architecture can be used with specific parameters. The model output can be processed to estimate the sample object adjacency matrix. The output matrix can be used to partition the set of CUs into a number of disjoint sets, represented by an estimated adjacency matrix. Agglomerative clustering with full linkage can be used for partitioning, with the number of clusters determined a priori or by optimizing a secondary criterion, such as the number of CUs per cluster. A heuristic compute the estimated sample object adjacency matrix, where sets of disjoint CUs can be compared based on summations over the CUs to determine adjacency.
[0186] The presently described method can enable the analysis of physical interactions between cells in scenarios where sample objects are not uniquely assignable to computing units, providing a robust approach for spatial profiling in single-cell transcriptomics.
Molecular Tags
[0187] A molecular tag refers to any molecule capable of (directly or indirectly) capturing and/or labeling a plurality of biological molecules. In some embodiments, the molecular tag can be a nucleic acid or a polypeptide. In some embodiments, the molecular tag can be a conjugate of, for example, but not limited to, an oligonucleotide-antibody conjugate. A non-limiting exemplary structure of molecular tag is shown in FIG. 3D. In some embodiments, the molecular tag can include a unique sequence for recognizing a specific biological molecule. In some embodiments, the molecular tag can include a recognition sequence, which can be a unique sequence for recognizing a specific SBE. In some embodiments, the molecular tag can also include a hash element and a priming element ( .g., a random N-mer, such as dT_N). In some embodiments, the molecular tag can include a unique sequencing element for e.g., Next Generation Sequencing (NGS). In some embodiments, the molecular tag can include an oligonucleotide-antibody conjugate and a hash element. Non-limiting exemplary structures of a molecular tag are illustrated in FIG. 3D, FIG. 4D, and FIGS. 16A-16B.
[0188] After the biological molecules from sample objects have been associated with SBEs according to any of the methods described above, the biological molecules bound to the SBEs can further interact with the priming elements of the molecular tags. The resulting constructs can be analyzed via sequencing to identify the biological molecules. A wide variety of different sequencing methods can be used to analyze the resulting constructs. In general, sequenced polynucleotides can be, for example, nucleic acid molecules such as DNA or RNA, including variants or derivatives thereof (e.g., single stranded DNA or DNA/RNA hybrids, and nucleic acid molecules with a nucleotide analog). In some embodiments, individual biological molecules (e.g., cells or cellular contents following lysis of cells) can be extracted by partitioning/hashing using the hash element, which is described further below.
[0189] Sequencing can be performed by various commercial systems. More generally, sequencing can be performed using nucleic acid amplification, polymerase chain reaction (PCR) (e.g., digital PCR and droplet digital PCR (ddPCR), quantitative PCR, real time PCR, multiplex PCR, PCR-based singleplex methods, emulsion PCR), and/or isothermal amplification.
[0190] Other examples of methods for sequencing can include, but are not limited to, DNA hybridization methods (e.g., Southern blotting), restriction enzyme digestion methods, Sanger sequencing methods, next-generation sequencing methods (e.g., single-molecule real-time sequencing, nanopore sequencing, and Polony sequencing), ligation methods, and microarray methods. Additional examples of sequencing methods that can be used include targeted sequencing, single molecule real-time sequencing, exon sequencing, electron microscopy-based sequencing, panel sequencing, transistor-mediated sequencing, direct sequencing, random shotgun sequencing, Sanger dideoxy termination sequencing, whole-genome sequencing, sequencing by hybridization, pyrosequencing, capillary electrophoresis, gel electrophoresis, duplex sequencing, cycle sequencing, single-base extension sequencing, solid-phase sequencing, high-throughput sequencing, massively parallel signature sequencing, co-amplification at lower denaturation temperature-PCR (COLD-PCR), sequencing by reversible dye terminator, paired- end sequencing, near-term sequencing, exonuclease sequencing, sequencing by ligation, shortread sequencing, single-molecule sequencing, sequencing-by-synthesis, real-time sequencing, reverse-terminator sequencing, nanopore sequencing, 454 sequencing, Solexa Genome Analyzer sequencing, SOLiD™ sequencing, MS-PET sequencing, and any combinations thereof.
Priming Element
[0191] As discussed above, each molecular tag can include at least one priming element. The priming element can be an oligonucleotide, a polypeptide, a small molecule, or any combination thereof, that can bind specifically to a biological molecule to be analyzed. In some embodiments, a molecular tag can be used to capture or detect a biological molecule to be analyzed.
[0192] In some embodiments, a molecular tag can be a functional nucleic acid sequence configured to interact with one or more biological molecules, such as one or more different types of nucleic acids (e.g., RNA molecules and DNA molecules). In some embodiments, the functional nucleic acid sequence can include an N-mer sequence (e.g., a random N-mer sequence), which N-mer sequences can be configured to interact with a plurality of DNA molecules and/or a plurality of RNA molecules. In some embodiments, the functional sequence can include a poly(N) sequence, which poly(N) sequences can be configured to interact with messenger RNA (mRNA) molecules via the poly(A) tail of an mRNA transcript. In some embodiments, the poly(N) sequence can be configured to interact with a DNA molecule comprising a single stranded 3’ terminus. In some embodiments, the functional sequence can include a poly(T) sequence, which poly(T) sequences can be configured to interact with messenger RNA (mRNA) molecules via the poly(A) tail of an mRNA transcript. In some embodiments, the functional nucleic acid sequence can be the binding target of a protein (e.g, a transcription factor, a DNA binding protein, or a RNA binding protein), where the protein is a desired biological molecule to be analyzed.
[0193] In some embodiments, a molecular tag can include ribonucleotides and/or deoxyribonucleotides as well as synthetic nucleotide residues that can be capable of participating in Watson-Crick type or analogous base pair interactions. In some embodiments, the molecular tag is capable of priming a reverse transcription reaction to generate cDNA that is complementary to the captured RNA biological molecules. In some embodiments, the priming element of the molecular tag can prime a DNA extension (polymerase) reaction to generate DNA that is complementary to the captured DNA biological molecules. [0194] In some embodiments, the priming element can be located at the 3’ end of the molecular tag and can include a free 3’ end that can be extended, e.g., by template dependent polymerization, to form an extended molecular tag. In some embodiments, the priming element can include a nucleotide sequence that is capable of hybridizing to nucleic acids, e.g., RNA, DNA, or other analyte, present in the sample object interacting with one or more CUs. In some embodiments, the priming element can be selected or designed to bind selectively or specifically to a target nucleic acid. For example, the capture domain can be selected or designed to capture mRNA by way of hybridization to the mRNA poly(A) tail. Thus, in some embodiments, the capture domain can includes a poly(T) DNA or poly(N) oligonucleotide, which is capable of hybridizing to a poly(A) tail of mRNA or a single stranded 3’ terminus of DNA. In some embodiments, the priming element can include nucleotides that can be functionally or structurally analogous to a poly(T) tail. For example, a poly(U) oligonucleotide or an oligonucleotide included of deoxythymidine analogues.
[0195] In some embodiments, a molecular tag can include a priming element having a sequence that is capable of binding to mRNA and/or genomic DNA. For example, the molecular tag can include a priming element that includes a nucleic acid sequence (e.g., a poly(T) or poly(N) sequence) capable of binding to a poly(A) tail of an mRNA and/or to a poly(A) homopolymeric sequence present in genomic DNA. In some embodiments, a homopolymeric sequence can be added to an mRNA molecule or a genomic DNA molecule using a terminal transferase enzyme in order to produce a DNA or RNA biological molecule that has a poly(A) or poly(T) sequence. For example, a poly(A) sequence can be added to a biological molecule (e.g., a fragment of genomic DNA) thereby making the biological molecule capable of capture by a poly(T) priming element.
[0196] In some embodiments, random sequences, e.g., random hexamers or similar sequences, can be used to form all or a part of the priming element. For example, random sequences can be used in conjunction with poly(T) (or poly(T) analogue) sequences. Thus, where a priming element includes a poly(T) (or a “poly(T)-like”) oligonucleotide, it can also include a random oligonucleotide sequence (e.g., “poly(T)-random sequence” probe). This can, for example, be located 5’ or 3’ of the poly(T) sequence, e.g., at the 3’ end of the priming element. The poly(T)- random sequence probe can facilitate the capture of the mRNA poly(A) tail. In some embodiments, the priming element can be an entirely random sequence. In some embodiments, degenerate priming element can be used.
[0197] The priming element can be based on a particular gene sequence or particular motif sequence or common/conserved sequence, that it is designed to capture (i.e., a sequence-specific priming element). Thus, in some embodiments, the priming element is capable of binding selectively to a desired sub-type or subset of nucleic acid, for example a particular type of RNA, such as mRNA, rRNA, tRNA, SRP RNA, tmRNA, snRNA, snoRNA, SmY RNA, scaRNA, gRNA, RNase P, RNase MRP, TERC, SL RNA, aRNA, cis-NAT, crRNA, IncRNA, miRNA, piRNA, siRNA, shRNA, tasiRNA, rasiRNA, 7SK, eRNA, ncRNA or other types of RNA. In a non-limiting example, the priming element can be capable of binding selectively to a desired subset of ribonucleic acids, for example, microbiome RNA.
[0198] In some embodiments, the priming element of the molecular tag can be a non-nucleic acid domain. Examples of suitable priming elements that are not exclusively nucleic-acid based can include, but are not limited to, proteins, peptides, aptamers, antigens, antibodies, and molecular analogs that mimic the functionality of any of the priming elements described herein.
Recognition Element
[0199] In some embodiments, a molecular tag can include a recognition element, which can be a contiguous nucleic acid segment or two or more non-contiguous nucleic acid segments that can identify and bind to a SBE (e.g, ssDNA binding protein) on CU. A recognition element can be unique to a SBE to which it binds. A recognition element can include one or more specific polynucleotides sequences, one or more random nucleic acid and/or amino acid sequences, and/or one or more synthetic nucleic acid and/or amino acid sequences. In some embodiments, a recognition element can be a nucleic acid sequence that does not substantially hybridize to the biological molecules (e.g, DNA and/or RNA) to be analyzed. In some embodiments, a recognition element can bind to a SBE in a reversible or irreversible manner. In some embodiments, a recognition element can be a chemical moiety that can bind to thiol groups on CUs. In some embodiments, the chemical moiety can be, but are not limited to, maleimides, iodoacetamides, disulfides, haloacetyl groups (e.g., chloroacetyl, bromoacetyl), aziridines, or vinyl sulfones. [0200] In some embodiments, a recognition element can include a sequence that can bind covalently to a single stranded DNA binding protein, e.g., a HUH endonuclease from the family of replication initiator domains or relaxases. In some embodiments, a recognition element can include a sequence that can form double stranded structures recognized by a double stranded DNA binding protein, e.g., a protein from the family of transcription factors.
Hash Element
[0201] A molecular tag can include one or more hash elements. A hash element can be a contiguous nucleic acid segment or two or more non-continuous nucleic acid segments that function as a label or an identifier that can convey the origin of the biological molecules. In some embodiments, a hash element can be uniquely assigned to a computing unit and can thereby be associated with sample objects with the computing unit, such that it can allow for accurate detection of a plurality of biological molecules originating from different sample object but captured in the same computing unit.
[0202] A hash element can have a variety of different formats. For example, a hash element can include random nucleic acid and/or amino acid sequences; and synthetic nucleic acid and/or amino acid sequences. In some embodiments, a hash element can be attached to a biological molecule in a reversible or irreversible manner. In some embodiments, a hash element can be added to, for example, a fragment of a DNA or RNA biological molecule before, during, and/or after sequencing of the sample. In some embodiments, a hash element can allow for identification and/or quantification of individual sequencing-reads. In some embodiments, a hash element can be a nucleic acid sequence that does not substantially hybridize to a biological molecule (e.g., DNA and/or RNA). In some embodiments, a hash element is not nucleic acids, such as, but not limited to, polypeptide tags and affinity tags, e.g., FLAG-gat, HA-tag, His-tag, or Myc-tag. In some embodiments, a hash element can permit partitioning of captured biological molecules by the affinity tag and quantification of biological molecules associated with each tag by methods other than sequencing, e.g., sandwich ELISA. In some embodiments, a hash element can include a heavy metal, which can allow partitioning of the captured biological molecules and quantification by mass spectrometry.
[0203] For multiple molecular tags in a multiplex analysis, the molecular tags can be divided into one or more subsets, wherein the hash elements of the multiple molecular tags can include sequences that can be the same within a subset of the molecular tags, while the sequences of the hash elements of another subset of the multiple molecular tags can be different from the sequences of the hash elements of the first subset.
[0204] In some embodiments, a hash element can be associated with the origin of the biological molecule within the multiplex analysis. In some embodiments, a hash element can be associated with a quantity of the biological molecule present within a sample object. For example, a mixed but known set of hash elements can provide a stronger address or attribution of the biological molecules to a given sample object, by providing duplicate or independent confirmation of the identity of the biological molecule. In some embodiments, the multiple hash element can represent increasing specificity of the origin of biological molecules.
[0205] In some embodiments, the hash element can include a unique molecular identifier.
Unique Molecular Identifier (UM I)
[0206] A unique molecular identifier can be a contiguous nucleic acid segment or two or more non-contiguous nucleic acid segments that can function as a label or identifier for a particular biological molecule to be analyzed. A UMI can be unique. A UMI can include one or more specific polynucleotides sequences, one or more random nucleic acid and/or amino acid sequences, and/or one or more synthetic nucleic acid and/or amino acid sequences. In some embodiments, the UMI can be a nucleic acid sequence that does not substantially hybridize to analyte biological molecules in the sample object. In some embodiments, the UMI can have less than 80% sequence identity (e.g., less than 70%, 60%, 50%, or less than 40% sequence identity) to the nucleic acid sequences across a substantial part (e.g., 80% or more) of the nucleic acid molecules in the sample object.
[0207] In some embodiments, the UMI can include from about 6 to about 20 or more nucleotides within the sequence of the capture probes. In some embodiments, the length of a UMI sequence can be about 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides or longer. In some embodiments, the length of a UMI sequence can be at least about 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides or longer. In some embodiments, the length of a UMI sequence is at most about 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides or shorter. In some embodiments, these nucleotides can be completely contiguous, i.e., in a single stretch of adjacent nucleotides, or they can be separated into two or more separate subsequences that can be separated by 1 or more nucleotides. Separated UMI subsequences can be from about 4 to about 16 nucleotides in length. In some embodiments, the UMI subsequence can be about 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or 16 nucleotides or longer. In some embodiments, the UMI subsequence can be at least about 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or 16 nucleotides or longer. In some embodiments, the UMI subsequence can be at most about 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or 16 nucleotides or shorter.
[0208] In some embodiments, a UMI can be attached to a molecular tag in a reversible or irreversible manner. In some embodiments, a UMI can be added to, for example, a fragment of a DNA or RNA sample before, during, and/or after sequencing of the biological molecule. In some embodiments, a UMI can allow for identification and/or quantification of individual sequencing reads. In some embodiments, a UMI can be a used as a fluorescent barcode for which fluorescently labeled oligonucleotide probes hybridize to the UMI.
Sequencing Element
[0209] The sequences of the molecular tag can generally be selected for compatibility with any of a variety of different sequencing systems, e. ., NGS, 454 Sequencing, Ion Torrent Proton or PGM, Illumina XI 0, PacBio, Nanopore, etc., and the requirements thereof. In some embodiments, functional sequences can be selected for compatibility with non-commercialized sequencing systems. Examples of such sequencing systems and techniques, for which suitable functional sequences can be used, include (but are not limited to) Roche 454 sequencing, Ion Torrent Proton or PGM sequencing, Illumina X10 sequencing, PacBio SMRT sequencing, and Oxford Nanopore sequencing. Further, in some embodiments, functional sequences can be selected for compatibility with other sequencing systems, including non-commercialized sequencing systems.
Barcode
[0210] A barcode can be a label, or identifier, that can convey or is capable of conveying information (e.g., information about a computing unit). In some embodiments, a barcode can be part of a CU. In some embodiments, a barcode can be attached to a CU. A particular barcode can be unique a particular CU relative to other barcodes. [0211] Barcodes can have a variety of different formats. For example, barcodes can include polynucleotide barcodes, random nucleic acid and/or amino acid sequences, and synthetic nucleic acid and/or amino acid sequences. A barcode can be attached to a CU or to another moiety or structure in a reversible or irreversible manner. Barcodes can allow for identification and/or quantification of individual sequencing-reads (e.g., a barcode can be or can include a unique molecular identifier (UMI).
[0212] Barcodes can spatially resolve biological molecules found in sample objects, for example, at single-cell resolution (e.g, a barcode can be or can include a spatial barcode). In some embodiments, a barcode can include both a UMI and a spatial barcode. In some embodiments, a barcode can include two or more sub-barcodes that together can function as a single barcode. For example, a polynucleotide barcode can include two or more polynucleotide sequences (e.g., sub-barcodes) that can be separated by one or more non-barcode sequences.
Release Element
[0213] In some embodiments, a molecular tag can optionally include a release element. The release element can represent a portion of a molecular tag that can be used to reversibly attach the molecular tag to a CU. For example, the barcode and/or UMIs can be released by cleavage of the release element.
[0214] In some embodiments, the release element can link the molecular tag to the CU via a disulfide bond. A reducing agent can be added to break the disulfide bonds, resulting in release of the molecular tag from the CU. In some embodiments, the release element can be a photosensitive chemical bond e.g., a chemical bond that dissociates when exposed to light such as ultraviolet light). In some embodiments, the release element can be an ultrasonic cleavage domain. For example, ultrasonic cleavage can depend on nucleotide sequence, length, pH, ionic strength, temperature, and the ultrasonic frequency.
[0215] Oligonucleotides with photo-sensitive chemical bonds (e.g., photo-cleavable linkers) have various advantages. They can be cleaved efficiently and rapidly (e.g., in nanoseconds and milliseconds). When a photo-cleavable release element is used, the cleavable reaction can be triggered by light, and can be highly selective to the linker and consequently biorthogonal. Typically, wavelength absorption for the photocleavable release element can be located in the near-UV range of the spectrum. In some embodiments, kmax of the photocleavable linker can be from about 300 nm to about 400 nm, or from about 310 nm to about 365 nm. In some embodiments, Xmax of the photocleavable linker can be about 300 nm, about 312 nm, about 325 nm, about 330 nm, about 340 nm, about 345 nm, about 355 nm, about 365 nm, or about 400 nm. [0216] Non-limiting examples of a photo-sensitive chemical bond that can be used in a release element can include those described in Leriche et al. Bioorg Med Chem. 2012 Jan 15;20(2):571- 82, which is incorporated by reference herein in its entireties. For example, linkers that comprise photo-sensitive chemical bonds can include 3-amino-3-(2-nitrophenyl)propionic acid (ANP), phenacyl ester derivatives, 8-quinolinyl benzenesulfonate, dicoumarin, 6-bromo-7- alkixycoumarin-4-ylmethoxycarbonyl, a bimane-based linker, and a bis-arylhydrazone based linker. In some embodiments, the photo-sensitive bond can be part of a release element, such as an ortho-nitrobenzyl (ONB) linker.
[0217] Other examples of release element can include labile chemical bonds such as, but not limited to, ester linkages (e.g., cleavable with an acid, a base, or hydroxylamine), a vicinal diol linkage (e.g., cleavable via sodium periodate), a Diels-Alder linkage (e.g., cleavable via heat), a sulfone linkage (e.g., cleavable via a base), a silyl ether linkage (e.g., cleavable via an acid), a glycosidic linkage (e.g., cleavable via an amylase), a peptide linkage (e.g., cleavable via a protease), an abasic or apurinic/apyrimidinic (AP) site (e.g., cleavable with an alkali or an AP endonuclease), or a phosphodiester linkage (e.g., cleavable via a nuclease (e.g., DNAase)). [0218] In some embodiments, the release element can include a sequence that can be recognized by one or more enzymes capable of cleaving a nucleic acid molecule, e.g., capable of breaking the phosphodiester linkage between two or more nucleotides. A bond can be cleavable via other nucleic acid molecule targeting enzymes, such as restriction enzymes (e.g., restriction endonucleases). For example, the release element can include a restriction endonuclease (restriction enzyme) recognition sequence. Restriction enzymes cut double-stranded or single stranded DNA at specific recognition nucleotide sequences known as restriction sites. In some embodiments, a rare-cutting restriction enzyme, e.g., enzymes with a long recognition site (at least 8 base pairs in length), can be used to reduce the possibility of cleaving elsewhere in the capture probe.
[0219] In some embodiments, the release element can include a poly(U) sequence which can be cleaved by a mixture of Uracil DNA glycosylase (UDG) and the DNA glycosylase-lyase Endonuclease VIII, commercially known as the USER™ enzyme. Linker
[0220] In some embodiments, a molecular tag can include a linker, which can prevent extension. In some embodiments, the linker can be dideoxynucleotides (ddNTPs), which are modified nucleotides that lack the 3’ hydroxyl group necessary for forming phosphodiester bonds, thereby preventing further nucleotide addition. In some embodiments, the linker can be spacer phosphoramidites, which are synthetic linkers that can be incorporated into oligonucleotides to create a gap or block extension. For example, spacer phosphoramidites can include spacer C3, C6, C12, HEG (hexaethylene glycol) spacer, abasic sites, which are nucleotides with no base attached, which can prevent extension by causing a pause in the DNA polymerase activity. In some embodiments, the linker can be peptide nucleic acids (PNAs), which are synthetic polymers structurally similar to DNA but with a peptide backbone, which can hybridize to DNA and block polymerase extension. In some embodiments, the linker can be methylphosphonate linkers, which are phosphoramidite linkers with a methyl group attached to the phosphorus, preventing normal base pairing and extension. In some embodiments, the linker can be LNA (locked nucleic acid) probes, which are modified RNA nucleotides with a methylene bridge connecting the 2’ oxygen and the 4’ carbon, which can enhance binding affinity and block extension. In some embodiments, the linker can be thioate linkers, which are sulfur-containing linkers that can interfere with the polymerase activity and prevent extension. In some embodiments, the linker can be PEG (polyethylene glycol) linkers, which are flexible, hydrophilic linkers that can create a physical barrier to polymerase extension. In some embodiments, the linker can be phosphate or phosphorothioate groups at the 3’ end, wherein adding a phosphate or phosphorothioate group at the 3’ end of an oligonucleotide can prevent extension by blocking the 3’ hydroxyl group.
Blockins Element
[0221] In some embodiments, a molecular tag can further include a blocking element. In some embodiments, the blocking element can prevent reverse transcription. In some embodiments, the blocking element can be removed when the at least one sample object interacts with each CU. [0222] In some embodiments, the blocking element can be C3 Spacer, which is a three-carbon spacer that acts as a blocking group to prevent the extension of nucleic acids during reverse transcription. In some embodiments, the blocking element can be 3’ phosphate group. Addition of a phosphate group at the 3’ end of an oligonucleotide can prevent reverse transcription by blocking the addition of nucleotides. In some embodiments, the blocking element can be 3’ amino modifier, which can prevent reverse transcription by blocking the polymerase from extending the nucleotide chain. In some embodiments, the blocking element can be dideoxynucleotides (ddNTPs). Incorporation of dideoxynucleotides at the end of an oligonucleotide can block reverse transcription by preventing the addition of further nucleotides. In some embodiments, the blocking element can be locked nucleic acids (LNAs), which are modified RNA nucleotides that can form highly stable duplexes with complementary DNA or RNA, blocking reverse transcriptase from accessing and extending the template. In some embodiments, the blocking element can be a thioate linker, which is a sulfur-containing linker that can interfere with reverse transcriptase activity and prevent the enzyme from synthesizing complementary DNA. In some embodiments, the blocking element can be a hairpin loop structure. Designing oligonucleotides with hairpin loop structures can create physical barriers to reverse transcription, preventing the enzyme from extending the nucleic acid strand. In some embodiments, the blocking element can be a PEG (polyethylene glycol) linker, which can create a steric hindrance that blocks the reverse transcriptase from binding and extending the nucleic acid. In some embodiments, the blocking element can be an aptamers or secondary structure, which are like G-quadruplexes or aptamers that can fold in such a way that they block the binding or activity of reverse transcriptase. In some embodiments, the blocking element can be a moiety that includes a ssDNA sequence further fused to a domain that can prevent extension (e.g., C3, C6 etc.). The ssDNA sequence can be recognized by an endonuclease. Upon cleavage, extension by reverse transcription can become enabled.
Input Samples
[0223] Input samples can be collected from various origins, e.g., biological origin. In some embodiments, the input sample can be biological. In some embodiments, the biological input sample can comprise a biological fluid (e.g., whole blood, serum, plasma, sputum, urine, saliva, nipple aspirate, ductal lavage, vaginal fluid, nasal fluid, ear fluid, gastric fluid, cerebrospinal fluid, sweat, pericrevicular fluid, semen, prostatic fluid, feces, cell lysate, or tears). In some embodiments, the biological input sample can comprise a tissue samples e.g., hair, skin, or biopsy material). In some embodiments, the biological input sample can comprise an enriched biological material (e.g., various cell types or exosomes). In some embodiments, the input sample can be a biological sample derived from a subject or an individual. In some embodiments, the input sample can be primary cell cultures. In some embodiments, the input sample can be cell line cultures. In some embodiments, the input sample can be other complex cultures, such as, but not limited to, organoids, 2D/3D cultures, mixed-cell cultures, genetically modified cultures, cell lines modified using CRISPR, and the like.
[0224] In some embodiments, the presently described methods can be compatible with whole blood, stabilized leukocyte fractions, isolated peripheral blood mononuclear cells (PBMCs), cryogenically stored PBMCs, and primary cell cultures. In some embodiments, the recommended cell resuspension solution can be lx DPBS containing 0.1% gelatin w/v. In some embodiments, standard cultivation media can also be compatible.
[0225] In some embodiments, an input sample should not exceed given limits in a standard reaction tube. In some embodiments, sample volume can be about 10 pL, about 20 pL, about 30 pL, about 40 pL, about 50 pL, about 60 pL, about 70 pL, about 80 pL, about 90 pL, about 100 pL, about 110 pL, about 120 pL, about 130 pL, about 140 pL, about 150 pL, about 160 pL, about 170 pL, about 180 pL, about 190 pL, about 200 pL, about 210 pL, about 220 pL, about 230 pL, about 240 pL, about 250 pL, about 260 pL, about 270 pL, about 280 pL, about 290 pL, about 300 pL, about 310 pL, about 320 pL, about 330 pL, about 340 pL, about 350 pL, about 360 pL, about 370 pL, about 380 pL, about 390 pL, about 400 pL, about 410 pL, about 420 pL, about 430 pL, about 440 pL, about 450 pL, about 460 pL, about 470 pL, about 480 pL, about 490 pL, or about 500 pL In some embodiments, sample volume can be 100 pL.
[0226] In some embodiments, the total number of cells (passive background) of the input sample can be about 100,000 cells, about 200,000 cells, about 300,000 cells, about 400,000 cells, about 500,000 cells, can be about 600,000 cells, about 700,000 cells, about 800,000 cells, about 900,000 cells, or about 1,000,000 cells. In some embodiments, the total number of cells (passive background) of the input sample can be about 1 million cells, about 1.5 million cells, about 2 million cells, about 2.5 million cells, or about 3 million cells. In some embodiments, the total number of cells (passive background) of the input sample can be about 1 million cells.
[0227] In some embodiments, cells expressing at least one target antigen (active background) can be about 1%, about 2%, about 3%, about 4%, about 5%, about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50% of the passive background. In some embodiments, cells expressing at least one target antigen (active background) can be about 10% of the passive background.
[0228] In some embodiments, cells expressing all target antigen (target cells) can be about 1%, about 2%, about 3%, about 4%, about 5%, about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50% of the active background. In some embodiments, cells expressing all target antigen (target cells) can be 10% of the active background.
Sample Objects
[0229] The sample objects can be entities derived from the input sample, such that the CUs of the present disclosure, configured to interact with the sample objects, can generate an output signal indicative of a characteristic of the input sample.
[0230] In some embodiments, a sample object can include a plurality of biological molecule. In some embodiments, the plurality of biological molecules can be DNA. In some embodiments, the plurality of biological molecule can be RNA. In some embodiments, the plurality of biological molecule can be protein or peptide.
[0231] In some embodiments, a sample object can comprise a cell. In some embodiments, the cell can comprise a plurality of biological molecules. In some embodiments, the sample object can be lysed in order to release the plurality of biological molecules, which can then subsequently interact with one or more SBEs on CUs interacting with the sample object. In some embodiments, the sample object comprises a cell, which can associate with a surfacebound entity (SBE), which is further explained below. In some embodiments, the sample object can associate with an SBE of a CU. In some embodiments, a sample object can produce a signal object (SO), which is further explained below. In some embodiments, a sample object can degrade a signal object (SO). In some embodiments, a sample object can recognize an SO. In some embodiment, such recognition can lead to the sample object modulating its ability of producing or degrading an SO. In some embodiments, sample objects can be individual entities or clusters formed through aggregation of entities. For example, sample objects can be cells, and cell clusters can be formed due to specific cell-cell interactions, wherein the formation of cell clusters can signify cytotoxicity, adherence, and/or differentiation of cells or cell lineage. [0232] In some embodiments, a sample object can comprise a molecule. In some embodiments, the molecule can comprise a plurality of biological molecules. In some embodiments, the molecule can comprise a wide array of molecules produced by the input sample. In some embodiments, the molecule can be a metabolite and/or its related compounds. In some embodiments, the molecule can be a protein or peptide and/or its derivatives. In some embodiments, the molecule can be a pheromone and/or its derivative compounds. In some embodiments, the molecule can be a signaling molecule, which can include, but not limited to, a wide array of mammalian hormones, cytokines, interleukins, and/or chemokines. In some embodiments, the molecule can be one or more oligonucleotides. In some embodiments, the molecule can be one or more DNA. In some embodiments, the molecule can be one or more RNA.
[0233] In some embodiments, sample objects from the input sample or within the set of computing entities can be pre-treated prior to any of the presently described methods. In some embodiments, pre-treatment can slow or promote metabolic processes through external influence (e.g., temperature change) or chemical treatment (e.g., metabolite or inducer supplement). In some embodiments, pre-treatment can also expose or conceal object surfaces. In some embodiments, SBEs can be occluded by nonspecific layers (e.g., polysaccharide, glycoprotein layers). Such layers can be removed by appropriate enzymatic or chemical pre-treatment. In some embodiments, sample objects can comprise background entities that can hinder recognition of target objects. Erroneous clustering with background objects can be minimized by pretreatment of the objects with elements that can interact non-specifically with surface entities and thereby fill the residual binding capacity. In some embodiments, blocking entities can inhibit non-specific binding (passive and covalent) between SBEs or between surfaces. Such blocking entities do not exhibit cross-reactivity with SBEs, and thus should not disrupt the sample objects.
Reaction Reagent
[0234] The Reaction Reagent is a composition comprising at least one computing unit (CU), wherein each CU of the at least one CU can be configured to interact with a sample object derived from an input sample such that an output signal indicative of a characteristic of the input sample can be generated. In some embodiments, the Reaction Reagent can further comprise one or more additional components, such as yeast extract, peptone, D-glucose, Dulbecco’s phosphate-buffered saline, D-(+)-Trehalose dihydrate, skim milk, and/or gelatin.
[0235] The components of the Reaction Reagent can be configured such that one or more computational clusters can be formed upon coming in contact with the at least one sample object, wherein each computational cluster comprises, independently, the at least one CUs. For example, without limitation, by mixing and matching the types of CUs (e.g., engineered or synthetic cells), the target profile can be modified as desired by users.
[0236] In some embodiments, any components of the Reaction Reagent described herein can be formulated with acceptable excipients, such as carriers, solvents, stabilizers, diluents, etc., depending upon a customized combination of CUs and target profile. Suitable excipients can include, for example, carrier molecules that include large, slowly metabolized macromolecules such as proteins, polysaccharides, polylactic acids, polyglycolic acids, polymeric amino acids, amino acid copolymers, and inactive viral particles. Other exemplary excipients can include antioxidants (for example and without limitation, ascorbic acid), chelating agents (for example and without limitation, EDTA), carbohydrates (for example and without limitation, dextrin, hydroxyalkylcellulose, and hydroxyalkylmethylcellulose), stearic acid, liquids (for example and without limitation, oils, water, saline, glycerol and ethanol), wetting or emulsifying agents, pH buffering substances, and the like.
[0237] In some embodiments, the Reaction Reagent can be dehydrated and rehydrated prior to or upon contacting the at least one sample object from the input sample.
[0238] In some embodiments, the Reaction Reagent can be stored up to about 1 weeks, up to about 2 weeks, up to about 3 weeks, up to about 4 weeks, up to about 5 weeks, up to about 6 weeks, up to about 7 weeks, up to about 8 weeks, up to about 9 weeks, or up to about 10 weeks at room temperature without loss of signal. In some embodiments, the Reaction Reagent can be stored up to about 2 weeks at room temperature without loss of signal. In some embodiments, the Reaction Reagent can be stored up to about 1 weeks, up to about 2 weeks, up to about 3 weeks, up to about 4 weeks, up to about 5 weeks, up to about 6 weeks, up to about 7 weeks, up to about 8 weeks, up to about 9 weeks, up to about 10 weeks, up to about 11 weeks, up to about 12 weeks, up to about 13 weeks, up to about 14 weeks, up to about 15 weeks, or up to about 16 weeks at 4°C without loss of signal. In some embodiments, the Reaction Reagent can be stored up to about 8 weeks at 4°C without loss of signal. Computing Units (CUs)
[0239] A computing unit is an object that has the potential to affect the informational output (e.g., output signal) of the presently described methods. Informational output can be affected in a number of different ways. For example, a CU can be an object with a classified surface-bound entity (SBE) profile, thereby mediating object association. In another example, a CU can be an object that can produce other CUs or signal objects (SOs), thereby affecting the informational output of a customized method. In yet another example, a CU can be an object that can recognize SOs, thereby affecting the memory capacity of the presently described system.
[0240] In some embodiments, according to any of the methods described herein, a CU can express one or more surface-bound entities (SBEs) upon interacting with one or more sample objects. In some embodiments, a CU of the at least one CU can be, independently, (i) associated with one or more surface-bound entities (SBEs); (ii) capable of recognizing a signal object (SO); (iii) capable of producing an SO; (iv) be capable of degrading an SO; (v) capable of producing a change in a material property of the reaction medium; (vi) capable of producing another CU; or (vii) capable of changing its state following signal recognition or external influence.
[0241] In some embodiments, a CU can be capable of producing a change in a material property of the Reaction Medium, which is further described below, of the present disclosure upon coming in contact with the Reaction Reagent comprising CUs. In some embodiments, a CU can be capable of producing a reporter entity, such as, but are not limited to, fluorescent proteins (e.g., GFP, RFP, YFP, or CFP), luminescent proteins (e.g., luciferase, such as, but are not limited to, Gaussia princeps luciferase (GLuc), Metridia longa luciferase (MLuc), Renilla reniformis luciferase (RLuc), Cypridina noctiluca luciferase (CLuc)), enzymes (e.g., betalactamase, beta-galactosidase, SEAP), any functional fragments or variants thereof. In some embodiments, the material property of the Reaction Medium can be an optical property, an electrical property, or a thermal property.
[0242] A CU can comprise non-native elements, native elements in non-native locations, or other alterations to native elements. Introduced elements can include, but are not limited to, signal object generators, SBEs, reporter molecules, regulatory sequences, genetic selection markers, other types of non-genetic markers (e.g., magnetic, immunological), reference reporter molecules, enzyme coding genes (e.g., protease, kinase, phosphatase), etc. In some embodiments, modifications can be performed by introducing genetic changes at select chromosomal regions. Chromosomal modifications can either introduce new genetic elements at one or more desired loci or modify native elements (promoters, degradation tags, termini tags, transposon sites, etc.) at native loci. In some embodiments, modifications can be performed by introducing additional genetic material, e.g., plasmids or synthetic chromosomes.
[0243] In some embodiments, a CU can be a wildtype cell (e.g., of bacterial, yeast, mammalian origin) or an engineered or synthetic cell (e.g., any genetically engineered cell). In some embodiments, a CU can be an engineered or synthetic cell based on or derived from a yeast cell. In some embodiments, suitable yeasts can include, but are not limited to, Pichia pastoris, Saccharomyces cerevisiae, Arxula adeninivorans (Blastoboiys adeninivorans)', Candida boidinii, Hansenual polymorpha Pichia angusta), Kluveromyces lactis, Yarrowia lipolytica etc. In other embodiments, a CU can be monomeric or multimeric molecule (e.g., a polypeptide, polypeptide derivative, nucleic acid).
[0244] In some embodiments, a CU can be a cell comprising an SBE covalently attached to an anchor present within the cell. In some embodiments, the SBE can be covalently attached to the anchor by a linker. In some embodiments, the linker can comprise a repeat motif. In some embodiments, the linker can facilitate accessibility of the SBE and/or increases an effective contact area between a sample object and the CU upon coming in contact with the Reaction Medium.
[0245] In some embodiments, a CU can be a cell displaying SBEs on its outermost surface (e.g. a cell wall or membrane). In such case, the SBEs and the cell can be produced separately and subsequently associated by standard practices, e.g., through immunological labeling. Alternatively, SBEs can be produced by the CU itself and tethered or anchored on the surface. A SBE can be tethered to the surface by interaction with surface anchored proteins. Alternatively, a SBE can be anchored to the surface as a fusion polypeptide with a surface anchor moiety that interacts either with the membrane (e.g., by hydrophobic interaction with membrane lipids) or the cell wall (e.g., covalent bonding to cell wall polysaccharides or polypeptides).
[0246] In some embodiments, a CU can be a molecule (e.g, a monomeric or multimeric molecule). In some embodiments, the molecule can include, without limitation, a polypeptide, a polypeptide derivative, a nucleic acid, and/or a solid support. In some embodiments, a CU can comprise a polypeptide. In some embodiments, the polypeptide can be an antibody, e.g., a bi specific antibody (BsAb). In some embodiments, the polypeptide can be an enzyme. In some embodiments, the method can further comprise an agent, which the enzyme can convert into a signal object (SO). In some embodiments, a CU can comprise a solid support. In some embodiments, the solid support can be a functionalized bead.
[0247] In some embodiments, a CU can be a molecule with a single SBE or multiple SBE moieties of the same or different specificities. This can include a collection of immunoglobulins, their derivatives (e.g., scFv, Fab, diabody, etc.), or similar binding entities with some level of molecular specificity (e.g., DARPins, TALENS, antigens, nucleic acids).
[0248] In some embodiments, a first CU of the at least one CU can be capable of producing a second CU, wherein the second CU can be of the same type as the first CU or of a different type than the first CU.
[0249] In some embodiments, a first CU of the at least one CU can interact with a second CU of the at least one CU to influence SBE profiles of the first CU and/or the second CU. In some embodiments, a first CU of the at least one CU can be associated with a first SBE and a second CU of the at least one CU can be associated with a second SBE. In some embodiments, the first SBE and second SBE can be capable of forming a complex comprising the first SBE and second SBE.
[0250] CUs can be classified by the types of SBEs they expose to the Reaction Medium. Each SBE can be typed by how it can be recognized (e.g., by van der Waals forces, hydrogen bonds, hydrophobic and/or ionic interactions). Presentation of SBEs can be either constant, spontaneous, or induced (e.g., initiated following detection of signals transmitted in the medium or following direct interaction with objects). CUs can be classified at each point in time. A CU’s class can change in time as the CU’s SBE profile changes. A CU can be called a target object at a given point in time if its current SBE profile belongs to one of the predetermined classes. If two target objects have statistically similar SBE profiles, they can be assigned to the same target object class.
Surface-Bound Entities (SBEs)
[0251] SBEs can mediate object association either directly or through intermediate objects. In some embodiments, a SBE profile can define a CU class that can subsequently serve in constructing the methods and systems that perform particular computing functions. In some embodiments, SBEs can be expressed on the surface of a CU upon the CU interacting with one or more sample objects. In some embodiments, a first CU interacting with a second CU can influence the SBE profile(s) of the first CU and/or the second CU. In some embodiments, the SBE profile can be constant in time, change spontaneously (e.g., by random change of internal state), change as a result of an internal state change (e.g. , caused by an internal engineered mechanism), or change as a result of induction e.g., by chemical, temperature, light, or electromagnetic changes). In some embodiments, an SBE can have a dual purpose of (1) binding to a CU to a sample object; and (2) recognizing an SO.
[0252] In some embodiments, a SBE can interact with a plurality of biological molecules released from one or more sample objects. In some embodiments, the one or more sample objects can interact with one or more CUs and can express SBEs upon interacting with the one or more sample objects. In some embodiments, the one or more sample objects interacting with the one or more CUs can be lysed in order to release the plurality of biological molecules from the one or more sample objects. Upon lysis, the plurality of biological molecules can interact with at least one SBE on the surface of the one or more CUs. In some embodiments, the plurality of biological molecules can also interact with one or more molecular tags, which is explained further below.
[0253] In some embodiments, a SBE can be a single-stranded DNA (ssDNA) binding protein. In some embodiments, a SBE can be a mRNA cap binding protein, such as, but not limited to eIF4E. In some embodiments, a SBE can be an oligonucleotide anchored to the CU surface, e.g., poly(dT). In some embodiments, a SBE can be an antigen on the cell surface (e.g., cell surface receptor). In some embodiments, the antigen can be a disease-associated antigen (e.g., cancer- associated antigen). In some embodiments, the SBE can be a marker indicating a state of the sample object (e.g., differentiation factors or clusters of differentiation (CDs)). In some embodiments, the SBE can be a biological signal (e.g., MHC, MHC epitope complex, or glycocalyx). In some embodiments, the SBE can be a marker indicating an activity of the sample object (e.g., receptors, receptor ligand complexes, or ion channels, and their modified forms). In some embodiments, the SBE can be a pathogenic marker (e.g., glycoproteins or lectins). In some embodiments, the SBE can be a synthetically produced molecule (e.g., surface tags, displayed epitopes, or conjugated molecules). [0254] In some embodiments, a SBE can be covalently attached to a surface of a CU. In some embodiments, the SBE can be covalently attached to the surface of the CU by a linker. In some embodiments, the linker can comprise a repeat motif. In some embodiments, the linker can be a polypeptide linker. In some embodiments, the CU can be a cell, and the SBE can be covalently attached to an antigen present on the surface of the cell. For example, the SBE can be a polypeptide covalently attached to a surface receptor of the cell. In some embodiments, the CU can be a solid support, and the SBE can be covalently attached to the surface of the solid support. For example, the SBE (e.g., a polypeptide SBE) can be covalently attached to the surface of a bead (e.g, a functionalized bead).
[0255] In some embodiments, a SBE can be non-covalently attached to a surface of the CU. In some embodiments, the SBE can comprise a binding moiety that can bind to a surface of the CU. In some embodiments, the CU can be a cell, and the SBE can comprise a binding moiety that can bind to an antigen present on the surface of the cell. For example, the SBE can comprise a ligand that binds to a surface receptor of the cell. In some embodiments, the SBE binding moiety can be an antibody moiety. In some embodiments, the CU can be a solid support, and the SBE can comprise a binding moiety that can bind to a surface of the solid support. For example, the SBE (e.g, a polypeptide SBE) can comprise a binding moiety that can bind to a surface of a bead (e.g., a functionalized bead).
[0256] In some embodiments, SBEs can be displayed as a part of glycosylphosphatidylinositol (GPI) anchored fusion protein. The recombinant structure of the fusion protein can be optimized to promote interaction between objects and, in particular, between cellular CUs and target objects. In some embodiments, the fusion protein construct can include domains of common yeast flocculation proteins, e.g., S. cerevisiae Flol, Flo5, Flo9, FlolO and Flol 1. In some embodiments, the domains can include putative GPI associated moieties. In some embodiments, the domains can also include truncated fragments of the extracellular domains to increase the area of contact between adjoining objects, thereby increasing the strength of association between objects displaying complementary SBEs.
[0257] In some embodiments, a SBE can bind to a cognate binding partner. In some embodiments, the cognate binding partner can be a signal object (SO). In some embodiments, the cognate binding partner can be another SBE. In some embodiments, binding of the SBE to the other SBE can modulate an activity of the CU. In some embodiments, binding of the SBE to the other SBE can modulate the ability of the CU to produce an SO. In some embodiments, binding of the SBE to the other SBE can modulate the ability of the CU to degrade an SO. In some embodiments, the SBE can comprises an SO. In some embodiments, binding of the SBE to the SO can modulate an activity of the CU. In some embodiments, binding of the SBE to the SO can modulate the ability of the CU to produce an SO. In some embodiments, binding of the SBE to the SO can modulate the ability of the CU to degrade an SO.
[0258] In some embodiments, a SBE can be covalently attached to the surface of a sample object. In some embodiments, a SBE can be non-covalently attached to the surface of a sample object.
[0259] Engineered display of SBEs in various cellular systems has been disclosed in a number of publications, addressing a wide range of organisms ranging from phage and E. coli to yeast and general eukaryotic systems and also addressing protein folding, secretion, surface capture, and anchoring or tethering mechanisms.
[0260] In some embodiments, an SBE can be a cellular receptor. In some embodiments, receptor SBEs can be localized to the cellular membrane or cytoplasm. In some embodiments, receptor SBEs can include a transcription factor or a transmembrane receptor. In some embodiments, a SBE can be located on the surface of a cellular CU with a signaling moiety located on the cytoplasmic side of the cellular membrane. A cytoplasmic receptor can be any SO binding entity that can directly or indirectly lead to metabolic modulation, such as, but not limited to, a transcription factor or an aptamer that can change the transcription or translation rate of one or more genes. In some embodiments, a SBE can be a surface receptor or a transmembrane receptor protein. In cases where the CU can be derived from a yeast cell, a SBE can be a yeast receptor or its derivative (e.g., the signaling moiety of a yeast receptor fused to an engineered segment). Alternatively, a SBE can include a heterologous signaling moiety fused to a yeast signaling moiety and can incorporate other modifications for improved activity. In such a way, the SBE can incorporate homologous and/or heterologous segments of G protein-coupled receptors (GPCRs) in yeast derived CUs. Incorporated GPCRs can include modifications (e.g., compositions of extracellular, transmembrane, and cytoplasmic domains of GPCRs from different organisms and mutations improving their signaling properties) that can alter the receptors’ specificities, sensitivities, and signaling activities. Incorporated GPCRs can also include modifications (e.g, mutations or truncations in cytoplasmic domains) that can alter post- translational regulation of receptor activity (e.g., degradation, molecular interaction). Nonlimiting exemplary yeast GPCRs can include S. cerevisiae pheromone receptors STE2 and STE3 or their derivatives (e.g., mutants with altered stability). Non-limiting exemplary bacterial pheromone transcription factor proteins can include LuxR proteins that can sense bacterial pheromones N-acyl homoserine lactones.
[0261] In some embodiments, a CU can comprise one or more SBEs. In some embodiments, a CU can comprise a first set of SBE and a second set of SBE. In some embodiments, a CU can be engineered to display the second set of at least one SBE upon interacting with at least one sample object of the plurality of sample objects. In some embodiments, the plurality of sample objects can interact with the plurality of CUs via a first set of at least one SBE. In some embodiments, the plurality of biological molecules can interact with the plurality of CUs via a second set of at least one SBE. In some embodiments, the plurality of CUs bound to each sample object of the plurality of sample objects can provide a physical barrier, preventing diffusion of the plurality of biological molecules upon permeabilizing of the sample object.
Internal States
[0262] In some embodiments, a CU can store information regarding past interactions with objects and external influences in its internal state. For the purposes of the present disclosure, the internal state of the CU can be not necessarily identical to the full state of the physical object as is defined by dynamical systems theory. Instead, the internal state contains information necessary to support and execute future computing actions. The internal state can include continuous variables (e.g., ionic concentrations, permittivities, permeabilities, internal pressures, absorbances, rigidities), discrete variables (e.g., entity copy numbers, degrees of polymerization, set of molecular conformations, molecular modifications), as well as spatial distributions (e.g., compartmentalization of entities, polarization). In some cases, for the purposes of computing system modeling, it can be convenient to define the state through probability distributions.
[0263] Internal states can be defined for both molecular and cellular CUs. Internal states of cellular CUs, can be aptly described by biochemical reaction network models. Physical manifestations of the states, as related to the present disclosure, can be copy numbers of certain molecular species, in particular regulatory molecular species (e.g., transcription factors, regulatory RNAs, transferases) that contribute to maintaining cellular homeostasis. In some embodiments, the physical manifestations of the state can include copy numbers of active or inactive transcription factors in select cellular compartments. Wildtype transcription factors or their regulators can be considered. Heterologous transcription factors modified for the given host organism can also be considered. In addition, novel transcription factors can be considered. The internal state of the molecular CU can be set at time of production or through later interaction with objects or CUs.
State Changes and Processing
[0264] In some embodiments, a CU can implement mechanisms that change its state following signal recognition or external influence. The effect of a mechanism can be predicted precisely (e. ., rapid and stable conformational change) or can have a stochastic nature (e.g., a change in an entity’s time averaged copy number). In some embodiments, the state change can happen immediately following interaction of the SBE with its cognate ligand or following external influence. For example, the SO can interact directly with a transcription factor or the transcription factor can undergo conformational changes as a result of a shift in e.g., temperature or illumination. In some embodiments, the state change can follow a transient internal process during which the state change can permeate but the effector process can be reset upon signal removal. The effector process can include one or more intermediate steps, wherein molecular species can undergo modifications that can build in parallel or in sequence. In some embodiments, the effectors can form a cascade, where the first effector can modify the second effector, and so on. In some embodiments, the cascade can also involve additional elements that can support or inhibit its progress. For example, eukaryotic cells can implement widely conserved MAPK cascades that make possible various signal processing functions and accept multiple regulators by which the cascades can be redirected and repurposed. In such example, the effector process can include an MAPK cascade coupled to adapter proteins and G proteins to signal sensing GPCRs.
[0265] In some embodiments, transient modification of e.g., transcription factor complexes following signal recognition can also serve in changing the internal state of a cellular CU. In some embodiments, a transcription factor can be modified directly by the SO. In some embodiments, a transcription factor can be modified as part of the transient process that can ensue following a signal recognition event. In some embodiments, activated transcription factors or associated elements can yield changes in transcription that can subsequently alter many other processes. In some embodiments, novel transcription factors can be used to transform signal recognition events into transcriptional changes. In some embodiments, novel transcription factors can allow pathway rerouting to promoters that can be independent of a native response. In addition, novel transcription factors can enable further modulation or signal processing.
[0266] In some embodiments, a CU can implement mechanisms that can process and change the state without external influence. In some embodiments, the mechanisms can stabilize the current state by nullifying the effects of random and external actions (e.g, regulation by negative feedback) or execute conditional state transitions that persist (e.g., periodic changes) or terminate in a finite number of steps (e.g., evaluation of logical operations).
[0267] In some embodiments, the internal state of the CU can affect intracellular entities that are not themselves part of the state. For example, in cases where the state determines gene activities, the current state of the system can determine the copy numbers of the corresponding gene products and thereby the states of any entity those products can affect. The affected entities can include SBEs. Hence, in some embodiments, the SBE profile of a CU can change as a result of a state change. The affected entities can include SBEs and any elements that can relay signal recognition events to other parts of the cell. The affected entities can include produced CUs or signal objects. In some embodiments, any of the entities affected through state change can be equally affected by signal recognition events or external influences.
[0268] In some embodiments, entities can be introduced that are not affected by signal recognition events, external influences, or internal state changes. These entities can be taken from the same family of entities as the reporter entities (z.e., fluorescent proteins, luminescent proteins, enzymes, etc.) and can be used to generate control measurements to which other measurements can be compared.
[0269] In some embodiments, the CU state change can detect cell state changes, such as, but are not limited to, cell specific apoptosis, cell specific activation, cell specific suppression, or stem cell differentiation. In some embodiments, the cell state change is cell specific apoptosis. In some embodiments, the Apoptosis Reaction Tubes comprise Reaction Reagent. In some embodiments, the Apoptosis Reaction Tubes comprises strain Annexin, strain EGFR, and strain BARI (SEQ ID NOs: 1-3). Further description of apoptosis detection is described in Example 3. Reaction Medium
[0270] The Reaction Medium is a composition comprising a semi-permeable nutritive medium. In some embodiments, a material property of the Reaction Medium can be an optical property, an electrical property, or a thermal property. In some embodiments, the Reaction Medium has thermoresponsive properties.
[0271] The components of the Reaction Medium can be configured such that one or more computational clusters can be formed upon coming in contact with the at least one sample object, wherein each computational cluster comprises, independently, the at least one CUs.
[0272] In some embodiments, any components of the Reaction Medium described herein can be formulated with acceptable excipients, such as carriers, solvents, stabilizers, diluents, etc., depending upon a customized combination of CUs and target profile. Suitable excipients can include, for example, carrier molecules that include large, slowly metabolized macromolecules such as proteins, polysaccharides, polylactic acids, polyglycolic acids, polymeric amino acids, amino acid copolymers, and inactive viral particles. Other exemplary excipients can include antioxidants (for example and without limitation, ascorbic acid), chelating agents (for example and without limitation, EDTA), carbohydrates (for example and without limitation, dextrin, hydroxyalkylcellulose, and hydroxyalkylmethylcellulose), stearic acid, liquids (for example and without limitation, oils, water, saline, glycerol and ethanol), wetting or emulsifying agents, pH buffering substances, and the like. For example, the Reaction Medium can comprise gelatin, alginic acid sodium salt, yeast extract, peptone, D-glucose, and/or an antibiotic (c. ., tetracycline, doxycycline, ampicillin, etc.). In another example, the Reaction Medium can comprise gelatin, alginic acid sodium salt, yeast extract, peptone, D-glucose, and ampicillin.
[0273] In some embodiments, the Reaction Medium can include a crowding agent. In some embodiments, the crowing agent can be a hydrogel. In some embodiments, the crowding agent can include, but not limited to, polyethylene glycol (PEG), sucrose, urea, Ficoll, dextran, cellulose, chitosan, poly(lactic-co-glycolic acid), hydroxypropyl methylcellulose (HPMC), poly(N-isopropylacrylamide) (PNIPAAm), poly(2 -hydroxyethyl methacrylate) (PHEMA), poly(vinyl caprolactam) (PVCL), composite synthetic/proteinaceus hydrogel, such as PEG/PVA/PVP+gelatin, biopolymer hydrogel, such as chitosan, hyaluronic acid, silk fibroin, and their functionalized variants, and/or protein, such as BSA. In some embodiments, the crowding agent can be temperature responsive, such as, but not limited to, hydroxypropyl methyl cellulose (HPMC), poly(N-isopropylacrylamide) (PNIPAAm), poly(2-hydroxyethyl methacrylate) (PHEMA), poly(vinyl caprolactam) (PVCL). In some embodiments, the crowding agent can be temperature, pH, and/or osmolarity responsive, such as, but not limited to, composite synthetic/proteinaceus hydrogel (e.g., PEG/PVA/PVP+gelatin) and/or biopolymer hydrogel (e.g., chitosan, hyaluronic acid, silk fibroin, and their functionalized variants).
[0274] In some embodiments, the Reaction Medium can be stored up to about 1 weeks, up to about 2 weeks, up to about 3 weeks, up to about 4 weeks, up to about 5 weeks, up to about 6 weeks, up to about 7 weeks, up to about 8 weeks, up to about 9 weeks, or up to about 10 weeks at room temperature without loss of signal. In some embodiments, the Reaction Medium can be stored up to about 2 weeks at room temperature without loss of signal. In some embodiments, the Reaction Medium can be stored up to about 1 weeks, up to about 2 weeks, up to about 3 weeks, up to about 4 weeks, up to about 5 weeks, up to about 6 weeks, up to about 7 weeks, up to about 8 weeks, up to about 9 weeks, up to about 10 weeks, up to about 11 weeks, up to about 12 weeks, up to about 13 weeks, up to about 14 weeks, up to about 15 weeks, or up to about 16 weeks at 4°C without loss of signal. In some embodiments, the Reaction Medium can be stored up to about 8 weeks at 4°C without loss of signal.
Readout Reagent
[0275] The readout reagent is a composition comprising a reporter entity. In some embodiments, the readout reagent further comprises a buffer. A reporter entity can be any suitable molecular entity that can affect quantitative or qualitative measurements. Non-limiting exemplary reporter entities can include fluorescent proteins (e.g., GFP, RFP, YFP, or CFP), luminescent proteins (c.g, luciferase, such as, but are not limited to, Gaussia princeps luciferase (GLuc), Metridia longa luciferase (MLuc), Renilla reniformis luciferase (RLuc), Cypridina noctiluca luciferase (CLuc)), enzymes (e.g, beta-lactamase, beta-galactosidase, SEAP), any functional fragments or variants thereof. The origin of these reporters and the coding sequences are fully disclosed in the current state of the art.
[0276] In some embodiments, signal recognition events, external influences, or state changes in the presently disclosed system can lead to production of reporter entities. In some embodiments, the produced reporter entities can be cytoplasmic. In some embodiments, the produced reporter entities can be secreted and linked to the surface. In some embodiments, the produced reporter entities can be secreted and released into the medium of the system described herein. In some embodiments, secretion of reporter entities can increase their accessibility or increase their reporting function.
[0277] In some embodiments, entities can be introduced that are not affected by signal recognition events, external influences, or internal state changes. These entities can be taken from the same family of entities as the reporter entities (z.e., fluorescent proteins, luminescent proteins, enzymes, etc.) and can be used to generate control measurements to which other measurements are compared.
[0278] In some embodiments, a readout reagent comprises a luminescence substrate and a luminescence buffer. In some embodiments, the luminescence substrate and luminescence buffer can be mixed at 1 : 10 prior to assay readout. In some embodiments, the luminescence substrate and luminescence buffer can be mixed at 1 :20 prior to assay readout. In some embodiments, the luminescence substrate and luminescence buffer can be mixed at 1 :30 prior to assay readout. In some embodiments, the luminescence substrate and luminescence buffer can be mixed at 1 :40 prior to assay readout. In some embodiments, the luminescence substrate and luminescence buffer can be mixed at 1:50 prior to assay readout. In some embodiments, the luminescence substrate and luminescence buffer can be mixed at 1 :60 prior to assay readout. In some embodiments, the luminescence substrate and luminescence buffer can be mixed at 1 :70 prior to assay readout. In some embodiments, the luminescence substrate and luminescence buffer can be mixed at 1 :80 prior to assay readout. In some embodiments, the luminescence substrate and luminescence buffer can be mixed at 1 :90 prior to assay readout. In some embodiments, the luminescence substrate and luminescence buffer can be mixed at 1 : 100 prior to assay readout.
Signal Objects
Figure imgf000089_0001
[0279] Signal objects (SOs), unlike CUs which can be intended to generate new information, do not generate new information. SOs can be intended to complement CUs by transferring information between entities. For practical purposes, a SO can be any object that can interact with a SBE. In some cases, an object can be both a SO and a CU. For instance, a SO can be fused to a molecular entity that itself has signal processing functions. This can result in a SO that can interact with other SOs or CUs that can be recognized by SBEs. In other cases, SOs can be suspended in the medium before association with their respective SBEs. In some embodiments, the SO can be produced by a CU in the system. In some embodiments, the SO can be degraded by a CU in the system. In some embodiments, the SO can be produced by a sample object in the system. In some embodiments, SOs can be added to a medium. In some embodiments, SOs can be generated by system entities (e.g., CUs and/or sample objects). In some embodiments, SOs can be generated in the medium from other SOs. In some embodiments, SOs can be generated within CUs from elementary metabolic precursors (e.g., by protein expression). In some embodiments, SOs can be secreted.
[0280] In some embodiments, secreted SOs can belong to families of yeast pheromones. Expression of yeast pheromones can then rely on the presence of either wildtype coding sequences or their recombinant derivatives introduced in tandem with appropriate regulatory sequences (e.g., promoters, untranslated regions, transcription factor binding sites). In some embodiments, yeast pheromones can belong to a family of lipidated pheromones of alpha factor type that can be expressed as precursors and secreted using non-traditional pathways. In some embodiments, yeast pheromones can belong to a family of peptide pheromones of the alphafactor type that can be also expressed as precursors but can be secreted using traditional pathways. In some embodiments, several modes of regulation can be used to control and activate biogenesis of both pheromone types.
[0281] Suspended SOs can have various properties that can affect their recognition. In some embodiments, a SO can be membrane permeable and hence free to interact with cytoplasmic SBEs. In some embodiments, a SO can be membrane impermeable and hence can require SBEs that can be exposed to the medium. In some embodiments, recognition of SOs can be inhibited by other mechanisms (e.g., hydrophobic sequestration) and can require additional treatments. [0282] In some embodiments, SOs can be metabolites (e.g., amino acids, carbohydrates, ions, etc.), antibiotics (e.g., tetracycline, doxycycline, ampicillin, etc.), and various synthetic compounds e.g., isopropyl beta-D-l-thiogalactopyranoside-IPTG, nitrocefin, anhydrotetracycline-aTc, toxins, etc.). In some embodiments, SOs can be biogenerated entities, e.g., signaling molecules of viral, bacterial, or mammalian origin. Well-known families of biogenerated signaling molecules can include, but are not limited to, bacterial acetylated homoserine lactones (AHL) and type 2 autoinducers, yeast mating pheromones, plant hormones, animal morphogens, and a wide array of mammalian hormones, cytokines, interleukins, chemokines etc. Biogenerated signaling molecules can also be engineered for altered specificity or function.
Internal Signals
[0283] SOs produced by CUs can be referred to as internal signals. In some embodiments, an SO can be an internal SO.
[0284] In some embodiments, internal signals can be produced from other signals, e.g., by cleavage of existing signals or from metabolic precursors within cellular CUs. In some embodiments, internal signals can be engineered for altered behavior. In some embodiments, SOs or enzymes contributing to their productions can be expressed from recombined genes that comprise specific regulatory sequences (e.g., promoters, operators, untranslated regions). In some embodiments, signal behavior can be engineered through addition of synthetic open reading frame (ORF) elements. In some embodiments, single codon changes can redirect the signal to non-native SBEs.
[0285] In some embodiments, peptide signal degradation can be modulated by extending the terminal domains with a peptidase recognition tag. In some embodiments, peptide signals can be translated as pre-pro-peptides, where a series of additional cytoplasmic, periplasmic, or extracellular processing steps can be required to produce mature pheromones. The pre-pro- peptide format can provide a platform for engineering signal activation, strength, and specificity and can therefore increase informational content of a single internal SO. In some embodiments, a SO can have multiple activity states, each of which can be characterized by its affinity for SBEs. In some embodiments, cleavage of pro-peptide sequences can transition SOs between the multiple activity states. In some embodiments, putative protease recognition sites within the propeptide sequence can be used to encode transitions between the multiple activity states.
[0286] In some embodiments, the peptide signals can belong to or be derived from a family of lipidated or non-lipidated yeast pheromones (e.g., S. cerevisiae alpha mating factor). For example, yeast pheromone pro-peptide sequences can be engineered to increase or decrease the signal strength of a single pre-pro-peptide by varying the number of mature peptides encoded within a single ORF. For example, pheromone coding sequences can be flanked by protease recognition sites and repeated within a single ORF. In some embodiments, the yeast pheromone alpha factor can be translated as a pre-pro-peptide with up to about four mature peptide coding sequences, the number of which can be increased or decreased using the same flanking motifs established in the wildtype sequence. In some embodiments, the pro-peptide sequence can also be extended to increase the number of activity states. In some embodiments, additional sequences that can include recognition sites for non-native proteases can be inserted into wildtype sequence. By such modification, intermediate activity states can be produced, wherein the SOs recognizes SBEs that can be different from those recognized by the fully mature pheromone.
[0287] In some embodiments, the mature pheromone coding sequence can also be altered. In some embodiments, mature pheromone sequences from one species can be exchanged for pheromone sequences from another species. Previous work has shown that crosstalk between pheromone-receptor pairs from different species can be negligible. In some embodiments, internal signals can also include signaling molecules from unrelated species. Such internal signals can exhibit properties desirable for some applications.
[0288] In some embodiments, internal signals can be membrane permeable and hence detectable by simple mechanisms. In some embodiments, internal signals can provide complete orthogonality with respect to other signal objects in a medium (e.g., plant hormones). In some embodiments, SOs can be targeted towards SBEs of target objects in a medium. Such signals can include mammalian cytokines, chemokines, interleukins, growth hormones, neuropeptides, etc.
[0289] In some embodiments, recombination technologies can be used to produce heterologous signals in various cellular CUs. In some embodiments, yeast cells can provide a cellular platform wherein signals of various origins (e.g., bacterial, mammalian, or other higher eukaryotes) can be produced. In some embodiments, production of such signals can require additional metabolic engineering to enable necessary post-translational modifications and metabolic processing.
[0290] In some embodiments, internal signals can also exhibit properties that can enable their easy measurement by external devices. Measurable signals can include molecules with high specificity that can be quantified directly (e.g., immunosorbent assay or polymerase chain reaction). Measurable signals can also include molecules that can alter bulk properties (e.g., absorbance, fluorescence, etc.) of a medium comprising the presently described system. In some embodiments, easily measureable internal signal can primarily be used for readout of device states. Non-limiting exemplary internal signals can include fluorescent proteins (e.g., GFP, RFP, YFP, or CFP), luminescent proteins (e.g., luciferase, such as, but are not limited to, Gctussia princeps luciferase (GLuc), Metridia longa luciferase (MLuc), Renilla reniformis luciferase (RLuc), Cypridina noctiluca luciferase (CLuc)), enzymes (e.g., beta-lactamase, betagalactosidase, SEAP), any functional fragments or variants thereof.
[0291] The internal signal classification does not bar a signal from also being classified as external. For instance, signals that can be produced within a medium comprising the presently described system, isolated, and manually returned to the medium at a later time can be both internal and external signals.
External Signals
[0292] SOs that are not produced by CUs can be referred to as external signals. In some embodiments, an SO can be an external SO.
[0293] In some embodiments, external signals can include synthesized molecules or purified biogenerated products. In some embodiments, external signal can include metabolites and related compounds or short peptides. In some embodiments, SOs can be added to a medium comprising the presently described system, and the added SOs can include transcriptional inducers (e.g., IPTG, aTc, AHL), classes of amino acids (e.g., aromatic amino acids), pheromones (e.g., alpha factor), etc.
[0294] In some embodiments, external signals can include optical signals that can illuminate a medium comprising the presently described system. In some embodiments, external signals can include magnetic signals that can magnetize system objects. In some embodiments, external signals can include electrical signals that can polarize object charge.
[0295] In some embodiments, external signals can also be produced by entities in the input sample. In some embodiments, such signals can include a wide array of molecules that can be well-mixed throughout a medium comprising the presently described system. In some embodiments, such signals can be localized to specific target objects. Well-mixed signals can serve to coordinate system wide computing functions. Localized signals can affect only those CUs that recognize the target object (i.e., CUs with SBEs that interact with SBEs displayed by the target object class). [0296] Production of external signals can be constant in time or time-varying according to some predetermined pattern (e.g., exponentially decaying in time).
Output Signals
[0297] In any of the methods described herein, the logical operator modules can receive one or more input signals (e.g., sample objects) and generate one or more output signals (e.g., output SOs) based on the specified modules.
Signal Modulations
[0298] In some embodiments, a CU can affect internal SOs or external SOs. A CU can affect a single or multiple signals directly (e.g., by producing, degrading, or transforming signals) or indirectly (e.g., by producing CUs). In some embodiments, effects on SOs can be constant or time-dependent, where changes can occur spontaneously, following a change in internal state, or during signal induction. Signals can be affected while at least partially exposed to the medium of the present disclosure (i.e., Reaction Reagent + Reaction Medium). In some embodiments, a CU can catalyzes processing of signals. Such a catalyst can be an enzyme that can nullify SOs by affecting their degradation or an enzyme that can change the specificity of the SO towards SBEs. For example, an inactive SO can be split into one or more active SOs recognized by specific SBEs. For example, in the case of peptide signals or their derivatives, the enzyme can be a protease that can specifically or nonspecifically recognize and cleave the signal amino acid sequence. In some embodiments, purified proteases can be added to the medium of the present disclosure (i.e., Reaction Reagent + Reaction Medium) in an active or inactive form that can be cis or trans activated. A wide variety of commercially available proteases can be used for this purpose. In some embodiments, heterologous, native, or engineered proteases can also be secreted by the CUs.
[0299] In some embodiments, CUs can affect signals directly by secreting SOs into the medium. In some embodiments, SOs and cellular CUs can be secreted and linked to the CU surface. In some embodiments, SOs and cellular CUs can be secreted and released into the medium. Such secretion can be constant in time, induced by recognized signals, or induced following an internal state change. In some embodiments, intracellular SOs can be loaded in the CU any time prior to use (e.g., by electroporation or other disclosed methods for peptide transfection); however, in most cases, intracellular SOs can be synthesized by the CUs using available or engineered metabolic processes. In some embodiments, SOs can be produced from available metabolites by appropriate synthesizing enzymes. In some embodiments, polypeptide SOs can be produced by either heterologous or homologous gene expression, where the expression itself can be either constant in time or time-dependent with changes occurring either upon signal induction or internal state change. In some embodiments, SOs or potential CUs can be stored by the current CU for a period of time prior to secretion or secreted directly. Storage of SOs and their conditional secretion can be accomplished by synthesis of object precursors and rapid secretion following final processing (e.g., addition of functional groups or release by cleavage of pre-domains), where the processing itself can be induced through regulated gene expression or post-translational activation of catalyzing agents.
[0300] In some embodiments, biogeneration of SOs can require heterologous metabolic mechanisms. In such case, the secreted SOs can be recognized by entities in the input sample, rather than being recognized by other CUs. In such cases, the CU can be metabolically engineered to produce and regulate the various metabolic factors that can be necessary to produce the SOs from the native metabolic precursors.
[0301] In some embodiments, CUs can affect signals indirectly by producing CUs of the same or different type. In some embodiments, the capability of CUs to produce or degrade SOs can be enhanced or attenuated by binding of the SBE to its cognate binding partner. In some embodiments, the cognate binding partner can be a SBE associated with at least one CU. In some embodiments, the cognate binding partner can be a SBE associated with at least one sample object. In some embodiments, the cognate binding partner can be another SO.
[0302] In some embodiments, a sample object can be associated with an SBE and can produce or degrade an SO. In such embodiment, the capability of the sample object to produce the SO can be enhanced or attenuated by binding of the SBE to its cognate binding partner. In some embodiments, the cognate binding partner can be a SBE associated with at least one CU. In some embodiments, the cognate binding partner can be a SBE associated with at least one sample object. In some embodiments, the cognate binding partner can be another SO. Signal Recognition
[0303] In some embodiments, a CU can be affected by internal or external SOs through interaction with a SBE. In some embodiments, a SBE can be specific for a single signal or multi-specific for a subset of signals with variable sensitivity for each member of the subset. In some embodiments, a SBE can recognize a signal through physical interactions, e.g., hydrogen bonds, van der Waals forces, hydrophobic interactions, and/or ionic interactions. In some embodiments, a SBE can display some change in activity once an interaction can be initiated. In some embodiments, signal interactions can activate or inhibit e.g., an enzymatic process of a molecular CU with a SBE moiety and an enzymatic function. In some embodiments, a CU can be a protease with a SBE located near its active site, where signal interactions can lead to protease inhibition. In some embodiments, signal interactions with a SBE does not necessarily prevent processing of the SO. For example, further processing of the signal interaction by another protease can reverse an inhibitory effect of the original signal.
[0304] In some embodiments, a SBE can be located on the surface of a cellular CU with a signaling moiety located on the cytoplasmic side of the cellular membrane. In some embodiments, a SBE can be a surface receptor or a transmembrane receptor protein. In some embodiments, a SBE can be a cytoplasmic protein with a binding moiety. In some embodiments, a SBE can be a transcription factor with a sensory domain that can recognize membrane permeable SOs.
[0305] In some embodiments, a CU can be configured to respond to an SO in the medium of the present disclosure (i.e., Reaction Reagent + Reaction Medium). In some embodiments, the CU can be configured to respond to an SO in the medium by enhancing the level of the SO in the medium. In some embodiments, the SO can be an internal SO that can be recognized by at least one CU. In some embodiments, the SO can be an internal SO that can be recognized by at least one sample object. In some embodiments, the SO can be an output SO, and the output signal can comprise the SO level. In some embodiments, a CU can producing an SO, and interaction of the SO with the CU can enhance the capability of the CU to produce the SO. In some embodiments, the CU can be configured to respond to the SO by producing another SO, wherein interaction of the other SO with another CU can enhance the capability of the other CU to produce the SO. Computational Clusters
[0306] According to any of the methods provided herein, the CUs can form one or more clusters, also referred to herein as a computational cluster, in a medium (i.e., Reaction Reagent + Reaction Medium) comprising the presently described system, upon coming in contact with at least one sample object derived from an input sample. In some embodiments, the computational cluster can further comprise at least one sample object derived from the input sample. In some embodiments, the computational cluster can comprise one or more CUs, and optionally one or more sample objects.
[0307] In some embodiment, the computational clusters can be formed after a sufficient amount of time in an incubator. In some embodiments, the sufficient amount of time can be about 2 hours, about 2.1 hours, about 2.2 hours, about 2.3 hours, about 2.4 hours, about 2.5 hours, about 2.6 hours, about 2.7 hours, about 2.8 hours, about 2.9 hours, about 3 hours, about 3.1 hours, about 3.2 hours, about 3.3 hours, about 3.4 hours, about 3.5 hours, about 3.6 hours, about 3.7 hours, about 3.8 hours, about 3.9 hours, about 4 hours, about 4.1 hours, about 4.2 hours, about 4.3 hours, about 4.4 hours, about 4.5 hours, about 4.6 hours, about 4.7 hours, about 4.8 hours, about 4.9 hours, about 5 hours, about 5.1 hours, about 5.2 hours, about 5.3 hours, about 5.4 hours, about 5.5 hours, about 5.6 hours, about 5.7 hours, about 5.8 hours, about 5.9 hours, or about 6 hours. In some embodiments, the sufficient amount of time can be about 3 hours. In some embodiments, the sufficient amount of time can be about 3.5 hours. In some embodiments, the sufficient amount of time can be about 4 hours. In some embodiments, the sufficient amount of time can be about 4.5 hours. In some embodiments, the sufficient amount of time can be about 5 hours.
[0308] In some embodiments, the computational cluster can be formed when the CUs can associate with sample objects by diffusion of the CUs and the sample object in a medium comprising the presently described system. In some embodiments, the computational cluster can be formed when a force can be applied to a medium comprising the presently described system, such that the CUs and the sample objects can be placed in close proximity. In some embodiments, the force can be a centrifugal force. In some embodiments, the centrifugal force can be continuous. In some embodiments, the continuous centrifugal force can be applied by twisting the reaction tube around its z-axis between spin-downs, wherein such twisting of the reaction tubes can create a sliding effect along the reaction tube wall, promoting better interactions between Sample Objects and CUs. In some embodiments, the force can be a magnetic force. In some embodiments, the force can be an electrostatic force. In some embodiments, the computational cluster can be formed when the CU can be configured to respond to an SO present in a boundary layer around the cluster by producing another SO that can be not restricted to the boundary layer.
[0309] In some embodiments, the computational cluster can comprise a first CU, and the presently described system can further comprise a second CU that can be not localized to the cluster, wherein the second CU can be configured to degrade an SO produced by the first CU.
Logical Operator Modules
[0310] The CUs of the present disclosure can enable a rich set of behaviors. Standard mathematical notation can be introduced to precisely describe certain orchestrated functionalities of CU compositions. Functions, which can be made up in part with arithmetic operators and logical operators, can be encoded and evaluated by the CUs and signals that interact and selforganize within the medium of the present disclosure (i.e., Reaction Reagent + Reaction Medium). The input that drives the computation can be the sample objects derived from an input sample.
[0311] From the perspective of computing theory, the computation of the present disclosure can implement a rich set of operations that can include all forms of combinational logic and sequential logic. In other words, the fundamental configurations provided below can be the building blocks that can form a functionally complete system of Boolean operations. Moreover, the internal states of the CUs jointly make up the composite system state that can condition future Boolean operations for sequential logic functions.
[0312] The presently described systems and the methods of using the systems utilize Boolean logical operations. The logical operator modules of the present disclosure can act as logical circuits that can perform logical operations. A non-limiting exemplary logical operator module can receive one or more input signals (e.g., sample objects) and can generate one or more output signals e.g., output SOs). In some embodiments, a logical operator module can comprise one or more CUs. In some embodiments, a logical operator module can comprise at least one sample object and at least one CU. In some embodiments, a logical operator module can comprise at least one sample object and two or more CUs. In some embodiments, a logical operator module can generate one or more output signals. In some embodiments, the methods of the present disclosure can comprise at least one logical operator modules. In some embodiments, a logical operator module can comprise a YES gate, an AND gate, a NAND gate, an OR gate, a NOR gate, a XOR gate, a XNOR gate, a NOT gate, or any combination thereof.
[0313] In some embodiments, a logical operator module can operate as a YES gate, wherein the YES gate can comprise generating one or more output signals only when both a first CU and a second CU can interact with at least one sample object.
[0314] In some embodiments, a logical operator module can operate as an AND gate, wherein the AND gate can comprise generating one or more output signals only when both a first CU and a second CU can interact with at least one sample object.
[0315] In some embodiments, a logical operator module can operate as a NAND gate, wherein the NAND gate can comprise suppressing or diminishing one or more output signals when both a first CU and a second CU can interact with at least one sample object.
[0316] In some embodiments, a logical operator module can operate as an OR gate, wherein the OR gate can comprise generating one or more output signals when either a first CU or a second CU or when both the first CU and the second CU can interact with at least one sample object.
[0317] In some embodiments, a logical operator module can operate as a NOR gate, wherein the NOR gate can comprise generating one or more output signals when both the a CU and a second CU can be not interacting with at least one sample object.
[0318] In some embodiments, a logical operator module can operate as a XOR gate, wherein the XOR gate can comprise generating one or more output signals when either a first CU or a second CU but not both CUs can interact with at least one sample object.
[0319] In some embodiments, a logical operator module can operate as a XNOR gate, wherein the XNOR gate can comprise generating one or more output signals when either both a first CU and a second CU or when both the first CU and the second CU can interact with at least one sample object.
[0320] In some embodiments, a logical operator module can operate as a NOT gate, wherein the NOT gate can comprise suppressing or diminishing one or more output signals when a first CU can interact with at least one sample object. [0321] Negation can be an essential aspect of any general Boolean network since it can allow for recognition of absent SBEs. Negated implication can be a negation of all outputs so that an output can be true only if first input can be true and second input can be false. To compute a negated implication, the configurations can be further extended with signal degrading objects. In some embodiments, signal degradation can be achieved by target associated CUs. This can comprise secretion of signal degrading enzymes or display of signal degrading enzymes by target associated CUs.
[0322] More detailed information on configurations and operations of the computation comprising the presently described entities are described in International Patent Application No. PCT/US2019/50068 and U.S. Patent Publication No. US 2021-0319279, each of which are incorporated by reference in its entireties for all purposes.
MACHINE LEARNING METHODS OF THE DISCLOSURE
[0323] FIG. 6 is a non-limiting model architecture schematic for the design and training of transformer models for omics analysis, according to one or more embodiments herein. As shown in FIG. 6, a transformer-based model can be selected and trained for spatial profiling of singlecell transcriptomes. In some embodiments, other models can be used, such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), or graph neural networks (GNNs), depending on the specific requirements and characteristics of the data. In some embodiments, the training data construction process for the transformer-based model can involve several steps. In some embodiments, the GSE158055 dataset can be utilized to generate training dataset. As shown in Example 5, the GSE158055 dataset, comprising Unique Molecular Identifier (UMI) vectors from blood samples of COVID-19 patients, can be split into training, development, and test sets, for example in an 80/10/10 ratio. However, different split ratios can be used to optimize model performance. In some embodiments, each training input can be generated from one scRNA-seq sample to focus on biological differences. Generating training inputs from individual scRNA-seq samples can ensure that the model learns from the inherent biological variability present within each sample rather than confounding batch effects that can arise if multiple samples were mixed. This approach can allow the model to identify meaningful patterns and relationships between cells based on their gene expression profiles. Batches of training inputs can be created, each including UMI vectors of K cells, assignments of N computing units (CUs) to these cells, and UMI vectors generated for each CU. The assignment of CUs to cells can be represented by an adjacency matrix, ensuring random, independent, and equal probability assignment of UMIs to CUs. In some embodiments, different methods of CU assignment and sampling can be employed to explore alternative training dynamics. Validation and test inputs can be similarly constructed to maintain consistency.
Training Data Construction
[0324] In some embodiments, the construction of training data can involve generating training inputs that can include UMI matrices and can be associated with adjacency matrices to represent the assignment of computing units (CUs) to cells. For example, each training input can include a UMI matrix representing UMI counts for the number of interactions between a predetermined number of computing units (CUs) and a plurality of biological molecules. The training input can be associated with an adjacency matrix that represents known proximal relationships among CUs. The UMI matrices representing UMI counts for CUs can be used as inputs. In some embodiments, the adjacency matrix can indicate spatial relationships between CUs, which can then be used to derive spatial relationship between sample objects (e.g., cells). The use of binary values in the adjacency matrix can ensure clarity in defining these interactions. For example, the adjacency matrix can be constructed to maintain consistency across training, validation, and test inputs. This consistency can be crucial for ensuring that the model training, validation, and testing processes can be comparable and reliable. The random and independent assignment of UMIs to CUs can be employed to prevent bias and ensure that the training inputs are representative of the underlying biological data. In some embodiments, different methods of CU assignment and sampling can be explored to investigate alternative training dynamics. For example, weighted random assignment based on cell size or type, or stratified sampling to ensure diverse representation of cell types and states, can be utilized. These alternative methods can provide insights into the impact of different training dynamics on the model’s performance. Furthermore, validation and test inputs can be constructed similarly to the training inputs to maintain consistency. This can include using the same pre-processing steps for the raw data, ensuring random and independent assignment of UMIs to CUs, and maintaining the same structure for UMI matrices and the associated adjacency matrices. This consistency can be essential for accurately evaluating the model’s performance and ensuring its generalizability. In some embodiments, post-processing and quality control measures can be applied to validate the quality of the constructed training inputs. For example, checking the adjacency matrices for correctness in representing cell interactions, verifying the randomness and independence of UMI assignments, and ensuring the training inputs accurately reflect the biological and spatial characteristics of the dataset can be part of these measures. Additionally, in some embodiments, the constructed training, validation, and test datasets can be maintained in a structured format for easy access and reproducibility.
[0325] In some embodiments, for all training inputs, the numbers K and N can be predetermined. The K represents the number of sample objects, and the N represents the number of computing units (CUs). The assignment of computing units to sample objects can be represented by the adjacency matrix Y G
Figure imgf000102_0001
1 if computing unit i and computing/ can be assigned to the same sample object and
Figure imgf000102_0002
= 0 otherwise. This assignment can be calculated by selecting, with equal probability, exactly one sample object for each computing unit. For all z = 1,..., N, the UMI vector Xt E RG, where Xt can be equal to the number of UMIs corresponding to gene j attributed to computing unit i, can be constructed by sampling the UMI vector of the sample object to which the computing unit can be assigned. This can be done by randomly partitioning the UMIs of the sample object amongst the computing units assigned to the sample object, such that:
(a) each UMI can be attributed to a single computing unit,
(b) the attribution of any one UMI is independent of all other UMIs, and
(c) each UMI can have equal probability of being attributed to each relevant computing unit.
[0326] For simplicity, the UMI vectors of sample objects and computing units can be also expressed as matrices T E RKxG and X E RNxG, respectively.
[0327] In some embodiments, as described herein elsewhere, a training input can include three elements: (a) UMI vectors of K cells (sample objects) randomly selected from the training dataset, (b) the assignment of N computing units to K sample objects, and (c) UMI vectors generated for each computing unit. In some embodiments, in cell-cell interactions, the assignment process of computing units to sample objects can be different than described above, which can be no longer surjective. For all training inputs, the numbers K and N can be predetermined. Before assigning computing units to sample objects, the sample objects can be randomly connected with each other, described by an adjacency matrix representing these connections. The assignment of computing units to sample objects can be done in two steps: first, selecting one sample object for each computing unit with equal probability; and second, partitioning the computing units assigned to the same sample object into multiple sets, which can also be assigned to neighboring sample objects. The UMI vector for each computing unit can be constructed by sampling the UMI vector of the sample object to which the computing unit can be assigned, ensuring each UMI can be attributed to a single computing unit independently and with equal probability. For simplicity, the UMI vectors of sample objects and computing units can be also expressed as matrices. Similarly, validation and test inputs can be constructed from their respective datasets to maintain consistency and reliability in evaluating the model’s performance. [0328] In some embodiments, for all training inputs, the numbers K and N can be fixed. Prior to the assignment of CUs to sample objects, the sample objects can be randomly connected with each other. This can be described by the adjacency matrix W E RKxK, where the off-diagonal elements W( = 1, i > j, with probability a > 0. The assignment of computing units to sample objects can be described by the adjacency matrix Y E RNxN where Yt j — 1 if computing unit i and computing] can be assigned to the same sample object and Y^ — 0 otherwise. The assignment can be calculated in two steps. First, with equal probability, exactly 1 sample object can be selected for each computing unit. Second, CUs assigned to the same sample object can be partitioned randomly into b + 1 sets, where b can be the number of neighboring sample objects, and the CUs in the ith set, i = 1, b+1, can also be assigned to the zth neighbor. For all i =
1,..., N, the UMI vector Xt E RG, where Xi can be equal to the number of UMIs corresponding to gene j attributed to computing unit z, can be constructed by sampling the UMI vector of the sample object to which the computing unit can be assigned. This can be done by randomly partitioning the UMIs of the sample object amongst the computing units assigned to the sample object, such that (a) each UMI can be attributed to a single computing unit, (b) the attribution of any one UMI can be independent of all other UMIs, and (c) the probability of a UMI being attributed to a CU can be equal to , where c can be the number of sample objects to which the CU can be assigned and a can be the normalization factor. For simplicity, the UMI vectors of sample objects and CUs can be also expressed as matrices T E RKxG and X E RNxG, respectively. Model Architecture
[0329] In some embodiments, the model architecture for subcellular resolution by associative lattice (Sureal) can include multiple components designed to process and analyze the UMI matrices and adjacency matrices associated with the training inputs. The architecture can utilize transformer blocks, sigmoid activation functions, and binary cross-entropy loss functions to achieve high accuracy in recognizing biological molecule patterns and spatial relationships.
Input Layer
[0330] The input layer can accept UMI matrices X and optionally estimated adjacency matrices Y as inputs. In some embodiments, the adjacency matrix estimate can be omitted. The UMI matrix X 6 RNxG, can represent the UMI counts for the interactions between computing units (CUs) and biological molecules, while the adjacency matrix Y can capture the proximal relationships among CUs.
Embedding Layer
[0331] In some embodiments, the model can include an embedding layer to transform the high-dimensional UMI vectors into lower-dimensional representations. This step can involve the use of linear transformations or other dimensionality reduction techniques.
Transformer Blocks
[0332] The core of the model architecture can consist of multiple transformer blocks. Each transformer block can include self-attention mechanisms and feed-forward neural networks. The self-attention mechanisms can allow the model to weigh the importance of each element in the input sequence, capturing the complex dependencies between CUs and their associated biological molecules. For example, each transformer block can compute attention scores based on the UMI matrix and the adjacency matrix, facilitating the recognition of spatial relationships and molecular interactions.
Activation Functions
[0333] The model can utilize sigmoid activation functions to introduce non-linearity into the network. The sigmoid function can map the input values to the range (0, 1), making it suitable for binary classification tasks, such as determining the presence or absence of specific molecular interactions. Output Layer
[0334] The output layer can produce the final predictions based on the processed input data. In some embodiments, the output can be a binary classification indicating whether a specific interaction or spatial relationship exists. For example, the output layer can use a sigmoid activation function to generate probabilities, which can then be thresholded to produce binary outcomes.
Loss Function
[0335] The model can be trained using a binary cross-entropy loss function. This loss function can measure the difference between the predicted probabilities and the actual binary labels, guiding the optimization process to minimize prediction errors.
[0336] In some embodiments, the loss function can be defined as:
Figure imgf000105_0001
Post-Processing
[0337] In some embodiments, post-processing steps can be applied to the output of the machine learning models to improve the accuracy and interpretability of the results. These steps can validate the model’s predictions and make the findings accessible and useful for further biological analysis. The model output can be further processed to obtain estimates of sample object UMI vectors. The output matrix can be used to partition the set of computing units into a number of disjoint sets. This partition can be represented by the estimated adjacency matrix. For instance, the partitioning can be performed by agglomerative clustering with full linkage, where the number of clusters can be either known a priori or determined by optimization of a secondary criterion, such as the average number of computing units per cluster. Following clustering, the UMI vectors of computing units in each cluster can be summed into single vectors. A relative reconstruction error (RRE) can be calculated as a measure of the reconstruction quality. Post-processing steps can include verification of the accuracy of the model’s predictions. This can involve comparing the predicted interactions with known biological data or experimental results to assess the model’s performance. In some embodiments, statistical measures such as precision and recall can be used to evaluate the accuracy of the predictions. These metrics can help in identifying any discrepancies and refining the model further. In some embodiments, post-processing can involve consistency checks to verify that the model’s predictions can be reproducible and stable across different datasets or experimental conditions. Post-processing can culminate in the generation of comprehensive reports that document the findings. These reports can include detailed descriptions of the predicted interactions and clusters. In some embodiments, the reports can also include annotations and interpretations of the findings, providing context and insights into the biological significance of the results. The processed data can be stored in a structured format to ensure easy access and reproducibility. This can include maintaining records of the post-processing steps, parameters used, and the resulting outputs . In some embodiments, the data can be made accessible through online platforms or databases, facilitating further analysis and collaboration among researchers. By employing these post-processing techniques, the framework can enhance the accuracy and interpretability of subcellular interactions and spatial relationships, providing valuable insights into the underlying biological processes. The purpose of the clustering can be to obtain more relevant biological information regarding the sample objects. For example, relative gene expression of two or more genes can change when UMIs from computing units belonging to a single cluster can be considered in aggregate. In some embodiments, other ways of interpreting the data can include predicting the number of cells, as the number of cells can also be an output of the model. Training the model can provide both regression for the number of clusters and predictions for the number of clusters. Therefore, the output can include the model output and the number of cells. This post-processing approach is illustrated in Example 6.
[0338] In some embodiments, the model output T can be further processed to obtain estimates of sample object UMI vectors T E RK*G . First, the output matrix F can be used to partition the set of computing units into K number of disjoint sets. This partition can be represented by the estimated adjacency matrix Y E RN xN . For instance, the partitioning can be performed by agglomerative clustering with full linkage, where the number of clusters can either be known a priori or determined by optimization of a secondary criterion, such as the average number of computing units per cluster. Following clustering, for all z = 1,..., K, the UMI vectors of computing units in cluster z can be summed into a single vector Tt. The relative reconstruction error (RRE) can be calculated as a measure of the reconstruction quality.
Figure imgf000107_0001
[0339] In some embodiments, the model output can be further processed to estimate the relationships between sample objects. Initially, the set of computing units (CUs) can be partitioned into a predetermined number of disjoint sets using clustering techniques. This partitioning can be represented by an estimated adjacency matrix. The clustering can be performed by agglomerative clustering, where the number of clusters can be determined by a predefined criterion such as the number of computing units per cluster. A heuristic approach can then be used to compute the estimated sample object adjacency matrix, which can indicate the relationships between these disjoint sets. The adjacency matrix can represent the likelihood that sets of computing units are assigned to the same sample object based on their relationships. To evaluate the accuracy of these predictions, a ROC curve can be generated, indicating the model’s performance in predicting connections between sample objects. The optimal threshold for maximizing the accuracy of these predictions can be identified, enhancing the model’s ability to accurately reflect the underlying biological interactions.
[0340] A model output Y can be further processed to obtain the estimated Sample Object adjacency matrix. First, the output matrix Y can be used to partition the set of computing units into K number of disjoint sets. This partition can be represented by the estimated adjacency matrix Y E RN xN . For instance, the partitioning can be performed by agglomerative clustering with full linkage, where the number of clusters can be either known a priori or determined by optimization of a secondary criterion, such as the number of computing units per cluster. A simple heuristic can then be used to compute the estimated Sample Object adjacency matrix W E RRXR . For the rth and /th sets of the K number of disjoint sets, VFi ;- = 1
Figure imgf000107_0002
where the summation can be over all Computing Units lt in the ith set and Computing Units rrij in the /'th set, and /? > 0. The ROC curve of the prediction of edges between sample objects can then be analyzed, with a mark showing the optimal value of ? maximizing the accuracy of predicting whether a given pair of sample objects are connected or not.
As described in Example 6, a model output Y can be further processed to obtain the estimated sample object adjacency matrix. First, the output matrix Y can be used to partition the set of computing units into K number of disjoint sets. This partition can be represented by the estimated adjacency matrix. For instance, the partitioning can be performed by agglomerative clustering with full linkage, where the number of clusters can be either known a priori or determined by optimization of a secondary criterion, such as the number of computing units per cluster. A simple heuristic can then be used to compute the estimated sample object adjacency matrix. The ROC curve of the prediction of edges between sample objects can then be analyzed, with a mark showing the optimal value of maximizing the accuracy of predicting whether a given pair of sample objects are connected or not.
Model Performance Evaluation and Validation
[0341] In some embodiments, data validation steps can be incorporated to ensure the accuracy and reliability of the machine learning model’s performance. Data validation can involve checking the integrity, consistency, and accuracy of the data used for training, validation, and testing the model. The distribution of UMI counts per sample object across the validation dataset can be examined, with a median UMI count per sample object and a total number of cells. This distribution can provide insight into the overall molecular content captured within the dataset, which is essential for assessing the robustness and representativeness of the model.
Additionally, the distribution of UMIs allocated to each computing unit across the validation inputs can be analyzed. The significantly lower number of UMIs per computing unit (with 95% of computing units having less than 240 UMIs) compared to the number of UMIs per sample object can indicate the low informational content and sparsity of the computing unit UMI vectors. This sparsity can impact the model’s ability to accurately capture and reconstruct the underlying biological information. The distribution of the relative reconstruction error (RRE) per sample object across the validation inputs can also be evaluated. The distribution can be bimodal, with a percentage of the sample objects being reconstructed perfectly (RRE = 0) and the remaining sample objects reconstructed with a median RRE. Sample objects exhibiting RRE greater than 50% can be considered failed reconstructions for practical purposes. This metric can be vital for assessing the reconstruction quality and identifying potential issues in the model’s performance. Example 5 provides a visual representation of the data validation results, illustrating the distribution of UMI counts, UMIs per computing unit, and relative reconstruction error (RRE) across the validation dataset. By incorporating these data validation steps and visualizing the results as shown in FIG. 7, the framework can ensure that the model’s predictions are accurate, reliable, and provide valuable insights into the underlying biological processes. The detailed examination of UMI distributions and reconstruction errors can help in fine-tuning the model and addressing any potential issues in the data or the modeling process.
Cell Type Confusion Matrix
[0342] FIG. 9 provides an exemplary visualization of the cell type confusion matrix associated with the input data from, for example, the dataset GSE158055, which include a broad set of cell annotations, allowing for a comprehensive analysis of cell type classification accuracy. The confusion matrix can indicate the probability that a computing unit from a sample object on the y-axis is estimated to be from a sample object on the x-axis. Diagonal components of the matrix can list the probabilities that the origin of computing units is correctly estimated within the given cell type. High values along the diagonal indicate a high accuracy of correctly identifying the cell type of the computing units, reflecting the model’s ability to accurately classify cell types based on the input data. Non-zero, off-diagonal components of the confusion matrix illustrate mismatches between cell types. For example, similarities between certain cell types, such as NK cells and CD8 T cells, can lead to misclassifications, due to errors in the estimated CU adjacency matrix Y, which can be represented by the off-diagonal values. These mismatches highlight areas where the model can need improvement in distinguishing between similar cell types. By analyzing the confusion matrix, researchers can identify specific cell types that are often confused with each other and focus on refining the model to improve classification accuracy. The matrix can provide a clear and quantifiable measure of the model’s performance in classifying cell types, which is crucial for validating the accuracy and reliability of the model’s predictions.
[0343] FIG. 13 is another exemplary confusion matrix, listing statistics of computing unit associations for samples comprising B lymphocytes. The results shown in Example 7 illustrate that computing units in a highly connected cluster can be likely associated with the same cell, validating the model’s accuracy in predicting cell-cell interactions. They also demonstrate how the model can use mRNA transcript data to associate computing units with their respective sample objects, thereby providing insights into cell-cell interactions and improving the estimation of mRNA transcripts for individual cells in the sample. SYSTEMS OF THE DISCLOSURE
[0344] Provided herein is a system for biological computing, including: (a) a plurality of sample objects, wherein a sample object of the plurality of sample objects can include a plurality of biological molecules; and (b) a plurality of CUs, wherein a CU of the plurality of CUs can be engineered to interact with the sample object and display a first set of at least one SBE upon interacting with the sample object. In some embodiments, a first CU of the plurality of CUs can interact with a first sample object of the plurality of sample objects a first number of times, the first CU can interact with a second sample object of the plurality of sample objects a second number of times, a second CU of the plurality of CUs can interact with the first sample object a third number of times, and the second CU can interact with the second sample object a forth number of times. In some embodiments, the first number and the third number can be indicative of a characteristic of the first sample object, and the second number and the forth number can be indicative of a characteristic of the second sample object.
[0345] In some embodiments, the CU of the plurality of CUs can be further displays a second set of at least one SBE. In some embodiments, the system can further include (c) a plurality of molecular tags, which is described in detail above. In some embodiments, the plurality of molecular tags can associate with the second set of at least one SBE. In some embodiments, a molecular tag of the plurality of molecular tags can include a hash element and a priming element. In some embodiments, the priming element can be a random N-mer. In some embodiments, the molecular tag can further include a recognition element. In some embodiments, the molecular tag can further include a sequencing element. In some embodiments, the molecular tag can further include a unique protein binding domain. In some embodiments, the unique protein binding domain can be an antibody binding domain. In some embodiments, the antibody binding domain can be protein G. In some embodiments, the system can further include a strand displacing polymerase. In some embodiments, the CU of the plurality of CUs can be engineered to display the first set of at least one SBE and the second set of at least one SBE. In some embodiments, a first molecular tag of the plurality of molecular tags can associate uniquely with a first SBE of the second set of at least one SBE, and a second molecular tag of the plurality of molecular tags can associate uniquely with a second SBE of the second set of at least one SBE [0346] Also provided herein is a system for biological computing of the presently disclosed methods, the system comprising: (a) at least one sample object derived from an input sample; (b) a Reaction Reagent comprising at least one computing unit (CU) configured to interact with the at least one sample object; (c) a Reaction Medium; and (d) a readout reagent comprising a reporter entity and a reporter buffer. In some embodiments, the system can further comprise one or more evaluation strips and a plate adopter. In some embodiments, the system can further comprise one or more 96-well plates and one or more microcentrifuge tubes. In some embodiments, the system can further comprise a thermocycler. In some embodiments, the system can further comprise a plate reader.
[0347] In some embodiments, reactions can be analyzed separately or in bulk on any plate reader with readout reagent (e.g., luminescence) functionality. In some embodiments, the plate reader with luminescence functionality can have sensitivity about < 100 amol ATP. In some embodiments, the read setting of the plate reader with luminescence functionality can be endpoint/kinetic read type, luminescence fiber optics type, 255 gain, 1 s integration time, autoadjust or maximum read height, custom (defined by user) plate type, and 5 reads (average used as the final readout values).
[0348] In some embodiments, the presently described systems can maintain single target sensitivity in the presence of high background. In some embodiments, the presently described systems can exhibit exceptional false positive rate characteristics. In some embodiments, the presently described system can be applied in targeted sequestering of nucleic acids for multiomic data collection. In some embodiments, the presently described system can be used for inducible targeted sequestering of RNA on computing units. See Example 2. In some embodiments, the presently described system can be used for hashed transcriptomic readout of target objects. See Example 3
[0349] The present disclosure implements a computing device. The presently disclosed methods can be organized in a computing architecture that can cast broad terms from computer science into the device domain. Standard computer architectures (e.g., von Neumann or Harvard architectures) are well known and widely used to organize operations of stored-program electronic digital computers. More detailed information of the biological computing architecture and its implementation comprising the presently described entities are described in International Patent Application No. PCT/US2019/50068 and U.S. Patent Publication No. US2021-0319279, each of which are incorporated by reference in its entireties for all purposes.
[0350] Also provided herein is a system for spatial profiling an input sample, the system including: (a) the input sample comprising a plurality of sample objects comprising a plurality of biological molecules; (b) a plurality of CUs, wherein each CU of the plurality of CUs can be engineered to display a first set of at least one surface-bound entity (SBE) and a second set of at least one SBE on its surface; (c) a plurality of molecular tags; and (d) a computer implemented method, which is described in detail supra. In some embodiments, the system can further include a sequencing method.
Object Clustering and Signal Evaluation
[0351] In some embodiments, a clustering reaction can be used to ensure SBE recognition can be completed for biological computing. In various embodiments, object sizes (e.g., cellular CUs) can be in the order of micrometers. At these scales, exogenous forces can be necessary to achieve sufficient mixing of objects. For example, a 1 pm particle can diffuse 1000 times slower than a 1 nm particle. The clustering reaction can use other mechanisms besides diffusion to promote interaction between SBEs. At the same time, the clustering reaction must minimize the effects that these mechanisms have on signaling. Forced interaction between objects can resemble object co-localization, which can be the basis for most computational operations. Hence, the clustering reaction must also implement mechanisms that prevent signaling between objects.
[0352] The clustering method of the present disclosure can simultaneously force interaction and block signaling. During the clustering reaction, system objects (i.e., sample objects and CUs) can be. Combination of objects can be done all at once or progressively. In some embodiments, the reaction can commence with resuspension of objects in the medium of the present disclosure, which can be optimized for specific clustering (e.g., low viscosity, high ion content, containing blocking agents or solubility agents). The medium can also contain signal blockers (e.g., binding compounds or signal degrading enzymes) or metabolic suppressors (e.g., translation inhibitors). Alternatively, the medium can be depleted of essential metabolites (e.g., amino acids, carbon sources) required for high levels of signal production. In some embodiments, the reaction can proceed by forcing object interaction (e.g., hydrodynamic mixing, electromagnetic manipulation, mechanical compression). The choice of mechanism can depend on object properties. Hydrodynamic mixing can be appropriate in mixtures comprising objects of various masses or size. Electromagnetic manipulation can be applicable in mixtures where target objects or CUs can be magnetizable. Objects can be magnetized by linkage (e.g., covalent or non-covalent bonding) to magnetizable entities or by loading (e.g., by electroporation, diffusion, carrier particles) the objects with magnetizable particles. Mechanical compression (e.g., by centrifugation) can require no special properties but can be more prone to erroneous processing. In all cases, the clustering reaction can be performed in a one-time batch process, continuously throughout the computing process (e.g., in a reactor), or repeated at multiple times during the computing process. In each execution, the results of the reaction can be clusters that can co-localize CUs.
[0353] In some embodiments, the signal production can be binary, that is, either a signal can be transmitted or signal can be not transmitted. In some embodiments, the signaling processes e.g., signal production, signal degradation, signal recognition, state transition, etc.) can be mediated within a reaction optimized for signal evaluation. The evaluation reaction can initiate or terminate signal production and regulate chemical or spatial conditions. The primary goal can be to protect cluster integrity and to ensure completion of relevant physiological and enzymatic processes.
[0354] In some embodiments, the evaluation and clustering reactions can occur simultaneously. In some embodiments, the evaluation reaction can be explicitly initiated following the clustering reaction. In such cases, the two reactions can share the same medium but remain separate through changes in temperature, object density, or illumination. In some embodiments, separation of the two stages can include a medium exchange. Whereas the medium of the clustering reaction can be optimized for interaction (e.g., low viscosity, increased ion content, added blocking entities and solubility agents), the medium of the evaluation reaction can include enriched signaling precursors. The medium can be enriched in metabolites that can support signal production and signal recognition. For instance, synthetic growth media can be optimized for CU metabolisms and enriched with inducers of wild type or recombinant cellular systems. In addition, the properties of the media (e.g., pH, salt content) can be optimized for buffering and for any extracellular enzymatic processes (e.g., proteolysis, hydrolysis).
I l l [0355] Spatial arrangement of the evaluation reaction can affect diffusion. Convective diffusion can weaken signal strength requiring more sensitive SBEs that can be more error prone. Hence, the evaluation reaction can take precautions to limit object motion. For instance, medium viscosity can be increased by addition of certain polysaccharides or other polymers. In addition, the additives can be cross-linked, forming a structured matrix that can trap and immobilize system objects. The result can be greater signal accumulation and decreased signal transmission between clusters.
[0356] In some embodiments, according to any of the systems described herein, the system can further comprise an agent that can increases the viscosity of a medium comprising the system. In some embodiments, the agent can be a polymer. In some embodiments, the polymer can be a polysaccharide. In some embodiments, the agent can be cross-linked to form a matrix configured to immobilize one or more of the system components.
KITS OF THE DISCLOSURE
[0357] Provided herein is a kit for biological computing, including (a) a plurality of CUs, wherein a CU of the plurality of CUs can be engineered to interact with the sample object and display a first set of at least one SBE upon interacting with the sample object; and (b) an instruction for use of the kit. In some embodiments, the kit can further include (c) a plurality of molecular tags, which is explained in greater detail above.
[0358] Provided herein is a kit for biological computing, including (a) a plurality of CUs, wherein a CU of the plurality of CUs can be engineered to interact with the sample object and display a first set of at least one SBE upon interacting with the sample object; (b) one or more reaction tubes including Reaction Reagent and one or more master tubes including Reaction Medium; and (c) an instruction for use of the kit.
[0359] In some embodiments, the kit can also include Reaction Reagent and one or more master tubes containing Reaction Medium.
[0360] In some embodiments, the CU of the plurality of CUs can be further engineered to display a second set of at least one SBE. In some embodiments, a molecular tag of the plurality of molecular tags can include a hash element and a priming element. In some embodiments, the priming element can be a random N-mer. In some embodiments, the molecular tag can further include a recognition element. In some embodiments, the molecular tag can further include a sequencing element. In some embodiments, the molecular tag can further include a unique protein binding domain. In some embodiments, the unique protein binding domain can be an antibody binding domain. In some embodiments, the antibody binding domain can be protein G. In some embodiments, the kit can further include a strand displacing polymerase.
[0361] Also provided herein is a kit that contains any of the above described system components and compositions described herein. Provided herein is a kit for biological computing, the kit comprising: (a) one or more reaction tubes comprising Reaction Reagent; (b) one or more master tubes comprising Reaction Medium; (c) a readout reagent comprising a reporter entity and a reporter buffer; and (d) an instruction for use of the kit. In some embodiments, the kit can further comprise one or more evaluation strips and a plate adapter. In some embodiments, the kit can further comprise a blank sample.
[0362] In some embodiments, the blank sample can be prepared with the Reaction Medium and the Reaction Reagent of the present disclosure and can constitute a reaction tube with resuspension buffer used in a place of a test sample. In some embodiments, sample readout signal can be normalized using the following formula:
Signal — Blank SNR = — - -
Blank
[0363] In some embodiments, the plate adapter can accommodate up to 12 evaluation strips at once. In some embodiments, prior to first use, the plate adapter can be defined as a standard 96- well plate with adjusted parameters of 22000 pm plate height and 5000 pm well diameter.
[0364] In some embodiments, quantity of Reaction Reagent in the reaction tube can determine the processing power of a reaction. In some embodiments, the active component of the presently described kit is supplied in dried form in the reaction tube.
[0365] In some embodiments, the kit can contain evaluation strips. In some embodiments, the evaluation strips can be a set of standard PCR strips. In some embodiments, up to 8 tubes can be used to analyze a reaction. In some embodiments, black and white PCR strips can be alternatively used, but read setting may need to be adjusted to account for the shift in signal intensity.
[0366] Also provided herein is a kit for spatial profiling an input sample comprising a plurality of sample objects comprising a plurality of biological molecules, the kit comprising: (a) a plurality of CUs, wherein each CU of the plurality of CUs can be engineered to display a first set of at least one surface-bound entity (SBE) and a second set of at least one SBE on its surface; (b) a plurality of molecular tags; and (c) an instruction for use of the kit. In some embodiments, the kit can further include an instruction for analysis using a computer implemented method, which is described in detail supra.
[0367] The instructions for practicing the presently described methods can be generally recorded on a suitable recording medium. For example, the instructions can be printed on a substrate, such as paper or plastic, etc. The instructions can be present in the kits as a package insert, in the labeling of the container of the kit or components thereof (/ ., associated with the packaging or sub-packaging), etc. The instructions can be present as an electronic storage data fde present on a suitable computer readable storage medium, e.g. CD-ROM, diskette, flash drive, etc. In some instances, the actual instructions are not present in the kit, but means for obtaining the instructions from a remote source (e.g. via the Internet), can be provided. An example of this embodiment can be a kit that includes a web address where the instructions can be viewed and/or from which the instructions can be downloaded. As with the instructions, this means for obtaining the instructions can be recorded on a suitable substrate.
EXAMPLES
[0368] These examples are provided for illustrative purposes only and not to limit the scope of the claims provided herein.
EXAMPLE 1. Application of Targeted Sequestering of Nucleic Acids for Multiomic Data Collection
[0369] This experiment is performed for targeted transcriptome readout with the presently described systems and methods. A non-limiting exemplary work-flow of this experiment is described in FIG. 1.
[0370] Yeast cells are engineered to comprise at least one logical operator module as presently described herein, wherein the at least one logical operator module recognizes a given marker combination. The yeast cells are further engineered to conditionally display either an RNA or DNA binding protein based on the activation of the at least one logical operator module. One or more reaction tubes comprising Reaction Reagent is prepared with the engineered yeast cells and an input sample (e.g., human cells) with spiked-in targets (with a known marker combination). The one or more reaction tubes are incubated with Reaction Medium and FIG. 1 (see Incubation step), to enable activation of the at least one logical operator module on the target. Activation of the at least one logical operator module triggers expression and display of RNA/DNA binding protein on the engineered yeast cells interacting with the targets, as shown in FIG. 1 (see Target specific display step). The input sample cells are then lysed to release nucleic acid into the medium. Genomic DNA is further enzymatically fragmented, and the reaction is incubated in a crowding agent rich environment that enables local binding of nucleic acids to the respective binding proteins, as shown in FIG. 1 (see Lysis of target cell & Incubation step). The reaction is washed to dilute out unbound nucleic acids, as shown in FIG. 1 (see Wash out unbound step). If the bound nucleic acids are RNA molecules, the RNA molecules are reverse transcribed into DNA, as shown in FIG. 1 (see Reverse transcription step). Multiple different readouts can be produced with a plate reader, such as qPCR or NGS, as shown in FIG. 1 (see Readout step).
EXAMPLE 2. Inducible Targeted Sequestering of RNA on Computing Units
[0371] This experiment is performed to validate ability of RNA binding protein displaying yeast to bind RNA substrate with the presently described systems and methods. A non-limiting exemplary work-flow of this experiment is described in FIG. 2
[0372] Yeast cells are engineered to conditionally express and display an RNA binding protein tethered to the cell wall. Induced cells are compared against non-induced cells. Parent strains are used as controls throughout the experiment. The cells are grown according to standard laboratory practices in a rich medium and diluted to exponential phase. The cells are standardized to a known concentration and mixed with RNA substrate containing recognized epitope. The mixtures are incubated in a binding buffer so that RNA substrate is able to attach to the binding protein. The cells are washed multiple times with a washing buffer to dilute out unbound RNA substrate. The samples are spun down so that the cells are retained while molecules are diluted. The samples that are not spun down are used as a dilution control. The amount of cells left after the washing steps is quantified through an enzyme reporter to normalize RNA counts. Reverse transcriptase is added so that the bound RNA substrate is reverse transcribed into its DNA equivalent. A qPCR system is used to quantify the amount of DNA equivalent present in each reaction. The sample containing cells expressing RNA BP shows significantly more retained RNA substrate than other samples. EXAMPLE 3. Hashed Transcriptomic Readout of Target Objects
[0373] This experiment is performed to demonstrate hashed transcriptomic readout of target objects. A non-limiting exemplary work-flow of this experiment is described in FIGS. 3A-3E and 4A-4C, which can be performed after the lysis of target cell & incubation step of FIG. 1. [0374] CUs are engineered to comprise at least one logical operator module recognizing a given marker combination, as described above. CUs are further engineered to conditionally display an mRNAcap binding protein (RBP) (e.g., a IF4E) based on the logical operator module activation and to conditionally display a specific ssDNA binding protein (DBP) (e.g., a HUH- tag). Different CU types display a different DBPs specific towards different ssDNA sequences. [0375] A reaction with the Reaction Reagent is prepared with the CUs and an input sample with spiked-in sample objects (with the recognized marker combination). The reaction is incubated in Reaction Medium to enable activation of the logical operator modules on sample objects. The activation triggers expression and display of SBEs, such as RNA binding proteins and DNA binding proteins, on the CUs interacting with the sample objects. The sample objects are then lysed, which leads to localized mRNA release into the Reaction Medium, where localization is promoted by crowding agent (e.g., a hydrogel) in the Reaction Medium.
Molecular tags are mixed into the reaction, wherein each molecular tag includes a unique recognition element (e.g., ssDNA binding sequence), a unique hash element, and a priming element (e.g., dT_N). The priming element is optimized for hybridization of polyA tails of mRNA molecules for non-specific retention. The reaction is washed to dilute out unbound biological molecules. Co-localization of mRNAs and molecular tags on the CU surface results in hybridization of polyA tail to the dT_N priming element. First strand synthesis proceeds using standard methods. The synthesized tagged-cDNA is subsequently released and analyzed/ sequenced using NGS or similar methods. Computational algorithms (ML/optimization) are used to estimate transcriptomic profile of individual target objects as well as distributions of CU types on individual target objects.
EXAMPLE 4. Hashed Proteomic Readout of Target Objects
[0376] This experiment is performed to demonstrate hashed proteomic readout of target objects. A non-limiting exemplary work-flow of this experiment is described in FIGS. 4E-4F. [0377] Computing units (CUs) are brought into contact with sample objects, and interactions occur between the CUs and the sample objects. Subsequently, the sample objects are lysed, leading to the release of biological molecules (e.g., proteins). Unbound proteins are then removed through a washing step. Following this, molecular tag sequencing elements featuring recognition antibodies are introduced and come into contact with the sample. Any unbound sequencing elements with recognition antibodies are subsequently washed away. Specific priming of the antibody elements to the sequencing elements is achieved, and this is followed by the extension of the sequencing elements with antibody elements. DNA is quantified using established molecular methods. Computational algorithms (ML/optimization) are used to estimate proteomic profile of individual target objects as well as distributions of CU types on individual target objects.
EXAMPLE 5. Design and Training of Machine Learning Algorithms for Subcellular Resolution by Associative Lattice
[0378] This experiment describes the design and training of the large language models towards the analysis of single cell transcriptomes.
[0379] This Example employed spatial fluctuations in the number of biological molecules, such as those caused by Brownian motion and cell signaling, within subcellular portions of the sample to evaluate CU association by proximity. This proximity information was subsequently utilized to compute target-specific output signals associated with a plurality of sample objects. This included analyzing sample object interactions, such as cell-cell interactions, and determining the distribution of biological molecules within specific sample objects, achieving subcellular resolution of entities like mRNA transcripts. By leveraging the spatial fluctuations, this Example provides enhanced accuracy and utility in computing these output signals across various configurations and disease contexts.
Dataset
[0380] The dataset GSE158O55 of UMI (Unique Molecular Identifier) vectors was used for model training. This dataset corresponding to a large study of blood samples collected from a COVID-19 patient cohort comprises 1 .4 million cells and 28,000 features across 284 samples measured from 205 patients. This dataset includes a broad set of cell annotations, allowing for a comprehensive analysis of cell type classification accuracy. Training Dataset Construction
[0381] The GSE 158055 dataset was split into individual scRNA-seq samples and then organized randomly into train, dev and test datasets with zero intersection of patients between individual sets. The split ratio was chosen roughly 80/10/10 resulting in 1,168,236 train cells, 128,084 dev cells and 116,358 test cells. Across all samples, top G genes with least sparsity were retained for further processing as described below.
[0382] The UMI vector for cell k in the training dataset was denoted by Tk E RG, where Tk i was equal to the number of UMIs corresponding to the gene i detected in cell k. From the UMI vectors a batch of training inputs was generated forming a single training epoch. Each input was generated from a single scRNA-seq sample to ensure computation was based on biological differences and not batch-to-batch differences.
[0383] A training input included 3 elements: (a) UMI vectors of K cells (sample objects) randomly selected from the training dataset; (b) surjective assignment of N computing units to the K sample objects; and (c) UMI vectors Xt generated for each computing unit z = 1,..,N. These elements are described in more detail below.
[0384] For all training inputs, the numbers K and N were fixed. The assignment of computing units to sample objects was described by the adjacency matrix Y E RNxN ; where
Figure imgf000120_0001
if computing unit i and computing^ were assigned to the same sample object and Ytj = 0 otherwise. The assignment was calculated by selecting, with equal probability, exactly 1 sample object for each computing unit. For all i = 1,..., N, the UMI vector Xt E RG , where Xi was equal to the number of UMIs corresponding to gene j attributed to computing unit i, was constructed by sampling the UMI vector of the sample object to which the computing unit was assigned. This was done by randomly partitioning the UMIs of the sample object amongst the computing units assigned to the sample object, such that (a) each UMI was attributed to a single computing unit, (b) the attribution of any one UMI was independent of all other UMIs, and (c) each UMI had equal probability of being attributed to each relevant computing unit. For simplicity, the UMI vectors of sample objects and computing units were also expressed as matrices T E RKxG and X E RNxG, respectively. Similarly, validation and test inputs were constructed from the validation and test datasets.
[0385] The distribution of UMI counts per sample object across the validation dataset were examined, with a median UMI count per sample object of 1312 and a total number of cells being 128,000. This distribution provides insight into the overall molecular content captured within the dataset, which is essential for assessing the robustness and representativeness of the model. Additionally, the distribution of UMIs allocated to each computing unit across the validation inputs were analyzed. The median number of UMIs per computing unit was 85. The significantly lower number of UMIs per computing unit (with 95% of computing units having less than 240 UMIs) compared to the number of UMIs per sample object can indicate the low informational content and sparsity of the computing unit UMI vectors. This sparsity can impact the model’s ability to accurately capture and reconstruct the underlying biological information. The distribution of the relative reconstruction error (RRE) per sample object across the validation inputs was also evaluated. The median RRE per sample object was 0. The distribution was bimodal, with 56% of the sample objects reconstructed perfectly (RRE = 0) and the remaining sample objects reconstructed with a median RRE of 14%. Sample objects exhibiting RRE greater than 50% can be considered failed reconstructions for practical purposes. In this case, the proportion of failed reconstructions was approximately 3.5%. This metric is vital for assessing the reconstruction quality and identifying potential issues in the model’s performance.
[0386] FIG. 7 provides a visual representation of the data validation results, illustrating the distribution of UMI counts, UMIs per computing unit, and relative reconstruction error (RRE) across the validation dataset. Panel A shows the distribution of UMI counts per sample object, with a median of 1312 UMIs and a total of 128,000 cells, highlighting the variability and range of molecular content captured within each sample object. Panel B depicts the distribution of UMIs allocated to each computing unit, with a median of 85 UMIs per computing unit and 95% of computing units having fewer than 240 UMIs, illustrating the sparsity and low informational content of the computing unit UMI vectors compared to the sample objects. Panel C displays the distribution of RRE per sample object across the validation inputs, showing a bimodal distribution where 56% of sample objects are reconstructed perfectly (RRE = 0) and the remaining sample objects have a median RRE of 14%. Sample objects with RRE greater than 50% can be considered failed reconstructions, with approximately 3.5% of the sample objects falling into this category. This panel provides a clear visualization of the reconstruction quality and highlights areas where the model’s performance can be improved. By incorporating these data validation steps and visualizing the results as shown in FIG. 7, the framework ensures that the model’s predictions are accurate, reliable, and provide valuable insights into the underlying biological processes. The detailed examination of UMI distributions and reconstruction errors can help in fine-tuning the model and addressing any potential issues in the data or the modeling process.
Model Architecture
[0387] The model architecture is shown in FIG. 6. The input to the model was the measurement matrix X G RNxG, where N was the number of computing units and G was the number of genes. G was a constant for any given model while N could be arbitrarily chosen in inference mode.
[0388] The model first applied loglp transformation on the input data. Then linear dense layer was applied to reduce the dimensionality to RNxE where E was an embedding dimension. After dimension reduction, the tensor was passed through L stacked transformer blocks, each with H attention heads, a feed-forward dimension of F, and a ReLU activation function. Since no positional embedding was used, the model could handle inputs X G RNxG with arbitrary integer N. The last transformer block featured a single attention head. Its scaled attention logits were extracted, and, instead of applying softmax as in a standard transformer block, a sigmoid function was applied to each element. This resulted in an output matrix Y G [0,l]WxW, where each element Yitj indicated the relative likelihood that computing unit i and computing unit j were interacting with the same sample object.
[0389] The following parameter values were used herein.
G = 800
K = 10
N = 150 E = 300 F = 900 L = 30 H = 4
Training Process
[0390] The model was trained to match its outputs Y to the training input adjacency matrix Y. [0391] This problem can be viewed as an element-wise binary classification. Binary cross entropy was used as the loss function L:
Figure imgf000123_0001
[0392] Adam optimizer with learning rate le-4 was used to train the model. The model was trained for 110,000 steps and batch size 100.
Post Processins
[0393] The model output Y was further processed to obtain estimates of sample object UMI vectors T E RKxG . First, the output matrix Y was used to partition the set of computing units into K number of disjoint sets. This partition was represented by the estimated adjacency matrix Y RN XN For instance, the partitioning can be performed by agglomerative clustering with full linkage, where the number of clusters is either known a priori or determined by optimization of a secondary criterion, such as the [average] number of computing units per cluster.
Following clustering, for all i = 1,..., K, the UMI vectors of computing units in cluster i were summed into a single vector Tj. The relative reconstruction error (RRE) was calculated as a measure of the reconstruction quality.
£ £ _ sumi(abs(Ti-Ti')')
[0394] sumi(abs T )
[0395] Other ways - Number of cells is also an output of the model. Trainings will provide
[Regression] number of clusters and predictive number of clusters.
[0396] Output = Y + number of cells
[0397] FIG. 9 provides a detailed visualization of the cell type confusion matrix associated with the input data from the dataset GSE158O55. The confusion matrix (FIG. 9) indicates the probability that a computing unit from a sample object on the y-axis was estimated to be from a sample object on the x-axis. Diagonal components of the matrix list the probabilities that the origin of computing units was correctly estimated within the given cell type. High values along the diagonal indicate a high accuracy of correctly identifying the cell type of the computing units, reflecting the model’s ability to accurately classify cell types based on the input data. Non-zero, off-diagonal components of the confusion matrix illustrate mismatches between cell types. For example, similarities between certain cell types, such as NK cells and CD8 T cells, can lead to misclassifications, which are represented by the off-diagonal values. These mismatches highlight areas where the model needs improvement in distinguishing between similar cell types. By analyzing the confusion matrix, specific cell types that are often confused with each other can be identified, and the model can be refined to improve classification accuracy. The matrix provides a clear and quantifiable measure of the model’s performance in classifying cell types, which is crucial for validating the accuracy and reliability of the model’s predictions. FIG. 9 effectively illustrates the classification performance of the model, providing insights into the strengths and weaknesses of the cell type classification.
EXAMPLE 6. Design and Training of Machine Learning Algorithms for SUREAL Spatial
[0398] This experiment describes the design and training of large language models towards the analysis of physical interactions between single cells. This method builds on Example 5, wherein the method was described for the applications of single cell sequencing.
[0399] Example 5 assumed that each Computing Unit was assigned to a unique Sample Object. In cases where some Sample Objects are bound to each other, such assignment is not possible. In practice, most cases fall in this category, for example, as a result of incomplete dissociation or cell-cell interactions. For this purpose, the large language model was extended to consider Computing Units assigned to 2 or more Sample Objects.
Training Dataset Construction
[0400] A training input included 3 elements: (a) UMI vectors of K cells (sample objects) randomly selected from the training dataset; (b) assignment of N computing units to K sample objects; and (c) UMI vectors Xi generated for each computing unit i= 1,..,N. In this Example, the notable difference was in the assignment of Computing Units to Sample Objects, which was no longer suijective. These elements are described in more detail below.
[0401] For all training inputs, the numbers K and N were fixed. Prior to the assignment of Computing Units to Sample Objects, the Sample Objects were randomly connected with each other. This was described by the adjacency matrix W 6 RKxK, where the off-diagonal elements Wi;j = 1, i > j, with probability a > 0. The assignment of computing units to sample objects was described by the adjacency matrix Y E RNxr where Ytj = 1 if computing unit i and computing j can be assigned to the same sample object and Yt = 0 otherwise. The assignment was calculated in two steps. First, with equal probability, exactly 1 sample object was selected for each computing unit. Second, CUs assigned to the same sample object was partitioned randomly into b + 1 sets, where b was the number of neighboring sample objects, and the CUs in the ith set, z = 1, b+1, was assigned to the ith neighbor. For all z = 1,..., N, the UMI vector XL 6 RG, where Xtj was equal to the number of UMIs corresponding to gene j attributed to computing unit z, was constructed by sampling the UMI vector of the sample object to which the computing unit can be assigned. This was done by randomly partitioning the UMIs of the sample object amongst the computing units assigned to the sample object, such that (a) each UMI was attributed to a single computing unit, (b) the attribution of any one UMI was independent of all other UMIs, and (c) the probability of a UMI being attributed to a CU was equal to
Figure imgf000125_0001
, where c can be the number of sample objects to which the CU can be assigned and a was the normalization factor. For simplicity, the UMI vectors of sample objects and CUs were also expressed as matrices T E RKXG and X E RNxG, respectively.
[0402] Similarly, validation and test inputs were constructed from the validation and test datasets.
Model Architecture
[0403] The same model architecture was used with the following parameter values.
G = 800
K = 3
N = 90
E = 300
F = 900
L = 30
H = 4 a = 0.2
Post Processing
[0404] The model output Y was further processed to obtain estimates of sample object UMI vectors T E RKxG . First, the output matrix Y was used to partition the set of computing units into K number of disjoint sets. This partition was represented by the estimated adjacency matrix Y E RN xN . For instance, the partitioning can be performed by agglomerative clustering with full linkage, where the number of clusters can either be known a priori or determined by optimization of a secondary criterion, such as the average number of computing units per cluster. Following clustering, for all z = 1,..., K, the UMI vectors of computing units in cluster z can be summed into a single vector Tj. The relative reconstruction error (RRE) was calculated as a measure of the reconstruction quality.
Figure imgf000126_0001
[0405] The model output Y was further processed to obtain the estimated Sample Object adjacency matrix. First, the output matrix Y was used to partition the set of computing units into K number of disjoint sets. This partition was represented by the estimated adjacency matrix Y E RN XN por instancej the partitioning can be performed by agglomerative clustering with full linkage, where the number of clusters can be either known a priori or determined by optimization of a secondary criterion, such as the number of computing units per cluster. A simple heuristic can then be used to compute the estimated Sample Object adjacency matrix W E RKXK For the zth and jth sets of the K number of disjoint sets, Wi:j = 1 if Yii mj > P it,m where the summation was over all Computing Units lt in the ith set and Computing Units rrij in the jth set, and /? > 0. The ROC curve of the prediction of edges between sample objects was then analyzed, with a mark showing the optimal value of ? maximizing the accuracy of predicting whether a given pair of sample objects are connected or not. As shown in FIG. 11, left column includes a graph representation of the model output Y. Center and right columns include graph representations of the matrices W and W, respectively.
[0406] This post-processing approach corresponds to FIG 8. Cl -3. The visualization of the computing units in C1-C3 in FIG. 8 appears clearer and more organized compared to B1-B3, providing additional value by offering a more coherent representation of the biological data. This improved clarity in visualization can enhance the understanding of biological processes and the relationships between computing units and sample objects, leading to more accurate and insightful analyses.
[0407] A visualization of the model variables through different stages of the machine learning process was provided, wherein the model uses t-SNE projections and agglomerative clustering to illustrate the spatial relationships and assignments of computing units. Panels A1-A3 in FIG. 8 shows the example model input X projected by t-SNE, with cell types listed in the inset. This plot displays 150 points corresponding to 150 computing units assigned to 10 sample objects, indicated by the color coding. The projection reveals that no separated clusters are visible, indicating that the initial model input does not exhibit clear spatial separation. However, some colocalization of computing units from the same sample objects is present, with most sample objects spanning the full support of the plot. This visualization can help in understanding the initial distribution and relationship of computing units before any transformation can be applied by the model. Panels B1-B3 in FIG. 8 presents the example embeddings of the last transformer layer projected by t-SNE. In contrast to the initial input, the embeddings exhibit strong spatial separation that closely mirrors the sample objects. The t-SNE projection reveals distinct clusters, each corresponding to a different sample object, demonstrating the model’s ability to learn and represent the underlying structure and relationships within the data. This spatial separation indicates that the transformer layers effectively capture and encode the relevant features, facilitating accurate downstream analysis and predictions. Panels C1-C3 in FIG. 8 visualizes the estimated adjacency matrix Y following agglomerative clustering. A force layout can be used to indicate the relationships between computing units, where computing units appearing close together can be predicted to be assigned to the same sample object. This visualization can provide a clear representation of the clustering results, showing how the model groups computing units based on their learned relationships and interactions. The force layout can help in intuitively understanding the predicted assignments and the structural organization of the computing units within the sample objects. By incorporating these visualizations, FIG. 8 effectively illustrates the progression and transformation of model variables from the initial input through to the final clustering. The t-SNE projections and force layout can provide valuable insights into the model’s performance and its ability to capture and represent complex biological relationships. These visualizations can be crucial for validating the model’s predictions and ensuring that the learned representations are consistent with the underlying biological data.
EXAMPLE 7. Estimating mRNA Transcripts of a Cell in an Input Sample
[0408] This Example considers samples of 6 cells taken from patient derived tissue. The number of mRNA transcripts for each cell was taken from a publicly available dataset.
[0409] The number of mRNA transcripts bound to each computing unit was simulated by, first randomly associating each of the 300 computing units with one of the 6 cells, and second associating each transcript from each of the 6 cells with one associated computing unit. Hence, mRNA transcripts from one cell were thereby randomly distributed amongst the computing units associated with that cell. Lastly, each computing unit was associated with a unique molecular tag.
[0410] Next, the associations between the computing units and the cells were hidden. Then, mRNA transcript numbers associated with each computing unit were used to associate computing units with each other. For this purpose, the large language model described in Examples 5-6 was used. Select results of these associations are illustrated in FIG. 12, where nodes indicate computing units, edges indicate computed associations between computing units, and colors correspond to original associations with cells. FIG. 13 is the confusion matrix, listing statistics of computing unit associations for samples comprising B lymphocytes. These results show that computing units in a highly connected cluster were likely associated with the same cell, validating the model’s accuracy in predicting cell-cell interactions. This example demonstrates how the model can use mRNA transcript data to associate computing units with their respective sample objects, thereby providing insights into cell-cell interactions and improving the estimation of mRNA transcripts for individual cells in the sample.
EXAMPLE 8. Computing Unit Proximity Analysis
[0411] This Example is performed to demonstrate how the model can effectively use mRNA transcript data to associate computing units with their respective cells, both before and after postprocessing. It highlights the model’s ability to reveal cell-cell interactions and the distribution of mRNA transcripts within cells, thereby providing valuable insights into the complex relationships within the sample.
[0412] This Example considered three cells from a peripheral blood mononuclear cell (PBMC) sample bound to each other in different configurations. The number of mRNA transcripts for each cell was taken from a publicly available dataset.
[0413] The number of mRNA transcripts bound to each computing unit was simulated by, first randomly associating each of the 300 computing units with a subset of the 3 cells, and second associating each transcript from each of the 3 cells with one associated computing unit. Computing units associated with singleton cells were not associated with other cells. Computing units associated with non-singleton cells, i.e., cells bound to other cells, were with some probability also associated with one of the bound cells. Hence, mRNA transcripts from one cell were thereby randomly distributed amongst the computing units associated with that cell. Lastly, each computing unit was associated with a unique molecular tag.
[0414] Next, the associations between the computing units and the cells were hidden. Then, mRNA transcript numbers associated with each computing unit were used to associate computing units with each other. For this purpose, the large language model described in Examples 5-6 was used. Select results of these associations are illustrated in FIGS. 14-15, where nodes indicate computing units, edges indicate associations between computing units, and colors correspond to original associations with cells. In FIGS. 14-15, the bottom diagram indicates the true association of computing units, where computing units associated with the same cell are linked by an edge, and the top diagram indicates the computed associations.
[0415] FIG. 14 shows the cell-cell visualization before post-processing, illustrating the initial associations computed by the model, which indicate the relationships between computing units based on the mRNA transcript data. FIG. 15 shows the cell-cell visualization after postprocessing, illustrating the refined associations after applying the large language model and additional post-processing steps.
[0416] The results show that the computing unit associations were indicative of binding between cells as well as the distribution of mRNA transcripts within cells. The post-processing enhanced the accuracy of these associations, providing a clearer and more accurate representation of the underlying biological interactions.
EXAMPLE 9. Leukocyte mRNA Capture and Synthesis of cDNA
[0417] This Example demonstrates isolation of mRNA from leukocytes and synthesis of cDNA from the isolated mRNA.
[0418] Sample Preparation'. Cells were harvested from PBMCs or cell culture and resuspended to a concentration of 103 cells/ml in Resuspension Buffer. For each reaction, 100 pl of cell suspension was transferred into a 1.7 ml microcentrifuge tube.
[0419] Capsule Formation'. A thermoblock was preheated to 37 °C. The Encapsulating Reagent (a buffer containing computing units of the present disclosure, engineered to display human CD45 and covalently labeled with unique molecular tags comprising poly(dT) such that each computing unit is denoted by a molecular tag with a unique barcode) was equilibrated to room temperature for 5 minutes. The Encapsulating Reagent was briefly centrifuged at 100 * g for 5 seconds. 100 JJ.1 of the Resuspension Buffer (H2O, NaCl, KC1, Na2HPO4, KH2PO4, P407) was added and gently mixed by pipetting. The Encapsulation Reagent was placed on a thermoshaker at 1400 RPM, 37 °C for 5 minutes. 10 pl of the Encapsulation Reagent was transferred to each reaction for analysis. Each reaction was gently vortexed and placed into a fixed-angle centrifuge, orienting the hinge of the tube away from the center, and spun down at 200 x g for 1 minute. Without removing the tubes from the centrifuge, the tubes were twisted 180° until the hinge faced inwards to the center. The tubes were spun down nine more times in the same manner, rotating the tube 180° between each spin-down. At the end of the spinning, a small pellet was visible in each reaction tube. The pellets were gently dispersed with gentle pipetting in each tube.
[0420] Capsule Stabilization'. 10 pl of the Stabilization Buffer A (H2O, LiCl) was added to each reaction, and each reaction tube was gently vortexed and incubated at room temperature for 10 minutes. 10 pl of the Stabilization Buffer B (dimethylsulfoxide (DMSO), bissulfosuccinimidyl suberate (BS3)) was added to each reaction, and each reaction tube was gently vortexed and incubated at room temperature for 10 minutes.
[0421] Cell Lysis'. 400 pl of the Lysis Buffer was added to each reaction, and each reaction tube was rotated over the lid 3 times to mix and incubate at room temperature for 10 minutes. [0422] Wash. Each reaction was loaded to a spin filter and spun down at 200 x g for 1 minute. The flowthrough was removed and 400 pl of the Washing Buffer (H2O, NaCl, KC1, Na2HPO4, KH2PO4, P407) was added to the filter and spun down at 200 * g for 1 minute. This washing step was repeated.
[0423] Elution'. The filter insert for each reaction was transferred into a new 1.7 ml microcentrifuge tube and 20 pl of the Elution Buffer (NaCl, Tris, USER enzyme) was added. The tubes were incubated for 15 min at 37 °C with a closed lid and spun down at 200 * g for 1 minute. The spin filters were discarded and the flowthrough from each reaction was further analyzed.
[0424] Reverse Transcription (RT) and Template Switching'. The RT Buffer was vortexed briefly, and then the RT mix was prepared in a separate tube as shown in Table 1, adding the RT Enzyme Mix last. Table 1. Reverse Transcription (RT) Reaction
Figure imgf000131_0001
[0425] Each RT reaction was mixed thoroughly by pipetting several times and centrifuged briefly to collect solutions to the bottom of tubes. Each RT reaction mix was incubated in a thermal cycler with the heated lid with the following steps: (1) set to 105 °C; (2) 90 minutes at 42 °C; (3) 10 minutes at 70 °C; and (4) hold at 4 °C.
EXAMPLE 10. Estimating mRNA Transcripts of a Cell in an Input Sample Using Sequencing Data
[0426] This Example demonstrates how the large language model described in Examples 5-6 can use mRNA sequencing data to associate computing units with their respective sample objects, thereby providing insights into the distributions of biological molecules at the sub- cellular level and cell-cell interactions, improving the estimation of mRNA transcripts for individual cells in the sample.
[0427] Samples of approximately 1,000 cells were taken from patient derived tissues. The number of mRNA transcripts bound to each computing unit was obtained by Next Generation Sequencing (NGS) of cDNA obtained following the steps of Example 9, where mRNA transcripts from one cell or a cluster of cells were captured by molecular tags bound to computing units associated with that cell or cluster of cells.
[0428] The mRNA transcript numbers associated with each computing unit and with each gene were used to associate computing units associated with the same cell or cluster of cells. For this purpose, the large language model described in Examples 5-6 was used. Statistics of genes detected and comparisons to reference bulk are illustrated in FIGS. 17A-17C, where a public dataset was used as reference Human Protein Atlas. Thus, the present Example provides insights into the distributions of biological molecules at the sub-cellular level and cell-cell interactions.
EXAMPLE 11. Estimating mRNA Transcripts of a Cell in an Input Sample Using Allele Specific Sequencing Data
[0429] This Example demonstrates how the large language model of Examples 5-6 can use allele specific mRNA sequencing data to associate computing units with their respective sample objects, thereby providing insights into distribution of biological molecules at the sub-cellular level and cell-cell interactions, improving the estimation of mRNA transcripts for individual cells in the sample.
[0430] Samples of approximately 1,000 cells are taken from patient derived tissues. The number of mRNA transcripts bound to each computing unit is obtained by Next Generation Sequencing (NGS) of cDNA obtained following the steps of Example 9, where mRNA transcripts from one cell or a cluster of cells are captured by molecular tags bound to computing units associated with that cell or cluster of cells.
[0431] mRNA transcript numbers associated with each computing unit and with each allele are used to associate computing units associated with the same cell or cluster of cells. For this purpose, the large language model described in Examples 5-6 is used. The number of genes G is equal to the number of allele assignments. For a diploid cell, G is three times the number of genes corresponding to the first allele genes, second allele genes, and indeterminate allele genes. For homozygous genes, where both alleles have the same sequence, or for mRNA transcripts, where the sequenced portion of the transcript does not include the allele specific sequence, the transcript is by default assigned to the indeterminate allele. All other mRNA transcripts are assigned to alleles using a priori information regarding allele sequences.
[0432] While the disclosure has been particularly shown and described with reference to specific embodiments (some of which are preferred embodiments), it should be understood by those having skill in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the present disclosure as disclosed herein.

Claims

CLAIMS We claim:
1. A method of spatially profiling an input sample comprising a plurality of sample objects comprising a plurality of biological molecules without establishing a priori spatial relationship between a plurality of molecular tags and the plurality of sample objects, the method comprising
(a) contacting the plurality of sample objects with a plurality of computing units (CUs), wherein each CU of the plurality of CUs is engineered to display a first set of at least one surface-bound entity (SBE) and a second set of at least one SBE on its surface, wherein the first set of at least one SBE is capable of binding to the plurality of sample objects, and the second set of at least one SBE is associated with the plurality of molecular tags and is capable of binding to the plurality of biological molecules; and
(b) permeabilizing the plurality of sample objects such that the plurality of biological molecules are released and bind to the second set of at least one SBE associated with the plurality of molecular tags; and
(c) establishing a posteriori spatial relationship between the plurality of molecular tags and the plurality of sample objects for spatial profiling by evaluating proximity of the each CU of the plurality of CUs with aid of a machine learning algorithm and aggregating the plurality of biological molecules bound to proximal CUs.
2. The method of claim 1, wherein the spatial profiling comprises information on cell-cell interactions of the plurality of sample objects and information regarding the distribution of the plurality of biological molecules within each sample object.
3. A method of single-cell sequencing without establishing a priori spatial relationship between a plurality of molecular tags and a plurality of sample objects comprising a plurality of biological molecules, the method comprising
(a) contacting the plurality of sample objects with a plurality of computing units (CUs), wherein each CU of the plurality of CUs is engineered to display a first set of at least one surface-bound entity (SBE) and a second set of at least one SBE on its surface, wherein the first set of at least one SBE is capable of binding to the plurality of sample objects, and the second set of at least one SBE is associated with the plurality of molecular tags and is capable of binding to the plurality of biological molecules; and
(b) permeabilizing the plurality of sample objects such that the plurality of biological molecules are released and bind to the second set of at least one SBE associated with the plurality of molecular tags; and
(c) establishing a posteriori spatial relationship between the plurality of molecular tags and the plurality of sample objects for single cell sequencing by evaluating proximity of the each CU of the plurality of CUs with aid of a machine learning algorithm and aggregating the plurality of biological molecules bound to proximal CUs.
4. The method of claim 3, further comprising reverse transcribing the plurality of biological molecules bound to the plurality of molecular tags for sequencing prior to establishing a posteriori spatial relationship.
5. The method of any one of claims 3-4, wherein the plurality of biological molecules is RNA.
6. The method of any one of claims 3-4, the sequencing is Next-Generation sequencing or Sanger sequencing.
7. The method of any one of claims 1-5, wherein the each CU is associated with the plurality of molecular tags.
8. The method of claim 7, wherein the each CU is associated with the plurality of molecular tags prior to the contacting the plurality of sample objects with the plurality of CUs.
9. The method of claim 7, wherein the each CU is associated with the plurality of molecular tags following the contacting the plurality of sample objects with the plurality of CUs.
10. The method of any one of claims 1-8, wherein each molecular tag of the plurality of molecular tag comprises a barcode and a unique molecular identifier (UMI), and is associated with the second set of at least one SBE.
11. The method of claim 9, wherein the each molecular tag further comprises a sequencing element, a release element, and/or a linker.
12. The method of claim 10, wherein the release element releases the each molecular tag from the each CU.
13. The method of claim 10, wherein the linker prevents extension.
14. The method of any one of claims 9-12, wherein the barcode is unique to the each CU.
15. The method of any one of claims 9-13, wherein the each molecular tag is single-stranded.
16. The method of claim 14, wherein the each molecular tag comprises a hairpin structure.
17. The method of any one of claims 9-13, wherein the each molecular tag is double-stranded.
18. The method of any one of claims 1-16, wherein the second set of at least one SBE is poly(dT).
19. The method of any one of claims 9-17, wherein the barcode is uniquely assigned to the each CU of the plurality of CUs.
20. The method of any one of claims 9-19, wherein the UMI is uniquely assigned to the each molecular tag.
21. The method of any one of claims 9-20, wherein the each molecular tag is associated with at least two CUs of the plurality of CUs.
22. The method of any one of claims 1-21, wherein the each CU is engineered to display the second set of at least one SBE upon interacting with at least one sample object of the plurality of sample objects.
23. The method of claim 22, wherein the second set of at least one SBE further comprises a blocking element.
24. The method of claim 23, wherein the blocking element prevents reverse transcription.
25. The method of any one of claims 23-24, wherein the blocking element is removed when the at least one sample object interacts with the each CU.
26. The method of any one of claims 1-25, wherein the plurality of sample objects interacts with the plurality of CUs via the first set of at least one SBE.
27. The method of any one of claims 1-25, wherein the plurality of biological molecules interacts with the plurality of CUs via the second set of at least one SBE.
28. The method of any one of claims 1-27, the plurality of CUs bound to each sample object of the plurality of sample objects provide a physical barrier, preventing diffusion of the plurality of biological molecules upon permeabilizing of the sample object.
29. The method of any one of claims 1-28, wherein a first CU of the plurality of CUs interacts with a first sample object of the plurality of sample objects {al } number of times, the first CU interacts with a second sample object of the plurality of sample objects {a2} number of times, a second CU of the plurality of CUs interacts with the first sample object {a3 } number of times, and the second CU interacts with the second sample object {a4} number of times.
30. The method of claim 29, wherein the first CU displays the first set of at least one SBE that is different from what the second CU displays on its surface.
31. The method of any one of claims 29-30, wherein at least one of the {al } number, the {a2} number, the {a3} number, and the {a4{ number is zero.
32. The method of any one of claims 29-30, wherein at least one of the {al } number, the {a2} number, the {a3{ number, and the {a4} number is one.
33. The method of any one of claims 29-30, wherein at least two of the {al } number, the {a2} number, the {a3} number, and the {a4} number are the same.
34. The method of any one of claims 29-30, wherein at least two of the {al} number, the {a2} number, the {a3} number, and the {a4} number are different.
35. The method of any one of claims 1-34, wherein the evaluating proximity of the each CU of the plurality of CUs comprises evaluating proximity between the first CU and the second CU.
36. The method of claim 35, wherein the evaluating proximity between the first CU and the second CU comprises identifying the plurality of molecular tags associated with the first CU and the second CU.
37. The method of claim 35, wherein the evaluating proximity between the first CU and the second CU comprises identifying numbers of UMIs linked to the barcode and the plurality of biological molecules.
38. The method of any one of claims 1-37, wherein {al ’} number of UMIs are linked to a first barcode and a first biological molecule of the plurality of the biological molecules, {a2’} number of UMIs are linked to the first barcode and a second biological molecule of the plurality of the biological molecules, {a3’ } number of UMIs are linked to a second barcode and a third biological molecule of the plurality of the biological molecules, and {a4’} number of UMIs are linked to the second barcode and a fourth biological molecule of the plurality of the biological molecules.
39. The method of claim 38, wherein the first barcode is unique to the first CU, and the second barcode is unique to the second CU.
40. The method of any one of claims 38-39, wherein the {al’ } number, the {a2’ } number, the {a3 ’ } number, and the {a4’ } number are input into the machine learning algorithm for the proximity analysis.
41. The method of claim 40, wherein the machine learning algorithm outputs an adjacency matrix.
42. The method of any one of claims 38-41, wherein at least two of the {al ’ } number, the {a2’ } number, the {a3’} number, and the {a4’} number gives an output value of 1 in the adjacency matrix.
43. The method of claim 42, wherein the output value of 1 indicates the first CU and the second CU are in spatial proximity.
44. The method of any one of claims 38-41, wherein at least two of the {al ’ } number, the {a2’ } number, the {a3’} number, and the {a4’{ number gives an output value of 0 in the adjacency matrix.
45. The method of claim 44, wherein the output value of 0 indicates the first CU and the second CU are not in spatial proximity.
46. The method of any one of claims 41-45, wherein the adjacency matrix is used to establish the a posteriori spatial relationship between the plurality of molecular tags and the plurality of sample objects.
47. The method of any one of claims 1-46, wherein the machine learning algorithm is trained by a computer implemented method for training a model, wherein the computer implemented method comprises:
(a) maintaining a dataset comprising Unique Molecular Identifier (UMI) vectors corresponding to a plurality of sample objects derived from single-cell RNA sequencing (scRNA-seq) samples;
(b) generating a plurality of training inputs from the dataset, wherein each training input comprises a UMI matrix representing UMI counts for number of interactions between a predetermined number of computing units (CUs) and a plurality of biological molecules, and each training input is associated with an adjacency matrix representing known proximal relationships among CUs; and
(c) training the model by adjusting model parameters to match model outputs to the adjacency matrix associated with the plurality of training inputs, wherein the trained model is capable of generating a model output representing spatial relationships among CUs based on input where a priori spatial relationships between the plurality of biological molecules and the sample objects are not present.
48. The method of claim 47, wherein generating the plurality of training inputs comprises:
(a) randomly assigning the predetermined number of CUs to the plurality of sample objects to generate an assignment profde, wherein each of the predetermined number of CUs is assigned to one or more sample objects, and each of the plurality of the sample objects is assigned with one or more CUs; and
(b) sampling the predetermined number of CUs and corresponding biological molecules from the UMI vectors of the dataset to generate training inputs.
49. The method of claim 48, wherein the assignment profile corresponds with the adjacency matrix associated with the training inputs.
50. The method of any one of claims 47 to 49, wherein each row of the UMI matrix represents a biological molecule of the plurality of biological molecules, and each column of the UMI matrix represents a CU of the predetermined number of CUs.
51. The method of any one of claims 47 to 50, the known proximal relationships among CUs comprise know proximal relationships between each pair of the CUs.
52. The method of any one of claims 47 to 1 , wherein each training input is generated from one scRNA-seq sample.
53. The method of any one of claims 47 to 52, wherein the model does not comprise positional embeddings to allow the trained model to handle input data with varying lengths and structures.
54. The method of any one of claims 47 to 53, wherein the model comprises a plurality of transformer blocks, and wherein a last transformer block of the plurality of transformer blocks features a single attention head.
55. The method of any one of claims 47 to 54, wherein the model comprises a sigmoid activation function to allow the trained model to treat each proximal relationship between each pair of the CUs as an independent probability.
56. The method of any one of claims 47 to 55, wherein training the model further comprises using a binary cross entropy as a loss function.
57. The method of any one of claims 47 to 56, further comprising:
(a) partitioning the predetermined number of CUs into a plurality of clusters based on the model output representing spatial relationships among CUs, wherein each cluster of CUs comprises CUs that are proximal to one another; and
(b) aggregating the UMI vectors of the CUs within each cluster of the plurality of clusters to generate resultant UMI vector for each sample object.
58. The method of any one of claims 47 to 57, wherein the model output representing spatial relationships among CUs is indicative of proximal relationship among sample objects of the plurality of sample objects.
59. The method of claim 58, wherein the proximal relationship among sample objects is derived from spatial relationships among the plurality of clusters of CUs, wherein the spatial relationships among the plurality of clusters of CUs is derived from the model output representing spatial relationships among CUs.
60. The method of claim 59, wherein the resultant UMI vector represents a distribution of the plurality of biological molecules within each sample object.
61 . The method of any one of claims 57 to 60, wherein the partitioning the predetermined number of CUs into a plurality of clusters comprises agglomerative clustering with full linkage.
62. The method of any one of claims 57 to 61, wherein the number of clusters is known a priori.
63. The method of any one of claims 57 to 62, wherein the number of clusters is determined by optimizing a secondary criterion.
64. The method of any one of claims 57 to 63, wherein a relative reconstruction error is calculated to measure reconstruction quality of the model.
65. The method of any one of claims 47 to 64, wherein the model comprises a predetermined number of stacked transformer blocks, wherein the predetermined number is configurable.
66. The method of any one of claims 47 to 65, wherein each block of a subset of the predetermined number of stacked transformer blocks has a predefined number of attention heads, wherein the predefined number of attention heads are configurable.
67. The method of any one of claims 47 to 66, wherein an Adam optimizer with a predetermined learning rate is used to train the model.
68. The method of any one of claims 47 to 67, wherein the model is trained for a predetermined steps with a predefined batch size.
69. The method of any one of claims 1-46, wherein the machine learning algorithm is a computer implemented method for training a model, comprising:
(a) maintaining a dataset comprising Unique Molecular Identifier (UMI) vectors corresponding to each sample object of K number of sample objects derived from single-cell RNA sequencing (scRNA-seq) samples, wherein for sample object k, G number of biological molecules are comprised in the UMI vector;
(b) generating a plurality of training inputs from the dataset, wherein each training input comprises a UMI matrix X, wherein each row of the UMI matrix X represents the G number of biological molecules, and each column of the UMI matrix X represents N number of computing units (CUs), wherein values in the UMI matrix X indicate the UMI counts that represent a number of interactions between the Uh CU and the jth biological molecules, and each training input is associated with an adjacency matrix Y representing known proximal relationships among CUs, wherein rows and columns of the adjacency matrix Y represent the N number of CUs, and wherein values in the adjacency matrix Y represents a pairwise proximal relationship between the ith CU and the jth CU; and
(c) training the model by adjusting model parameters to match model outputs to the adjacency matrix Y associated with the plurality of training inputs, wherein the trained model is capable of generating a model output representing spatial relationships among CUs based on input where a priori spatial relationships between the G number of biological molecules and the K number of sample objects are not present.
70. The method of claim 69, wherein generating the plurality of training inputs comprises:
(a) randomly assigning the TV number of CUs to the K number of the sample objects to generate an assignment profde, wherein each of the N number of CUs is assigned to one or more sample objects, and each of the ? number of the sample objects is assigned with one or more CUs; and
(b) sampling the N number of CUs and corresponding biological molecules from the UMI vectors of the dataset to generate the training inputs.
71 . The method of claim 70, wherein the assignment profde corresponds with the adjacency matrix Y associated with the training inputs.
72. The method of any one of claims 69 to 71, wherein the number G is a constant.
73. The method of any one of claims 69 to 72, wherein the number N is an arbitrary integer.
74. The method of any one of claims 69 to 73, wherein each training input is generated from one scRNA-seq sample.
75. The method of any one of claims 69 to 74, wherein the model does not comprise positional embeddings to allow the trained model to handle input data with varying lengths and structures.
76. The method of any one of claims 69 to 75, wherein the model comprises a plurality of transformer blocks, and wherein a last transformer block of the plurality of transformer blocks features a single attention head.
77. The method of any one of claims 69 to 76, wherein the model comprises a sigmoid activation function to allow the trained model to treat the pairwise proximal relationship between the ith CU and the jth CU as an independent probability.
78. The method of any one of claims 69 to 77, wherein training the model further comprises using a binary cross entropy as a loss function.
79. The method of any one of claims 69 to 78, further comprising:
(a) partitioning the N number of CUs into a plurality of clusters based on the model output representing spatial relationships among CUs, wherein each cluster of the CUs comprises CUs that are proximal to one another; and
(b) aggregating the UMI vectors of the CUs within each cluster of the plurality of clusters to generate resultant UMI vector for each sample object.
80. The method of any one of claims 69 to 79, wherein the model output representing spatial relationships among CUs is indicative of proximal relationship among sample objects of the K number of sample objects.
81. The method of claim 80, wherein the proximal relationship among sample objects is derived from spatial relationships among the plurality of clusters of CUs, wherein the spatial relationships among the plurality of clusters of CUs is derived from the model output representing spatial relationships among CUs.
82. The method of claim 81, wherein the resultant UMI vector represents a distribution of the G number of biological molecules within each sample object.
83. The method of any one of claims 79 to 82, wherein the partitioning the N number of CUs into a plurality of clusters comprises agglomerative clustering with full linkage.
84. The method of any one of claims 79 to 83, wherein the number of clusters is known a priori.
85. The method of any one of claims 79 to 84, wherein the number of clusters is determined by optimizing a secondary criterion.
86. The method of any one of claims 69 to 85, wherein a relative reconstruction error is calculated to measure reconstruction quality of the model.
87. The method of any one of claims 69 to 86, wherein the model comprises a predetermined number of stacked transformer blocks, wherein the predetermined number of configurable.
88. The method of any one of claims 69 to 87, wherein each block of a subset of the predetermined number of stacked transformer blocks has a predefined number of attention heads.
89. The method of any one of claims 69 to 88, wherein an Adam optimizer with a predetermined learning rate is used to train the model.
90. The method of any one of claims 69 to 89, wherein the model is trained for a predetermined steps with a predefined batch size.
91. A computer implemented method for spatially profiling an input sample comprising a plurality of sample objects comprising a plurality of biological molecules, comprising:
(a) generating a measurement matrix for experimental measurement data of the input sample by aid of computing units (CUs), wherein each row of the measurement matrix represents G number of biological molecules, and each column of the measurement matrix represents /V number of CUs, wherein values in the measurement matrix indicate Unique Molecular Identifier (UMI) counts that represent a number of interactions between the ith CU and the jth biological molecules;
(b) feeding the measurement matrix into a trained Al model, wherein the Al model has been trained by:
(1) obtaining a dataset comprising UMI vectors corresponding to a plurality of sample objects derived from single-cell RNA sequencing (scRNA-seq) samples;
(2) generating a plurality of training inputs from the dataset, wherein each training input comprises a UMI matrix representing UMI counts for number of interactions between a predetermined number of CUs and a plurality of biological molecules, and each training input is associated with an adjacency matrix representing known proximal relationships among CUs; and
(3) training the model by adjusting model parameters to match model outputs to the adjacency matrix associated with the plurality of training inputs; (c) receiving an output matrix from the trained Al model, wherein the output matrix represents a spatial relationship among CUs, wherein the spatial relationship among CUs identifies proximal CUs that are proximal to one another,
(d) aggregating the plurality of biological molecules bound to the proximal CUs to spatially profile the input sample.
92. A computer program product for training a model, wherein the computer program product comprises a non-transient machine-readable medium storing instructions that, when executed by at least one programmable processor, cause the at least one programmable processor to perform the method of any one of claims 47-91.
93. A computer-implemented system for training a model comprising:
(a) at least one programmable processor; and
(b) a machine-readable medium storing instructions that, when executed by the at least one processor, cause the at least one programmable processor to perform the method of any one of claims 47-91.
94. A system for spatial profiling an input sample, the system comprising:
(a) the input sample comprising a plurality of sample objects comprising a plurality of biological molecules;
(b) a plurality of CUs, wherein each CU of the plurality of CUs is engineered to display a first set of at least one surface-bound entity (SBE) and a second set of at least one SBE on its surface;
(c) a plurality of molecular tags; and
(d) the computer implemented method of any one of claims 47-93.
95. The system of claim 94, further comprising a sequencing method.
96. A kit for spatial profiling an input sample comprising a plurality of sample objects comprising a plurality of biological molecules, the kit comprising:
(a) a plurality of CUs, wherein each CU of the plurality of CUs is engineered to display a first set of at least one surface-bound entity (SBE) and a second set of at least one SBE on its surface;
(b) a plurality of molecular tags; and
(c) an instruction for use of the kit.
97. The kit of claim 95, further comprising an instruction for analysis using the computer implemented method of any one of claims 47-93.
98. A method of detecting presence of one or more target specific output signals indicative of a characteristic of an input sample, comprising:
(a) contacting a plurality of sample objects derived from the input sample with a plurality of computing units (CUs), wherein a sample object of the plurality of sample objects comprises a plurality of biological molecules, wherein each CU of the plurality of CUs is engineered to interact with the plurality of sample objects and display a first set of at least one surface-bound entity (SBE) on its surface only upon interacting with the plurality of sample objects, and wherein the first set of at least one SBE is capable of binding to the plurality of biological molecules;
(b) lysing the plurality of sample objects to release the plurality of biological molecules such that the plurality of biological molecules are released from the plurality of sample objects and binds to the first set of at least one SBE; and
(c) detecting presence of the one or more target specific output signals indicative of the characteristic of the input sample by measuring an amount of the plurality of biological molecules bound to the first set of at least one SBE, wherein the amount is indicative of the characteristic of the input sample.
99. The method of claim 98, wherein a first CU of the plurality of CUs interacts with a first sample object of the plurality of sample objects a first number of times, the first CU interacts with a second sample object of the plurality of sample objects a second number of times, a second CU of the plurality of CUs interacts with the first sample object a third number of times, and the second CU interacts with the second sample object a forth number of times.
100. The method of any one of claims 98-99, the plurality of biological molecules are assigned to one or more hash group using the first set of at least one SBE.
101. The method of any one of claims 99-100, the first number, the second number, the third number, and the forth number of interactions are utilized to evaluate the one or more target specific output signals associated with the plurality of sample objects.
102. A method of detecting presence of one or more target specific output signals indicative of a characteristic of an input sample via multiplexed hashing, comprising:
(a) contacting a plurality of sample objects derived from the input sample with a plurality of CUs, wherein a sample object of the plurality of sample objects comprises a plurality of biological molecules, wherein each CU of the plurality of CUs is engineered to interact with the plurality of sample objects and display a first set of at least one SBE on its surface, wherein the plurality of CUs are partitioned, such that a first CU of the plurality of CUs interacts with a first sample object of the plurality of sample objects a first number of times, the first CU interacts with a second sample object of the plurality of sample objects a second number of times, a second CU of the plurality of CUs interacts with the first sample object a third number of times, and the second CU interacts with the second sample object a forth number of times, and wherein the first set of at least one SBE is capable of binding to the plurality of biological molecules;
(b) lysing the plurality of sample objects to release the plurality of biological molecules such that the plurality of biological molecules is released from the sample object and binds to the first set of at least one SBE; and
(c) detecting presence of the one or more target specific output signals indicative of the characteristic of the input sample by i. assigning the plurality of biological molecules to one or more hash group using the first set of at least one SBE, and ii. utilizing the first number, the second number, the third number, and the forth number of interactions to evaluate the one or more target specific output signals associated with the plurality of sample objects.
103. The method of any one of claims 98-102, wherein at least two of the plurality of sample objects are bound to each other.
104. The method of claim 103, wherein the each CU interacting with the plurality of sample objects depends on the binding of the at least two of the plurality of sample objects.
105. The method of any one of claims 98-104, wherein the plurality of biological molecules bound to the first set of at least one SBE is washed to remove any unbound biological molecules before step (c).
106. The method of any one of claims 99-105, wherein the first CU displays the first set of at least one SBE that is different from what the second CU displays on its surface.
107. The method of any one of claims 99-106, wherein at least one of the first number, the second number, the third number, and the forth number is one.
108. The method of any one of claims 99-106, wherein at least one of the first number, the second number, the third number, and the forth number is greater than one.
109. The method of any one of claims 99-106, wherein at least two of the first number, the second number, the third number, and the forth number are the same.
110. The method of any one of claims 99-106, wherein at least two of the first number, the second number, the third number, and the forth number are different.
111. The method of any one of claims 99-110, wherein the first number and the third number are indicative of a characteristic of the first sample object, and the second number and the forth number are indicative of a characteristic of the second sample object.
112. The method of any one of claims 98-111, wherein a plurality of molecular tags is added after lysing in step (b).
113. The method of any one of claims 98-112, wherein the each CU of the plurality of CUs further displays a second set of at least one SBE.
114. The method of any one of claims 112-113, wherein the plurality of molecular tags associate with the second set of at least one SBE.
115. The method of any one of claims 112-114, wherein the second set of at least one SBE is incapable of binding to the plurality of biological molecules.
116. The method of claim 114, wherein the plurality of molecular tags that do not associate with the second set of at least one SBE are removed by washing.
117. The method of any one of claims 1 12-116, wherein a first molecular tag of the plurality of molecular tags associates uniquely with a first SBE of the second set of at least one SBE, and a second molecular tag of the plurality of molecular tags associates uniquely with a second SBE of the second set of at least one SBE.
118. The method of any one of claims 112-117, wherein the plurality of molecular tag and the second set of at least one SBE are associated either before or after the each CU interacts with the sample object.
119. The method of any one of claims 112-118, wherein a molecular tag of the plurality of molecular tags comprises a hash element and a priming element.
120. The method of claim 119, wherein the priming element is a random N-mer.
121. The method of claim 119, wherein the hash element comprises a unique molecular identifier (UMI).
122. The method of any one of claims 119-121, wherein the molecular tag further comprises a recognition element.
123. The method of any one of claims 119-122, wherein the molecular tag further comprises a sequencing element.
124. The method of any one of claims 119-123, wherein the molecular tag further comprises a unique protein binding domain.
125. The method of claim 124, wherein the unique protein binding domain is an antibody binding domain.
126. The method of claim 125, the antibody binding domain is protein G.
127. The method of any one of claims 98-126, wherein the plurality of biological molecules is RNA, or protein.
128. The method of claim 127, wherein the plurality of biological molecules is RNA.
129. The method of claim 128, further comprising
(1) poly(A) priming of the RNA to the priming element of the molecular tag, and
(2) reverse transcribing the RNA; after step (b) and before step (c).
130. The method of claim 127, wherein the plurality of biological molecules is protein.
131. The method of claim 130, wherein the protein comprises a barcode.
132. The method of claim 131, wherein the protein is an antibody with the barcode.
133. The method of any one of claims 131-132, further comprising
(1) capturing the protein via the sequencing element or the unique protein binding domain,
(2) priming the barcode via the priming element, and
(3) extending the priming element with a strand displacing polymerase; after step (b) and before step (c).
134. The method of any one of claims 131-132, further comprising i. capturing the protein via the unique protein binding domain, ii. releasing the protein from the molecular tag; after step (b) and before step (c).
135. The method of any one of claims 133-134, wherein a protein-molecular tag complex is optionally stabilized via crosslinking.
136. The method of claim 135, wherein the cross-linking is via an amine-reactive cross-linker.
137. The method of any one of claims 112-136, wherein the measuring in step (c) further comprises determining an amount of the molecular tag associated with the plurality of biological molecules.
138. The method of claim 137, wherein the determining is performed using qPCR, sequencing, gel electrophoresis, isothermal amplification, ELISA, or mass spectrometry.
139. The method of claim 138, wherein the sequencing is Next-Generation sequencing or Sanger sequencing.
140. The method of any one of claims 112-139, further comprising (3) computing the amount of the molecular tag such that the plurality of biological molecules in the first sample object can be differentiated from the plurality of biological molecules in the second sample object.
141. The method of any one of claims 113-140, wherein the plurality of sample objects in contact with the plurality of CUs in step (a) is incubated for a sufficient amount of time in an incubator for the each CU to display the first set of at least one SBE and the second set of at least one SBE on its surface.
142. The method of any one of claims 98-141, wherein the plurality of sample objects in contact with the plurality of CUs is incubated in a crowding agent.
143. The method of claim 142, wherein the crowing agent is a hydrogel.
144. The method of any one of claims 98-143, wherein the interaction between the plurality of CUs and the plurality of sample objects comprises at least one logical operator module.
145. The method of claim 144, wherein a logical operator module of the at least one logical operator module comprises the sample object and two or more CUs of the plurality of CUs.
146. The method of claim 144 or 145, wherein the at least one logical operator module generates one or more output signals.
147. The method of any one of claims 144-146, wherein the at least one logical operator module comprises a YES gate, an AND gate, a NAND gate, an OR gate, a NOR gate, a XOR gate, a XNOR gate, a NOT gate, or any combination thereof, wherein the two or more CUs comprise a first CU and a second CU.
148. The method of claim 147, wherein the YES gate comprises generating the one or more output signals only when both the first CU and the second CU are bound to the sample object.
149. The method of claim 147, wherein the AND gate comprises generating the one or more output signals only when both the first CU and the second CU are bound to the sample object.
150. The method of claim 147, wherein the NAND gate comprises suppressing or diminishing the one or more output signals when both the first CU and the second CU are bound to the sample object.
151. The method of claim 147, wherein the OR gate comprises generating the one or more output signals when either the first CU or the second CU or when both the first CU and the second CU are bound to the sample object.
152. The method of claim 147, wherein the NOR gate comprises generating the one or more output signals when both the first CU and the second CU are not bound to the sample object.
153. The method of claim 147, wherein the XOR gate comprises generating the one or more output signals when either the first CU or the second CU but not both CUs are bound to the sample object.
154. The method of claim 147, wherein the XNOR gate comprises generating the one or more output signals when either both the first CU and the second CU or when both the first CU and the second CU are bound to the sample object.
155. The method of claim 147, wherein the NOT gate comprises suppressing or diminishing the one or more output signals when the first CU is bound to the sample object.
156. The method of any one of claims 146-155, wherein the one or more output signals is display of the first set of at least one SBE.
157. A system for biological computing, comprising:
(a) a plurality of sample objects, wherein a sample object of the plurality of sample objects comprises a plurality of biological molecules; and
(b) a plurality of CUs, wherein a CU of the plurality of CUs is engineered to interact with the sample object and display a first set of at least one SBE upon interacting with the sample object.
158. A system for biological computing, comprising:
(a) a plurality of sample objects, wherein a sample object of the plurality of sample objects comprises a plurality of biological molecules; and
(b) a plurality of CUs, wherein a CU of the plurality of CUs is engineered to interact with the sample object and display a first set of at least one SBE.
159. The system of claim 157 or 158, wherein a first CU of the plurality of CUs interacts with a first sample object of the plurality of sample objects a first number of times, the first CU interacts with a second sample object of the plurality of sample objects a second number of times, a second CU of the plurality of CUs interacts with the first sample object a third number of times, and the second CU interacts with the second sample object a forth number of times.
160. The system of claim 159, wherein the first number and the third number are indicative of a characteristic of the first sample object, and the second number and the forth number are indicative of a characteristic of the second sample object.
161. The system of any one of claims 157-160, wherein the CU of the plurality of CUs is further displays a second set of at least one SBE.
162. The system of any one of claims 157-161, further comprising (c) a plurality of molecular tags.
163. The system of any one of claims 157-162, wherein the plurality of molecular tags associate with the second set of at least one SBE.
164. The system of any one of claims 157-163, wherein a molecular tag of the plurality of molecular tags comprises a hash element and a priming element.
165. The system of claim 164, wherein the priming element is a random N-mer.
166. The system of any one of claims 157-165, wherein the molecular tag further comprises a recognition element.
167. The system of any one of claims 164-166, wherein the molecular tag further comprises a sequencing element.
168. The system of any one of claims 164-167, wherein the molecular tag further comprises a unique protein binding domain.
169. The system of claim 168, wherein the unique protein binding domain is an antibody binding domain.
170. The system of claim 169, wherein the antibody binding domain is protein G.
171. The system of claim 157-170, further comprising a strand displacing polymerase.
172. The system of any one of claims 161-171, wherein the CU of the plurality of CUs is engineered to display the first set of at least one SBE and the second set of at least one SBE.
173. The system of any one of claims 161-172, wherein a first molecular tag of the plurality of molecular tags associates uniquely with a first SBE of the second set of at least one SBE, and a second molecular tag of the plurality of molecular tags associates uniquely with a second SBE of the second set of at least one SBE.
174. A kit for biological computing, comprising
(a) a plurality of CUs, wherein a CU of the plurality of CUs is engineered to interact with the sample object and display a first set of at least one SBE upon interacting with the sample object; and
(b) an instruction for use of the kit.
175. A kit for biological computing, comprising
(a) a plurality of CUs, wherein a CU of the plurality of CUs is engineered to interact with the sample object and display a first set of at least one SBE; and
(b) an instruction for use of the kit.
176. The kit of claim 174 or 175, wherein the CU of the plurality of CUs is further engineered to display a second set of at least one SBE.
177. The kit of any one of claims 174-176, further comprising (c) a plurality of molecular tags.
178. The kit of claims 177, wherein a molecular tag of the plurality of molecular tags comprises a hash element and a priming element.
179. The kit of claim 178, wherein the priming element is a random N-mer.
180. The kit of any one of claims 178-179, wherein the molecular tag further comprises a recognition element.
181. The kit of any one of claims 178-180, wherein the molecular tag further comprises a sequencing element.
182. The kit of any one of claims 178-181, wherein the molecular tag further comprises a unique protein binding domain.
183. The kit of claim 182, wherein the unique protein binding domain is an antibody binding domain.
184. The kit of claim 183, wherein the antibody binding domain is protein G.
185. The kit of claim 174-184, further comprising a strand displacing polymerase.
186. A computer-implemented method, comprising:
(a) compiling a measurement matrix Ml comprising from measurements of biological molecules derived from a set of sample objects interacting with CUs, wherein the measurement matrix Ml is partitioned by molecular tags assigned to the biological molecules; wherein the molecular tags are assigned in sufficiently different proportions to biological molecules derived from different sample objects, wherein each column of the matrix Ml represents measurements associated with a same molecular tag, and each row represents measurements associated with a same biological molecule; and
(b) generating a profile of a subset of sample objects based at least in part on the measurement matrix Ml .
187. The method of claim 186, wherein the subset of sample objects is class B sample objects denoted by matrix B, and wherein the class B sample objects is represented by a set of vectors bi {bl, b2, ... bk} (i = 1, 2, 3, . . . k) and where the vector bi = (rli, r2i, . .. rmi) is indicative of typical amounts for some of biological molecules for class B sample objects.
188. The method of claim 187, wherein the set of sample objects comprises a plurality of classes of sample objects, wherein the plurality of classes is denoted by Bi {Bl, B2, . .. Bk}, and wherein each class Bi of sample objects is represented by a set of vectors {bi 1 , bi2, biki } and where the vector bij = (rlij, r2ij, rmij) is indicative of typical amounts for some of the biological molecules for class Bi sample objects.
189. The method of any one of claims 186-188, wherein the profile of a subset of sample objects is estimated based on the measurements with aid of a machine learning algorithm.
190. The method of claim 189, wherein the machine learning algorithm comprises neural network algorithm.
191. The method of claim 188, wherein computing the profile of a subset of sample objects further comprises: (a) computing proportions in which each molecular tag of the molecular tags is assigned to the biological molecules derived from each sample object class Bi, wherein the proportions is denoted by an optimal transformation matrix A; and
(b) computing the profile of a subset of sample objects based at least in part on the measurement matrix Ml and the optimal transformation matrix A.
192. The method of claim 191, wherein computing the optimal transformation matrix A comprises an operation that utilizes an optimization algorithm to minimize the absolute difference between transformation matrix A-transformed matrix B and a truncated measurement matrix M.
193. The method of claims 192, wherein computing the matrix A comprises computing the matrix A by an optimization algorithm, wherein the optimization algorithm is a linear program that is defined as: minimize \M — BA\ subject to,
A > 0, | l |x = c, i = 1, ..., n wherein:
(a) matrix M is compiled from truncated hashed measurements, wherein ith column of AT is a vector of measurements associated with the same molecular tag, and jth row of M corresponds to measurements of the jth biological molecule,
(b) matrix B is compiled from the class vectors
Figure imgf000155_0001
wherein columns of B correspond to the vectors and the jth row of B corresponds to the jth biological molecule,
(c) matrix A represents the optimization variable denoting the proportions in which each molecular tag is assigned to the biological molecules derived from each sample object class Bi, and
(d) optimization constraints are indicative of physical limitations, with the constrain |zl |x = c being optional and corresponds to a case where a number of measured sample objects is known.
194. The method of any one of claims 186-193, wherein the profde is transcriptomic profile, proteomic profile, or multiomic profile.
195. The method of any one of claims 186-194, wherein the profile includes probabilities specific sample objects are bound to each other.
196. A computer-implemented method for analyzing multiplexed hashed measurements of biological molecules, comprising:
(a) accessing, at one or more computing devices, the multiplexed hashed measurements of biological molecules,
(b) utilizing a numerical classification engine stored in one or more memories of the one or more computing devices, to ascertain if the accessed measurements partially derive from a sample object of class Bi; and
(c) training the numerical classification engine via supervised learning, to classify the multiplexed hashed measurements into a plurality of classes; wherein the training data comprises multiplexed hashed measurements and the classes of corresponding sample objects from which the measured biological molecules originate, and wherein the training data is obtained by either direct measurement or being synthesized from single cell molecular data.
197. A computer-implemented method for analyzing multiplexed hashed measurements of biological molecules, comprising:
(a) accessing, at one or more computing devices, the multiplexed hashed measurements of biological molecules;
(b) utilizing a numerical regression engine stored in one or more memories of the one or more computing devices, to determine the number of sample objects of class Bi that generated the accessed measurements; and
(c) training the numerical regression engine via supervised learning, to perform regression on the multiplexed hashed measurements; wherein the training data comprises multiplexed hashed measurements, the number and the classes of corresponding sample objects from which the measured biological molecules originate, and wherein the training data is obtained by either direct measurement or being synthesized from single cell molecular data.
198. A computer-implemented method for analyzing multiplexed hashed measurements of biological molecules, comprising:
(a) accessing, at one or more computing devices, multiplexed hashed measurements of biological molecules;
(b) determining, using a numerical classification engine stored in one or more memories of the one or more computing devices, whether the accessed measurements in part generate from a sample object described by a set of vectors {bl, ..., bm}; and
(c) training, the numerical classification engine using supervised learning, to classify multiplexed hashed measurements into two classes, true or false, based on training data comprising multiplexed hashed measurements and the representative vectors of sample objects from which the measured biological molecules originate; wherein the training data being obtained by either direct measurement or being synthesized from single cell molecular data.
199. A computer-implemented method for analyzing multiplexed hashed measurements of biological molecules, comprising:
(a) accessing, at one or more computing devices, multiplexed hashed measurements of biological molecules;
(b) determining, using a numerical regression engine stored in one or more memories of the one or more computing devices, the number of sample objects described by a set of vectors {bl, ..., bm} that in part generated the accessed measurements;
(c) training the numerical classification engine using supervised learning, to perform a regression on the multiplexed hashed measurements; wherein training data comprises the multiplexed hashed measurements and the number and representative vectors of sample objects from which the measured biological molecules originate; and wherein the training data being obtained by either direct measurement or being synthesized from single cell molecular data.
PCT/US2024/052668 2023-10-24 2024-10-23 Biological computing methods and systems for analyzing biological units Pending WO2025090677A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202363592687P 2023-10-24 2023-10-24
US63/592,687 2023-10-24
US202463666556P 2024-07-01 2024-07-01
US63/666,556 2024-07-01

Publications (2)

Publication Number Publication Date
WO2025090677A1 true WO2025090677A1 (en) 2025-05-01
WO2025090677A9 WO2025090677A9 (en) 2025-11-06

Family

ID=95516431

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2024/052668 Pending WO2025090677A1 (en) 2023-10-24 2024-10-23 Biological computing methods and systems for analyzing biological units

Country Status (1)

Country Link
WO (1) WO2025090677A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019055852A1 (en) * 2017-09-15 2019-03-21 Apton Biosystems, Inc. Heterogeneous single cell profiling using molecular barcoding
US20210319279A1 (en) * 2018-09-07 2021-10-14 Xeno Cell Innovations s.r.o. Biological computing systems and methods for multivariate surface analysis and object detection
US20220145361A1 (en) * 2019-03-15 2022-05-12 10X Genomics, Inc. Methods for using spatial arrays for single cell sequencing
US20230113092A1 (en) * 2020-02-14 2023-04-13 Caris Mpi, Inc. Panomic genomic prevalence score

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019055852A1 (en) * 2017-09-15 2019-03-21 Apton Biosystems, Inc. Heterogeneous single cell profiling using molecular barcoding
US20210319279A1 (en) * 2018-09-07 2021-10-14 Xeno Cell Innovations s.r.o. Biological computing systems and methods for multivariate surface analysis and object detection
US20220145361A1 (en) * 2019-03-15 2022-05-12 10X Genomics, Inc. Methods for using spatial arrays for single cell sequencing
US20230113092A1 (en) * 2020-02-14 2023-04-13 Caris Mpi, Inc. Panomic genomic prevalence score

Also Published As

Publication number Publication date
WO2025090677A9 (en) 2025-11-06

Similar Documents

Publication Publication Date Title
Herring et al. Human prefrontal cortex gene regulatory dynamics from gestation to adulthood at single-cell resolution
US12071656B2 (en) Methods and compositions for identifying or quantifying targets in a biological sample
Mereu et al. Benchmarking single-cell RNA-sequencing protocols for cell atlas projects
Dekker et al. Spatial and temporal organization of the genome: Current state and future aims of the 4D nucleome project
Zhang et al. Comparative analysis of droplet-based ultra-high-throughput single-cell RNA-seq systems
Hériché et al. Integrating imaging and omics: computational methods and challenges
Brenner et al. Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays
Xie et al. Droplet-based single-cell joint profiling of histone modifications and transcriptomes
US20200131506A1 (en) Systems and methods for identification of nucleic acids in a sample
Qu et al. Simultaneous profiling of chromatin architecture and transcription in single cells
US20240355472A1 (en) Deep Learning and Artificial Intelligence-Based Non-Sequence Altering Change Latent Space using Variational Autoencoders (VAEs)
Wang et al. Spatial organization of the transcriptome in individual neurons
Beagrie et al. Multiplex-GAM: genome-wide identification of chromatin contacts yields insights overlooked by Hi-C
Ren et al. Spatial omics advances for in situ RNA biology
Rajpurkar et al. Deep learning connects DNA traces to transcription to reveal predictive features beyond enhancer–promoter contact
Lawson et al. Imaging-based screens of pool-synthesized cell libraries
Hu et al. Genomics and systems biology of mammalian cell culture
US20240167097A1 (en) Cellular response assays for lung cancer
Wang et al. Deep learning approaches for non-coding genetic variant effect prediction: current progress and future prospects
Sanka et al. Investigation of different free image analysis software for high-throughput droplet detection
Aggarwal A review on genomics data analysis using machine learning
Mulroney et al. Using nanocompore to identify RNA modifications from direct RNA nanopore sequencing data
WO2025090677A9 (en) Biological computing methods and systems for analyzing biological units
US20230140008A1 (en) Systems and methods for evaluating biological samples
WO2024130230A2 (en) Systems and methods for evaluation of expression patterns

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24883283

Country of ref document: EP

Kind code of ref document: A1