Simatcher

Introduction

This is lite nlp frame, it refers to data processing pipeline thinking and incorporates the related capabilities of LLM.

I drew on the design ideas of the Project Rasa nlu module code，add Langchain's LLM processing logic.

Through this project, you can quickly build your personal knowledge base and use LLM to summarize and reason about the results.

The LLM models that can be used here depend on the models that langchain can support. You can inherit the LLM class of langchain and add your own model.

Design

Engine

Intent

Component

Native component

├── demo2.py
├── demo.py
└── splitters
|   └── langchan_splitter
|               └── ChineseRecursiveTextSplitter (from langchain-chatglm)
|               └── TextSplitter
└── fearturizrs
|   └── bert_featurizer
|               └── all-MiniLM-L6-v2
|               └── sber-chinese-general
|               └── text2vec-base-chinese
└── classifiers
|   └── faiss
|         └── l2     
|         └── cosine
└── extractors
|   └── regex
└── refine
    └── chatglm  
    └── jarvis

How to Use

Environment Installation

First, you need to download this repository:

git clone https://github.com/xiashuqin89/simatcher
cd simatcher

Then use pip to install the dependencies:

pip install -r requirements.txt

Download encoder mode

mkdir model && cd model
# sbert-chinese-general-v2
git clone https://huggingface.co/DMetaSoul/sbert-chinese-general-v2
# text2vec-base-chinese
git clone https://huggingface.co/shibing624/text2vec-base-chinese

build your pipline

Do a config

{
    "language": "zh",
    "training_data": "",
    "pipeline": [
        {
            "name": "LangchainSplitter",
            "classifier_file": "LangchainSplitter.pkl",
            "class": "simatcher.nlp.splitters.LangchainSplitter",
            "chunk_size": 100,
            "chunk_overlap": 0,
            "zh_title_enhance": False
        },
        {
            "name": "LangchainFeaturizer",
            "classifier_file": "LangchainFeaturizer.pkl",
            "class": "simatcher.nlp.featurizers.LangchainFeaturizer",
            "pre_model": "text2vec-base-chinese"
        },
        {
            "name": "LangchainClassifier",
            "classifier_file": "LangchainClassifier.pkl",
            "class": "simatcher.nlp.classifiers.LangchainClassifier",
            "knowledge_base_id": "default",
            "top_k": 4,
            "score_threshold": 1,
            "with_score": True
        },
        {
            "name": "SummaryRefiner",
            "class": "simatcher.nlp.refiners.SummaryRefiner",
            "llm_model": "chatglm2-6b",
            "endpoint_url": "http://127.0.0.1",
            "api_key": "xxx",
            "model": "xxxx",
            "history": [],
        }
    ],
    "version": "0.0.0"
}

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
docs/images		docs/images
simatcher		simatcher
tests		tests
Dockerfile		Dockerfile
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simatcher

Introduction

Design

Engine

Intent

Component

How to Use

Environment Installation

build your pipline

About

Releases

Packages

Languages

xiashuqin89/simatcher

Folders and files

Latest commit

History

Repository files navigation

Simatcher

Introduction

Design

Engine

Intent

Component

How to Use

Environment Installation

build your pipline

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages