Development Roadmap (2024 Q4) #1487

Ying1123 · 2024-09-21T22:38:00Z

fengyang95 · 2024-09-22T02:02:41Z

Are there any plans to optimize long context latency?

lumiere-ml · 2024-10-17T02:24:33Z

Hi，can I help for Multi-layer radix cache (GPU/CPU/Disk)？ Really insterested in that.

tanzelin430 · 2024-10-17T11:58:58Z

Are there any plans to optimize long context latency?

I am interested in contributing to P-D split inference architechure and I have machines that support me to develop the architechure, if you guys got any related develop plans please let me know. Thank you @Ying1123 @zhyncs @fengyang95

merrymercy · 2024-10-19T13:58:47Z

@lumiere-ml @tanzelin430 Are you in the slack channel? We can follow up on that.

zhyncs · 2024-10-20T06:01:03Z

@lumiere-ml @tanzelin430 Are you in the slack channel? We can follow up on that.

@lumiere-ml @tanzelin430 Welcome to join our slack channel https://join.slack.com/t/sgl-fru7574/shared_invite/zt-2ngly9muu-t37XiH87qvD~6rVBTkTEHw

tanzelin430 · 2024-10-20T06:14:54Z

@lumiere-ml @tanzelin430 Are you in the slack channel? We can follow up on that.

@lumiere-ml @tanzelin430 Welcome to join our slack channel https://join.slack.com/t/sgl-fru7574/shared_invite/zt-2ngly9muu-t37XiH87qvD~6rVBTkTEHw

thanks for invitation, I am in slack now. forward to collaberate with you

lumiere-ml · 2024-10-20T09:01:30Z

@lumiere-ml @tanzelin430 Are you in the slack channel? We can follow up on that.

@lumiere-ml @tanzelin430 Welcome to join our slack channel https://join.slack.com/t/sgl-fru7574/shared_invite/zt-2ngly9muu-t37XiH87qvD~6rVBTkTEHw

Thanks for your invitation！

Edenzzzz · 2024-11-11T03:30:14Z

@lumiere-ml @tanzelin430 Are you in the slack channel? We can follow up on that.

@lumiere-ml @tanzelin430 Welcome to join our slack channel https://join.slack.com/t/sgl-fru7574/shared_invite/zt-2ngly9muu-t37XiH87qvD~6rVBTkTEHw

Thanks for your invitation！

@lumiere-ml @zhyncs I'm also very interested, could you share which channel you're using to discuss?
Perhaps we can combine radix tree prefix matching with P-D disaggregation similar to Mooncake?

Ying1123 changed the title ~~[WIP] Development Roadmap (2024 Q4)~~ Development Roadmap (2024 Q4) Sep 22, 2024

zhyncs pinned this issue Sep 22, 2024

zhyncs mentioned this issue Sep 22, 2024

[Feature] Are there plans to implement a prefill-decode split inference architecture? #1080

Closed

ByronHsu mentioned this issue Oct 4, 2024

Provide an offline engine API #1567

Merged

3 tasks

ByronHsu mentioned this issue Oct 15, 2024

Support vLLM-style rope flashinfer-ai/flashinfer#530

Closed

zhaochenyang20 mentioned this issue Oct 20, 2024

Add documentations for Installation #1733

Closed

3 tasks

zhyncs mentioned this issue Nov 1, 2024

Development Roadmap (2024 Q3) #634

Closed

29 tasks

liangzelang mentioned this issue Nov 15, 2024

[Feature] Expert parallelism support #1435

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Development Roadmap (2024 Q4) #1487

Development Roadmap (2024 Q4) #1487

Development Roadmap (2024 Q4) #1487

Development Roadmap (2024 Q4) #1487

Comments

Performance

Parallelism

Hardware Coverage

Model Coverage

LoRA support

LMCache Integration

Quantization

Server API

Observability

Others