PaddlePaddle EDL: Elastic Deep Learning

While many hardware and software manufacturers are working on improving the running time of deep learning jobs, EDL optimizes

the global utilization of the cluster, and
the waiting time of job submitters.

For more about the project EDL, please refer to this invited blog post on the Kubernetes official blog.

EDL includes two parts:

a Kubernetes controller for the elastic scheduling of distributed deep learning jobs, and
making PaddlePaddle a fault-tolerable deep learning framework. This directory contains the Kubernetes controller. For more information about fault-tolerance, please refer to the design.

We deployed EDL on a real Kubernetes cluster, dlnel.com, opened for graduate students of Tsinghua University. The performance test report of EDL on this cluster is here.

Tutorials

Design Docs

Future

Resource Adjustments by EDL
Support Full-Tolerant Distributed Training in PadldePaddle Fluid.

FAQ

TBD

License

PaddlePaddle EDL is provided under the Apache-2.0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 950 Commits
.tools		.tools
cmd/edl		cmd/edl
doc		doc
docker		docker
example		example
k8s		k8s
logo		logo
pkg		pkg
scripts		scripts
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.travis.yml		.travis.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
glide.lock		glide.lock
glide.yaml		glide.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PaddlePaddle EDL: Elastic Deep Learning

Tutorials

Design Docs

Future

FAQ

License

About

Releases

Packages

Languages

License

guru4elephant/edl

Folders and files

Latest commit

History

Repository files navigation

PaddlePaddle EDL: Elastic Deep Learning

Tutorials

Design Docs

Future

FAQ

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages