Milvus Build Up the Unstructured Data Service - Jun Gu 09.2020 - ITU

Page created by Charlie Haynes
 
CONTINUE READING
Milvus Build Up the Unstructured Data Service - Jun Gu 09.2020 - ITU
Milvus
Build Up the Unstructured Data Service

               Jun Gu
               09.2020

                                         © 2020 Zilliz. All rights reserved.
Milvus Build Up the Unstructured Data Service - Jun Gu 09.2020 - ITU
Speaker bio

              Jun Gu
              Database engineer, SME

                       Voting member in Technical Advisory Council (TAC)
                       Partner, Chief Evangelist

              Career history

              Education

                                                              © 2020 Zilliz. All rights reserved.
Milvus Build Up the Unstructured Data Service - Jun Gu 09.2020 - ITU
Zilliz: Who we are

                     • Open source software company based
                       in Shanghai
                     • Mission: Reinvent data science
                     • Main contributor of Milvus project

                                                  © 2020 Zilliz. All rights reserved.
Milvus Build Up the Unstructured Data Service - Jun Gu 09.2020 - ITU
Unlock the treasure of unstructured data
AI algorithms transform image, video, voice, natural language into vectors,
and enables understanding and utilization of unstructured data at scale.

  Unstructured data                  Deep learning models                     Vectors   Knowledge, insight, $

                                                                                         © 2020 Zilliz. All rights reserved.
Milvus Build Up the Unstructured Data Service - Jun Gu 09.2020 - ITU
The flow-based AI applications
                            Video

            Extract         Voice          Extract     The most popular way
            frames          model           tags
                                                       • Flexible
                                                       • Easy to compose, web-based UI
                                                       • Sample piplelines
             Image

                                                       The challenge
          Visual model                                 • Data fragmentation
            VGG, eg.

            Vectors        Vectors
                                          Attributes
             Visual         Voice

                                                                       © 2020 Zilliz. All rights reserved.
           The sample pipelines for video processing
Milvus Build Up the Unstructured Data Service - Jun Gu 09.2020 - ITU
The unstructured data service (UDS) for AI

                                       Unstructured Data
                              image, video, voice, natural language

                            Search                              Insert

                  Model Inference Runtime
                                                                             store    Inference Layer
                  TensorRT, ONNX RT, TFRT

                            Search                Insert
                Milvus
                    Vectors           Attributes                          Object
                                                                                     Data Service Layer
                High dense + Sparse   (will be in 0.11)                  Storage

                 Multimodal             Scoring
                  (will be in 0.14)   (will be in 0.16)

                                                           output

                                          Result Set
                             image, video, voice, natural language
                                                                                     © 2020 Zilliz. All rights reserved.
Milvus Build Up the Unstructured Data Service - Jun Gu 09.2020 - ITU
Why Milvus: Vectors are different
                     Numbers                                                    Vectors

               Arithmetic operation                              Similarity (eg. Euclidean distance)
                                                                      d ( A, B ) 
                                                                                          n

                                                                                          (a  b )
                                                                                                i         i
                                                                                                              2

                                                   Operation                             i 1

               Number comparison                                       Similarity comparison
                     a  b
                                                                     TopK ( A)  arg min(d ( A, B ))
                                                                                                    B

                         1–10

             1–5                      6–10        Organization

     1 2 3         4 5          6 7      8 9 10

                                                                                                                  © 2020 Zilliz. All rights reserved.
Milvus Build Up the Unstructured Data Service - Jun Gu 09.2020 - ITU
Milvus: The big picture

                                                             Query Scheduler              Processing Engine                        Buffer Pool
                                                                                          ANNS           Collaborative Query
                                                                                    Mi-FAISS, Mi-Annoy   tag/structured data             Index
                                                                 Result                                                                   Files

                                             SDK / Web API
              top-K result
                                                                Reducer                Multi-modal
                                                                                                              Scoring
                                                                                                            app specific

                                                                Segment                                                        Segment
                  query obj                                                                     Metadata
                                                                Selection

                  insert obj

  X86: supports SSE4.2, AVX2, AVX512
  GPU: Pascal microarchitecture or later, CUDA 10.0
  or later                                                     x86          ARM           GPU                               New      Index
  Arm: requires aarch64                                                                                                    Index
  Kunpeng: tested on Kunpen 920 with Centos 7.x                                                                                       Files
                                                                                                                            File
  Loongson: tested on Loongson with docker
  container                                                  Kunpeng     Loongson       RISC-V
  RSIC-V: in early development
                                                                     Various Processors                                     Storage Tier
                                                                                                                                            © 2020 Zilliz. All rights reserved.
Milvus Build Up the Unstructured Data Service - Jun Gu 09.2020 - ITU
Milvus: Distributed deployment

                                 © 2020 Zilliz. All rights reserved.
Milvus Build Up the Unstructured Data Service - Jun Gu 09.2020 - ITU
Milvus: The ANN benchmark

      Milvus: 0.8.0
      OS: Ubuntu 18.04
      ECS: AWS c5.4xlarge (16c, 32GB), Intel XeonPlatinum 8275CL
      Data set: sift-128-euclidean (1 million vectors)
      More info: https://milvus.io/docs/benchmarks_aws

                                                                                                                  © 2020 Zilliz. All rights reserved.
          Special thanks to ANN-Benchmarks (developed by Martin Aumueller, Erik Bernhardsson and Alec Faitfull)
Milvus: The journey

  2018.10            2019.04            2019.06

                                                  The most active AI projects in
                                         1st
   The               Milvus                       Linux foundation
                                        seed
   idea               0.1
                                        user

             Open              Joined
            Source              LF AI

          2019.10             2020.03

                                                                                   © 2020 Zilliz. All rights reserved.
Progress
Unstoppable momentum since its debut.

                          5.9K          3.9K            104

                           Commits      GitHub stars   Contributors

                             16         200+              19

                            Release        Users       Patents filed

                                                                       © 2020 Zilliz. All rights reserved.
Comprehensive       Leading-Edge     Dynamic Data
                                        Similarity Metrics   Performance      Management

Milvus
Features & benefits                     Near Real Time
                                                             Rich Data Type
                                                              & Advanced      Cost Efficient
                                           Search
                                                                 Search
The world’s most advanced, our target

                                        Highly Scalable
                                          and Robust          Cloud Native     Ease of Use

                                                                              © 2020 Zilliz. All rights reserved.
Use case: Inteligent writing assistant

                       Corpus Data
                     natural language

                                                                     Writing Intention
                     Data Cleansing

                              Feature engineering

                        Encoder
                        TextCNN
                              Extract paragraph, summary
                                                                            Result
                                                                     An auto-generated
                         Encoder                                           essay
                        InferSent

                                                            Object
                         Milvus
                                                           Storage

                                                                     © 2020 Zilliz. All rights reserved.
Use case: News recommendation on mobile

                        Daily batch              Feeding
                        News title              News title

                                      Encoder
                                      SimBert

                                                                    Object
                                      Milvus
                                                                   Storage

                 Reading                             Recommended
                Preference                               News

                                                                        © 2020 Zilliz. All rights reserved.
Use case: Image search for company trademark

                                   Images
                              Company Trademark                • 55 million images
                                                               • Search elapsed time:
                                                                    20 ms on cloud GPU server
                                     Encoder
                                  VGG (fine tuned)

                 Search                                               Object
                                      Milvus
                                                                     Storage

                Trademark Image                      Company Info

                                                                          © 2020 Zilliz. All rights reserved.
Use case: Pharmaceutical molecule analysis

                                                             Molecular Formula
• 800 million molecules
                                                                      CC(=O)Nc1ccc(S(=O)(=O)NCC(=O)N2CCS(=O)CC2)cc1
• Search elapsed time:
                                                                 Encoder
    500 ms on single server                                       RDKit
                                                                                 Molecular fingerprint: 1024 bits
                                                                                 00001100...10000000

                                                                  Milvus
                          Tanimoto similarity

                                                Molecular
                                Substructure                  Candidate List
                                                Similarity

                                                                                         © 2020 Zilliz. All rights reserved.
                              Superstructure
Useful Links

                                                      Performnance benchmark:
                                                      https://milvus.io/docs/benchmarks_aws
       https://milvus.io

       https://github.com/milvus-io/milvus            Live demo:
                                                      https://milvus.io/scenarios
       https://milvusio.slack.com

       https://twitter.com/milvusio                   • Content-based image retrieval system (以图搜图)
       https://medium.com/unstructured-data-service
                                                      • Q&A chatbot powered by NLP (智能客服机器人)
                                                      • Molecular analysis (化合物分析)
       https://zhuanlan.zhihu.com/ai-search

                                                                                    © 2020 Zilliz. All rights reserved.
Thanks!

          © 2020 Zilliz. All rights reserved.
You can also read