Clustered LIO Using RBD - Mike Christie Red Hat Oct 28, 2014

Page created by Stephen Tran

Government & Politics

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Clustered LIO Using RBD

Mike Christie 
Red Hat
Oct 28, 2014

Agenda

    ●   State of HA SCSI Target Support in Linux.
    ●   Difficulties adding Active/Active Support.
    ●   Future Work.

2

State of Linux Open Source HA SCSI Targets

●   Active/Passive.
●   Pacemaker support for IET, TGT, SCST and LIO.
     ●   Node level failover when target node goes down.
●   Relies on virtual ports/portals (IP take over for iSCSI,
    NPIV for FC) or implicit ALUA failover.
●   Missing final pieces of support for distributed SCSI
    Persistent Reservations.

3

iSCSI Active/Passive With Virtual IPs

                           Server1

                                     Switch A
                                                                          ● Server1 accesses the two targets/GWs one at a time
Virtual IP 192.168.56.22
                                                                            through one or more Virtual IPs.
                                                                          ● eth2 and eth4 are used by Corosync/Pacemaker for

                                                                            cluster membership and cluster aware devices like
                                                                            DRBD.
                 eth1                                     eth3            ● If the active target goes down, corosync/pacemaker will

                                                                            activate the passive target.
                                                                          ● Server1's TCP/IP layer and/or iSCSI/multipath layer will

                           eth2                 eth4                        detect the disruption and perform recovery like
                                                                            packet retransmission, iSCSI/SCSI command retry or
                                                                            relogin.

     (Active) iqn.2003-04.com.test              (Passive) iqn.2003-04.com.test

4

Active/Active HA LIO Support

    ●   Benefits:
         ●   Simple initiator support.
              ●   Boot, failover, failback, setup.
         ●   Support for all SCSI transports in common a
             implementation.
         ●   Possible performance improvement.
    ●   Drawbacks:
         ●   Complex target implementation.
              ●   Distributed error handling, setup, and command execution.

5

iSCSI HA Active/Active

               Server1

                           Switch A                                   ● Server1 accesses the two targets/GWs through two
                                                                       paths: 192.168.10.22 and 192.168.1.23.
                                                                      ● Both targets access the same RBD devices at the

IP: 192.168.100.22        IP: 192.168.1.23                              same time.
                                                                      ● eth2 and eth4 are used by Corosync/Pacemaker
               eth1                           eth3
                                                                        for DLM/CPG and cluster membership.
                                                                      ● If a node or paths to a node become unreachable

                         eth2         eth4                              Server1's multipath layer will mark those paths as
                                                                        unusable until they come back online.
  (Active) iqn.2003-04.com.test       (Active) iqn.2003-04.com.test

                                RBD

  6

Implementation Challenges

    ●   Request execution.
    ●   Synchronizing error recovery across nodes.
    ●   Distributing setup information.

7

Distributed Request Execution

    ●   COMPARE AND WRITE
        ●   Atomically read, compare, and if matching, write N bytes
            of data.
        ●   Used by ESXi (known as ATS) for finely grained locking.
        ●   If multiple nodes are executing this request at the same
            time then locking is needed.
             ●   Patches posted upstream to push the execution to the backing
                 device.

8

Persistent Reservation (PR) Support

    ●   PRs are a set of commands used to control access to a
        device.
    ●   Used by clustering software like Windows Clustering and
        Red Hat Cluster Suite to prevent client nodes from
        accessing the device.
    ●   Initiator sends PR requests to the target which inform it
        what set of I_T Nexuses (SCSI ports) can access the
        device, and what type of access they have.
         ●   This info must be copied across cluster.
         ●    Ports can be added/removed and access restrictions can be
             changed any time.

9

HA Active/Active PR example

           Server1                            1) Server1 sends PR register command to
                                                register Sever1 and Node1's ports to allow
                                                access to LUN $N.
     1.
                                              2) Node1 stores PR info locally.
                                              3) Node1 copies data to Node2.
                     Switch A
     4.   6.                            5.    4) Node1 returns successful status to Server1.
                                              5) The process is now repeated for Server1
                                                 and Node2's ports (remote copy and return
2.        eth1
                        3.                       of status are skipped in example).
                                     eth3
                                              6) Server1 sends a PR reserve command to
                                                 establish the reservation. This prevents other
                 eth2         eth4               Server nodes from being able to access
     Node 1                          Node 2      LUN $N (this info will also be copied to node2
                                                 and a node1 will return a status code to
                                                 Sever1).
                        RBD

10

Persistent Reservation Implementation

 ●   Possible solutions:
      ●   Use Corosync/Pacemaker and DLM to distribute PR info across
          nodes.
      ●   Pass the PR execution to userspace and use the Corosync cpg
          library messaging to send the PR info to the nodes in the cluster.
      ●   Have a cluster FS/device that is used to store the PR info in.
      ●   Add callbacks to the LIO kernel modules or pass PR execution to
          userspace, so devices like RBD can utilize their own
          locking/messaging.

11

Distributed Task Management

 ●   When a command times out, the OS will send SCSI
     Task Management requests (TMFs) to abort
     commands and reset devices.
 ●   The SCSI device reset request is called LOGICAL
     UNIT RESET (LUN RESET).
      ●   SCSI spec defines the required behavior.
           ●   Abort all commands.
           ●   Terminate other task management functions.
           ●   Send an event (SCSI Unit Attentions) through all paths
               indicating the device was reset.

12

HA Active/Active LUN RESET Example

Server1 1) Sever1 cannot determine the state of a
command. To get the device in a known state it
sends a LUN RESET.
2) Node1 begins processing the reset by
internally blocking new commands and aborting
Switch A
running commands.
1. 4. 5. 6.
3) Node1 sends a message to Node2
instructing it to execute the distributed reset
2. 3.
process.
eth1 eth3
4) After all the reset steps, like command
cleanup, have been completed on both nodes,
Node1 eth2 eth4 Node1 returns a success status to Server1.
Node2
5) And 6) Node1 and Node2 send Unit
Attention notifications through all paths that are
accessing the device that was reset.
RBD

LUN RESET Handling

     ●   Experimenting with passing part of the TMF handling to
         userspace.
          ●   Use cpg to interact with LIO on all nodes.
          ●   Extend LIO configfs interface, so userspace can block devices and
              perform the required reset operations.
     ●   Possible future work/alternative.
          ●   Add Linux kernel block layer interface to abort commands and
              reset devices.
               ●   request_queue->rq_abort_fn(struct request *)
               ●   request_queue->reset_q_fn(struct request_queue *)
               ●
                   New BLK_QUEUE_RESET notifier_block event.
          ●   LIO would use this to allow backing device to do the heavy lifting.

14

Offloaded Task Management

Server1
1) Server1 sends LUN RESET to Node1.
2) Node1 calls RBD's
request_queue->reset_q-fn()
Switch A 5.
1. RBD translates that to new rbd/rados reset
6. operaiton.
3) RBD/rados aborts commands and sends
other clients accessing device notification that
eth1 eth3 its commands were aborted due to reset.
4) RBD client on Node2 handles rados reset
notification by firing new
eth2 eth4
2. Node1 Node2 4. BLK_QUEUE_RESET event.
5) LIO handles BLK_QUEUE_RESET event
by sending SCSI UAs on paths accessing
LUN through that node.
6) RBD client on Node1 notifies reset_q_fn
RBD caller the reset was successful. LIO then
returns a success status and UAs as needed.
3.

Management

     ●   Have only just begun to look into this.
     ●   Must support VMware VASA, Oracle Storage Connect,
         Red Hat libStorageMgmt, etc.
     ●   Must have setup info like UUIDs, inquiry info, SCSI
         settings synced up on all nodes.
     ●   Prefer to integrate with existing projects.
          ●   Extend the LIO target library, rts lib, and lio utils to
              support clustering?
          ●   Extend existing remote configuration daemons like
              targetd (https://github.com/agrover/targetd)?

16

Questions?

     ●   I can be reached at mchristi@redhat.com.

17

You can also read

How to Develop a Loudspeaker for iPhone 5 with Digital Audio Interface Dr. John Oh Pulsus Technologies - The Association of Loudspeaker ...

Transio A52/A53 Series Quick Installation Guide - Version 8.1, January 2021 - Moxa

Evolution of COVID-19 Rapid Antigen Testing - Panbio COVID-19 Ag Rapid Test Device

Webinar August 23, 2017 - Academic and Career Guidance Content (ACGC) 2017-2018 Direction de l'adaptation scolaire - LEARN Quebec

Support for the Long Term Unemployed Help to Work Support Supervised Jobsearch Pilots - National Delivery Group 10 December 2013

K-eCommerce vs Shopify - A Comparison Guide

Activities Programme - Winterbourne ...

COUNCIL TAX FOR 2021/22 WATFORD BOROUGH COUNCIL - What you pay and what we provide

Delightfully Deconstructed - aamcdenimanddiamonds.org - AAMC Denim & Diamonds

Join our Here to Help campaign, shine a light on domestic abuse and help end it for good - Find The Glow

Managing your money online - A RESOURCE FOR PEOPLE OF ALL ABILITIES - As a person with disability, you have the right to go online safely and ...

Weekly Bulletin for registered users - 6th Edition Date: 10/12/2020 - Northern Ireland ...

ETP Energy Industry Doctorate Programme 2021

Setup Guide Microsoft Office 365 using Microsoft Graph App - for touchONE-concentrator - Cue System

Knowledge and Information Skills Training Sessions - UHCW LIBRARY AND KNOWLEDGE SERVICES - UHCW Library and ...

Item: Components 1.2 Modelling, Decision Support Tools (DST) and Simulation Exercises

EU-ILO Project Steering Committee, 2nd sitting Project deliverables in 2020 and future perspectives - António J. Robalo dos Santos Project manager

PHP Code Injection in J-Web - CVE-2021-0210 Security advisory 2021/01/18 Lena David Geoffrey Bertoli - Synacktiv

Upgrade ArchiOffice - BQE Software

Northern Textile collection Retail 2021 - Northern.no

AIM Industrial Growth Freehold and Leasehold Real Estate Investment Trust - Investor Presentation Q1/2018

2020-2021 Financial Aid - Sky Tavern

MRC 2020 Tooling and Infrastructure - Control Systems ...

Nelson Tasman Accommodation Guide

EBU External Bus Unit - AURIX Microcontroller Training V1.0 2019-03 - Infineon Technologies

International cooperation to increase access to COVID-19 vaccines

Microsoft Teams Direct Routing - Gamma

Weekend Outings January - June 2020 - Nowra Outlet 24 Trim Street South Nowra NSW 2541 - Shoalhaven ...

Halloween Shopping Guide - THE ULTIMATE - from spookylittlehalloween.com - Spooky Little Halloween

FY 2020-2021 Five-Year GTR Forecast - Preliminary Presentation to BBAC 12/19/2019 - The City of Portland ...

WHAT'S NEW IN DRAFTSIGHT 2021 - WELCOME TO DRAFTSIGHT 2021!

Express Wi-Fi by Facebook - Cambium Networks

Overview of the Canadiana collections - crkn-rcdr.ca

Please, feed the quantum troll! - Alba Cervera-Lierta University of Toronto Q-Hack 2021

223 Taunton Road, , Bridgwater TA6 6BJ

Optimal Non-Convex Exact Recovery in Stochastic Block Model via Projected Power Method

Eastlink Maestro Quick Start Guide - My Eastlink

Working Remotely Communication Tools - March 2020 Document 01 - Storyblok

OPEN ACCESS TO GREY LITERATURE: THE PUMA SERVICE

ESI tronic 2.0 Online - News 2021

From DPD to CLP - The Detergent Industry Network for CLP Classification Anna Melvås

Alcatel-Lucent OmniAccess Stellar Wi-Fi 6 - Raise the bar with superior connectivity

Byerley Park Primary School - Catch-up Premium Strategy 2020 - 2021 Updated 3.2.21

Dawn - R630 000 www.kironprop.co.za - NET

Dunbia International markets for beef and lamb - NFU

Using DDA Trainer Name Trainer/Consultant - PowerSchool University 2012

Get the Power to transform your Business - Eaton

Supporting People with Unique Access or Functional Needs (including disabilities) - A COMMUNICATION AID FOR EMERGENCY PERSONNEL - Center for ...

THRIVE 2019 Strategic Business Planning Workshop - Build healthy execution plans for your - Webflow