How does a Ceph OSD handle a read message ? (in Firefly and up)
When an OSD handles an operation it is queued to a PG, it is added to the op_wq work queue ( or to the waiting_for_map list if the queue_op method of PG finds that it must wait for an OSDMap ) and will be dequeued asynchronously. The dequeued operation is processed by the ReplicatedPG::do_request method which calls the the do_op method because it is a CEPH_MSG_OSD_OP. An OpContext is allocated and is executed.
2014-02-24 09:28:34.571489 7fc18006f700 10 osd.4 pg_epoch: 26 pg[3.7s0( v 26'1 (0'0,26'1] local-les=26 n=1 ec=25 les/c 26/26 25/25/25) [4,6,9] r=0 lpr=25 crt=0’0 lcod 0’0 mlcod 0’0 active+clean] execute_ctx 0x7fc16c08a3b0
A transaction (which is either a RPGTransaction for a replicated backend or an ECTransaction for an erasure coded backend) is obtained from the PGBackend. The transaction is attached to a OpContext (which was allocated by do_op). Note that in the following log line although do_op shows, it comes from the execute_ctx method.
2014-02-24 09:28:34.571563 7fc18006f700 10 osd.4 pg_epoch: 26 pg[3.7s0( v 26'1 (0'0,26'1] local-les=26 n=1 ec=25 les/c 26/26 25/25/25) [4,6,9] r=0 lpr=25 crt=0’0 lcod 0’0 mlcod 0’0 active+clean] do_op 847441d7/SOMETHING/head//3 [read 0~4194304] ov 26’1
The execute_ctx method calls prepare_transaction which calls do_osd_ops which prepares the CEPH_OSD_OP_READ.
2014-02-24 09:28:34.571663 7fc18006f700 10 osd.4 pg_epoch: 26 pg[3.7s0( v 26'1 (0'0,26'1] local-les=26 n=1 ec=25 les/c 26/26 25/25/25) [4,6,9] r=0 lpr=25 crt=0’0 lcod 0’0 mlcod 0’0 active+clean] async_read noted for 847441d7/SOMETHING/head//3
The execute_ctx method continues when prepare_transaction returns and creates the MOSDOpReply object. Then it calls start_async_reads which calls objects_read_async on the backend (which is either ReplicatedBackend::objects_read_async or ECBackend::objects_read_async). When the read completes (this code path is not explored here), it calls the OnReadComplete::finish method (because the OnReadComplete object was given as an argument to objects_read_async) which calls ReplicatedPG::OpContext::finish_read each time a ready completes (i.e. if reading from an erasure coded pool, on each chunk) which calls ReplicatedPG::complete_read_ctx (if there are no pending reads) which sends the reply to the client .