I'll try to distill this down to the essence of the question, please feel free to ask for details. Also, I'm not a software person to begin with, so my approaches might have some obvious bugs - which I would appreciate pointers to.
So, I have 2 serial ports. One is half-duplex, the other is full-duplex. My goal is to buffer the data from the full duplex port in a pair of queues and send it out over the half-duplex one, running on a Raspberry Pi (Zero ideally, could be 3B+). There is also an initialization routine to establish a connection on one of the ports that has to be serviced before any data gets sent, and some handshaking with control characters to switch drections after a set number of bytes are sent.
So far I wrote two working solutions. The first one uses multiprocessing to service each port in a separate process, and the second one uses asyncio to deal with both at once.I also had a 3rd naive solution which didn't use either, and used 100% CPU on one core with corresponding slow throughput.
The multiprocessing solution works well in that it doesn't use 100% of any core on a laptop OR a Raspberry Pi, but when running on the Pi it only achieves about 1/4 the throughput compared to running on a laptop. Question 1: Why might this be? If CPU speed was the bottleneck, I would expect the cores to run at 100% on the Pi but they don't according to htop.
The asyncio solution turned out to be much faster, it achieves twice the throughput than the multiprocessing one when both are run on a laptop - however it uses nearly 100% of one core as I would expect.
Either way, when I push data through the serial ports at maximum speed, I have not seen under ~12% aggregate CPU utilization (1/8 cores fully or ~6% for 2 cores). This seems extremely high for just buffering some data and servicing serial ports. I understand that nonblocking reads will incur this, which is why my very initial approach was suboptimal.
Question 2: For either of these solutions, how would I go about reducing the CPU usage further so it can run more comfortably on a Raspberry Pi? I think it's possible to use multiprocessing with asyncio (service each port in its own process using async) - but this is getting into the weeds and over my head.
Multiprocessing version (abridged - the control character checking in receivebytes() can probably be optimized - but I'm pretty sure this is not my main problem):
import serialimport timeimport sysimport stringfrom multiprocessing import Queue, Processfrom queue import Empty as QueueEmptybytecount = 5000ser = serial.Serial('/dev/ttyUSB1', 115200, timeout=5)ser1 = serial.Serial('./reader', 57600 , timeout = 0.00005)tx=Queue()rx=Queue()index = 0index1 = 0conn_good = 0def init(): print("waiting for server...") while(ser.read(1) != b'\x12'): continue print("got connect byte") ser.write(b'\x14') global conn_good conn_good=1def serviceport(): print("servicing port!") while(1): newbytes=ser1.read() if(len(newbytes) != 0): tx.put(newbytes) try: rxbyte = rx.get(block=False) ser1.write(rxbyte) except QueueEmpty: continue def sendbytes(): ser.write(b'\x11') for index1 in range (bytecount, 0, -1): try: byte = tx.get(block=False) ser.write(byte) except QueueEmpty: break ser.write(b'\x13') def receivebytes(): global conn_good rxbyte=ser.read(1) if(len(rxbyte)==0): print("Timing out waiting for remote byte...") conn_good = 0 return while(rxbyte != b'\x11'): if(rxbyte == b'\x12'): print("It's a connect byte, quit and reconnect") conn_good=0 return rxbyte=ser.read(1) if(len(rxbyte)==0): print("Timing out waiting for start byte...") conn_good = 0 return rxbyte=ser.read(1) if(len(rxbyte)==0): print("Timing out waiting for data or end byte...") conn_good = 0 return while(rxbyte != b'\x13'): if(rxbyte == b'\x12'): print("It's a connect byte, quit and reconnect") conn_good=0 return rx.put(rxbyte) rxbyte=ser.read(1) if(len(rxbyte)==0): print("Timing out waiting for remote byte...") conn_good = 0 returndef serviceradio(): while(1): init() while(conn_good == 1): receivebytes() sendbytes()if __name__ == '__main__': radioprocess = Process(target=serviceradio) radioprocess.start() serialinputprocess = Process(target=serviceport) serialinputprocess.start()
Asyncio version:
import asyncioimport serialimport serial_asynciobytes_to_send = 5000reading = 0writing = 0conn_good = 0# Create asyncio event for signaling data receptiondata_received_event = asyncio.Event()async def read_serial(reader, rx_local): print("Reading local port") while True: msg = await reader.read(1) rx_local.put_nowait(msg) data_received_event.set()async def write_serial(writer, tx_local): print("Writing local port..") while True: try: if(not tx_local.empty()): sendbytes = tx_local.get_nowait() writer.write(sendbytes) await asyncio.sleep(0.00002) except (asyncio.QueueEmpty,asyncio.TimeoutError) passasync def sendbytes(ser, tx_radio_bytes, bytecount): global reading global conn_good print("Sending") while(True): while(conn_good == 1 and reading == 0): ser.write(b'\x11') # Send start-byte for _ in range(bytecount): try: byte = tx_radio_bytes.get_nowait() ser.write(byte) except (asyncio.QueueEmpty,asyncio.TimeoutError): break ser.write(b'\x13') reading = 1 await asyncio.sleep(0.001)async def receivebytes(rd, wrt, rx_radio_bytes): print("Receiving") global conn_good global reading while(True): rxbyte = await rd.read(1) #print(rxbyte) if(conn_good == 0): if(rxbyte == b'\x12'): print("got connect byte") wrt.write(b'\x14') conn_good = 1 continue #print("Waiting for server...") continue if(rxbyte == b'\x12'): print("Connect-byte out of seq, quit and restart") conn_good = 0 continue if (rxbyte == b'\x11'): reading = 1 #print("Got data start byte") continue if (rxbyte == b'\x13'): #print("Got data end byte") reading = 0 continue #print("Adding to receive queue") rx_radio_bytes.put_nowait(rxbyte)async def main(): radio_to_local = asyncio.Queue(maxsize=0) local_to_radio = asyncio.Queue(maxsize=0) # Create serial connection for reading reader, writer = await serial_asyncio.open_serial_connection(url='./reader', baudrate=115200) radioreader, radiowriter = await serial_asyncio.open_serial_connection(url='/dev/ttyUSB1', baudrate=115200) read_task = asyncio.create_task(read_serial(reader, local_to_radio)) write_task = asyncio.create_task(write_serial(writer, radio_to_local)) receive_task = asyncio.create_task(receivebytes(radioreader, radiowriter, radio_to_local)) radiowrite_task = asyncio.create_task(sendbytes(radiowriter, local_to_radio, bytes_to_send)) # Run all tasks concurrently await asyncio.wait([read_task, write_task, radiowrite_task, receive_task])if __name__ == '__main__': asyncio.run(main())
For this version, increasing the asyncio.sleep() calls reduces CPU usage, but cuts into throughput first - so that's not great.
Question 3: For the multiprocessing version, could I add some sleep times somewhere without missing data to reduce load too? If so, where? I tried adding some in the loops that service the ports, but they have the same issues as my previous point.
Note: I do not want to fully rewrite this code, but I do want to learn how to optimize it for CPU usage - and if I can't, why not.
Cheers,R