sipd is a VoIP (Session Initiation Protocol) engine I had designed and developed during my previous employment at an artificial intelligence firm. Back then, I was working on-site at Samsung Electronics of America to build an SIP processing logic that intercepted and mirrored their call center network traffic to an isolated environment - where we would post-process the calls for risk assessments, analytics, sentiments, reputations, liabilities, etc. Our ultimate goal was applying machine learning and different heuristics techniques to pre-emptively detect and prevent potential incidents like the Samsung Galaxy S7 battery combustion way ahead of the curve.
But one of the major blocker was we didn’t have any SIP solutions that would allow us to customize in ways we & they wanted (if you have worked for/with Samsung before, you know how much they LOVE customization). So I volunteered to build it.
Although heavily oversimplified and a lot of the hardware/architectures have been redacted, the general project flow was:
Assuming that there are a minimum of two members in a conversation (one customer, one agent), both parties establish a VoIP session through a Genesys Resource Manager, in which the session is mirrored against their primary SIP servers and my sipd engine.
The voice traffic (Real-time Transport Protocol) encoded in either G.711 (proprietary) or G.729 (royalty-free, 1/8 the size of G.711) are sent to one of our rtpd (engine that decodes RTP packets into Pulse-code Modulation format) depending on the construction of Session Description Protocol section that the sipd generates.
Everything else - which is regarding text analytics - I’ll defer to another blog post.
There were a few constraints I had to keep in mind.
Initial POC in 2 weeks and incrementally mature into production-level.
Caching and queueing are done at code-level instead of software (like Redis).
(Non-)production network is isolated from internet access - except DNS traffic.
Any changes had to be surgically precise as rollbacks were impossible. This means if a kernel [security] patch broke something, it’s pretty much “debug and fix it live”.
Must be able to handle approx. 24 million calls a year - or around 2,800 calls an hour.
Pre-deployment testing had to be coordinated with a network engineer located in South Korea. This meant frequent 20 hour work days from 09:00 EST to 05:00 EST and back to the office by 09:00 EST 💪
And many, many more.
Before I started on the initial POC work, I had to spend a couple of days first researching the protocol as I was unfamiliar with it. Here are some of my notes:
SIP is similar to the TCP 3-way handshake where it’s a communication protocol to establish session boundaries/parameters. It doesn’t exchange call [audio] payload.
Text-based: there is no need to shift bits and apply masks to parse a SIP packet.
Communicated only using UDP in Samsung call centres. Dropped packets are critical.
OPTION is used for keep-alive. Always respond to these to prevent the network load balancer from evicting “non-responsive” SIP server nodes.
SDP is delimited using two CRLF tokens.
No SIPS, no encryption.
Due to the constraints noted above, I had to implement custom logics for: