Sometimes audio data is in `[[l,l,l,l], [r,r,r,r]]` container. Should we add `channelarray` or something to handle that?