pyspark research analytics named entity recognition pandas sentiment analysis forecasting lasso document classification optical character recognition (ocr) performance measurement text extraction python data pipeline statistics bit error rate tester (bert) tensorflow error analysis deep learning assortment optimization time series algorithms credit risk demand management category management natural language processing trend analysis numpy text classification scikit-learn track geometry risk analysis