pyspark mysql unstructured data reliability technical documentation gnu make functional requirement workflows operations current source devops python innovation azure data lake scripting denormalization sqoop data analysis memory management dual table troubleshooting hadoop distributed file system (hdfs) shell script functional specification compression cloud amx programming management agile methodology integration jira data engineering usability github airflow track geometry data quality