Oh, ok, then I've misunderstood the role test and trace play. No worries.
Out of interest, have you got a source that confirms what you've said?
Oh, ok, then I've misunderstood the role test and trace play. No worries.
Out of interest, have you got a source that confirms what you've said?
Sky News 10pm bulletin went into great detail last night including showing a flow chart, it might be available On Demand. All the tests come from a variety of sources, Pillar 1 tests (NHS tests) are directly integrated into the system. The issue is Pillar 2 tests which come from a variety of sources from Lighthouse laboratories to Universities etc whose data is sent to PHE in a .CSV format. PHE then processes the data in a centralised system then sends out to Test and Trace which then goes into multiple locations.
The whole flow chart looked similar to a knockout competition flowchart like Wimbledown with PHE in the middle like the Final. This centralised location is where the glitch was, if the failing had been elsewhere it wouldn't have affected so many people.
Test and Trace is a bit of a misnomer which causes some confusion - they are responsible for Tracing more than Testing. Much of the positives at the moment for instance now come from eg Universities which are testing their pupils using their own laboratories and are not under the auspices of Test and Trace but are sending their data to PHE using the CSV files, with PHE then supposed to send the data to Test and Trace but PHE was losing the data.
Here you go gogo:
Twitter Link
The data was sent to PHE but PHE never sent it on to Test and Trace due to their glitch.
Thanks.
So it's not Excel at all then? CSV makes even more sense. Absolutely nothing wrong with that.
I wonder how this happened. It could be:
- The team don't have the QA expertise to have known this kind of non functional testing was needed
- NF testing -was- performed but they were given then wrong forecasts (i.e. they never expected the volume they had, and didn't put in appropriate fail-safes)
- NF testing was cut from scope due to budget/time constraints
Either way, someone in the management team needs a spanking.
Edit: I wonder who's in charge of the whole thing? That end to end process is a "system" and as such it should be tested as a working, end-to-end process.
PHE were receiving CSV files, but processing them as XLS files. They wouldn't have hit the limit if they'd continued to use them as CSV files.
PHE are in charge of the process. The whole system is already being replaced but the new system isn't ready yet so PHE are still using their legacy system which worked until they hit the XLS limit.
Ah yes, I should have read the thread you posted rather than just look at the picture, sorry!
I wonder why PHE are in charge of the entire end-to-end process. I'd be interested to find out who took responsibility for testing it all the way through.
Btw, processing csv is the only programming I've done in years, and I'm pretty sure i could have done a better job. At least mine process into xlsX
Keep on keepin' the beat alive!
Yeh this all smacks of just a fundamental schoolboy error. Easily done in any development cycle.
Any proper development and deployment of any system however should have proper testing, which should certainly have picked this up. Even the most basic of reconciliations conducted as part of testing should have checks for # records in vs # records processed vs # records out, which should meet common-sense basic test requirements.
My analysts deal with datasets containing millions and occasionally billions of records. Unfortunately we have a need to occasionally stick resulting analyses into excel for dissemination of information to stakeholders. One very quickly becomes aware of the 1,048k line limitation in current excel, and moreso in the old days with the 65k limitation. And that's just for internal business purposes, not for processing of live data upon which live decisions are made.
Deploying to a production environment a system which relies on input of .csv files for processing of data which generates results upon which fairly critical health decisions are made, may have been pragmatically necessary with short timescales here etc, but the lack of very basic fundamental testing really highlights how shoddy the deployment actually is. At my bank I would be summarily shot if the team I lead deployed something similar.
Excel should never be used for this type of application. There is a high risk of errors, and detecting errors by checking someone's work is too challenging. For the same reason, using excel is frowned upon in serious research (although many people do it anyway). In this case, the proximal cause was the choice of the wrong file format, but PHE should've had devs who can set up a modern data handling system—and leadership that would've requested it from their devs. My guess is that some parts of PHE or agencies PHE works with still use excel, and they were trying to accommodate them on the cheap. Extremely unfortunate and completely preventable mistake.
"One day, we shall die. All the other days, we shall live."
I don't know enough about the format of data or the process to say whether Excel is suitable or not. What I said was that it's not remarkable that it's been used.
In some instances it's better to use Excel for data interfacing than to build something bespoke.
There's a high risk of errors whenever you deal with large volumes of data. You need to build in validation whatever approach you take and using Excel doesn't necessarily mean that you increase risk or need more validation.
I'm actually pretty surprised they didn't use R. That's pretty standard for manipulating datasets in the world of epidemiology and biostatistics. And it's got a very large maximum size.
That being said, I don't really know the conditions under which they deployed this system, the types of users they were working for, and whether there was an opportunity to make incremental improvements after rollout. It's possible that this was the least bad option until they could build something more sustainable.
"When I meet God, I am going to ask him two questions: Why relativity? And why turbulence? I really believe he will have an answer for the first." - Werner Heisenberg (maybe)
Polling: Biden has a 16 point lead. This can't be worse for Trump.
Trump: Hold my beer.
Twitter Link
Didn't they recently rename the abbreviations for some genes because it coflicted with Excel's autoformat feature? Oh, yes, they did:
https://www.theverge.com/2020/8/6/21...sreading-dates
When the stars threw down their spears
And watered heaven with their tears:
Did he smile his work to see?
Did he who made the lamb make thee?
Brief analysis of another one of the UK's unforced cock-ups:
https://www.nytimes.com/2020/10/06/w...versities.html
"One day, we shall die. All the other days, we shall live."
Yes, well, these people are biologists. They're not known for being particularly sophisticated users of computers.
An amusing related story: A friend of mine is an excellent software engineer working for a very well known software company. A few years ago she was working on a collaboration with arguably the best bioinformatics/genomics research institute in the world - one that has literally done much of the seminal work in the field. She was helping them build out their software and when she actually got a look at their code she was... well, shocked would be an understatement. It was written horribly with all sorts of kludgy fixes added on by various researchers over the years (and badly documented to boot) - there were literally crucial sections of the code that had unexplained statistical corrections to genomics data that everyone was afraid to touch because they didn't understand them and the code didn't return intelligible results if you took it out. This is the underpinning of much of the scientific discoveries in biology in the last two decades, and it was written in a way that even an undergrad in computer sciences would know is a no-no.
Well, yes. But the question is why. I would have imagined that the people building a software system that would be tracking an epidemic might be, in fact, epidemiologists. And all of the epidemiologists I know are actually pretty savvy at data analysis. So it raises the question of who actually was building the system if they didn't use epidemiologists.
"When I meet God, I am going to ask him two questions: Why relativity? And why turbulence? I really believe he will have an answer for the first." - Werner Heisenberg (maybe)
Time. Pragmatism.
Results are needed now now now, not after the time it takes to develop and test a comprehensive and fully-functioning bespoke system in R or Python or SAS or whatever. Excel is ubiquitous, population of data into excel sheets widely understood by those submitting data, import of resulting .csv files simple and straightforward to the engine doing the processing.
And schoolboy errors like this are what happen when you're rushed to deliver.
Oh dear, Swiss pharma giant Roche have announced a major problem at their warehouse from which much of the testing for Covid19, cancer and other tests come from. Could be a major bottleneck if they can't fix it soon.
https://www.bbc.co.uk/news/health-54435226
I'm about 90% sure the reason to use Excel rather than something else will have been organisational rather than technical. Interpreting CSVs is some tutorial level shit.
I guess I am a bit shocked that PHE didn't already have something better in place. You'd think 'how many people are sick' is a pretty basic question in the world of being a public health authority.
When the sky above us fell
We descended into hell
Into kingdom come
I don't really agree. The power of something like R is that you're not limited by the built in statistical functionality of something as limited as Excel and the data is much more manipulable. It's not at all hard to do, is much more scalable, and it's what epidemiologists do all the time. *shrugs* I'm sure there's a story there but without knowing who set up the system and what their goals were, it's hard to understand.
"When I meet God, I am going to ask him two questions: Why relativity? And why turbulence? I really believe he will have an answer for the first." - Werner Heisenberg (maybe)
Isn't R more for the number crunching? Looks like this was also meant to process names etc for the contact tracing, I thought R isn't really for that. But as a disclaimer, I'm one of those fossils who uses excel (in combination with csv and python scripts).
Last edited by Flixy; 10-07-2020 at 09:45 PM.
Keep on keepin' the beat alive!
Yeah R is a statistical tool. I manage statistical modellers and analysts. Masses of number crunching yes but up to and including artificial intelligence in statistical pattern-recognition with platforms such as Ayasdi. R is a tool my analysts used to use fairly extensively for building models and extensive ad-hoc scripts and algorithms, but we've moved on to Python as a better tool for the job.
Such tools would certainly be far in advance of what the PHE requires here, which is primarily basic data gathering and processing small numbers into the handy (if inaccurate :/ ) little reports we see in the media and on government press releases.
And let's remember it's a government funded body (AFAIK), not a private software company. We've had 10 year of cuts.
I'm impressed they have the money to splash out on Excel. They probably don't have a development team, so to speak, and it may be more if a case of a system whipped up by some talent in IT Ops who can do some basic scripting. It also probably has a high staff turnover so has remained that way for years because no one is really responsible for it.