Linggo, Agosto 13, 2017

Tracing the Curves: Digital Scanning

Graphs and plots are helpful ways of presenting data. Many sites offering free data, like government websites, share their information to the public through graphs. They sometimes however do not give away the raw coordinates of the graphs. For data scientists, this would be a huge pain since a lot more investigations can be done with the raw data.

In the second activity, we were tasked to obtain the actual data points from a digitally scanned plot. Using ratio and proportion and by tabulating the pixel locations of the points in the graph, the plot can be perfectly replicated with its actual coordinates, with the limitations dependent on the quality of the scanned image. After this, the data points are now fresh and ready to be processed!

I used the image in figure 1 obtained from an old dissertation found in IPL[1]. The slight distortion of the plot can be observed from the curve in the horizontal axis since it was captured by a phone camera.

Figure 1.  A figure obtained from an old dissertation[1], captured using a phone camera.

The pixel locations from the plot can be tracked using Photoshop, GIMP, ImageJ, or Paint. I used paint due to availability. With paint, the origin (0,0) pixel is located at the upper left edge of the image. Using Excel, the tabulated pixel location of points in the plot is shown in figure 2.

Figure 2. Obtaining pixel locations with Paint. Hovering the mouse over the point of interest(indicated by the intersection of the red lines) shows the x and y pixel locations at the lower left corner of the window (marked by the blue box).

To convert the pixel locations to actual data point, a conversion factor can be done using pixels of known data points in the plot. In my case, I used the x and y tick marks as reference points since it was properly labeled. With this method, it was assumed that the x and y axis of the graph is horizontal and vertical. For the x-axis pixel locations, only the x-pixels were noted and for the y-axis pixel locations, the y-pixels. It can be seen that distortion in the image was not accounted since the y-locations of the x-axis and the x-locations of the y-axis was implied to be constant. Through ratio and proportion the difference between the points along the x and y coordinate will be converted to the difference in the actual coordinate, however, the difference of the points with respect to the origin will be neglected. It will be assumed that the origin of the image is same as that of the graph unfortunately the origin of the graph is not at the upper left edge but is at the lower left point after a thick margin.

Linear regression would be a good way to automatically convert the pixel locations to the actual data points since includes a shift due to the margins from the y-intercept and automatically accounts for the inverted origin location of the image from upper left to lower left from the negative sign in the slope of the resulting linear relation.

Figure 3 and 4 shows the plot for the calibration of the x and y axis of the plot. Looking back again at reference image in figure 1, the plot is in semilogx. It was noted that the x-axis is logarithmically scaled, so the trendline fitted should be in log scale. However, in excel the nearest fitting is exponential, which would be might cause a errors since the approximation may not match. So, I used the logx for the fitting and used the linear equation to convert the pixel values to the logx coordinates as shown in figure 3(b). The linear equation for the conversion of the x-pixels to the logx of the actual coordinates is logx =  0.0041xpixel - 2.0358 and the calibration equation for the conversion of the y pixel locations to the actual y coordinates is yactual = -0.1488ypixel + 423.63. Note the negative slope in the yactaul calibration equation, which accounts for the inverted origin point.

Figure 3. The actual x-axis of the plot vs the pixel location of the x-axis in the image. (a) With the real x value of the x-axis, an exponential trend line can be plotted. (b) With the logarithm of the x value of the actual x axis, a linear trend line can be used for the calibration equation.


Figure 4. The actual y-axis of the plot vs the y-pixel location of the y-axis in the image. A linear trendline can be used for the calibration equation of the y-axis

With the recalibrated points, the plot can already be reconstructed in lin-lin scale. However, the actual coordinates is in logx. As an additional step, the x-coordinates of the points was obtained getting the 10 raised to the logx values. The complete reconstruction of the plot with the actual point coordinates is shown in figure 6, which is in semilogx.



Figure 5. The reconstructed plot in lin-lin scale. The actual y coordinates of the plot vs the logarithm of the actual x coordinates of the plot.

Figure 6. The reconstructed plot in semilogx using the actual x and y coordinates of the plot.

By overlaying the image, it can be observed that the produced plot overlaps the captured image. Notice that the x-axis of the image does not completely overlap with the produced graph. This is due to the distortion dependent on the angle of the camera and the flatness of the page at which the image was taken. From figure 3, the R^2 value of the graph also shows that the fitted equation does not perfectly fit since it is slightly less than 1. This affects the conversion since the fitting equation was used as a calibrating equation.

Figure 7. The reconstructed plot overlayed with the captured image.


For this activity, I would give myself 10/10 since I successfully overlayed the image with the graph but the picture I took was a bit distorted, which affected the results of the overlap. In reality, most of the images do have distortions. With this, to obtain the actual plot coordinates, the image should first be corrected.

It was kinda fun doing the activity especially in the part where I was trying to overlap the image and the plot. It was quite fulfilling to see that it matches. The tracing part however tested my patience since at first, I chose a plot that is super wigly. I would like to acknowledge Micherene Clauzette Lofamia and Adonis Villagomez for the help they offered throughout the activity especially in the convertion method done of using linear regression up to which dataset should be in the x and y axis of the plot. I would also like to thank Sir Mario for his opinion on the method of converting to logx and the aproval of using the images. Towards the end of the lesson, when kuya Mario was checking if ever we did something during class, I explained the method and the linear regression I did to convert the pixel locations to actual location. Sir made a face looking a bit unconvinced about the method. He was actually expecting to use the ratio and proportion method to for conversion. But then, thinking more about it, the linear regression method is more direct to the point since the ratio and proportion was already given by the slope of the actual vs pixel graph, plus it accounted several points which is equivalent to averaging several measured ratios. Also, the group of Anton Cruz (w/ Jarms, Jethro and Ardie) kind of made the room super happy and lively with their nice remarks, especially since everyone else left the building. Their presence helped me finish the activity faster and enthusiastically.

In life, as in art, the beautiful moves in curves. - Edward G. Bulwer Lytton

[1] Lumawag, Efren (1992) Real-time Spectral Characteristics of Laser Diodes Under Large-Signal Current Modulation at Low Frequencies, Dissertation, UP Diliman

Walang komento:

Mag-post ng isang Komento